Kyutais New "VOICE AI" SHOCKS The ENTIRE INDUSTRY!" (Beats GPT4o!)

57,038
0
Published 2024-07-03
Learn A.I With me - www.skool.com/postagiprepardness
๐Ÿค Follow Me on Twitter twitter.com/TheAiGrid
๐ŸŒ Checkout My website - theaigrid.com/


Links From Todays Video:
x.com/kyutai_labs/status/1808557953957703722
x.com/kyutai_labs/status/1808526962941366415
kyutai.org/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries) [email protected]

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

All Comments (21)
  • @Silas2-p7c
    This is a GPT2 moment. Itโ€™s only a matter of time before voice models become the new LLMs.
  • @EDLR234
    To the complainers. It's all in the context. It's a tiny quantized model, open source, and made by a small independent team of just 8 people, from scratch, in 6 months! It's so small their aim is that can run locally on device, and it's actually a true multi-modal model. It's like having a conversation in real time even if it's still very janky and awkward at this point. With this context, it's astounding and the experience is like nothing I've experienced in AI so far. There is no distance from speaker, it's like it's right there listening and responding without any barrier.
  • @shidheadmemes
    im fucking SHOCKED, my legs are SHAKING, this QUITE LITERALLY BLEW MY MIND, my grandmother STOOD UP from her GRAVE because she was so SHOCKED
  • @MojaveHigh
    It starts off well, but at least for me, after about a minute, its functionality drops significantly, it starts repeating itself and just not understanding anything anymore.
  • @donaldclark1019
    I demoed today and tried to ask more about the Matrix. Apparently Neo was a rebel pilot who teamed up with a hacker to fight an AI controlled by an evil corporation. Its voice options were limited and seemed to always hallucinate a response and then say "sorry im here to help"
  • @jwetzel3141
    Today I learned that pirates have an American accent.
  • @reezlaw
    It's hit and miss but when it works it's unbelievable. The response time is superhuman and when you get good relevant replies in less than 200ms you really get a glimpse of the future. Of course way more often it goes nuts, starts repeating itself, loops and stops listening, but if this is the beginning and they keep training this has huge potential IMO
  • @andrewai2001
    Um im stunned they thought this was ready for demo
  • @MarshalArnold
    Subtitles: the first movie was called Matrix released in 1990 ๐Ÿ˜‚๐Ÿ˜‚
  • @ppowell1212
    I think that two way conversations is going to be the way forward.
  • @jimlynch9390
    Yes, the latency is impressive. The responses aren't quite as good as Pi for instance. Or the yet to be released gpt assistant. Like lots of things AI, it's only going to get better.
  • @vihangnair
    ๐ŸŽฏ Key points for quick navigation: 00:05 ๐ŸŽญ The voice AI can express over 70 emotions and speaking styles, including whispering, singing, and accents. 00:27 ๐Ÿคฏ The AI model revealed by caai is state-of-the-art and shocked the industry with its real-time conversation capabilities. 00:54 ๐Ÿ—ฃ๏ธ Moshi, the voice AI, can respond with lifelike emotions and incredible speed. 01:06 ๐Ÿ‡ซ๐Ÿ‡ท Moshi demonstrates speaking with a French accent by reciting a poem about Paris. 01:47 ๐Ÿดโ€โ˜ ๏ธ Moshi switches to a pirate voice and discusses pirate life. 02:56 ๐Ÿ•ต๏ธ Moshi uses a whispering voice to tell a mystery story. 03:22 ๐ŸŽฌ Moshi narrates the plot of "The Matrix" with detailed accuracy. 03:54 โš ๏ธ Discussion on the current limitations of voice AI, including latency and loss of non-textual information. 05:02 ๐Ÿ”„ Explanation of the new approach to integrate complex pipelines into a single deep neural network. 07:16 ๐ŸŽค Demonstration of Moshi understanding and generating speech by listening to a voice snippet. 08:13 ๐Ÿ’ก Moshi thinks as it speaks, generating both text and audio simultaneously for richer interactions. 09:12 ๐Ÿ”Š Moshi supports dual audio streams, allowing it to speak and listen simultaneously for more natural conversations. 10:20 ๐Ÿ“ž Example of Moshi's conversational capabilities using historical data sets. 12:23 ๐Ÿ˜ฎ Moshi can express over 70 different emotions and speaking styles using a text-to-speech engine. 15:59 ๐Ÿ“ฑ Moshi can run on-device, ensuring privacy and security by eliminating the need for cloud processing. 18:36 ๐Ÿ” Measures are in place to detect and watermark audio generated by Moshi for safety and authenticity. 20:11 ๐ŸŒ Demonstration of Moshi's real-time conversational capabilities, showing quick responses and lifelike interaction. 23:34 ๐Ÿš€ Moshi represents a revolutionary advancement in AI, promising significant changes in AI-human interactions.
  • @anta-zj3bw
    I bet you that conversation at the end got you really, REALLY excited.
  • @Tilofus
    Can't wait to try any of the State-of-the-Art Voice Models
  • @Yogsoggeth
    Gee thanks for the huge subtitles right in the middle of the screen where a video should have been. Hot tip, CC is optional on the site you don't need to force feed me your script, because my ears work fine thanks. And if they didn't I would turn the CC on if I wanted it.
  • @jeffkilgore6320
    Each day a yesteryear Nobel Prize is won. The word โ€œshockedโ€ has become a self mockery that reminds us that while we should be shocked, somehow, weโ€™re not.
  • @dreamyxqc3812
    open ai will still be releasing gpt 4o in the next coming weeks ( infinity )