Kyutais New "VOICE AI" SHOCKS The ENTIRE INDUSTRY!" (Beats GPT4o!)

57,038

1,388 0

Published 2024-07-03

Learn A.I With me - www.skool.com/postagiprepardness
🐤 Follow Me on Twitter twitter.com/TheAiGrid
🌐 Checkout My website - theaigrid.com/

Links From Todays Video:
x.com/kyutai_labs/status/1808557953957703722
x.com/kyutai_labs/status/1808526962941366415
kyutai.org/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries) [email protected]

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

All Comments (21)

@Silas2-p7c 14 days ago

This is a GPT2 moment. It’s only a matter of time before voice models become the new LLMs.
@JohnSmith-gt3be 14 days ago

More pressure for OpenAI to release GPT-4o voice. Good.
@EDLR234 14 days ago

To the complainers. It's all in the context. It's a tiny quantized model, open source, and made by a small independent team of just 8 people, from scratch, in 6 months! It's so small their aim is that can run locally on device, and it's actually a true multi-modal model. It's like having a conversation in real time even if it's still very janky and awkward at this point. With this context, it's astounding and the experience is like nothing I've experienced in AI so far. There is no distance from speaker, it's like it's right there listening and responding without any barrier.
@shidheadmemes 14 days ago

im fucking SHOCKED, my legs are SHAKING, this QUITE LITERALLY BLEW MY MIND, my grandmother STOOD UP from her GRAVE because she was so SHOCKED
@strangereyes9594 14 days ago

The subtitles are hilarious.
@MojaveHigh 14 days ago

It starts off well, but at least for me, after about a minute, its functionality drops significantly, it starts repeating itself and just not understanding anything anymore.
@donaldclark1019 14 days ago

I demoed today and tried to ask more about the Matrix. Apparently Neo was a rebel pilot who teamed up with a hacker to fight an AI controlled by an evil corporation. Its voice options were limited and seemed to always hallucinate a response and then say "sorry im here to help"
@jwetzel3141 14 days ago

Today I learned that pirates have an American accent.
@reezlaw 12 days ago

It's hit and miss but when it works it's unbelievable. The response time is superhuman and when you get good relevant replies in less than 200ms you really get a glimpse of the future. Of course way more often it goes nuts, starts repeating itself, loops and stops listening, but if this is the beginning and they keep training this has huge potential IMO
@ElectricEric2030 14 days ago

*runs in circle while screaming* :face-blue-wide-eyes:
@andrewai2001 13 days ago

Um im stunned they thought this was ready for demo
@MarshalArnold 14 days ago

Subtitles: the first movie was called Matrix released in 1990 😂😂
@ppowell1212 14 days ago

I think that two way conversations is going to be the way forward.
@jimlynch9390 14 days ago

Yes, the latency is impressive. The responses aren't quite as good as Pi for instance. Or the yet to be released gpt assistant. Like lots of things AI, it's only going to get better.
@vihangnair 13 days ago

🎯 Key points for quick navigation: 00:05 🎭 The voice AI can express over 70 emotions and speaking styles, including whispering, singing, and accents. 00:27 🤯 The AI model revealed by caai is state-of-the-art and shocked the industry with its real-time conversation capabilities. 00:54 🗣️ Moshi, the voice AI, can respond with lifelike emotions and incredible speed. 01:06 🇫🇷 Moshi demonstrates speaking with a French accent by reciting a poem about Paris. 01:47 🏴‍☠️ Moshi switches to a pirate voice and discusses pirate life. 02:56 🕵️ Moshi uses a whispering voice to tell a mystery story. 03:22 🎬 Moshi narrates the plot of "The Matrix" with detailed accuracy. 03:54 ⚠️ Discussion on the current limitations of voice AI, including latency and loss of non-textual information. 05:02 🔄 Explanation of the new approach to integrate complex pipelines into a single deep neural network. 07:16 🎤 Demonstration of Moshi understanding and generating speech by listening to a voice snippet. 08:13 💡 Moshi thinks as it speaks, generating both text and audio simultaneously for richer interactions. 09:12 🔊 Moshi supports dual audio streams, allowing it to speak and listen simultaneously for more natural conversations. 10:20 📞 Example of Moshi's conversational capabilities using historical data sets. 12:23 😮 Moshi can express over 70 different emotions and speaking styles using a text-to-speech engine. 15:59 📱 Moshi can run on-device, ensuring privacy and security by eliminating the need for cloud processing. 18:36 🔐 Measures are in place to detect and watermark audio generated by Moshi for safety and authenticity. 20:11 🌐 Demonstration of Moshi's real-time conversational capabilities, showing quick responses and lifelike interaction. 23:34 🚀 Moshi represents a revolutionary advancement in AI, promising significant changes in AI-human interactions.
@anta-zj3bw 14 days ago

I bet you that conversation at the end got you really, REALLY excited.
@Tilofus 14 days ago

Can't wait to try any of the State-of-the-Art Voice Models
@Yogsoggeth 14 days ago

Gee thanks for the huge subtitles right in the middle of the screen where a video should have been. Hot tip, CC is optional on the site you don't need to force feed me your script, because my ears work fine thanks. And if they didn't I would turn the CC on if I wanted it.
@jeffkilgore6320 13 days ago

Each day a yesteryear Nobel Prize is won. The word “shocked” has become a self mockery that reminds us that while we should be shocked, somehow, we’re not.
@dreamyxqc3812 14 days ago

open ai will still be releasing gpt 4o in the next coming weeks ( infinity )