Deep-dive into the AI Hardware of ChatGPT

310,222

7,659 0

Published 2023-02-20

With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnordpass
Or use code highyieldnordpass at checkout.

Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass: www.nordpass.com/highyieldbusiness with code highyieldbusiness in the form.
....

What hardware was used to train ChatGPT and what does it take to keep it running? In this video we will take a look at the AI hardware behind ChatGPT and figure out how Microsoft & OpenAI use machine learning and Nvidia GPUs to create advanced neural networks.

Support me on Patreon: www.patreon.com/user?u=46978634
Follow me on Twitter: twitter.com/highyieldYT

Links
The Google research paper that changed everything: arxiv.org/abs/1706.03762
The OpenAI research paper confirming Nvidia V100 GPUs: arxiv.org/abs/2005.14165

0:00 Intro
0:28 AI Training & Inference
2:22 Microsoft & OpenAI Supercomputer
4:08 NordPass
5:57 Nvidia Volta & GPT-3
9:05 Nvidia Ampere & ChatGPT
13:23 GPT-3 & ChatGPT Training Hardware
14:41 Cost of running ChatGPT / Inference Hardware
16:06 Nvidia Hopper / Next-gen AI Hardware
17:58 How Hardware dictates the Future of AI

All Comments (21)

@HighYield 1 year ago

With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnordpass Or, use code highyieldnordpass at the checkout. Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass at www.nordpass.com/highyieldbusiness with code highyieldbusiness in the form.
@wiredmind 1 year ago

I think in a few years, AI accelerator cards will be the next video cards. A race to the top for the most powerful accelerator to be able to train and run AI locally on our own PCs, bypassing the need to pay for filtered models from large companies. Once people can run this kind of thing independently, that's when things will start getting really exciting.
@matthewhayes7671 8 months ago

I'm a newer subscriber, working my way back through your recent videos. I just want to tell you that I think this is the best tech channel on YouTube right now, hands down. You are a wonderful teacher, you take frequent breaks to summarize key points, you provide ample context and visual aids, and when you do make personal guesses or offer your own opinions, it's always done in a transparent and logical manner. Thank you so much, and keep up the amazing work. I'll be here for every upload going forward.
@garyb7193 1 year ago

Great video! Hopefully it will put things into perspective, that Nvidia, Intel, and AMD's world does not revolve around graphic card sales and squeezing the most performance out of Cyberpunk. Hundreds of millions of dollars are a stake in areas much more lucrative than $500 CPUs or $800 videocards. They must meet the demands of all their various customers as well as investors and stockholders too. Thanks!
@guyharris8007 1 year ago

Dev here... gotta say I love it. Thoroughly enjoyable thank you for your time!
@kaystephan2610 1 year ago

I find the increase in cumpute performance incredible. As shown at 7:38 the GV100 from the beginning of 2018 had 125 TFLOPS of FP16 Tensor core compute performance. The current generation of Enterprise AI accelerators from NVIDIA are the NVIDIA H100. And the H100 provides up to 1979 TFLOPS of FP16 Tensor core compute Performance. And the Fp32 and FP64 Tensor Core performance has also obviously increased massively. Within ~5 years the raw compute performance of Tensor Cores has increased by around 16x. What previouslym required 10,000 GPUs could now be done with ~632.
@Fractal_32 1 year ago

I just saw your post on the community page, I wish YouTube would have notified me when the video was posted instead of pushing the post without the video. I cannot wait to see what is talked about in the video! Edit: This was great, I’m definitely sharing it with some friends, keep up the great work!
@klaudialustig3259 1 year ago

Great video! I'd like to add one thing: in the segment starting at 16:06 where you talk about the Nvidia Hopper H100, in the context of Neural Networks the most important number to compare to the previous A100 should be the memory. As far as I know, as long as there is some kind of matrix multiplication acceleration, it doesn't matter much how fast it is. Memory bandwidth becomes the major bottleneck again. I looked it up and found the number of 3TB/s, which would be 50% higher than the A100 80GB-version. I wonder where the number of 4.9TB/s shown in the video at 18:50 comes from. It seems unrealistically high to me. Nvidia's marketing does not like to admit this. They like to instead compare other numbers, where they can claim some 10x or 20x or 30x improvement.
@newmonengineering 1 year ago

I have been an OpenAI beta member for 3 years now. It has only become better over the years. I wonder what it will look like in 5 years.
@marka5968 1 year ago

Great and very informative video, sir. I remember watching your very early videos and thought it was bit meh. But, this is absolutely world class stuff and happy to listen to such a sharp and insightful mind.
@SteveAbrahall 1 year ago

Thanks for the tech background on what it's running from a hardware angle. The interesting thing is I think when some one comes up with a hunk of code that saves billions of hours of computational power. That disruptive type of thing from a software angle. It is an amazing time to live. Thanks for all your hard work, and an interesting vid!
@KiraSlith 1 year ago

Been beating my head against the Cost vs Scale issue of building my own AI compute rig for training models at home, and this gave me a better idea of what kind of hardware I'll need long-term by looking at what the current bleeding edge looks like. Thanks for doing all the research work on this one!
@og_jakey 1 year ago

Fantastic presentation. Appreciate your pragmatic and reasonable research, impressive work. Thank you!
@kanubeenderman 1 year ago

MS will for sure use its Azure based cloud system for hosting its ChatGPT, so that they can load balance the demand, and be able to scale out to more VM's and instances if needed to meet demand, and to increase resources on any individual instance if needed. That would be the best use of that set up and provide the best user experience. So basically, the hardware specifics would be whatever servers are running in the 'farms'. I doubt if they will have separate and specific hardware set aside just for ChatGPT as it would run like any other service out there.
@novadea1643 1 year ago

Logically the inference costs should scale pretty linearly to the amount of users since it's pretty much a fixed amount of computation and data transfer, or can you elaborate why the requirements would scale exponentially as you state at @15:40?
@backToFreedom 1 year ago

Thank you very much for bringing this kind of information. Even chatgpt is unware about the hardware is running on!
@theminer49erz 1 year ago

Fantastic!! First of all, I cpuld be wrong, but I don't remember you having an in video sponsor before. Either way, that I awesome!! I'm glad you are getting the recognition you deserve!! You must have done a lot of work to get these numbers and configurations. Very interesting stuff! I am looking forward to AI splitting off from GPUs too. Especially with the demand for them going up as investment in AI grows. I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them. Plus I'm sure they will do a much better job. The design, at least in regards to Nvidia because of it is quite annoying too. Tensor cores for example were mainly put there for AI and Mining use, the marketing of them for upscaling and the cost added for a gamer to use it is kinda ridiculous. If you have a lower end card with them where you wpuld benifiet from the upscaling, you could probably buy a card without them that wouldnt need to upscale. It seems to me that their existence is almost the cause for their need in that use case. I don't know how much of the cost of the card is just for them, but I imagine it's probably around 20-30% maybe?? IDK, just thinking "aloud". Anyway, thanks again for the hard work and please let us know when you get a Patreon account!! I would be proud to sponsor you as well!! Cheers!!
@EmaManfred 1 year ago

Good job here sir! Mind if you did a quick breakdown of language model like Bluewillow that also utilizes diffusion?
@frizzel4 1 year ago

Congrats on the sponsorship!! Been watching since you had 1k subs
@AgentSmith911 1 year ago

10:14 is so funny because "what hardware are you running on?" is one of the first questions I asked that bot 😀