QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
52,593
Published 2024-02-27
In this video, I discuss how to fine-tune an LLM using QLoRA (i.e. Quantized Low-rank Adaptation). Example code is provided for training a custom YouTube comment responder using Mistral-7b-Instruct.
More Resources:
👉 Series Playlist: • Large Language Models (LLMs)
🎥 Fine-tuning with OpenAI: • 3 Ways to Make a Custom AI Assistant ...
📰 Read more: medium.com/towards-data-science/qlora-how-to-fine-…
💻 Colab: colab.research.google.com/drive/1AErkPgDderPW0dgE2…
💻 GitHub: github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/ql…
🤗 Model: huggingface.co/shawhin/shawgpt-ft
🤗 Dataset: huggingface.co/datasets/shawhin/shawgpt-youtube-co…
[1] Fine-tuning LLMs: • Fine-tuning Large Language Models (LL...
[2] ZeRO paper: arxiv.org/abs/1910.02054
[3] QLoRA paper: arxiv.org/abs/2305.14314
[4] Phi-1 paper: arxiv.org/abs/2306.11644
[5] LoRA paper: arxiv.org/abs/2106.09685
--
Book a call: calendly.com/shawhintalebi
Socials
medium.com/@shawhin
www.linkedin.com/in/shawhintalebi/
twitter.com/ShawhinT
www.instagram.com/shawhintalebi/
The Data Entrepreneurs
🎥 YouTube: / @thedataentrepreneurs
👉 Discord: discord.gg/RSqZbF9ygh
📰 Medium: medium.com/the-data-entrepreneurs
📅 Events: lu.ma/tde
🗞️ Newsletter: the-data-entrepreneurs.ck.page/profile
Support ❤️
www.buymeacoffee.com/shawhint
Intro - 0:00
Fine-tuning (recap) - 0:45
LLMs are (computationally) expensive - 1:22
What is Quantization? - 4:49
4 Ingredients of QLoRA - 7:10
Ingredient 1: 4-bit NormalFloat - 7:28
Ingredient 2: Double Quantization - 9:54
Ingredient 3: Paged Optimizer - 13:45
Ingredient 4: LoRA - 15:40
Bringing it all together - 18:24
Example code: Fine-tuning Mistral-7b-Instruct for YT Comments - 20:35
What's Next? - 35:22
All Comments (21)
-
👉More on LLMs: youtube.com/playlist?list=PLz-ep5RbHosU2hnz5ejezwa… -- References [1] Fine-tuning LLMs: https://youtu.be/eC6Hd1hFvos [2] ZeRO paper: arxiv.org/abs/1910.02054 [3] QLoRA paper: arxiv.org/abs/2305.14314 [4] Phi-1 paper: arxiv.org/abs/2306.11644 [5] LoRA paper: arxiv.org/abs/2106.09685
-
Amazing work Shaw - complex concepts broken down to 'bit-sized bytes' for humans. Appreciate your time & efforts :)
-
Your explanations are amazing and the content is great. This is the best playlist on LLMs on YouTube.
-
This is the best explanation that i've ever heard, thanks for all the work!!
-
wow, you are the genius of explaining super hard math concept into layman understandable terms with good visual representation. Keep it coming.
-
Thank you Shaw for yet another awesome video succinctly explaining complex topics!
-
Exactly what I was looking for! Thanks for the video. Keep going!
-
Thank you for this amazing video, great explanations, very clear and easy to understand!
-
So far the best explanation on Youtube about this topic
-
Great video and your slides are very well organized!
-
Learned a lot. Great video and very accessible. Well Done!
-
Amazing video ! You are the best, man ! Thank you so much.
-
Loved this, very informative and clear!
-
First I thought omg this video is horrible but its actually excellent! (I wanted a practical fast way to get my LLM finetuned using my own data, but found it really isnt that easy). After this I understood a lot better what is going on in the background.
-
dear Shaw, i listen to the video so many times, and aside that is extremely well done and i learn so much, you should emphasize (or even do an ad hoc video) the fact that key for the finetuning with "one" gpu is the usage of the "quantifized" model of mistral, overall i m sure that many users, wodul like to know more about this models, i m sure that not many knows how to use the most important LMM (quantized) on their own colab or even pc.....like base of their own application.... :)
-
Amazing explanation!!! Thank you Shaw!
-
Great content, thank you!
-
thank u for sharing this knowledge , we need more videos like this
-
Amazing work! Thanks mate :)
-
Another fire video in the books!