ZERO Cost AI Agents: Are ELMs ready for your prompts? (Llama3, Ollama, Promptfoo, BUN)

5,819

250 0

Published 2024-04-29

🚀 Are Efficient Language Models (ELMs) READY for On-Device Use?

How do know when it is?

Using the ITV Benchmark with Llama 3, Gemma, PHI 3, you can be 100% sure that the ELM is ready for your use case.

Let's make 1 thing absolutely clear: The cost of the prompt is going to ZERO.

The world of AI is evolving at a BREAKNECK pace, and the latest advancements in efficient language models (ELMs) like Llama 3, Gemma, OpenELM, and PHI 3 are pushing the boundaries of what's possible with on-device AI. 🤖💡

LLama 3 8b, and Llama 3 70b have hit the top 20 on the LMSYS Chatbot Arena Leaderboard in less than a week of launch. You can bet that the open source LLM community is tweaking and tuning llama3 to make it even better. It's likely we'll see the 8k context window improved to 32k and above in a matter of days.

But with so many options and rapid developments, how do you know if an ELM (efficient language model aka on device language model) is truly ready for YOUR specific use case? 🤔

Enter this video and the ITV Benchmark - a powerful tool that helps you quickly assess the viability of an ELM for your needs. 📊💪

In this video, we dive deep into the world of ELMs, exploring:

✅ The key attributes you should consider when evaluating an ELM, including accuracy, speed, memory consumption, and context window
✅ How to set your personal standards for each metric to ensure the ELM meets your requirements
✅ A detailed breakdown of the ITV Benchmark and how it can help you determine if an ELM (llama3, phi3, gemma, etc) is ready for prime time
✅ Real-world examples of running the ITV Benchmark on Llama 3 and Gemma to see how they stack up 🥊
✅ Gain access to a hyper modern, minimalist prompt testing framework built on top of Bun, Promptfoo, and Ollama

We'll also discuss the game-changing implications of ELMs for your agentic tools and products. Imagine running prompts directly on your device, reducing the cost of building to ZERO! 💸

By the end of this video, you'll have a clear understanding of how to evaluate ELMs for your specific use case and be well-equipped to take advantage of these incredible advancements for both LLMs and ELMs. 🚀

ELMs, setting standards and clean prompt testing enable you to stay ahead of the curve and unlock the full potential of on-device AI! 🔓💡

Like and subscribe for more cutting-edge insights into the world of AI, and let's continue pushing the boundaries of what's possible together! 👍🌟

💻 Reduce your agentic costs with the ELM-ITV Codebase
github.com/disler/elm-itv-benchmark

🔗 Links:
Bun bun.sh/
Ollama ollama.com/
Promptfoo promptfoo.dev/
Apples OpenELM machinelearning.apple.com/research/openelm

📚 Chapters:
00:00 The cost of agentic tools is going to ZERO
00:48 Are ELMs ready for on device use?
02:28 Setting standards for ELMs
04:05 My (IndyDevDan) personal standards for ELMs
06:36 The ITV benchmark
07:05 ELM benchmark codebase
09:30 Bun, Ollama, Promptfoo, llama3, phi3, Gemma
12:10 Llama3, Phi3, Gemma, GPT3 TEST Results
16:10 New LLM class system
18:45 On Device PREDICTION
19:05 Make this prompt testing codebase your own
19:45 The cost of the prompt is going to ZERO
20:15 How do you know if ELMs are ready for your use case?

#promptengineering #aiagents #llama3

All Comments (15)

@thunken 2 months ago

would be cool if you had finger puppets :)
@kenchang3456 2 months ago

And now I know why I subscribed with alerts on.
@reagansenoron6763 2 months ago

Hi Dan, thanks a lot for sharing your knowledge. With around 700K open-source LLMs around, its really hard to pick a decent one. Usually we sort it by most downloaded or most likes but its not enough. This benchmarking will really help. BTW, I followed the readme and running bun elm throws 'error: Script not found "elm"
@alew3tube 2 months ago

I would add to your list: tool/function calling as fundamental for a LLM
@AGI-Bingo 2 months ago

I currently enjoy the groq free era, and i dont mind using it for development, but as production goes, i wouldnt want my or others private data going to any corp, so going local is definitely on the way to go
@wellbishop 2 months ago

Pretty smart guy you are. Tks for sharing your divinity with us, poor mortals.
@acllhes 2 months ago

Good stuff
@larsfaye292 2 months ago

In my opinion, the LPU (such as what Groq is developing) is going to be built into future PCs, dedicated for the sole task of running local models.
@mylesholloway9223 2 months ago

Left a comment under the Git Hub repo, may have forgotten to include the package.json. When running "bun i" I get an error because there are no dependencies in a package.json
@EternalKernel 2 months ago

you should test claude3 hiaku
@6lack5ushi 2 months ago

I love your videos and posts, but even with ELM's the biggest Issue I find. unless it's a nasty bug inherent to my system. INSTRUCTION FOLLOWING! I would rather have the legacy GPT-4 than any 4-turbo model. because it follows commands WAY BETTER! I have a terrible feeling MMLU and other benchmarks are hiding the fact models may get more capable but less reliable. or "lazy" I thought it was bloated initial prompts (and human moderating and creating illogical gaps where it just omits) but I think its more sinister. We are optimising for the benchmarks but do not bench mark instruction following in said benchmarks!
@fraugdib3834 12 days ago

Effin' Righteous man... Have a metric --> Use it often --> Know exactly where you stand in reference to an ever expanding whirlwind of clickbait and noise.
@fontende 1 month ago

Apple models not impressed me at all, maybe they deliberately published only the smallest and useless ones. Normal quality LLM like llama 3 70 billions in best quality 8bit GGUF need 90GB of RAM just to start. All these hardware makers can't provide such, everyone showing the powerful CPUs which will be wasted in that laptops as Microsoft required just 16Gb of RAM. Using SSD here impossible, wearing out fast. 128GB of DDR4 costs me exactly $400 which is half of decent GPU or all these fancy laptops.
@xyster7 2 months ago

just remove that echo, invest in better mike and you will be better than other this type channels