ZERO Cost AI Agents: Are ELMs ready for your prompts? (Llama3, Ollama, Promptfoo, BUN)

5,819
0
Published 2024-04-29
πŸš€ Are Efficient Language Models (ELMs) READY for On-Device Use?

How do know when it is?

Using the ITV Benchmark with Llama 3, Gemma, PHI 3, you can be 100% sure that the ELM is ready for your use case.

Let's make 1 thing absolutely clear: The cost of the prompt is going to ZERO.

The world of AI is evolving at a BREAKNECK pace, and the latest advancements in efficient language models (ELMs) like Llama 3, Gemma, OpenELM, and PHI 3 are pushing the boundaries of what's possible with on-device AI. πŸ€–πŸ’‘

LLama 3 8b, and Llama 3 70b have hit the top 20 on the LMSYS Chatbot Arena Leaderboard in less than a week of launch. You can bet that the open source LLM community is tweaking and tuning llama3 to make it even better. It's likely we'll see the 8k context window improved to 32k and above in a matter of days.

But with so many options and rapid developments, how do you know if an ELM (efficient language model aka on device language model) is truly ready for YOUR specific use case? πŸ€”

Enter this video and the ITV Benchmark - a powerful tool that helps you quickly assess the viability of an ELM for your needs. πŸ“ŠπŸ’ͺ

In this video, we dive deep into the world of ELMs, exploring:

βœ… The key attributes you should consider when evaluating an ELM, including accuracy, speed, memory consumption, and context window
βœ… How to set your personal standards for each metric to ensure the ELM meets your requirements
βœ… A detailed breakdown of the ITV Benchmark and how it can help you determine if an ELM (llama3, phi3, gemma, etc) is ready for prime time
βœ… Real-world examples of running the ITV Benchmark on Llama 3 and Gemma to see how they stack up πŸ₯Š
βœ… Gain access to a hyper modern, minimalist prompt testing framework built on top of Bun, Promptfoo, and Ollama

We'll also discuss the game-changing implications of ELMs for your agentic tools and products. Imagine running prompts directly on your device, reducing the cost of building to ZERO! πŸ’Έ

By the end of this video, you'll have a clear understanding of how to evaluate ELMs for your specific use case and be well-equipped to take advantage of these incredible advancements for both LLMs and ELMs. πŸš€

ELMs, setting standards and clean prompt testing enable you to stay ahead of the curve and unlock the full potential of on-device AI! πŸ”“πŸ’‘

Like and subscribe for more cutting-edge insights into the world of AI, and let's continue pushing the boundaries of what's possible together! πŸ‘πŸŒŸ

πŸ’» Reduce your agentic costs with the ELM-ITV Codebase
github.com/disler/elm-itv-benchmark

πŸ”— Links:
Bun bun.sh/
Ollama ollama.com/
Promptfoo promptfoo.dev/
Apples OpenELM machinelearning.apple.com/research/openelm

πŸ“š Chapters:
00:00 The cost of agentic tools is going to ZERO
00:48 Are ELMs ready for on device use?
02:28 Setting standards for ELMs
04:05 My (IndyDevDan) personal standards for ELMs
06:36 The ITV benchmark
07:05 ELM benchmark codebase
09:30 Bun, Ollama, Promptfoo, llama3, phi3, Gemma
12:10 Llama3, Phi3, Gemma, GPT3 TEST Results
16:10 New LLM class system
18:45 On Device PREDICTION
19:05 Make this prompt testing codebase your own
19:45 The cost of the prompt is going to ZERO
20:15 How do you know if ELMs are ready for your use case?

#promptengineering #aiagents #llama3

All Comments (15)
  • @thunken
    would be cool if you had finger puppets :)
  • @kenchang3456
    And now I know why I subscribed with alerts on.
  • Hi Dan, thanks a lot for sharing your knowledge. With around 700K open-source LLMs around, its really hard to pick a decent one. Usually we sort it by most downloaded or most likes but its not enough. This benchmarking will really help. BTW, I followed the readme and running bun elm throws 'error: Script not found "elm"
  • @alew3tube
    I would add to your list: tool/function calling as fundamental for a LLM
  • @AGI-Bingo
    I currently enjoy the groq free era, and i dont mind using it for development, but as production goes, i wouldnt want my or others private data going to any corp, so going local is definitely on the way to go
  • @wellbishop
    Pretty smart guy you are. Tks for sharing your divinity with us, poor mortals.
  • @larsfaye292
    In my opinion, the LPU (such as what Groq is developing) is going to be built into future PCs, dedicated for the sole task of running local models.
  • Left a comment under the Git Hub repo, may have forgotten to include the package.json. When running "bun i" I get an error because there are no dependencies in a package.json
  • @6lack5ushi
    I love your videos and posts, but even with ELM's the biggest Issue I find. unless it's a nasty bug inherent to my system. INSTRUCTION FOLLOWING! I would rather have the legacy GPT-4 than any 4-turbo model. because it follows commands WAY BETTER! I have a terrible feeling MMLU and other benchmarks are hiding the fact models may get more capable but less reliable. or "lazy" I thought it was bloated initial prompts (and human moderating and creating illogical gaps where it just omits) but I think its more sinister. We are optimising for the benchmarks but do not bench mark instruction following in said benchmarks!
  • @fraugdib3834
    Effin' Righteous man... Have a metric --> Use it often --> Know exactly where you stand in reference to an ever expanding whirlwind of clickbait and noise.
  • @fontende
    Apple models not impressed me at all, maybe they deliberately published only the smallest and useless ones. Normal quality LLM like llama 3 70 billions in best quality 8bit GGUF need 90GB of RAM just to start. All these hardware makers can't provide such, everyone showing the powerful CPUs which will be wasted in that laptops as Microsoft required just 16Gb of RAM. Using SSD here impossible, wearing out fast. 128GB of DDR4 costs me exactly $400 which is half of decent GPU or all these fancy laptops.
  • @xyster7
    just remove that echo, invest in better mike and you will be better than other this type channels