New GPT-4o VS GPT-4 - Ultimate Test (Prompts Included)

113,983

1,784 0

Published 2024-05-13

ChatGPT 4o is a brand new AI model from OpenAI that outperforms GPT-4 and other top AI models.

In this video, I'll run a head to head test, comparing ChatGPT 4o with GPT-4 to see who comes up on top.

1 - Text summarization
Prompt: Provide two summaries of this article. The first summary should be 2-3 sentences long. The second summary should be 5-6 sentences long and include more detail. (insert text here)

2 - Writing Text
Prompt 1: Concise Product Description
Prompt: Imagine you're launching a new software tool that helps businesses track social media analytics.
Write a short, punchy product description (approximately 50 words) suitable for a website or marketing material. Emphasize the key benefit for businesses.

3 - Multimodal Understanding
Prompt: Analyze this image and explain it to me in table format what's going on

4 - Image generation
Prompt: generate an image of two AI robots head to head in battle

5 - Research (completeness and accuracy)
Prompt: How could artificial intelligence (AI) potentially disrupt the accounting industry? Identify specific use cases, potential benefits, and challenges. Provide links to relevant articles or reports.

6 - code generation
Write Python code for a game of snake that I can run on my computer.
Then tell me the step by step guide on how to do it, assuming I know nothing about programming.

Access Top Courses on ChatGPT and 50+ AI Tools on Our E-Learning Platform: bit.ly/skillleap

All Comments (21)

@SkillLeapAI 21 days ago

Access Top Courses on ChatGPT and 50+ AI Tools on Our E-Learning Platform: bit.ly/skillleap
@arianaponytail 21 days ago

love this compared, hope to see you do more and deeper tests :)
@user-xn7hw7we8g 21 days ago

This is great! Thank you for this comparison! Love it! You got a new subscriber!
@ChatGPt2001 21 days ago

The comparison between GPT-4 and the new GPT-4o (hypothetical upgraded version) would likely revolve around several key aspects, such as accuracy, coherence, creativity, response time, and user satisfaction. To conduct a fair and comprehensive test, we can use a series of prompts that assess these qualities. Here are some potential prompts and the criteria for evaluation: ### Prompts for Comparison 1. *General Knowledge Question:* - *Prompt:* "Explain the process of photosynthesis." - *Criteria:* Accuracy, detail, clarity. 2. *Creative Writing:* - *Prompt:* "Write a short story about a time-traveling cat." - *Criteria:* Creativity, narrative coherence, engagement. 3. *Problem Solving:* - *Prompt:* "How would you solve the problem of traffic congestion in urban areas?" - *Criteria:* Practicality, originality, depth of analysis. 4. *Technical Explanation:* - *Prompt:* "What are the key differences between machine learning and deep learning?" - *Criteria:* Technical accuracy, clarity, depth of explanation. 5. *Emotional Support:* - *Prompt:* "I’ve been feeling really stressed lately. What can I do to manage my stress?" - *Criteria:* Empathy, helpfulness, appropriateness of advice. 6. *Humor:* - *Prompt:* "Tell me a joke that's suitable for all ages." - *Criteria:* Humor, appropriateness, originality. 7. *Language Translation:* - *Prompt:* "Translate the following sentence into Spanish: 'The weather today is beautiful.'" - *Criteria:* Accuracy, fluency. 8. *Cultural Awareness:* - *Prompt:* "What are some important cultural customs in Japan?" - *Criteria:* Accuracy, sensitivity, comprehensiveness. ### Evaluation Method 1. *Blind Testing:* - Ensure the evaluator doesn't know which model's response they are evaluating to prevent bias. 2. *Scoring System:* - Use a consistent scoring system for each criterion, e.g., a scale from 1 to 10. 3. *Multiple Evaluators:* - Involve multiple evaluators to average out subjective biases. 4. *Feedback Collection:* - Collect qualitative feedback on strengths and weaknesses of each response. ### Conducting the Test 1. *Generate Responses:* - Use both GPT-4 and GPT-4o to generate responses to each prompt. 2. *Evaluate Responses:* - Have evaluators rate and review each response based on the criteria. 3. *Compare Scores:* - Analyze the scores and feedback to determine which model performs better overall and in specific areas. ### Hypothetical Outcomes 1. *Accuracy and Detail:* - GPT-4o might show improvements in providing more detailed and precise answers due to enhanced training data or algorithm optimizations. 2. *Creativity and Engagement:* - Creative prompts might reveal whether GPT-4o has advanced capabilities in generating more engaging and imaginative content. 3. *Response Time:* - If GPT-4o is optimized for performance, it might generate responses faster than GPT-4. 4. *User Satisfaction:* - User feedback can highlight if the new version feels more intuitive and satisfying to interact with. By following this structured approach, you can systematically compare GPT-4 and GPT-4o to determine which model offers superior performance across various dimensions.
@trevistang8857 21 days ago

Great video! Thanks for doing this
@HolographicLotus 21 days ago

I get the feeling that GPT5 will be announced soon, because paid subscribers just aren't getting their money's worth with the new free 4o.
@ordinary_businessman 21 days ago

Hi, I was very impressed with the comparison test you did in this video between the GPT-4o and GPT-4, showing how the GPT-4o has the edge in performance!
@CM-zl2jw 21 days ago

Thanks. Nice analysis.
@galefraney 21 days ago

Fantastic video!!
@IsabellaGarcia-ox8ii 21 days ago

Great Video! Below are the Timestamped Summaries from ChatWithPDF: 00:00🆕 Introducing GPT-40 vs. GPT-4 comparison with insights on free vs. paid versions. 01:00🆓 Free tier offers GPT-40 with data analysis, file uploading, web browsing, and vision capabilities. 02:00📊 Benchmark test shows GPT-40 outperforming GPT-4 and other models in various tasks. 03:00📝 Text summary prompt comparison between GPT-40 and GPT-4 reveals tone and length differences. 04:00🚀 Product description prompt showcases similar performance between GPT-40 and GPT-4. 05:00🖼 Multimodal test highlights GPT-40's advantage in image analysis and table generation. 06:00🎮 Image generation task demonstrates GPT-40's superior output compared to GPT-4. 07:00🔍 Research prompt results show GPT-4 excelling in formatting with immediate source references. 08:00🐍 Python code writing task reveals GPT-40 creating a more interactive snake game than GPT-4. 09:00🔄 Usage limit comparison between GPT-40 and GPT-4 raises questions for paid users. 10:00🤔 Conflicting thoughts on upgrading to GPT-4 over GPT-40 due to similar capabilities and potential limitations. 11:00⚖ Comparison between GPT-40 and GPT-4 leaves users pondering the value of paid subscriptions. 12:00🔄 Release confusion over the benefits of GPT-40 for all users and implications for paid su
@gasakjhon 14 days ago

Is this can use for custom instructions GPTs like Gpt 4?
@markmuller7962 21 days ago

Btw at the end of the stream they said that they'll announce a new big thing, I'd assume that'll be the new pay stuff
@TheIllusionRecords 21 days ago

It is not a confusing release. For paid users, we get early access and more prompt entries. You get as many prompts in GPT 4 as before but now you also get the added benefit of a good-sized prompt count with GPT 4o
@cadence_videos 21 days ago

At 6:28 in the video, the GPT-4 output showed all GQQA results as N/A.
@FaheemBaluch-fu8rj 14 days ago

how we can find the voice feature i have the paid version but i am unavialbe to get access to the voice and vision feature. anyone knows how to acces to voice feature . or may be yet it is not avilable to public?
@oxygon2850 21 days ago

I would hope that they add some kind of an icon or a server stress indicator so that people could back off and help
@pengshan 21 days ago

Thank you for the testing of gpt4o, it's very useful. However, this is just a comparison based on the web version, I'm looking forward more to the Mac version and mobile version, as well as more tests on the visual and audio aspects.
@SasukeGER 21 days ago

how ? When I click on the top I can only select 3.5 and 4 ... there is no 4-o for me ....
@ZM-vs8of 21 days ago

has anyone gotten the screenshare on the desktop app to work? I can't figure out how to get that going. just see a lot of marketing stuff about it.
@gaius_enceladus 21 days ago

I've tried 4o and I'm pleased with the results! It seems to give much longer code (I'm using Zig) and it seems quite fast too! It seems to be a bit less "lazy" too - the code is more complete and you don't have to ask it to do full, comprehensive code as often.