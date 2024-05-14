Home > AI > 6 Cool Things ChatGPT 4o Can Do That OpenAI Didn’t Highlight

6 Cool Things ChatGPT 4o Can Do That OpenAI Didn’t Highlight

Arjun Sha
OpenAI recently released its next flagship model GPT-4o and demonstrated some cool demos. The human-like voice chat has become the headline feature, but there is more to it. OpenAI didn’t highlight many cool things that ChatGPT 4o is capable of. These details are available on OpenAI’s page and I went through all of them. On that note, let’s find out the cool new capabilities of ChatGPT 4o.

Table of Contents

1. Accurate Text Generation in Images

We know that Diffusion models struggle with generating texts on images. Dall -E 3 still fails to generate images with the given text. However, the ChatGPT 4o model which is an end-to-end multimodal model, can render texts accurately. OpenAI didn’t mention this in the presentation. However, you can find the example on OpenAI’s page where the company explores its capabilities.

gpt-4o text rendering capability in image generation
Image Courtesy: OpenAI

It can generate and add text to images effortlessly. The consistency in many samples is remarkable. You can also attach images and ask it to generate images from different angles of the same character, and it maintains consistency across all scenarios. It can also generate a 3D view of objects which you can combine to create a 3D render. Not to mention, it can generate fonts too.

  • gpt-4o image generation consistency
    Image Courtesy: OpenAI
  • gpt-4o image generation consistency 2
    Image Courtesy: OpenAI
  • gpt-4o image generation consistency 3
    Image Courtesy: OpenAI

Keep in mind that these capabilities are not available on ChatGPT yet. It still uses Dall -E 3 to generate images. OpenAI may unlock these features in the near future.

2. GPT-4o Can Process Videos Too

chatgpt 4o video processing
Image Processing: OpenAI

OpenAI didn’t mention that GPT-4o can handle videos too. Well, on the model page, OpenAI has demonstrated that you can upload a video and ask GPT-4o to summarize it. From transcription to bullet-point summary, it does everything. So it seems Gemini 1.5 Pro is not the only model that can process videos.

3. GPT-4o Can Be Your Tutor

In a presentation with Khan Academy’s Sal Khan, OpenAI showcased a fascinating demo using the GPT-4o model. Basically, on an iPad, you can share your screen with ChatGPT 4o, and it can see everything on your screen.

You can now ask it to explain and help you find solutions to a problem. Be it mathematics, sciences, charts, maps, or anything else, ChatGPT 4o will be your personal teacher guiding you throughout your study session. That’s such a great application of AI, powered by GPT-4o’s multimodal vision capability. By the way, it also works with the ChatGPT desktop app for macOS.

4. ChatGPT 4o Can Be Your Meeting Companion

In one of the demos, OpenAI showcased that you can have ChatGPT 4o as your live companion during meetings. You can share the screen with ChatGPT 4o, and it can see and hear all the participants. It can also give inputs and participants can also ask questions to the GPT-4o model. It replies spontaneously and stays engaged in the conversation. At the end, you can ask it to summarize the meeting as well. How cool is that?

5. Improved Non-English Language Performance

OpenAI has not just improved the performance of GPT-4o in the English language but also improved performance in regional languages. It has significantly improved the tokenizer that allows the model to compress non-English languages to fit more tokens.

gpt-4o language tokenization improvement
Image Courtesy: OpenAI

To give some examples, Gujarati language takes up 4.4x fewer tokens, Hindi 2.9x fewer tokens, Telugu 3.5x fewer tokens, Urdu 2.5x fewer tokens, Russian 1.7x fewer tokens, and more. Basically, for regional languages, ChatGPT 4o has become even more powerful.

6. ChatGPT 4o Beats All Other AI Models

OpenAI didn’t discuss the benchmark numbers and focused on delivering new experiences. However, ChatGPT 4o’s benchmark numbers overshadow all other AI models from Google, Anthropic, Meta, etc. In fact, it performs better than its own GPT-4 Turbo model which was released a few months back.

chatgpt 4o benchmark performance
Image Courtesy: OpenAI

From MMLU to HumanEval, GPQA, and DROP, ChatGPT 4o outranks both proprietary and open-source models. In the LMSYS arena too, the mysterious im-also-a-good-gpt2-chatbot model (which is actually the ChatGPT 4o model) got an overall ELO score of 1310, much higher than other AI models.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

How to Use ChatGPT 4o Right Now
How to Use ChatGPT 4o Right Now
Author Arjun Sha
View quick summary
You can access and use the GPT-4o model right away without requiring any subscription. The latest model is available on the web, ChatGPT's Android and iOS apps, and the ChatGPT macOS app as well. We have added a link to download the new ChatGPT desktop app for macOS.
Why Spend on AI Gadgets When These AI Apps Can Do It All
Why Spend on AI Gadgets When These AI Apps Can Do It All
Author Anshuman Jain
View quick summary
In this feature, we explore whether we can mimic and get the same features as these new AI gadgets offer on our smartphones using the help of generative AI apps. We discuss all the things you can do, with which app and also how you can do it.
Surprise! Rabbit R1 is Just an App Disguised as AI Hardware
Surprise! Rabbit R1 is Just an App Disguised as AI Hardware
Author Anshuman Jain
View quick summary
Rabbit R1, the AI-powered device that used LAM that could learn and perform actions for you has just been revealed to be an app running on some sort of version of Android. Mishaal Reehman was able to install its APK on his Pixel 6A and it was working fine except for app integration which required system level permissions. However, Rabbit's CEO, Jesse Lyu, stated that Rabbit R1 runs on a bespoke AOSP, not an Android app.
In Today’s AI Race, Don’t Gamble with Your Digital Privacy
In Today’s AI Race, Don’t Gamble with Your Digital Privacy
Author Arjun Sha
View quick summary
As we are moving towards the AI era, things are developing at a breakneck pace. In all of this, we have to be mindful of our privacy and how to protect it. In this article, we look at the privacy policies of popular AI chatbots and how companies handle private conversations. We have also discussed how you can minimize your data footprint and opt out of model training.
Google Gemini to Support Music Streaming Apps Soon
Google Gemini to Support Music Streaming Apps Soon
Author Abubakar Mohammed
View quick summary
There's a hidden feature that has been spotted inside the Gemini Settings page which lets you "Choose your default media provider". It will help users select a streaming service as the default, allowing Gemini to start accepting commands related to music streaming, such as "Play my liked songs".
150+ Best ChatGPT Prompts for All Kinds of Workflow
150+ Best ChatGPT Prompts for All Kinds of Workflow
Author Upanishad Sharma
 & 
Author Ajaay Srinivasan
View quick summary
This comprehensive list of 150+ ChatGPT prompts tackles everything from content creation to project management. You can use these prompts to streamline your workflow, unlock the AI chatbot's hidden functionalities, enhance your creativity, and optimize tasks at hand.
Llama 3 vs GPT-4: Meta Challenges OpenAI on AI Turf
Llama 3 vs GPT-4: Meta Challenges OpenAI on AI Turf
Author Arjun Sha
View quick summary
Meta released its Llama 3 models recently so we have taken the liberty to compare the 70B model with the flagship GPT-4 model. Surprisingly, Llama 3 performs as good as the GPT-4 model despite being a smaller model. In advanced reasoning tests, it demonstrates intelligence and does better than GPT-4 in following user instructions.
