6 Cool Things ChatGPT 4o Can Do That OpenAI Didn’t Highlight

OpenAI recently released its next flagship model GPT-4o and demonstrated some cool demos. The human-like voice chat has become the headline feature, but there is more to it. OpenAI didn’t highlight many cool things that ChatGPT 4o is capable of. These details are available on OpenAI’s page and I went through all of them. On that note, let’s find out the cool new capabilities of ChatGPT 4o.

1. Accurate Text Generation in Images

We know that Diffusion models struggle with generating texts on images. Dall -E 3 still fails to generate images with the given text. However, the ChatGPT 4o model which is an end-to-end multimodal model, can render texts accurately. OpenAI didn’t mention this in the presentation. However, you can find the example on OpenAI’s page where the company explores its capabilities.

gpt-4o text rendering capability in image generation
Image Courtesy: OpenAI

It can generate and add text to images effortlessly. The consistency in many samples is remarkable. You can also attach images and ask it to generate images from different angles of the same character, and it maintains consistency across all scenarios. It can also generate a 3D view of objects which you can combine to create a 3D render. Not to mention, it can generate fonts too.

  • gpt-4o image generation consistency
  • gpt-4o image generation consistency 2
  • gpt-4o image generation consistency 3

Keep in mind that these capabilities are not available on ChatGPT yet. It still uses Dall -E 3 to generate images. OpenAI may unlock these features in the near future.

2. GPT-4o Can Process Videos Too

chatgpt 4o video processing
Image Processing: OpenAI

OpenAI didn’t mention that GPT-4o can handle videos too. Well, on the model page, OpenAI has demonstrated that you can upload a video and ask GPT-4o to summarize it. From transcription to bullet-point summary, it does everything. So it seems Gemini 1.5 Pro is not the only model that can process videos.

3. GPT-4o Can Be Your Tutor

In a presentation with Khan Academy’s Sal Khan, OpenAI showcased a fascinating demo using the GPT-4o model. Basically, on an iPad, you can share your screen with ChatGPT 4o, and it can see everything on your screen.

You can now ask it to explain and help you find solutions to a problem. Be it mathematics, sciences, charts, maps, or anything else, ChatGPT 4o will be your personal teacher guiding you throughout your study session. That’s such a great application of AI, powered by GPT-4o’s multimodal vision capability. By the way, it also works with the ChatGPT desktop app for macOS.

4. ChatGPT 4o Can Be Your Meeting Companion

In one of the demos, OpenAI showcased that you can have ChatGPT 4o as your live companion during meetings. You can share the screen with ChatGPT 4o, and it can see and hear all the participants. It can also give inputs and participants can also ask questions to the GPT-4o model. It replies spontaneously and stays engaged in the conversation. At the end, you can ask it to summarize the meeting as well. How cool is that?

5. Improved Non-English Language Performance

OpenAI has not just improved the performance of GPT-4o in the English language but also improved performance in regional languages. It has significantly improved the tokenizer that allows the model to compress non-English languages to fit more tokens.

gpt-4o language tokenization improvement
Image Courtesy: OpenAI

To give some examples, Gujarati language takes up 4.4x fewer tokens, Hindi 2.9x fewer tokens, Telugu 3.5x fewer tokens, Urdu 2.5x fewer tokens, Russian 1.7x fewer tokens, and more. Basically, for regional languages, ChatGPT 4o has become even more powerful.

6. ChatGPT 4o Beats All Other AI Models

OpenAI didn’t discuss the benchmark numbers and focused on delivering new experiences. However, ChatGPT 4o’s benchmark numbers overshadow all other AI models from Google, Anthropic, Meta, etc. In fact, it performs better than its own GPT-4 Turbo model which was released a few months back.

chatgpt 4o benchmark performance
Image Courtesy: OpenAI

From MMLU to HumanEval, GPQA, and DROP, ChatGPT 4o outranks both proprietary and open-source models. In the LMSYS arena too, the mysterious im-also-a-good-gpt2-chatbot model (which is actually the ChatGPT 4o model) got an overall ELO score of 1310, much higher than other AI models.

Comments 0
Leave a Reply

Loading comments...