OpenAI GPT-4: Multimodal, New Features, Image Input, How to Use & More

OpenAI made tremendous waves when it launched GPT-4 last night. The next-generation AI language model is a noticeable improvement from its predecessor and is capable of so much more. If you know a thing or two about ChatGPT and its alternatives, you are already aware of what this spells for chatbots and artificial intelligence in general. However, for those unaware of language models or GPT-4 in particular, we have your back. We have scoured OpenAI’s blogs and the Internet and curated a dedicated guide on GPT-4. So if you are someone with little to no clue about it, get a cup of coffee and sit down as we tell you all about this AI model.

Table of Contents

What is GPT-4?

Put simply, GPT-4 is OpenAI’s latest iteration in the company’s large language model systems (LLM). Language model systems, in general, are systems that try to predict the next word in a sentence and intelligently add their inputs to it. They do that by studying a large dataset that gives them the ability to identify patterns and act upon them.

GPT-4 is the newest model in this series and is expected to be a big improvement over previous-gen models like GPT 3 and 3.5. There are some specific things GPT-4 is better at, which we will discuss in-depth below. However, a simple point you should understand is that this new model will empower chatbots like ChatGPT and MS Bing to be much more capable in their responses. So you can expect them to give better answers, design more creatively and perform differently with older and newer ChatGPT prompts.

GPT-4 Is Multimodal

If you’ve used the previous GPT models, you might be aware of its limited ability to just interpret the text you input. However, one of the newest and biggest implementations in the new model is that it is multimodal. This means that GPT-4 is able to accept prompts of both text and images.

This translates to the AI not only receiving the image but actually interpreting and understanding it. This understanding will apply to prompts interspersed with both text and vision inputs. Furthermore, GPT-4’s multimodal capability will spread across all sizes and types of images and text, including documents with text and photographs, diagrams (sketched or hand-drawn), or screenshots. GPT-4’s output will remain as capable as it would in just text-only inputs.

In a developer livestream organized by OpenAI, the company showcased GPT-4’s multimodal nature. GPT-4 was provided a screenshot of a Discord window in the stream and was asked to describe it in painstaking detail.

The model took a little over a minute and rendered an extremely descriptive and accurate response. The response captured almost every single element of the input screen. From the server name on the top left corner to the different voice channels and even naming all of the Discord members online in the right pane, GPT-4 captured everything.

GPT-4 was put through some more tests where people submitted several random artworks, including photos of a squirrel holding a camera. The model was then asked to identify “what was funny about this image.” It again churned out a response stating that the photo was funny because squirrels typically eat nuts and do not act like humans. As seen, it again provided a very specific answer like a human would.

However, as mentioned above, the model’s specificity goes beyond screenshots and onto text and image inputs of all types. OpenAI showcased the same when Greg captured a photo of a hand-drawn mockup of a joke website. He then uploaded the same to GPT-4’s API-connected Discord server. The model was then asked to ‘write brief HTML/JS code to turn the page into a website and replace the jokes with actual ones.’

Amazingly, GPT-4 compiled working code for the same. Upon testing it, it produced a fully working website wherein pressing the buttons revealed the jokes. The fact that the ChatGPT model could decipher human handwriting and create code out of a combination of text and image inputs is mindblowing. GPT-4’s multimodal capability is a huge step in the direction of AI fully understanding the prompts and delivering results with pitch-perfect accuracy.

While there weren’t any considerable hitches, OpenAI did claim that speed is something GPT-4 could use work with and it might take time. Furthermore, visual inputs for GPT-4 are still in research preview and are still not publicly available.

How Is GPT-4 Better than GPT 3.5/ GPT-3?

Besides its breathtaking multimodal approach, GPT-4 comes has other areas of improvement where the new model not only outperforms its older brothers. Some of these areas are:

1. Better Understanding Nuanced Prompts

OpenAI claims that it might be difficult to actually see the difference between GPT-4 and GPT-3.5 at first glance. However, the former’s capabilities come to light when you go into the nitty-gritty. To demonstrate the difference, the new model was pitted against GPT-3.5 in a variety of human-level exams. OpenAI used the most recent publicly available tests and gave the models no specific training for this.

The data itself paints a better picture than we could tell you. In all results, GPT-4 came out on top and scored above its former version. While the threshold was barely pushed in some exams (such as SAT EBRW), there was a tremendous leap in performance in other exams (Uniform bar exam, AP Chemistry, and more). OpenAI stated, “GPT-4 is also more reliable, creative, and generally able to handle more nuanced instructions when compared to GPT-3.5.” This translates to the bot effectively understanding more complex prompts easily.

2. Exponentially Larger Word Limit

While everyone did love GPT 3 and GPT 3.5, people did wish it could recognize even longer inputs. The introduction of GPT-4 has solved that problem. The new GPT-4 AI language model comes with an astounding 25,000-word input limit, which is significantly large. For context, GPT 3.5 was limited to 8,000 words.

This means users will be able to feed the bot with much longer input prompts for it to read and then render outputs from. So when GPT-4 finally launches, you can expect to give a much more detailed response and take longer inputs without problems. What this means for the developers out there is that you will be able to feel new APIs and documentation to the chatbot and get help writing code or fixing bugs in existing code more easily.

3. Supports more languages

ChatGPT has predominantly been used by English speakers around the globe. However, GPT-4 takes other languages into consideration. The newest model has demonstrated support for over 26 different languages. This includes the likes of Ukranian, Korean, Germanic languages, and many more.

OpenAI tested the same by translating the MMLU benchmarks into a variety of languages. Out of 26 languages, GPT-4 outperforms the English-language performance of GPT-3.5 in 24 of them. However, there is more data training to be done before GPT-4 fully supports all languages fully.

4. Different Personalities

Steerability is a concept wherein you can instruct the AI to act a certain way with a fixed tone of speech. A good example of this is asking ChatGPT to act like a cowboy or a police officer (assigning it a role like we did while making our chatbot using ChatGPT API). GPT-4 gets that steerability but OpenAI now makes it harder for the AI to break character. Developers can now fix their AI’s style from the get-go by describing those directions in the “system” message. Since these messages are easy to jailbreak, OpenAI is also working on making them more secure.

From the demos OpenAI showcased in its blog post, it was quite funny to see the user trying to get GPT-4 to stop being a Socratic tutor and just tell them the answer to their query. However, as it was programmed to be a tutor, GPT-4 refused to break character, which is the same as many developers can expect when they train their bots in the future.

Possible Applications of GPT-4?

While GPT-4’s multimodal application is not possible by users yet, OpenAI has already teamed with Be My Eyes, an app meant for the visually impaired. GPT-4’s inclusion in the app enables you to take a picture of what you’re looking at and the AI will describe exactly what’s on the screen, including dresses, plants, machines in a gym, reading maps, and much more.

GPT-4 has also partnered up with other apps like Duolingo, Khan Academy for intelligent learning, and even the government of Iceland for language preservation. While GPT-4’s API is currently available on a waitlist basis, we can expect developers to come out with amazing experiences once it is finally released. Even before that happens, the applications above are already live for people to use.

Does GPT-4 Have Any Limitations?

Even though GPT-4 is being heralded as the next step in artificial intelligence, it still has its speedbumps.

For starters, GPT-4 lacks knowledge of any world knowledge of events that have occurred after September 2021. The model also doesn’t learn from its experience. This can lead to the GPT-4 model making reasoning errors and is even prone to accepting obvious false statements from a user.

GPT-4 can also fail at problems just like humans would. Like GPT 3.5, the new model can also hallucinate and be confidently wrong in its predictions. This might lead to GPT-4 not double-checking its work when sometimes making a mistake.

Despite that, however, OpenAI promises that GPT-4 has been better trained than the previous models to avoid this. In the company’s own internal adversarial factuality evaluations, the model scored 35% higher compared to GPT 3.5 in reducing hallucinations. While the model’s perceptions and predictions have improved, its results should still be taken in conjunction with human advice.

OpenAI Evals – Make GPT-4 better together

OpenAI uses its own software framework to create and run benchmarks for models like GPT-4. However, it’s open source so the company has shared a few templates that have been most used. OpenAI has started that evals will be an integral part of crowdsourcing benchmarks that can be used to ensure GPT-4 is better trained and performs even better.

As such, the company has invited everyone (yeah, every GPT-4 user) to test its models against benchmarks and submit their examples. You can find more information regarding the same on OpenAI’s GPT-4 research page.

How to Get Access to GPT-4 Right Now

Unfortunately, GPT-4 isn’t out for everyone just yet. OpenAI has currently released the new model only to ChatGPT Plus subscribers with a usage cap. The Plus subscribers are also getting limited token access to two different versions of GPT-4. While some users can use GPT-4s 32K engine, which gives them a longer word limit, others are restricted to GPT-4 8K with a limited capacity. OpenAI has stated this will dynamically be adjusted based on demand. ChatGPT Plus subscribers who want access right now should check out our guide on how to get access to GPT-4 here.

If you can’t be bothered to get ChatGPT Plus, then you will be pleased to know that Microsoft Bing is already using GPT-4. While you won’t be able to play around with the language model as on OpenAI, it’s still pretty good to experiment and try out different types of things. Check out how to use MS Bing on any web browser to get started.

We hope you are leaving this explainer with more information about GPT-4 than before you came to it. GPT-4 is a model that is chock full of opportunity and spells a lot of excitement for everyone. Once fully implemented into ChatGPT for everyone, it will be interesting to see how everyone makes full use of the new model to create experiences. However, you don’t have to wait for it to experience ChatGPT. Check out all the cool things you can do in ChatGPT and then integrate ChatGPT with Siri and even get ChatGPT on your Apple Watch! So what do you think about this exciting new model? Drop your thoughts in the comments below!

Is GPT-4 coming to ChatGPT?

GPT-4 is indeed already coming to ChatGPT. As mentioned above, the new model is already live for ChatGPT Plus subscribers. If you are signed in, all you need to do is select the correct model and begin chatting. You can also follow our link above on how to go about getting ChatGPT Plus if you haven’t already.

Will GPT-4 Be Free to Use?

As of now, GPT-4 is unfortunately not free to use. It requires a ChatGPT Plus subscription, which costs $20 per month. But, OpenAI has stated that it hopes to offer free GPT-4 queries to everyone at some point. The company could also introduce a new subscription tier to deliver improved access to new AI language models like GPT-4.

Can I Fully Rely on GPT-4?

No, you cannot rely on GPT-4 as a full measure. The new model still suffers from some limitations, including an old dataset and occasional hallucinations. The model can also confidently give wrong answers, which while not ill-intended, can still be malicious. While GPT-4 does indeed have improved compared to GPT 3.5, it still has its share of problems. So if you end up using the newest model, use proper human judgment alongside it.

What is GPT-4’s Dataset size?

While a lot of rumors were going around regarding GPT-4 having 100 trillion parameters as opposed to GPT-3’s 175 billion, that is most likely false. In an interview with StriclyVC, OpenAI CEO Sam Altman stated indirectly stated that won’t be the case and the “GPT-4 rumor mill is a ridiculous thing.”
Perhaps for this reason, OpenAI has tempered expectations when it comes to GPT-4’s dataset size and has not provided an exact number. Time will tell if it is actually revealed. Nonetheless, we believe it should perform quite well given in its initial demonstration.

How has GPT-4 been trained?

Like its previous language models, GPT-4’s base model has been trained to predict the next word in a document. As such the data used is a combination of publicly available data and OpenAI’s own licensed data.
This data contains a mixture of correct and incorrect information, weak and strong reasoning, self-contradictory statements, and various other ideas. This gives GPT-4 a wide level of data to parse from and recognize what’s being asked of it.

Upanishad Sharma

Combining his love for Literature and Tech, Upanishad dived into the world of technology journalism with fire. Now he writes about anything and everything while keeping a keen eye on his first love of gaming. Often found chronically walking around the office.

Comments 0