The age of artificial intelligence is here, and Generative AI is playing a pivotal role in bringing unprecedented advancements to everyday technology. There already are several free AI tools that can assist you in generating incredible images, texts, music, videos, and a lot more within a few seconds. Adobe’s AI Generative Fill in Photoshop and Midjourney’s amazing capabilities have indeed startled us. But, what exactly is Generative AI and how is it fueling such rapid innovation? To learn more, follow our detailed explainer on Generative AI.
Definition: What is Generative AI?
As the name suggests, Generative AI means a type of AI technology that can generate new content based on the data it has been trained on. It can generate texts, images, audio, videos, and synthetic data. Generative AI can produce a wide range of outputs based on user input or what we call “prompts“. Generative AI is basically a subfield of machine learning that can create new data from a given dataset.
If the model has been trained on large volumes of text, it can produce new combinations of natural-sounding texts. The larger the data, the better will be the output. If the dataset has been cleaned prior to training, you are likely to get a nuanced response.
Similarly, if you have trained a model with a large corpus of images with image tagging, captions, and lots of visual examples, the AI model can learn from these examples and perform image classification and generation. This sophisticated system of AI programmed to learn from examples is called a neural network.
That said, there are different kinds of Generative AI models. These are Generative Adversarial Networks (GAN), Variational Autoencoder (VAE), Generative Pretrained Transformers (GPT), Autoregressive models, and much more. We are going to briefly discuss these generative models below.
At present, GPT models have gotten popular after the release of GPT-4/3.5 (ChatGPT), PaLM 2 (Google Bard), GPT-3 (DALL – E), LLaMA (Meta), Stable Diffusion, and others. All of these user-friendly AI interfaces are built on the Transformer architecture. So in this explainer, we are going to mainly focus on Generative AI and GPT (Generative Pretrained Transformer).
What Are the Different Types of Generative AI Models?
Amongst all the Generative AI models, GPT is favored by many, but let’s start with GAN (Generative Adversarial Network). In this architecture, two parallel networks are trained, of which one is used to generate content (called generator) and the other one evaluates the generated content (called discriminator).
Basically, the aim is to pit two neural networks against each other to produce results that mirror real data. GAN-based models have been mostly used for image-generation tasks.
Next up, we have the Variational Autoencoder (VAE), which involves the process of encoding, learning, decoding, and generating content. For example, if you have an image of a dog, it describes the scene like color, size, ears, and more, and then learns what kind of characteristics a dog has. After that, it recreates a rough image using key points giving a simplified image. Finally, it generates the final image after adding more variety and nuances.
Moving to Autoregressive models, it’s close to the Transformer model but lacks self-attention. It’s mostly used for generating texts by producing a sequence and then predicting the next part based on the sequences it has generated so far. Next, we have Normalizing Flows and Energy-based Models as well. But finally, we are going to talk about the popular Transformer-based models in detail below.
What Is a Generative Pretrained Transformer (GPT) Model
Before the Transformer architecture arrived, Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) like GANs, and VAEs were extensively used for Generative AI. In 2017, researchers working at Google released a seminal paper “Attention is all you need” (Vaswani, Uszkoreit, et al., 2017) to advance the field of Generative AI and make something like a large language model (LLM).
Google subsequently released the BERT model (Bidirectional Encoder Representations from Transformers) in 2018 implementing the Transformer architecture. At the same time, OpenAI released its first GPT-1 model based on the Transformer architecture.
So what was the key ingredient in the Transformer architecture that made it a favorite for Generative AI? As the paper is rightly titled, it introduced self-attention, which was missing in earlier neural network architectures. What this means is that it basically predicts the next word in a sentence using a method called Transformer. It pays close attention to neighboring words to understand the context and establish a relationship between words.
Through this process, the Transformer develops a reasonable understanding of the language and uses this knowledge to predict the next word reliably. This whole process is called the Attention mechanism. That said, keep in mind that LLMs are contemptuously called Stochastic Parrots (Bender, Gebru, et al., 2021) because the model is simply mimicking random words based on probabilistic decisions and patterns it has learned. It does not determine the next word based on logic and does not have any genuine understanding of the text.
Coming to the “pretrained” term in GPT, it means that the model has already been trained on a massive amount of text data before even applying the attention mechanism. By pre-training the data, it learns what a sentence structure is, patterns, facts, phrases, etc. It allows the model to get a good understanding of how language syntax works.
How Google and OpenAI Approach Generative AI?
Both Google and OpenAI are using Transformer-based models in Google Bard and ChatGPT, respectively. However, there are some key differences in the approach. Google’s latest PaLM 2 model uses a bidirectional encoder (self-attention mechanism and a feed-forward neural network), which means it weighs in all surrounding words. It essentially tries to understand the context of the sentence and then generates all words at once. Google’s approach is to essentially predict the missing words in a given context.
In contrast, OpenAI’s ChatGPT leverages the Transformer architecture to predict the next word in a sequence – from left to right. It’s a unidirectional model designed to generate coherent sentences. It continues the prediction until it has generated a complete sentence or a paragraph. Perhaps, that’s the reason Google Bard is able to generate texts much faster than ChatGPT. Nevertheless, both models rely on the Transformer architecture at their core to offer Generative AI frontends.
Applications of Generative AI
We all know that Generative AI has a huge application not just for text, but also for images, videos, audio generation, and much more. AI chatbots like ChatGPT, Google Bard, Bing Chat, etc. leverage Generative AI. It can also be used for autocomplete, text summarization, virtual assistant, translation, etc. To generate music, we have seen examples like Google MusicLM and recently Meta released MusicGen for music generation.
Apart from that, from DALL-E 2 to Stable Diffusion, all use Generative AI to create realistic images from text descriptions. In video generation too, Runway’s Gen-1, StyleGAN 2, and BigGAN models rely on Generative Adversarial Networks to generate lifelike videos. Further, Generative AI has applications in 3D model generations and some of the popular models are DeepFashion and ShapeNet.
Not just that, Generative AI can be of huge help in drug discovery too. It can design novel drugs for a specific disease. We have already seen drug discovery models like AlphaFold, developed by Google DeepMind. Finally, Generative AI can be used for predictive modeling to forecast future events in finance and weather.
Limitations of Generative AI
While Generative AI has immense capabilities, it’s not without any failings. First off, it requires a large corpus of data to train a model. For many small startups, high-quality data might not be readily available. We have already seen companies such as Reddit, Stack Overflow, and Twitter closing access to their data or charging high fees for the access. Recently, The Internet Archive reported that its website had become inaccessible for an hour because some AI startup started hammering its website for training data.
Apart from that, Generative AI models have also been heavily criticized for lack of control and bias. AI models trained on skewed data from the internet can overrepresent a section of the community. We have seen how AI photo generators mostly render images in lighter skin tones. Then, there is a huge issue of deepfake video and image generation using Generative AI models. As earlier stated, Generative AI models do not understand the meaning or impact of their words and usually mimic output based on the data it has been trained on.
It’s highly likely that despite best efforts and alignment, misinformation, deepfake generation, jailbreaking, and sophisticated phishing attempts using its persuasive natural language capability, companies will have a hard time taming Generative AI’s limitations.