While powerful AI chatbots like ChatGPT and Google Bard are powered by large language models, image and video synthesis using AI are built on Diffusion and GAN models. All of these are part of the popular Generative AI experience. And on this article, we take a closer look at the best AI video generators. So far, only a few text-to-video AI models have been released online, but which ones are good and usable? To find out, let’s go ahead and check out the list of best AI video generators in 2023.
1. Runway Gen-2
The best AI video generator that you can use right now is Runway Gen-2. Earlier, Runway had introduced video-to-video generation with Gen-1, and now with the Gen-2 model, you can generate video using text prompts from scratch. Similar to Midjourney prompts, you can describe the scene, camera angles, etc., and it produces incredible results. I tried some prompts on Runway, and it did a decent job.
The best part is that you can add an image to your prompt, and Runway can use the image in the video. That’s pretty cool, right? Coming to availability, well, it’s almost free to use. You can generate up to 4 seconds of videos in 720p resolution and you can create almost 10 free videos.
If you choose to get the paid plan ($12 per month), you can export the videos in 4K, however, the 4-sec duration will remain the same. So if you want to try the best text-to-video AI tool, check out Runway Gen-2.
Check out Runway Gen-2 (Free, Paid plan starts at $12 per month)
ModelScope is a text-to-video model funded by Alibaba’s DAMO Vision Intelligence Lab, and it has gotten pretty good over time. It’s built on the Diffusion model and trained on 1.7 billion parameters. Currently, it only supports English input and can generate videos that match the text input.
Thankfully, the project is available on Hugging Face, so you can use it to generate AI videos. But keep in mind, it can only generate a 2-second video, and there is a “Shutterstock” watermark on the video. I tried the model and it seemed like a work in progress.
Check out ModelScope (Free)
Zeroscope is another text-to-video model derived from ModelScope. It’s capable of creating high-quality AI videos in 1024 x 576 resolution. The model has been trained on the original weight from ModeScope in addition to 9,923 clips and 29,769 tagged frames at 24 frames (1024 x 576 resolution). As a result, it creates slightly better output than ModelScope.
There are two models of Zeroscope: zeroscope_v2_576w and zeroscope_v2_XL. The zeroscope_v2_576w model is used for generating the video and zeroscope_v2_XL is used to upscale the generated content at a higher resolution. You can check out the demo for this cool AI video generator on Hugging Face.
Check out Zeroscope (Free)
VideoCrafter is an AI toolkit to create video from text prompts, and it has been developed by Tencent. Unlike other AI video generation models, it can create videos of up to 8 seconds and supports different resolutions as well.
There are three different ways to use VideoCrafter. You can use text-to-video generation, personalized AI video generation using LoRA, and controllable video generation. All three modes let you create AI videos from scratch. You can run VideoCrafter locally on your machine if you have a powerful GPU with at least 7GB VRAM. However, there is a Hugging Face demo available online, which you can try below.
Check out VideoCrafter (Free)
Synthesia is an AI tool that you can use to create professional AI videos within a few minutes. You can use it to create tutorials, video documentation, presentation, sales pitch, and so much more. In that sense, it is not an AI video generator that can use your text prompt to create something from scratch. On Synthesia, you can choose from more than 140 diverse AI avatars and turn any text into speech in over 120 languages.
Basically, you don’t have to build a studio and buy expensive hardware to create professional videos. With Synthesia’s AI character and built-in text-to-speech tool, you can quickly start creating content. All you have to do is enter the video script.
Check out Synthesia (One free video, Paid plan starts at $22.50 per month)
Kaiber is not an AI video generator per se, but it can generate animations of subjects in different art forms. You can enter a text prompt, upload your own image, or upload a song, and it can take everything and use its advanced AI generation engine to create captivating animation. You can also upload your videos and transform them in various styles and aesthetics.
The app is not entirely free, though. You get a 7-day free trial, but for that, you will have to add your card details and subscribe to its $5 subscription plan. Simply put, Kaiber is an AI tool that you should try out to generate elevating visuals of your images and videos.
Check out Kaiber (7-day Free trial, Paid plan starts at $5 per month)
7. Wonder Studio
Wonder Studio is not an AI video generation tool for general consumers, but it’s targeted at filmmakers and content creators. It allows you to automatically animate a computer-generated character into a live-action scene without having to apply VFX manually. Basically, it can automate 80 – 90% of the VFX and 3D work, and it works well. No need to use complex 3D software or use expensive hardware.
Wonder Studio can automatically detect the actor in the scenes and apply the CG character frame by frame without heavy VFX work. So if you are a budding filmmaker who needs to get a lot taxing VFX work done quick, you should take a look at Wonder Studio.
Check out Wonder Studio (Request Access)
8. Google Imagen Video and Phenaki
Google has not released its text-to-image model to the public, but it has announced the models that the company is working on. The search giant is working on Imagen Video based on Cascaded Diffusion models. It can generate high-definition videos in 1280 x 768 resolution at 24 fps.
Google is also working on Phenaki, a text-to-video model that can synthesize realistic videos from text prompts. Both models are under development, and we don’t when a working AI video generator at our hands. However, you can read the research papers from the links below.
9. Meta’s Make-A-Video
Apart from that, Meta has announced its Make-A-Video AI tool that can generate videos from texts. You can create realistic, surreal, and personalized videos using text, images, or video input. Meta’s model is capable of creating motion videos from a single image. You can also add multiple images as input, and it can fill in motion to create dreamy videos.
According to Meta’s research paper, its video generation model has a 3x better representation of text input and better efficiency than other models. The project is again not open to the public, but you can sign up and request access from Meta.
10. Nvidia’s Latent Diffusion Model
Finally, Nvidia has announced its high-fidelity Video Latent Diffusion model that can generate efficient high-resolution videos using text prompts. It can generate videos at 1280 x 2048 resolution at 24 fps, which is perfect. Most of its videos have a length of 5 seconds, but it can also generate longer 5-minute videos at 512 x 1024 resolution. You can also add image inputs and create personalized AI videos.
In the video synthesis space, I think Nvidia will emerge as one of the key players in the future. Meanwhile, Nvidia has showcased multiple video demos on its website which you can check out below.