- At the Google Cloud Next 2024 event, Google unveiled the Imagen 2 model which can generate live images from text prompts.
- The model can generate short clips of up to four seconds from different camera angles and show motion as well.
- Currently, it's available to business customers on Vertex AI.
At Google Cloud Next 2024, the search giant unveiled new hardware for the cloud and released many products, mostly for enterprise customers. Among them, the Imagen 2 model stands out as it can create short video clips of up to four seconds from text prompts.
It’s still a text-to-image model, and Google is calling it a text-to-live image model. Unlike generated AI videos with static photos and some degree of motion, Imagen 2 can show different camera angles and there is consistency across the scene as well.
That said, the model can only output video clips aka live images at a low resolution of 640 x 360. Google is pitching Imagen 2 to enterprise customers including marketers and creatives who can quickly generate short clips for ads, campaigns, and more.
Apart from that, Google is using its SynthID technique to apply an invisible watermark on AI-generated clips and images. The company says SynthID can withstand edits and even compression. In addition to that, Google has also filtered the image generation model for safety and bias.
It must be noted that Google recently came under fire for refusing to generate images of white-skinned people. After the incident, Google paused image generation for humans, and even after two months, the company has not lifted the restriction on Gemini.
That said, Imagen 2 has been made generally available on Vertex AI for enterprise customers. It now also supports inpainting and outpainting, the ability to edit images using AI, expand the border, or add/remove certain parts of the image. OpenAI also brought image editing to Dall -E generated images recently.
While the Imagen 2 model can generate video clips for up to four seconds, I am not sure how it can compete with other text-to-video generators. Runway offers video generation up to 18 seconds at a much better resolution and OpenAI recently introduced its groundbreaking Sora model. To compete with these models, Google has to come up with a far more powerful diffusion model.