
- Genie 3 is the latest world model by Google DeepMind that can generate dynamic worlds, and users can navigate in real-time.
- It can simulate any environment, beyond gaming environments, at 720p, and it lasts for minutes.
- Genie 3 maintains visual consistency over a long horizon, and its visual memory is extended to one minute.
Google DeepMind has announced Genie 3, a frontier world model that can generate interactive environments. Genie 3 is a general-purpose world model, designed to generate dynamic worlds from text prompts, allowing users to navigate within the simulated world. It’s a major breakthrough by the DeepMind team as world models are a “key stepping stone on the path to AGI.”
Genie 3 can generate interactive environments in real time at 24 FPS with a resolution of 720p while maintaining consistency for a few minutes. Earlier, Genie 2 could only generate environments in 360p that would last for 10 to 20 seconds, and it was only limited to 3D environments. Now, Genie 3 can generate any environment that spans multiple minutes.
Not only that, Genie 3 maintains environmental consistency for several minutes during the simulation. It means that objects and locations remain the same even when users move out, navigate, and come back into view. DeepMind says Genie 3’s visual memory is extended to one minute, which allows the model to reference past visuals up to one minute.
What is surprising is that this environmental consistency has emerged naturally from its training. There are no special methods employed, such as NeRFs and Gaussian Splatting, to make the environment consistent. Genie 3’s world model generates the dynamic world frame-by-frame based on user descriptions and actions, making it far more dynamic and diverse.
Users can also prompt and change the world by text-based instructions. You can change the weather conditions, add new objects and characters, and change the location. In my review of Veo 2, I mentioned that Google’s video generation model has far better visual coherence than other AI models out there. With the latest Veo 3, Google has made it even better, and now, Genie 3 makes the world navigable.
If you are wondering what is the use case of world models like Genie 3, well, it can generate interactive games from simple descriptions. Simply by prompting, users can generate infinite game worlds and fully explore them. Microsoft already showcased its World and Human Action Model (WHAM) called Muse that generated Quake II gameplay sequences using AI.
Apart from that, Genie 3 can be helpful in robotics, where robots can be trained in unlimited simulated scenarios. In fact, Google is already testing its SIMA agent in worlds generated by Genie 3. This will allow AI labs to train robots to achieve goals in the real world.