Last month, we reported that Google is developing an AI agent in the form of a browser extension that can perform actions for you in the web browser. And in today’s Gemini 2.0 announcement, Google has finally unveiled Project Mariner, an early prototype that can unlock the future of human-agent interaction.

Powered by the latest Gemini 2.0 model, Project Mariner can understand what it sees on your browser screen and uses that information to perform tasks for you. It can understand web elements like forms, text fields, code, images, and more. The web extension, powered by Project Mariner can type, scroll, and click in the active tab, but for sensitive tasks like purchasing something, it requires final confirmation from the user.

Google says the early prototype is currently slow and not always accurate, but it will rapidly improve over time. In a demo that Google showcased, Project Mariner can remember company names from a Google Sheet, browse the web, find the websites of companies, and extract the contact details.

In the WebVoyager benchmark that tests the agentic capability of models on real-world web tasks, Project Mariner achieved 83.5% which is the highest score to date. Google says it’s working with trusted testers to improve Project Mariner, but there is no information on its release date.

As for Project Astra which was announced at Google I/O 2024, Google says it can now understand multiple languages and use tools like Google Search, Maps, and Lens to deliver a better experience. Project Astra is also getting better at remembering things. It can now remember 10 minutes of in-session memory for improved personalization. Google has significantly reduced the latency too.

Project Astra’s release date is unknown, but Google says its capabilities will be integrated into the Gemini app, and other form factors like glasses.

Apart from that, Google also announced that it’s working with game developers to explore how its AI agents behave in games like Clash of Clans and Hay Day. Google’s Gemini 2.0-powered AI agents can see the screen and offer suggestions in real-time. These AI agents can also use Google Search and offer gaming knowledge on the go.

Finally, Google introduced Jules, an AI code agent for developers that integrates directly into a GitHub workflow. It can find issues, develop a plan, and execute it under the developer’s supervision. You can find more details about Jules from here.