Meta Releases Llama 3.2 Models with Vision Capability For the First Time

llama 3.2 models released by meta
Image Courtesy: Meta
In Short
  • Meta has released Llama 3.2 family models, which include smaller text-only models Llama 3.2 1B and 3B for on-device tasks on phones and laptops.
  • The other two models are Llama 3.2 11B and 90B, which bring multimodality with vision capability and can analyze images as well.
  • You can start using Llama 3.2 11B and 90B vision models through the Meta AI chatbot on the web, WhatsApp, Facebook, Instagram, and Messenger.

At the Meta Connect 2024 event, Mark Zuckerberg announced the new Llama 3.2 family of models to take on OpenAI’s o1 and o1 mini models. Moreover, for the first time, the Llama 3.2 models come with multimodal image support.

Llama 3.2 Models Are Optimized for On-Device Tasks

First of all, Llama 3.2 has two smaller models, which include Llama 3.2 1B and 3B for on-device tasks. Meta says these small models are optimized to work on mobile devices and laptops.

Llama 3.2 1B and 3B models are best suited for on-device summarization, instruction following, rewriting, and even function calling to create an action intent locally. Meta also claims that its latest Llama models outperform Google’s Gemma 2 2.6B and Microsoft’s Phi-3.5-mini.

Llama 3.2 stack distribution

Basically, developers can deploy these models on Qualcomm and MediaTek platforms to power many AI use cases. Meta further says Llama 3.2 1B and 3B models are pruned and distilled from the larger Llama 3.1 8B and 70B models.

Llama 3.2 Models Give Vision to Meta AI

Now coming to the exciting vision models, they come in larger sizes — Llama 3.2 11B and Llama 3.2 90B. They replace the older text-only Llama 3.1 8B and 70B models. Meta goes on to say that Llama 3.2 11B and 90B models rival closed models like Anthropic’s Claude 3 Haiku and OpenAI’s GPT-4o mini in visual reasoning.

These new Llama 3.2 11B and 90B vision models will be available through the Meta AI chatbot on the web, WhatsApp, Instagram, Facebook, and Messenger. Since these are vision models, you can upload images and ask questions about them. For example — you can upload an image of a recipe, and it can analyze and give you instructions on how to make it. you can have Meta AI capture your face and reimagine yourself in tons of different scenarios and portraits.

The vision models also come in handy while understanding charts and graphs. On social media apps like Instagram and WhatsApp, the vision models can also generate captions for you.

Overall, Meta has released multimodal models for the first time under an open-source license. It is going to be pretty exciting to test the vision models against the competition.

#Tags
comment Comments 0
Leave a Reply

Loading comments...