Microsoft Releases a Small Phi-3 Vision Multimodal Model

microsoft open sources phi-3 vision multimodal model

Image Courtesy: Microsoft

In Short

At Microsoft Build 2024, the company released an open-source multimodal model, Phi-3 Vision.
It has a context length of 128K tokens and Phi-3 Vision is trained on 4.2B parameters.
Microsoft also released Phi-3 Small (7B) and Phi-3 Medium (14B) models at the event.

Earlier in April, Microsoft released its first AI model under the open-source Phi-3 family: Phi-3 Mini. And now, after almost a month, the Redmond giant has released a small multimodal model called Phi-3 Vision. At the Build 2024, Microsoft also unveiled two more Phi-3 family models including Phi-3 Small (7B) and Phi-3 Medium (14B). All of these models are open-source under the MIT license.

As for the Phi-3 Vision model, it’s trained on 4.2 billion parameters. It means that the model is fairly lightweight. This is the first time a mega-corporation like Microsoft has open-sourced a multimodal model. It has a context length of 128K and you can feed images as well. Google did release the PaliGemma model, but it’s not meant for conversational use.

Here Is Why I’m Excited for Microsoft Build 2024

Arjun Sha May 18, 2024

ChatGPT 4o vs Gemini 1.5 Pro: It’s Not Even Close

Arjun Sha May 15, 2024

Apart from that, Microsoft says that the Phi-3 Vision model was trained on publicly available, high-quality educational and code data. Microsoft has also generated synthetic data for math, reasoning, general knowledge, charts, tables, diagrams, and slides.

phi-3 vision model benchmark — Image Courtesy: Microsoft

Despite its small size, the Phi-3 Vision model performs better than Claude 3 Haiku, LlaVa, and Gemini 1.0 Pro on many multimodal benchmarks. It even comes pretty close to OpenAI’s GPT-4V model. Microsoft says that developers can use the Phi-3 Vision model for OCR, chart and table understanding, general image understanding, and more.

If you want to check out the Phi-3 Vision model, head over to Azure AI Studio (visit).

#Tags

#Misc

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Comments 0