
- Microsoft has released Phi-4 reasoning AI models, which are trained on 14B and 3.8B parameters.
- Despite their small size, Phi-4 reasoning models rival much larger models like DeepSeek R1 and o3-mini.
- Microsoft says Phi-4 reasoning models can run on Windows Copilot+ PCs, thanks to their small size.
Microsoft has launched three new AI reasoning models including Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These are small language models, designed for edge devices like Windows PCs and mobile devices. The Phi-4-reasoning AI model is trained on 14 billion parameters and can perform complex reasoning tasks.
The Phi-4-reasoning-plus model uses the same base model, but it uses more inference-time compute, nearly 1.5x more tokens than Phi-4-reasoning to deliver higher accuracy. Despite being much smaller in size, Phi-4-reasoning models rival larger models such as DeepSeek R1 671B and o3-mini.
In the GPQA benchmark, Phi-4-reasoning-plus-14B model achieves 69.3% while the o3-mini scores 77.7%. Next, in the AIME 2025 test, Phi-4-reasoning-plus-14B gets 78%, and o3-mini achieves 82.5%. It goes on to show that Microsoft’s small model comes very close to flagship reasoning models, which are much larger in size.
Microsoft says Phi-4 reasoning models are trained via supervised fine-tuning “on carefully curated reasoning demonstrations from OpenAI o3-mini.” Further, Microsoft writes, “The model demonstrates that meticulous data curation and high-quality synthetic datasets allow smaller models to compete with larger counterparts.“
Apart from that, the smaller Phi-4-mini-reasoning model, trained on just 3.8B parameters, outperforms many 7B and 8B models. In benchmarks like AIME 24, MATH 500, and GPQA Diamond, the Phi-4-mini-reasoning-3.8B model delivers competitive scores, nearly matching o1-mini. The Phi-4-mini model has been “fine-tuned with synthetic data generated by Deepseek-R1 model.”
Microsoft’s Phi models are already being locally used on Windows Copilot+ PCs, and they leverage the built-in NPU. It will be interesting to see how the Phi-4 reasoning models improve the on-device AI performance.