- An NPU is a dedicated hardware accelerator designed to perform AI operations much more efficiently and faster than CPUs and GPUs.
- NPU cores are specifically designed to perform matrix multiplication operations at a massive scale and unlock parallel operations.
- NPUs are being utilized for many AI tasks on Windows, macOS, iOS, iPadOS, and Android. This frees up the CPU and GPU and saves battery life.
After CPU and GPU, NPU is the new rage and every company out there is harnessing the power of NPU to deliver Generative AI features and experiences. Copilot+ PCs come with Qualcomm’s Snapdragon X series processors that pack a powerful NPU, delivering up to 45 TOPS (trillion operations per second).
Apple’s latest Neural Engine aka NPU can deliver up to 38 TOPS. Intel and AMD are also well poised to release their next-gen NPUs for Lunar Lake (48 TOPS) and Strix Point (50 TOPS) platforms. With so much hype around NPUs, it begs the question, what is an NPU and what exactly does it do? To answer all your questions, here is an explainer on NPUs.
What is an NPU?
NPU stands for Neural Processing Unit, and it’s specifically designed to perform AI-related tasks. When I say AI-related tasks, it means that NPU can process neural networks, machine learning tasks, and AI workloads.
There are certain types of mathematical calculations done for AI tasks, and the most common is matrix multiplication (also called ‘matmul’). NPUs are designed to perform these matrix multiplication operations superfast.
In addition, for any AI task, parallel processing is of utmost importance, where neural networks simultaneously process many operations. So, NPUs have specialized accelerators that unlock parallelism at a large scale. Coupled with high bandwidth memory, NPUs can quickly perform parallel matmul operations across several cores.
So to sum up, NPUs are specifically designed for AI tasks where the focus is to unlock parallelism, execute matmul operations extremely fast, and enable scalability. Keep in mind, different companies call NPUs differently. Google calls it TPU (Tensor Processing Unit) and Apple calls it Neural Engine.
How Does NPU Differ From CPU and GPU?
As I said above, NPUs specifically cater to AI-related tasks, which means it’s an application-specific processing unit. Whereas, the CPU is a general-purpose processing unit that should handle a wide variety of tasks.
For example, CPUs can handle the operating system operations and general applications. Its versatility is that it can handle anything you throw at it. CPUs are very good at single-threaded tasks, but it’s very inefficient at parallel tasks.
Next, the GPU is purpose-built for rendering graphics, which means it’s very good at powering games and creating simulations. GPUs are closest to NPUs as they can also perform tasks parallelly. That’s why GPUs can also execute AI-related tasks, and it’s widely used for training AI models. That said, since NPUs are designed only for AI-related operations, it does better in speed and efficiency.
Keep in mind, in the early days of computing when there were no GPU or NPU, CPU entirely handled graphics using software rendering. With the advancement of technology in the 1990s, GPUs came in to handle graphics through dedicated hardware. And now, we are witnessing the age of NPUs.
All these compute units are now developed for specialized tasks so that the CPU is not stressed for all kinds of tasks. This leads to better efficiency and performance. While NPUs are getting popular, bear in mind that GPUs are still extensively used for training AI models. And NPUs are now widely used for inferencing. That said, Google has trained its Gemini model entirely on its TPU.
What are the Applications of NPU in Laptops?
NPUs or specialized AI hardware accelerators were initially used by large companies for parallel processing. However, consumer products like laptops and smartphones now have NPUs. For example, Microsoft’s new Copilot+ PCs come with a powerful NPU that can power features like Recall, which has been delayed for now but is coming soon in the next few months.
Recall takes screenshots of the screen, processes the data on the device using the NPU, and creates a vector index. Had it been the CPU or GPU processing the data, it would have resulted in battery drain. But now that we have a dedicated NPU, it can efficiently perform AI operations without hitting the battery life or stressing the CPU or GPU.
Similarly, dedicated NPUs are what power features like Cocreator in MS Paint, generate images in the Photos app, remove background from video clips, apply visual effects using Magic Mask on DaVinci Resolve, upscale frames in games, apply Windows Studio Effects, generate real-time translation and transcription, and much more.
Over time, the application of NPUs is only going to get wider. It will free up the CPU and GPU from performing such tasks, making the device faster and battery-efficient.
Apple, on the other hand, uses its Neural Engine aka NPU to power many Apple Intelligence features on iOS, iPadOS, and macOS. The on-device AI model uses the Neural Engine to summarize emails, prioritize notifications, generate summaries of call recordings, generate images, and more. The new Siri also utilizes the Neural Engine to process many AI tasks.
Simply put, NPU is a new hardware accelerator that can unlock fresh possibilities in the AI age. This is just the beginning, and new applications and experiences based on the NPU will be possible in the near future.