Apple Achieves Breakthrough for Running LLMs on iPhone

Close up look of the back camera on the iPhone 15 Pro Max
In Short
  • Apple's latest research paper suggests that the company is readying a ton of new AI-powered features for the iPhone.
  • Researchers claim they have found a breakthrough in deploying large language models (LLMs) on iPhones and other Apple devices with limited memory.
  • This technology will also set the stage for iPhones to run sophisticated on-device AI chatbots and assistants which Apple is said to be working on.

Apple was caught a bit off-guard when the generative AI technology began to take off. However, the Cupertino tech giant is believed to be working with its LLM models and is aiming to integrate broader use of the technology in the upcoming versions of iOS and Siri.

Apple AI researchers claim they’ve made a significant breakthrough in using Large Language Models (LLMs) on iPhones and other Apple devices with lower memory by introducing an ingenious flash memory technique.

The research paper titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” was released on December 12, 2023, but gained wider attention when Hugging Face, the most popular website for AI scientists to display their work, announced it this Wednesday. This is the second Apple research paper on generative AI this month and is the latest in a series of moves that allow image-generating models, like Stable Diffusion, to run on its custom chips.

LLMs on iPhones

Siri-on-iPhone

Until this breakthrough, it was considered impossible to run large language models on devices with limited memory as LLMs require a large amount of RAM to store data and memory-intensive processes. To combat this, Apple researchers have come up with a technology to store data on flash memory, the secondary memory that is used for storing images, documents and apps.

Apple researchers say that it “tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM.”

Therefore, the entire LLM is still stored on the device, but using it in RAM could be done by working with flash memory, a virtual memory form. It’s not much different than how it is done on macOS for tasks requiring a lot of memory.

In simple words, Apple researchers cleverly bypassed the limitations by making use of two techniques that can minimize data transfer and maximize flash memory throughput:

Windowing: Imagine this in terms of a way to recycle data. Instead of loading data each time, the AI model reuses a portion of existing data that it previously processed. This means there is less requirement to constantly fetch data and store it in memory, making the process quicker and smoother.

Row-Column Bundling: This technique is similar to the reading of a text in bigger chunks rather than one word at each. The data can be read faster from the flash memory when grouped more effectively, increasing the AI’s ability to comprehend and generate language.

The research paper proposes that the combination of these techniques will enable AI models to be able to run at least twice the size of an iPhone’s memory. This method is anticipated to boost the speed of conventional processors (CPUs) by 5 times, and 20-25x times faster for graphics processors (GPUs).

AI on iPhone

The new advancement in AI efficiency has opened up new possibilities for the future iPhones including more sophisticated Siri capabilities and real-time language translation as well as advanced AI-driven features for photography and augmented reality. This technology will also set the stage for iPhones to run sophisticated on-device AI chatbots and assistants which Apple is said to be working on.

#Tags
comment Comments 0
Leave a Reply