Siri Is Not Gemini: Apple’s 20B On-Device Model

Editor J
Siri Is Not Gemini: Apple’s 20B On-Device Model

Contrary to headlines that Gemini powers the new Siri, Apple's own docs reveal its on-device brain is a homegrown 20B model: AFM 3 Core Advanced.

Prior to Apple taking the stage at WWDC 2026 on June 8 to unveil a redesigned Siri, the industry had coalesced around a single narrative: the virtual assistant's new engine would be Google Gemini. Accompanying reports even specified details—an annual licensing fee of approximately $1 billion and a model size of 1.2 trillion parameters.

However, Apple's release of its own model documentation that same week painted a different picture. The intelligence driving Siri on-device is not Gemini. Instead, it is a proprietary 20-billion-parameter model designed entirely by Apple.

The Initial Headlines: Gemini Running Siri

This narrative originated with Bloomberg. Reporter Mark Gurman reported last November that Apple planned to integrate a custom 1.2-trillion-parameter Google Gemini model into the redesigned Siri. This represented an eightfold increase over the 150-billion-parameter proprietary cloud model Apple had previously operated.

The two companies formalized a multi-year partnership in a joint statement on January 12, 2026. Google announced that the next generation of Apple Foundation Models would leverage Google Gemini and cloud infrastructure, while reports indicated Apple would pay approximately $1 billion annually for the access. With the timing, financials, and official messaging aligned, the market concluded that Gemini would indeed serve as Siri's backbone. Pre-WWDC reporting reinforced this expectation.

The Real Brain: Apple's 20B On-Device Model

However, Apple's third-generation foundation model documentation published in June sidestepped this assumption. Apple described a family of five models 'custom-built in collaboration with Google,' but notably stopped short of referencing Gemini in relation to Siri.

The Apple Foundation Models icon depicted as connected nodes
The Apple Foundation Models (AFM) logo

At the core of this on-device intelligence is the AFM 3 Core Advanced model. Although it contains 20 billion parameters, it employs a sparse architecture that activates only 1 to 4 billion parameters per request, leveraging Apple's proprietary Instruction-Following Pruning (IFP) technique. By keeping the full model weights in flash memory and loading only the specific expert modules required for a given prompt into DRAM, Apple enables a model that exceeds typical device memory constraints to run directly on the phone.

This model primarily powers features such as expressive voice generation and significantly more accurate dictation. According to Apple's internal evaluations, the new speech synthesis engine scored 4.15 out of 5, outperforming the previous system's score of 3.87, with an even wider performance gap observed on conversational text. However, these features are restricted to newer hardware equipped with at least 12GB of memory, including the iPhone 17 Pro, iPhone Air, and M4-equipped iPads.

What the Billion-Dollar Deal Actually Buys

Where, then, does Apple's annual $1 billion payment to Google go? Based on Apple's disclosures, the value lies not in the final model architecture itself, but in the infrastructure used to train it.

Apple disclosed that all its models were pre-trained using Google Cloud's latest TPU clusters. Furthermore, its largest server-based model, AFM 3 Cloud Pro, is deployed via Private Cloud Compute, which has been extended—in partnership with Google and NVIDIA—to run on NVIDIA GPUs hosted within Google Cloud.

Clearly, a significant gap exists between the popular headline that 'Gemini runs Siri' and Apple's more nuanced description of a suite 'custom-built in collaboration with Google.' In fact, the name Gemini does not appear once in Apple's official technical writeup. While the on-device intelligence is powered by a proprietary model designed by Apple, Google's involvement is concentrated in model training, server infrastructure, and collaborative development.

Ultimately, the redesigned Siri occupies a middle ground between an assistant powered by licensed technology and one built entirely in-house. While the on-device Siri that users interact with daily is powered by Apple's proprietary model, only the most complex queries are routed to server-based models running on Google's infrastructure. Summarizing this architecture under the umbrella of 'Gemini' oversimplifies what is a far more layered engineering blueprint.

Menu