Key Takeaways
- Apple Intelligence is a three-tier hybrid β on-device AFM Core (pure Apple, zero Google), Private Cloud Compute (Apple servers), and AFM 3 Cloud Pro (Nvidia GPUs in Google Cloud, refined with Gemini).
- Your iPhone's on-device model is pure Apple β AFM Core / AFM 3 Core Advanced is 20B sparse, activating 1β4B params per prompt via Instruction-Following Pruning.
- Gemini is a teacher signal, not the runtime β Apple's cloud model was refined using Gemini outputs; Gemini itself is not running on your device.
- Self-hosted local LLMs give control Apple cannot β open weights, any quantization, any tool, fully offline, model-swappable.
- WWDC 2026 (8 June, Tim Cook's final keynote): six OS betas, dedicated Siri app with iCloud history, homeOS preview for HomePad.
- EU/GDPR: on-device = data residency by default; Cloud Pro routes to Google Cloud (US), raising Chapter V transfer questions.
What Apple Announced at WWDC 2026
WWDC 2026 opened on 8 June with Tim Cook's final keynote as CEO. The headline was a rebuilt AI strategy: a new dedicated Siri app with iCloud-synced conversation history, six OS releases in beta (iOS 27, iPadOS 27, macOS 27, watchOS 27, tvOS 27, visionOS 27 β full launch targeted fall 2026), and a homeOS developer preview for a forthcoming HomePad smart-home hub.
The AI layer is called Apple Intelligence, now co-developed with Google using Gemini technology. The on-device models (AFM Core / AFM 3 Core Advanced) are Apple's own. The cloud model (AFM 3 Cloud Pro) is refined using Gemini outputs and runs on Nvidia GPUs in Google Cloud.
π In One Sentence
At WWDC 2026, Apple announced Apple Intelligence as a three-tier hybrid: on-device AFM models (pure Apple), Private Cloud Compute (Apple servers), and AFM 3 Cloud Pro on Nvidia GPUs in Google Cloud (refined with Gemini).
π¬ In Plain Terms
Apple Intelligence is Apple's on-device AI. Simple tasks (dictation, quick replies) run entirely on your iPhone's chip and never leave your device. Harder tasks can go to Apple-run cloud servers. The most complex reasoning goes to a Google Cloud server running an Apple model that was trained partly using Google's Gemini.
The Three-Tier Architecture: What Runs Where
Apple Intelligence routes each task through one of three tiers depending on complexity. Which tier a task hits determines the privacy story.
| Tier | Where it runs | What it handles | Touches Google? |
|---|---|---|---|
| On-device | Apple Silicon chip (AFM Core / AFM 3 Core Advanced) | Dictation, on-screen awareness, personal-context lookups, quick tasks | No β pure Apple. Zero Google code, Gemini, or Search involvement |
| Private Cloud Compute (PCC) | Apple Silicon servers (attested, code-audited) | Medium tasks needing more compute than the device provides | No β no third-party data access |
| Cloud Pro | Nvidia GPUs in Google Cloud (AFM 3 Cloud Pro) | Heaviest world-knowledge tasks and complex reasoning | Yes β Google Cloud infrastructure; model refined using Gemini outputs |
Gemini Is a Teacher, Not the Runtime
The most misunderstood part of WWDC 2026 is the Google relationship. Apple distinguishes 'trained using Gemini' from 'is Gemini'. The on-device models β AFM Core and AFM 3 Core Advanced β are Apple's own and have no Google involvement. Your on-device interactions never go to Google.
The cloud model (AFM 3 Cloud Pro) is different. It runs on Nvidia GPUs in Google Cloud. Apple states the model was refined using Gemini outputs β a knowledge-distillation step where Gemini's outputs served as training signal. The result is Apple's own model, but hosted on Google infrastructure.
Reported (unconfirmed): the partnership is worth approximately $1B/year; the cloud model is reportedly around 1.2T parameters. Apple reportedly attempted its own Private Cloud Compute hardware for heavy tasks first but found it too slow, leading to the Google Cloud arrangement.
π In One Sentence
Gemini trained Apple's AFM 3 Cloud Pro via knowledge distillation; the on-device Apple models have no Google involvement and your iPhone interactions never go to Google.
Apple's On-Device Model vs a Self-Hosted Local LLM
Apple's on-device model and a self-hosted open-weight LLM both process on local hardware β but the differences are significant:
| Apple AFM 3 Core Advanced (on-device) | Self-hosted local LLM (Qwen / Llama / Gemma) | |
|---|---|---|
| Model size | 20B sparse; activates 1β4B params/prompt (Instruction-Following Pruning) | Your choice: 3Bβ70B+ |
| Control | Locked to Apple OS; not user-swappable | Full: any model, any quantization, any tool |
| Offline capability | On-device tier offline; heavy tasks route to cloud | Fully offline if you choose |
| Privacy | Strong for on-device tier; cloud tiers process your request | Absolute β nothing leaves your machine |
| Openness | Closed weights; Apple-only ecosystem | Open weights; inspectable and fine-tunable |
| Model cutoff / updates | Apple controls release schedule | You choose when to update or swap |
What It Means for Users: Privacy in Practice
The practical question: does my data stay on my device? The answer depends entirely on which tier handles the task. Apple provides some transparency, but you cannot directly observe which tier fires for any given request.
| What you ask | Which tier? | Leaves device? | Touches Google Cloud? |
|---|---|---|---|
| Dictation, set a timer, quick reply | On-device | No | No |
| Summarize a long email thread | PCC or Cloud Pro | Yes | Possibly (Cloud Pro) |
| Complex research or creative writing | Cloud Pro | Yes | Yes |
| Self-hosted LLM via Ollama | Your machine | Never | Never |
Keep medical notes, legal documents, and confidential business data off Apple Intelligence if you cannot guarantee the on-device tier. For verified data residency, self-hosted local LLMs remain the only confirmed option.
What It Means for Developers and Companies
The developer story from WWDC 2026 is less about model quality and more about surface area. Apple is extending App Intents so Apple Intelligence can call into third-party apps β but only through explicitly declared actions and data structures. Siri won't scrape your UI; it calls declared intents.
This is functionally parallel to GEO (Generative Engine Optimization). Instead of structuring content for AI search crawlers, you're structuring the action surface your app exposes to the OS model. Apps that declare clean, granular App Intents will appear in Apple Intelligence results; apps that don't, won't.
For EU/GDPR-regulated businesses: the on-device tier provides data residency by default, which may satisfy Article 32 obligations for simple tasks. The Cloud Pro tier routes to Google Cloud in the United States, raising the same Chapter V cross-border transfer questions that apply to any US cloud service. Legal teams should assess whether Apple Intelligence falls within DPIA scope for their workload.
The Honest Take
Apple just made 'private, on-device AI' a mainstream expectation for roughly a billion device owners β that validation of the local-first premise matters. But Apple Intelligence is a hybrid, partly Google-backed, closed-weight system: a gateway to the local-AI mindset, not a replacement for running your own models.
If privacy is your primary motivation, the three-tier architecture introduces real caveats: cloud tiers process your requests, the Cloud Pro tier runs on US-based Google Cloud infrastructure, and you control none of the weights, routing logic, or update schedule.
Self-hosted local LLMs β Qwen, Llama, Gemma on your own hardware β remain the only architecture where you can verify that nothing leaves your environment. Apple's on-device tier narrows the gap significantly for everyday tasks, but the 'trust Apple' requirement never disappears.
For EU users: on-device gives you data residency on simple tasks. For complex tasks routed to Google Cloud, the same GDPR Chapter V analysis applies as for any other US cloud service.
Frequently Asked Questions
Is Apple Intelligence a local LLM?
Not exactly. Apple Intelligence is a three-tier hybrid. Simple tasks use the on-device model (AFM Core / AFM 3 Core Advanced), which runs on Apple Silicon and never leaves your device. Medium tasks go to Apple's Private Cloud Compute servers. Complex tasks go to AFM 3 Cloud Pro, running on Nvidia GPUs in Google Cloud. Only the first tier qualifies as a true local model.
Does Apple use Gemini on my iPhone?
No. The on-device models β AFM Core and AFM 3 Core Advanced β are Apple's own and have no Google involvement. Gemini was used as a teacher signal to train the cloud model (AFM 3 Cloud Pro), but Gemini itself is not running on your device. Your on-device Apple Intelligence interactions do not go to Google.
Is my data sent to Google?
Only for tasks routed to the Cloud Pro tier (AFM 3 Cloud Pro), which runs on Nvidia GPUs in Google Cloud. Simple on-device tasks never leave your device. Medium tasks go to Apple's Private Cloud Compute (not Google). Complex reasoning tasks go to Google Cloud infrastructure. Apple states no data is stored by Google, but this involves US-based infrastructure.
How big is Apple's on-device model?
Apple's AFM 3 Core Advanced is a 20B sparse model that activates only 1β4B parameters per prompt using Instruction-Following Pruning. This makes it memory-efficient enough to run on iPhone and Mac chips while remaining competitive on common everyday tasks.
Can I run my own local LLM instead of Apple Intelligence?
Yes. Ollama (free, cross-platform) lets you run open-weight models β Qwen, Llama, Gemma β entirely on your own hardware. Unlike Apple Intelligence, self-hosted LLMs are fully offline, use open weights you can inspect and fine-tune, and route nothing through Apple's or Google's infrastructure. See What Are Local LLMs? to get started.
Is Apple Intelligence private enough for EU/GDPR compliance?
The on-device tier provides strong data residency β data never leaves your Apple Silicon chip, likely satisfying GDPR Article 32 for simple tasks. The Cloud Pro tier routes to Google Cloud (US-based), raising GDPR Chapter V cross-border transfer questions. EU businesses handling sensitive personal data should conduct a DPIA and confirm which Apple Intelligence tasks stay on-device vs route to cloud.
Does Siri work offline after WWDC 2026?
For on-device tasks β dictation, quick replies, on-screen awareness β yes, Siri works without internet. Tasks requiring Private Cloud Compute or Cloud Pro need connectivity. Apple does not publicly document which tasks route to which tier for every scenario.
What is homeOS and the HomePad?
homeOS is a new OS previewed at WWDC 2026 for smart-home hub devices. Apple showed a developer preview tied to a forthcoming HomePad. Specs and a release date for the HomePad were not announced at WWDC 2026.