Can You Run a Local LLM on an Xperia Phone?
Quick Answer
Yes — the Xperia 1 VI (12 GB RAM, Snapdragon 8 Gen 3) runs Rinna 3.6B and Phi-4 Q4 via MLC Chat. The Xperia 5 V (8 GB) handles lightweight models. The Xperia 10 VI (6 GB) is limited to TinyLlama and Gemma 2B.
- ▸Xperia 1 VI: 12 GB RAM — runs Phi-4 Q4, Rinna 3.6B, Qwen2.5-3B
- ▸Xperia 5 V: 8 GB RAM — runs Rinna 3.6B and Gemma 2B Q4
- ▸Xperia 10 VI: 6 GB RAM — TinyLlama and Gemma 2B only
- ▸Sony does not include Galaxy AI-style features — local LLM via MLC Chat fills that gap
Updated: 2026-05
Key Takeaways
- ✓Xperia 1 VI (12 GB RAM, Snapdragon 8 Gen 3) is the only Xperia that reliably runs 7B+ models — use it for Phi-4 Q4 and Qwen2.5-3B
- ✓Xperia 5 V (8 GB RAM) handles 3B models including Rinna 3.6B and Gemma 2B Q4 without issues
- ✓Xperia 10 VI (6 GB RAM) is limited to sub-2B models — TinyLlama 1.1B and Gemma 2B are the practical ceiling
- ✓Sony does not ship on-device AI features — MLC Chat or PocketPal AI from Google Play is the practical replacement
- ✓Battery drain is approximately 15% per hour with screen on during continuous inference on the Xperia 1 VI; use airplane mode to reduce drain
Xperia Model Compatibility
As of May 2026, three current Xperia models support local LLM inference, with capability determined entirely by RAM and chipset — the Xperia 1 VI leads, the Xperia 5 V covers the mid-range, and the Xperia 10 VI is limited to the smallest models. Sony does not pre-install on-device AI assistants (unlike Samsung Galaxy AI), so local LLM apps are the only route to private, offline AI on Xperia devices.
The Xperia 1 VI is the only Xperia capable of running quantized 7B+ models. Its Snapdragon 8 Gen 3 SoC and 12 GB of LPDDR5X RAM give it headroom for Phi-4 Q4 (14B quantized to ~8 GB) and Qwen2.5-3B alongside day-to-day app usage. The Xperia 5 V with Snapdragon 8 Gen 2 and 8 GB RAM is the sweet spot for 3B models — Rinna 3.6B and Gemma 2B Q4 run reliably. The Xperia 10 VI uses the mid-range Snapdragon 6 Gen 1 with only 6 GB RAM; at this tier, stick to TinyLlama 1.1B or Gemma 2B — larger models will crash or OOM during loading.
Use Xperia 1 VI for 7B+ models; use Xperia 5 V for 3B models; stick to sub-2B models on Xperia 10 VI.
For app setup instructions, see our Android LLM apps for Japan guide.
| Xperia Model | RAM / Chip | Recommended Models |
|---|---|---|
| Xperia 1 VI | 12 GB / Snapdragon 8 Gen 3 | Phi-4 Q4, Rinna 3.6B, Qwen2.5-3B |
| Xperia 5 V | 8 GB / Snapdragon 8 Gen 2 | Rinna 3.6B, Gemma 2B Q4 |
| Xperia 10 VI | 6 GB / Snapdragon 6 Gen 1 | TinyLlama 1.1B, Gemma 2B only |
3-Step Setup Guide
Installing a local LLM on an Xperia takes three steps and under 30 minutes, including model download time. The process requires no root access, no developer mode, and no special Xperia settings — it runs entirely through standard Android app and file management.
Step 1: Install MLC Chat or PocketPal AI from Google Play (Google Playストア). Both are free and available in Japan without a VPN or region workaround. MLC Chat is faster to first inference; PocketPal AI supports a broader range of GGUF model files from Hugging Face.
Step 2: Download your model over Wi-Fi. Model download sizes vary: TinyLlama 1.1B Q4 is approximately 0.7 GB, Rinna 3.6B Q4 is approximately 2 GB, Gemma 2B Q4 is approximately 1.5 GB, and Phi-4 Q4 is approximately 8 GB. Use a 128 GB or larger storage Xperia for Phi-4. Close all other apps before loading Phi-4 Q4 — it uses approximately 8 GB of the Xperia 1 VI's 12 GB RAM and requires maximum available memory to load without crashing. Do not download over mobile data — the files are large and your carrier plan will not thank you.
Step 3: Switch your keyboard to Japanese input. Gboard with Japanese enabled or ATOK (popular in Japan for business use) both work directly with MLC Chat and PocketPal AI — you type in Japanese, the model responds in Japanese. No extra configuration is required for Japanese language input to function.
Battery note: expect approximately 15% battery drain per hour with screen on during continuous inference on the Xperia 1 VI. Enable airplane mode (機内モード) during inference sessions to reduce background radio drain and extend session time. Power-saving mode further reduces drain but may throttle the Snapdragon's AI cores and slow inference speed.
Sony Xperia AI Agent (currently in beta) connects to cloud AI services and does not run on-device. Local LLM via MLC Chat is the only way to run AI inference entirely on the Xperia without sending data to external servers — an important distinction for privacy under Japan's Act on the Protection of Personal Information (APPI / 個人情報保護法). For a full guide to Android LLM setup including hardware requirements, see running AI on tablets and Android phones.
Quick Answers About Xperia LLMs
Does local LLM work on the Xperia 10 VI?▾
How much storage does a model need on Xperia?▾
How much battery does running an LLM drain on Xperia?▾
Does it work offline on Xperia?▾
What is the difference between Sony Xperia AI Agent and a local LLM?▾
Want the full breakdown?
Read the complete guide →Related Prompt Bites