Can You Run a Local LLM on an Xperia Phone?

Read in:

🇺🇸en 🇩🇪de 🇫🇷fr 🇯🇵ja 🇨🇳zh 🇪🇸es 🇧🇷pt 🇸🇦ar 🇰🇷ko

Quick Answer

Yes — the Xperia 1 VI (12 GB RAM, Snapdragon 8 Gen 3) runs Rinna 3.6B and Phi-4 Q4 via MLC Chat. The Xperia 5 V (8 GB) handles lightweight models. The Xperia 10 VI (6 GB) is limited to TinyLlama and Gemma 2B.

▸Xperia 1 VI: 12 GB RAM — runs Phi-4 Q4, Rinna 3.6B, Qwen3-3B
▸Xperia 5 V: 8 GB RAM — runs Rinna 3.6B and Gemma 2B Q4
▸Xperia 10 VI: 6 GB RAM — TinyLlama and Gemma 2B only
▸Sony does not include Galaxy AI-style features — local LLM via MLC Chat fills that gap

Updated: 2026-05

Hardware GuidesIntermediate

Key Takeaways

✓Xperia 1 VI (12 GB RAM, Snapdragon 8 Gen 3) is the only Xperia that reliably runs 7B+ models — use it for Phi-4 Q4 and Qwen3-3B
✓Xperia 5 V (8 GB RAM) handles 3B models including Rinna 3.6B and Gemma 2B Q4 without issues
✓Xperia 10 VI (6 GB RAM) is limited to sub-2B models — TinyLlama 1.1B and Gemma 2B are the practical ceiling
✓Sony does not ship on-device AI features — MLC Chat or PocketPal AI from Google Play is the practical replacement
✓Battery drain is approximately 15% per hour with screen on during continuous inference on the Xperia 1 VI; use airplane mode to reduce drain

Xperia Model Compatibility

As of May 2026, three current Xperia models support local LLM inference, with capability determined entirely by RAM and chipset — the Xperia 1 VI leads, the Xperia 5 V covers the mid-range, and the Xperia 10 VI is limited to the smallest models. Sony does not pre-install on-device AI assistants (unlike Samsung Galaxy AI), so local LLM apps are the only route to private, offline AI on Xperia devices.

The Xperia 1 VI is the only Xperia capable of running quantized 7B+ models. Its Snapdragon 8 Gen 3 SoC and 12 GB of LPDDR5X RAM give it headroom for Phi-4 Q4 (14B quantized to ~8 GB) and Qwen3-3B alongside day-to-day app usage. The Xperia 5 V with Snapdragon 8 Gen 2 and 8 GB RAM is the sweet spot for 3B models — Rinna 3.6B and Gemma 2B Q4 run reliably. The Xperia 10 VI uses the mid-range Snapdragon 6 Gen 1 with only 6 GB RAM; at this tier, stick to TinyLlama 1.1B or Gemma 2B — larger models will crash or OOM during loading.

Use Xperia 1 VI for 7B+ models; use Xperia 5 V for 3B models; stick to sub-2B models on Xperia 10 VI.

For app setup instructions, see our Android LLM apps for Japan guide.

Xperia Model	RAM / Chip	Recommended Models
Xperia 1 VI	12 GB / Snapdragon 8 Gen 3	Phi-4 Q4, Rinna 3.6B, Qwen3-3B
Xperia 5 V	8 GB / Snapdragon 8 Gen 2	Rinna 3.6B, Gemma 2B Q4
Xperia 10 VI	6 GB / Snapdragon 6 Gen 1	TinyLlama 1.1B, Gemma 2B only

3-Step Setup Guide

Installing a local LLM on an Xperia takes three steps and under 30 minutes, including model download time. The process requires no root access, no developer mode, and no special Xperia settings — it runs entirely through standard Android app and file management.

Step 1: Install MLC Chat or PocketPal AI from Google Play (Google Playストア). Both are free and available in Japan without a VPN or region workaround. MLC Chat is faster to first inference; PocketPal AI supports a broader range of GGUF model files from Hugging Face.

Step 2: Download your model over Wi-Fi. Model download sizes vary: TinyLlama 1.1B Q4 is approximately 0.7 GB, Rinna 3.6B Q4 is approximately 2 GB, Gemma 2B Q4 is approximately 1.5 GB, and Phi-4 Q4 is approximately 8 GB. Use a 128 GB or larger storage Xperia for Phi-4. Close all other apps before loading Phi-4 Q4 — it uses approximately 8 GB of the Xperia 1 VI's 12 GB RAM and requires maximum available memory to load without crashing. Do not download over mobile data — the files are large and your carrier plan will not thank you.

Step 3: Switch your keyboard to Japanese input. Gboard with Japanese enabled or ATOK (popular in Japan for business use) both work directly with MLC Chat and PocketPal AI — you type in Japanese, the model responds in Japanese. No extra configuration is required for Japanese language input to function.

Battery note: expect approximately 15% battery drain per hour with screen on during continuous inference on the Xperia 1 VI. Enable airplane mode (機内モード) during inference sessions to reduce background radio drain and extend session time. Power-saving mode further reduces drain but may throttle the Snapdragon's AI cores and slow inference speed.

Sony Xperia AI Agent (currently in beta) connects to cloud AI services and does not run on-device. Local LLM via MLC Chat is the only way to run AI inference entirely on the Xperia without sending data to external servers — an important distinction for privacy under Japan's Act on the Protection of Personal Information (APPI / 個人情報保護法). For a full guide to Android LLM setup including hardware requirements, see running AI on tablets and Android phones.

Quick Answers About Xperia LLMs

Does local LLM work on the Xperia 10 VI?▾

TinyLlama 1.1B and Gemma 2B Q4 only. The Xperia 10 VI has 6 GB RAM and a Snapdragon 6 Gen 1 — larger models crash or produce out-of-memory errors during loading. Do not attempt Rinna 3.6B or any 7B model on the Xperia 10 VI.

How much storage does a model need on Xperia?▾

Rinna 3.6B Q4 requires approximately 2 GB of storage. Phi-4 Q4 requires approximately 8 GB. TinyLlama 1.1B Q4 requires approximately 0.7 GB. Use a 128 GB or larger Xperia for Phi-4; 64 GB storage is sufficient for Rinna 3.6B and Gemma 2B.

How much battery does running an LLM drain on Xperia?▾

Approximately 15% battery per hour with screen on during continuous inference on the Xperia 1 VI at full performance. On the Xperia 5 V with Rinna 3.6B, expect similar drain. Enable airplane mode (機内モード) to cut background radio usage and reduce total drain by 2–4% per hour.

Does it work offline on Xperia?▾

Yes — fully offline after the initial model download. MLC Chat and PocketPal AI do not require an internet connection, an API key, or a Sony account once the model is stored on the device. No data leaves your phone during inference.

What is the difference between Sony Xperia AI Agent and a local LLM?▾

Sony Xperia AI Agent (beta) routes requests through cloud AI servers — your prompts and responses pass through Sony's or a third-party's infrastructure. A local LLM running via MLC Chat executes entirely on the Xperia's Snapdragon chip — data never leaves the device. This on-device approach is the privacy-compliant alternative for users who handle sensitive data under APPI (個人情報保護法).

Want the full breakdown?

Read the complete guide →

Related Prompt Bites

← Back to Prompt Bites