Best Ollama Models for 4 GB VRAM?
Quick Answer
4 GB VRAM is tight but usable with small models like Phi-4 Mini Q4 at ~3.2 GB, Gemma 2 2B at ~1.5 GB, and SmolLM 1.7B at ~1.0 GB for flexible allocation. Llama 3 8B will not fit.
- ▸Phi-4 Mini Q4: best quality in 4 GB (3.2 GB VRAM)
- ▸Gemma 2 2B: fast and lightweight (1.5 GB)
- ▸SmolLM 1.7B: smallest option, 1.0 GB VRAM
Updated: 2026-05
Key Takeaways
- ✓Best for 4 GB VRAM: Phi-4 Mini Q4 at ~3.2 GB — highest quality at this tier
- ✓Gemma 2 2B (1.5 GB) is the fastest option; SmolLM 1.7B (1.0 GB) is the smallest
- ✓Llama 3 8B will not fit at any quantization — it needs 5.5 GB minimum
What Fits in 4 GB VRAM
As of May 2026, at 4 GB VRAM you are limited to models with 3 billion parameters or fewer at Q4 quantization. This rules out every mainstream local model — Llama 3 8B, Mistral Small, Qwen 14B. Three modern small models perform surprisingly well: Phi-4 Mini approaches GPT-5.5 mini on instruction following, Gemma 2 2B handles fast chat, and SmolLM 1.7B runs on integrated graphics.
Phi-4 Mini is the top pick at this tier. Despite its small size, it handles general Q&A, light coding, and document summarization at ~25 tokens per second. Gemma 2 2B is faster for single-turn chat. SmolLM 1.7B is the fallback if even Phi-4 Mini pushes your VRAM too close to the limit.
| Model | VRAM | Best For |
|---|---|---|
| Phi-4 Mini Q4 | 3.2 GB | Best quality at 4 GB |
| Gemma 2 2B Q4 | 1.5 GB | Fast single-turn chat |
| SmolLM 1.7B Q4 | 1.0 GB | Minimal VRAM footprint |
What Won't Fit in 4 GB
These models are commonly requested but require more than 4 GB VRAM at every quantization level:
Upgrading to 6 GB unlocks Llama 3 8B and Mistral Small — the two most popular local models. See the best local LLMs for 6 GB VRAM. For a full hardware comparison, see fastest local LLMs for low-end PCs.
- ▸Llama 3 8B — needs ~5.5 GB at Q4_K_M (minimum)
- ▸Mistral Small — needs ~4.5 GB at Q4_K_M (marginal; risky at 4 GB with context overhead)
- ▸Phi-4 (full 14B) — needs ~9.8 GB
- ▸Qwen 14B — needs ~9.5 GB at Q4_K_M
Related Guides
- ▸Can You Run RAG on 2 GB RAM? -- RAG on low RAM
Quick Answers About 4 GB VRAM Models
Is 4 GB VRAM enough for a useful LLM?▾
Can I run Llama 3 on 4 GB VRAM?▾
What GPU has 4 GB VRAM?▾
Can CPU-only mode bypass the 4 GB VRAM limit?▾
Want the full breakdown?
Read the complete guide →Related Prompt Bites