Skip to main content
PromptQuorumPromptQuorum

Best Ollama Models for 4 GB VRAM?

Quick Answer

4 GB VRAM is tight but usable with small models like Phi-4 Mini Q4 at ~3.2 GB, Gemma 2 2B at ~1.5 GB, and SmolLM 1.7B at ~1.0 GB for flexible allocation. Llama 3 8B will not fit.

  • Phi-4 Mini Q4: best quality in 4 GB (3.2 GB VRAM)
  • Gemma 2 2B: fast and lightweight (1.5 GB)
  • SmolLM 1.7B: smallest option, 1.0 GB VRAM

Updated: 2026-05

Quantization & VRAMIntermediate

Key Takeaways

  • Best for 4 GB VRAM: Phi-4 Mini Q4 at ~3.2 GB — highest quality at this tier
  • Gemma 2 2B (1.5 GB) is the fastest option; SmolLM 1.7B (1.0 GB) is the smallest
  • Llama 3 8B will not fit at any quantization — it needs 5.5 GB minimum

What Fits in 4 GB VRAM

As of May 2026, at 4 GB VRAM you are limited to models with 3 billion parameters or fewer at Q4 quantization. This rules out every mainstream local model — Llama 3 8B, Mistral Small, Qwen 14B. Three modern small models perform surprisingly well: Phi-4 Mini approaches GPT-5.5 mini on instruction following, Gemma 2 2B handles fast chat, and SmolLM 1.7B runs on integrated graphics.

Phi-4 Mini is the top pick at this tier. Despite its small size, it handles general Q&A, light coding, and document summarization at ~25 tokens per second. Gemma 2 2B is faster for single-turn chat. SmolLM 1.7B is the fallback if even Phi-4 Mini pushes your VRAM too close to the limit.

ModelVRAMBest For
Phi-4 Mini Q43.2 GBBest quality at 4 GB
Gemma 2 2B Q41.5 GBFast single-turn chat
SmolLM 1.7B Q41.0 GBMinimal VRAM footprint

What Won't Fit in 4 GB

These models are commonly requested but require more than 4 GB VRAM at every quantization level:

Upgrading to 6 GB unlocks Llama 3 8B and Mistral Small — the two most popular local models. See the best local LLMs for 6 GB VRAM. For a full hardware comparison, see fastest local LLMs for low-end PCs.

  • Llama 3 8B — needs ~5.5 GB at Q4_K_M (minimum)
  • Mistral Small — needs ~4.5 GB at Q4_K_M (marginal; risky at 4 GB with context overhead)
  • Phi-4 (full 14B) — needs ~9.8 GB
  • Qwen 14B — needs ~9.5 GB at Q4_K_M

Related Guides

Quick Answers About 4 GB VRAM Models

Is 4 GB VRAM enough for a useful LLM?
Yes, for basic tasks. Phi-4 Mini handles general Q&A and light coding at ~25 tok/s. For longer context, multi-step coding agents, or document analysis, 4 GB is a bottleneck — upgrade to 6 GB or more.
Can I run Llama 3 on 4 GB VRAM?
No. Llama 3 8B needs ~5.5 GB at Q4_K_M minimum. Llama 3.2 3B fits in ~2.5 GB if you specifically want a Llama variant. See the full VRAM requirements guide.
What GPU has 4 GB VRAM?
RTX 3050 Ti (4 GB), GTX 1650 Super (4 GB), and AMD RX 6500 XT (4 GB) are the most common. All three work with Ollama — NVIDIA via CUDA, AMD via ROCm or Vulkan.
Can CPU-only mode bypass the 4 GB VRAM limit?
Yes. Running without GPU, Llama 3 8B Q4 uses ~6 GB of system RAM and runs at 3–6 tok/s on a modern 8-core CPU. Slower but works if you have enough RAM.