Quick Answer
Yes — Ollama supports all Qwen 3 model sizes from 0.6B to 72B. Run any size with ollama run qwen3:8b. The 8B model needs ~6 GB VRAM at Q4.
Updated: 2026-05
Key Takeaways
As of May 2026, Ollama supports all major Qwen 3 model sizes from 0.6B to 72B. Pull any size with a single command: ollama run qwen3:8b. Replace 8b with 0.6b, 1.5b, 3b, 14b, 32b, or 72b for other sizes.
Each size is available in multiple quantizations. Q4_K_M is the default and recommended starting point — it delivers the best quality-to-file-size ratio. Q8_0 is available for 7B and 14B if you have the VRAM headroom.
Tool calling is supported natively on all Qwen 3 sizes via the standard Ollama API. No custom Modelfile or special prompt template is required.
ollama run qwen3:8bThe right Qwen 3 size depends entirely on available VRAM. For most users on a mid-range GPU (6–8 GB VRAM), the 7B model at Q4_K_M is the practical choice — it needs ~6 GB and runs at ~20 tok/s.
The 14B model at Q4 is the recommended coding tier: it outperforms the 7B on code generation and fits comfortably in 10–12 GB VRAM. For a full comparison of Qwen 3 coding performance versus other local models, see the guide to running Qwen locally in 2026.
| VRAM | Qwen 3 Size | Best For |
|---|---|---|
| < 4 GB | 0.6B / 1.5B | Edge devices, testing, CPU-only |
| 4–6 GB | 3B | Budget GPU or low-RAM CPU |
| 6–12 GB | 7B / 14B | General use and coding |
| 12–24 GB | 14B / 32B | High-quality coding and reasoning |
| 40+ GB | 72B | Near-frontier local quality |
ollama run qwen3:8b in a terminal. Ollama downloads the model automatically on first run. Replace 8b with your target size: 0.6b, 1.5b, 3b, 14b, 32b, or 72b.