PromptQuorumPromptQuorum

Can You Run Qwen 3 on Ollama?

Quick Answer

Yes — Ollama supports all Qwen 3 model sizes from 0.6B to 72B. Run any size with ollama run qwen3:8b. The 8B model needs ~6 GB VRAM at Q4.

  • ollama run qwen3:0.6b — fits in 1 GB VRAM
  • ollama run qwen3:8b — needs ~6 GB VRAM
  • ollama run qwen3:72b — needs ~40 GB VRAM

Updated: 2026-05

OllamaIntermediate

Key Takeaways

  • Ollama supports all Qwen 3 sizes: 0.6B, 1.5B, 3B, 7B, 14B, 32B, and 72B
  • Pull any size with <code>ollama run qwen3:8b</code> — replace the tag with your target size
  • The 7B model needs ~6 GB VRAM at Q4 and runs at ~20 tok/s on a mid-range GPU
  • Qwen 3 supports tool calling natively via the standard Ollama API — no custom Modelfile required

Yes — Here's What's Available

As of May 2026, Ollama supports all major Qwen 3 model sizes from 0.6B to 72B. Pull any size with a single command: ollama run qwen3:8b. Replace 8b with 0.6b, 1.5b, 3b, 14b, 32b, or 72b for other sizes.

Each size is available in multiple quantizations. Q4_K_M is the default and recommended starting point — it delivers the best quality-to-file-size ratio. Q8_0 is available for 7B and 14B if you have the VRAM headroom.

Tool calling is supported natively on all Qwen 3 sizes via the standard Ollama API. No custom Modelfile or special prompt template is required.

ollama run qwen3:8b

Which Qwen 3 Size to Pick

The right Qwen 3 size depends entirely on available VRAM. For most users on a mid-range GPU (6–8 GB VRAM), the 7B model at Q4_K_M is the practical choice — it needs ~6 GB and runs at ~20 tok/s.

The 14B model at Q4 is the recommended coding tier: it outperforms the 7B on code generation and fits comfortably in 10–12 GB VRAM. For a full comparison of Qwen 3 coding performance versus other local models, see the guide to running Qwen locally in 2026.

VRAMQwen 3 SizeBest For
< 4 GB0.6B / 1.5BEdge devices, testing, CPU-only
4–6 GB3BBudget GPU or low-RAM CPU
6–12 GB7B / 14BGeneral use and coding
12–24 GB14B / 32BHigh-quality coding and reasoning
40+ GB72BNear-frontier local quality

Quick Answers About Qwen 3 on Ollama

How do I install Qwen 3 on Ollama?
Run ollama run qwen3:8b in a terminal. Ollama downloads the model automatically on first run. Replace 8b with your target size: 0.6b, 1.5b, 3b, 14b, 32b, or 72b.
Is Qwen 3 better than Llama 3 for coding?
For coding: yes, Qwen 3 14B outperforms Llama 3 8B on HumanEval benchmarks. For general conversation at the 8B tier: Llama 3 8B remains competitive. For the current top Ollama picks across all tasks, see the best Ollama models right now.
Does Qwen 3 support tool calling on Ollama?
Yes. Qwen 3 supports function and tool calling natively via the standard Ollama API. No custom Modelfile or special configuration is required — it works with any client that supports the Ollama tool-use format.
Can I run Qwen 3 72B on consumer hardware?
Technically yes, but it requires ~40 GB of VRAM at Q4 — meaning a dual-GPU setup (two RTX 3090s) or an Apple M-series Mac with 64+ GB unified memory. Most consumer setups max out at the 32B tier.