Can You Run Qwen 3 on Ollama?

Read in:

🇺🇸en 🇩🇪de 🇫🇷fr 🇯🇵ja 🇨🇳zh 🇪🇸es 🇧🇷pt 🇸🇦ar 🇰🇷ko

Quick Answer

Yes — Ollama supports all Qwen 3 model sizes from 0.6B to 72B with native tool calling via the standard API, needing only a single command like ollama run qwen3:8b. The 8B model needs ~6 GB VRAM at Q4.

▸ollama run qwen3:0.6b — fits in 1 GB VRAM
▸ollama run qwen3:8b — needs ~6 GB VRAM
▸ollama run qwen3:72b — needs ~40 GB VRAM

Updated: 2026-05

OllamaIntermediate

Key Takeaways

✓Ollama supports all Qwen 3 sizes: 0.6B, 1.5B, 3B, 7B, 14B, 32B, and 72B
✓Pull any size with <code>ollama run qwen3:8b</code> — replace the tag with your target size
✓The 7B model needs ~6 GB VRAM at Q4 and runs at ~20 tok/s on a mid-range GPU
✓Qwen 3 supports tool calling natively via the standard Ollama API — no custom Modelfile required

Yes — Here's What's Available

As of May 2026, Ollama supports all major Qwen 3 model sizes from 0.6B to 72B. Pull any size with a single command: ollama run qwen3:8b. Replace 8b with 0.6b, 1.5b, 3b, 14b, 32b, or 72b for other sizes.

Each size is available in multiple quantizations. Q4_K_M is the default and recommended starting point — it delivers the best quality-to-file-size ratio. Q8_0 is available for 7B and 14B if you have the VRAM headroom.

Tool calling is supported natively on all Qwen 3 sizes via the standard Ollama API. No custom Modelfile or special prompt template is required.

ollama run qwen3:8b

Which Qwen 3 Size to Pick

The right Qwen 3 size depends entirely on available VRAM. For most users on a mid-range GPU (6–8 GB VRAM), the 7B model at Q4_K_M is the practical choice — it needs ~6 GB and runs at ~20 tok/s.

The 14B model at Q4 is the recommended coding tier: it outperforms the 7B on code generation and fits comfortably in 10–12 GB VRAM. For a full comparison of Qwen 3 coding performance versus other local models, see the guide to running Qwen locally in 2026.

VRAM	Qwen 3 Size	Best For
< 4 GB	0.6B / 1.5B	Edge devices, testing, CPU-only
4–6 GB	3B	Budget GPU or low-RAM CPU
6–12 GB	7B / 14B	General use and coding
12–24 GB	14B / 32B	High-quality coding and reasoning
40+ GB	72B	Near-frontier local quality

Quick Answers About Qwen 3 on Ollama

How do I install Qwen 3 on Ollama?▾

Run ollama run qwen3:8b in a terminal. Ollama downloads the model automatically on first run. Replace 8b with your target size: 0.6b, 1.5b, 3b, 14b, 32b, or 72b.

Is Qwen 3 better than Llama 3 for coding?▾

For coding: yes, Qwen 3 14B outperforms Llama 3 8B on HumanEval benchmarks. For general conversation at the 8B tier: Llama 3 8B remains competitive. For the current top Ollama picks across all tasks, see the best Ollama models right now.

Does Qwen 3 support tool calling on Ollama?▾

Yes. Qwen 3 supports function and tool calling natively via the standard Ollama API. No custom Modelfile or special configuration is required — it works with any client that supports the Ollama tool-use format.

Can I run Qwen 3 72B on consumer hardware?▾

Technically yes, but it requires ~40 GB of VRAM at Q4 — meaning a dual-GPU setup (two RTX 3090s) or an Apple M-series Mac with 64+ GB unified memory. Most consumer setups max out at the 32B tier.

Want the full breakdown?

Read the complete guide →

Related Prompt Bites

← Back to Prompt Bites