Key Takeaways
- Most downloaded: Llama 3.2 3B (tutorials) and Llama 4 Scout (best quality) -- widest tool support.
- Best reasoning: DeepSeek-R1 7B and 14B -- chain-of-thought reasoning model, significantly above standard models on math and logic.
- Best coding: Kimi K2.6 (frontier MoE), Qwen 3.6 27B (best dense), Devstral Small 24B (best agentic) -- highest benchmarks at their sizes.
- Best image understanding: Gemma 4 9B (vision + tool calling) and Llama 3.2 Vision 11B -- both support image input locally.
- As of May 2026, the Ollama library contains 4,500+ models. All are available via `ollama pull <name>`.
What's New in Ollama β June 2026 Update
Current Ollama Version: v0.22.1 (released May 3, 2026). This is the latest stable release available via ollama.com/download.
Latest Release (May 3, 2026): Ollama v0.22.1 added full Gemma 4 support with thinking and tool calling capabilities. The release included improved quantization handling and model inference optimizations. Check GitHub for detailed release notes at github.com/ollama/ollama/releases.
New Models Added (MayβJune 2026):
- Kimi K2.6 (Moonshot AI, May 2026) β First non-Western model to reach Tier A in coding benchmarks (87/100 real-world). MoE architecture (42B active / 1T total). MIT license. Pull: `ollama pull kimi-k2.6`
- Qwen 3.6 27B (Alibaba, May 2026) β Best dense coding model with 77.2% SWE-bench. 22 GB VRAM required. Pull: `ollama pull qwen3.6:27b`
- GLM-5.1 (Zhipu AI, May 2026) β Structured code generation leader on SWE-Bench Pro. Pull: `ollama pull glm-5.1`
- Gemma 4 (Google, April 2, 2026) β First vision + tool calling combination. Vision support for image understanding. 6 GB VRAM. Pull: `ollama pull gemma4:9b`
# Update Ollama to latest version
curl https://ollama.ai/install.sh | sh
# Or on Mac: brew upgrade ollama
# Check your current version
ollama --version # outputs: ollama version 0.22.1
# Pull the latest new models
ollama pull kimi-k2.6
ollama pull qwen3.6:27b
ollama pull glm-5.1Which Models Are Most Popular on Ollama in 2026?
Popularity on Ollama is measured by download counts visible on each model's library page. As of May 2026, the top downloaded models are still dominated by Meta's Llama family -- Llama 3.2 3B is the most pulled model overall, largely due to its use as a first-install test model. However, Llama 4 Scout has climbed rapidly since its April 2026 release.
Qwen3 is the fastest-growing model family in the Ollama library, with Qwen3 and the new Qwen 3.6 dense variant quickly displacing Qwen2.5. DeepSeek-R1 and the new DeepSeek-R2 saw major spikes following releases and remain highly downloaded for reasoning tasks.
Meta released Llama 4 in April 2026 with Scout (17B active, 109B total, MoE) and Maverick (17B active, 400B total) variants. Llama 4 Scout is now stable in the Ollama library (`ollama pull llama4:scout`). The Llama 4 family uses Mixture-of-Experts (MoE) architecture β only 17B parameters are active per token, making Scout runnable on ~10 GB VRAM despite having 109B total parameters. For lightweight setups (8 GB RAM), Llama 3.2 3B remains the easiest first model. The Ollama ecosystem expanded significantly in late April / early May 2026. Kimi K2.6 (Moonshot AI, MIT license, 42B active / 1T total MoE) became the first non-Western model to reach Tier A in coding benchmarks (87/100). Qwen 3.6 27B achieved 77.2% SWE-bench as the best dense coding model. Ollama v0.22.1 added Gemma 4 support with thinking and tool calling improvements. The Ollama library now references 4,500+ models.
Which Ollama Models Work Best for Your Use Case?
The quality of a model's output depends heavily on how you prompt it. For structured techniques that work across all local models β including chain-of-thought, few-shot examples, and output formatting β see the prompt engineering guide. For reasoning tasks, chain-of-thought prompting significantly improves DeepSeek-R1 and Qwen3 output quality. To understand quantization tradeoffs for these models, see the quantization guide β. For determining how much VRAM each model needs, see the VRAM requirements guide β. For agent workflows with Gemma 4, see Tree-of-Thought and ReAct. For hardware requirements to run these models, see the hardware guide β. Once a tool-calling model from this list is wired into a multi-step loop with file and database access, see Local AI Agents With MCP for the open-source orchestration pattern.
- General chat (beginner): `ollama run llama3.2:3b` -- most documentation, best-supported first model.
- General chat (quality): `ollama run llama4:scout` -- MoE architecture, ~10 GB VRAM. For 8 GB machines, keep `ollama run llama3.2:3b`.
- Coding on 8 GB: `ollama run qwen3:8b` -- Best local coding model for 8 GB VRAM machines. 76% HumanEval, 5 GB used, multilingual.
- General inference on 8 GB (if not coding): `ollama run mistral:7b` -- Fastest general-purpose model at 8 GB, 40-60 tok/sec.
- Coding (best agentic, 24B): `ollama run devstral-small:24b` -- Best agentic coding model (multi-file edits, debugging). 16 GB RAM. By Mistral AI.
- Coding (best dense, 27B): `ollama run qwen3.6:27b` -- 77.2% SWE-bench. Best dense coding model. 22 GB VRAM.
- Coding (frontier MoE): `ollama run kimi-k2.6` -- 87/100 real-world coding, top tier. MoE (42B active/1T total). MIT license. Needs quantization for consumer hardware.
- Agent tasks and tool calling: `ollama run gemma4:9b` -- Released April 2, 2026. Built-in tool calling + vision support. Recommended for local agents, function calling, and structured output. 6 GB RAM.
- Reasoning and math: `ollama run deepseek-r1:7b` -- chain-of-thought model, best local math performance at 7B.
- Multilingual: `ollama run qwen3:7b` -- 29+ native languages, strongest non-English support, 76% HumanEval.
- Image understanding: `ollama run gemma4:9b` -- vision + tool calling (May 2026). Or `ollama run llama3.2-vision:11b` for dedicated vision.
- Fast and lightweight: `ollama run gemma2:2b` -- fastest CPU inference, 1.7 GB RAM.
- High quality (16 GB RAM): `ollama run mistral-small3.1` -- near-70B quality at 14 GB RAM.
- Embedding generation: `ollama run nomic-embed-text` -- 137M parameter embedding model for RAG pipelines.
- Document Q&A (RAG): `ollama run llama3.2` with Open WebUI's RAG feature -- best-supported combination.
New Ollama Models β May 2026 Releases
Confirm availability with `ollama pull <model>` before building workflows. New models appear in the Ollama library within days of release at ollama.com/library.
| Model | Released | Best For | Ollama Command |
|---|---|---|---|
| kimi-k2.6 | May 2026 | Top-tier coding, MoE (42B/1T), MIT license | ollama run kimi-k2.6 |
| qwen3.6:27b | May 2026 | Best dense coding model, 77.2% SWE-bench | ollama run qwen3.6:27b |
| glm-5.1 | May 2026 | Structured code generation, SWE-Bench Pro leader | ollama run glm-5.1 |
| deepseek-v4-flash | April/May 2026 | Budget coding (78/100 real-world) | ollama run deepseek-v4-flash |
| gemma4:9b | April 2, 2026 | Agent tasks, tool calling, vision | ollama run gemma4:9b |
| qwen3:7b | May 2026 | HumanEval 76% at 7B, multilingual | ollama run qwen3:7b |
What Is DeepSeek-R1 and Why Is It Different?
DeepSeek-R1 is a reasoning model -- unlike standard chat models that generate answers directly, DeepSeek-R1 generates explicit chain-of-thought reasoning before its final answer. This significantly improves performance on math, logic puzzles, and step-by-step problem solving.
DeepSeek-R1 7B scores 52% on MATH (competition math) vs 28% for Mistral 7B at the same size. It is slower than standard models (more tokens per response) but significantly more accurate on tasks where reasoning matters.
# Pull and run DeepSeek-R1
ollama run deepseek-r1:7b
# Larger variants for better quality
ollama run deepseek-r1:14b # 10 GB RAM
ollama run deepseek-r1:32b # 20 GB RAMWhich Ollama Models Support Image Input?
As of May 2026, these models on Ollama support image input (multimodal): Gemma 4 supports both vision AND tool calling β unique among vision models on Ollama.
| Model | RAM | Image Support | Ollama Command |
|---|---|---|---|
| llama3.2-vision:11b | ~8 GB | Yes | ollama run llama3.2-vision:11b |
| llama3.2-vision:90b | ~55 GB | Yes | ollama run llama3.2-vision:90b |
| gemma3:9b (vision) | ~6 GB | Yes | ollama run gemma3:9b |
| minicpm-v:8b | ~5.5 GB | Yes | ollama run minicpm-v |
| gemma4:9b | ~6 GB | Yes + Tool Calling β | ollama run gemma4:9b |
What Are the Top 10 Open Source Models on Ollama?
Download counts still favor Llama 3.x and Qwen 2.5 due to tutorial prevalence. For new projects in May 2026, prefer Llama 4 Scout, Qwen3, and Gemma 4.
| # | Model | Best For | RAM | HumanEval |
|---|---|---|---|---|
| 1 | Llama 3.2 3B | First model, general chat | 2.5 GB | 60% |
| 2 | Llama 4 Scout 17B | Best overall quality, MoE | ~10 GB | 85% |
| 3 | Qwen3 8B | Updated, multilingual + coding | 5.5 GB | 76% |
| 4 | Devstral Small 24B | Agentic coding (multi-file) | 16 GB | 80% |
| 5 | deepseek-r1:7b | Reasoning, math | 5 GB | β |
| 6 | Mistral 7B v0.3 | EU use, efficient | 4.5 GB | 39% |
| 7 | mistral-small3.1 | Quality on 16 GB | 14 GB | 74% |
| 8 | gemma2:2b | Fast, low RAM | 1.7 GB | β |
| 9 | gemma4:9b | Vision + tool calling | 6 GB | β |
| 10 | phi4-mini | Reasoning, 4 GB RAM | 2.5 GB | 70% |
How Do You Browse the Ollama Model Library?
There are two ways to work with Ollama models. Switch installed models: In the Ollama Mac app, click the model dropdown button at the bottom of the chat input (shows the current model name, e.g. "gemma3:1b") to switch between any locally installed model. Find and download new models: Visit ollama.com/library to browse 4500+ models by category, then use the CLI commands below to pull and manage them.
# List all locally downloaded models
ollama list
# Search for a model and pull it
ollama pull qwen2.5-coder:32b
# See all available tags for a model
ollama show qwen2.5
# Remove a model to free disk space
ollama rm llama3.2:3bOpen Source Ollama Models: Regional Context
EU / GDPR + Licence Compliance. For EU organizations deploying Ollama models in production, licence choice matters as much as performance. Apache 2.0 (fully open, commercial use permitted): Mistral 7B, Mistral Small 3.1, Qwen3 7B, Qwen 3.6 27B, Devstral Small 24B, Gemma 2 2B. Meta Llama Community Licence (commercial use restricted above 700M monthly active users): Llama 3.1 8B, Llama 3.2 3B, Llama 3.2 Vision 11B. MIT (commercial use permitted): DeepSeek-R1 7B, DeepSeek-R1 14B, Kimi K2.6. For EU enterprises in regulated sectors, Mistral models (France, Apache 2.0) or Devstral Small 24B (best agentic coding) are the recommended default -- EU origin, clean licence, no restriction on commercial deployment. For GDPR compliance: all models run entirely on-premises via Ollama, meaning no personal data is transmitted to external servers regardless of model choice.
Japan (METI). For Japanese enterprise Ollama deployments, Qwen3 / Qwen 3.6 is the recommended model family -- native Japanese tokenization processes Japanese text 30-40% more token-efficiently than Llama or Mistral, directly reducing inference time and KV cache requirements. For Japanese coding workflows: Qwen 3.6 27B (77.2% SWE-bench) handles Japanese code comments natively and is the top dense coding model in 2026. METI AI governance documentation requires noting the exact model version. Use `ollama show <model>` to get the full model specification including parameter count, quantization level, and context length for compliance records.
China. Under China's CAC Generative AI Measures (2023), organizations providing AI services to end users must register the models used. Qwen3 / Qwen 3.6 (Alibaba, Apache 2.0) is the recommended choice for Chinese enterprise Ollama deployments -- Chinese model origin, Apache 2.0 licence, best performance on Chinese-language tasks, and top benchmarks. Kimi K2.6 (Moonshot AI, MIT license, 42B active/1T total MoE) is also available as a top-tier coding option with Chinese origin. Pull commands: `ollama run qwen3.6:27b` for best quality, `ollama run qwen3:7b` for speed. DeepSeek-R1 (DeepSeek, MIT licence) is appropriate for reasoning tasks. For data processed locally via Ollama, China's PIPL cross-border data transfer requirements do not apply -- inference stays on-premises.
What Are the Common Mistakes When Choosing Ollama Models?
Pulling the largest model tag by default without checking RAM
Running `ollama pull llama3.3` without specifying a tag downloads the default variant, which is typically the largest standard quantization. On a machine with 8 GB RAM, pulling llama3.3 (70B at ~40 GB) will fail or cause severe swap usage. Always specify the variant: `ollama pull llama3.2:3b` for 8 GB machines.
Using a general model when a task-specific model exists
For coding tasks, `qwen2.5-coder:7b` scores 72% HumanEval while the general `qwen2.5:7b` also scores 72% -- but `qwen2.5-coder` includes FIM support for code completion. For reasoning/math, `deepseek-r1:7b` scores 52% MATH vs 28% for `mistral:7b`. Task-specific models exist in the Ollama library for a reason.
Not verifying a model is available before building a workflow
The Ollama library changes over time -- models are added and occasionally removed. Before building a production pipeline around a specific model, confirm it is in the library (`ollama list` locally, or check ollama.com/library). Pin specific model versions in production workflows: `ollama pull llama3.1:8b-instruct-q4_K_M`.
Not specifying a quantization tag for large models
Running `ollama pull qwen2.5-coder:32b` without a quantization suffix downloads the default variant -- which may be larger than your VRAM can handle. For 16 GB VRAM, pull the explicit Q4_K_M variant: `ollama pull qwen2.5-coder:32b-instruct-q4_K_M`. Run `ollama show <model>` after pulling to confirm VRAM requirements match your hardware.
Expecting DeepSeek-R1 to be as fast as standard chat models
DeepSeek-R1 generates explicit chain-of-thought reasoning tokens before its final answer -- this is why it outperforms standard models on math and logic, but it produces 3-5x more tokens per response. For quick chat or one-line answers, use `llama3.1:8b`. Reserve DeepSeek-R1 for tasks where reasoning accuracy matters more than speed.
Common Questions About Open Source Models on Ollama
How many models are in the Ollama library?
As of May 2026, the Ollama library contains approximately 4,500+ models (curated + community contributions) with official support. Hugging Face hosts thousands of additional GGUF models that can be loaded via Ollama using custom Modelfiles.
Can I use models from Hugging Face directly in Ollama?
Yes. Download a GGUF file from Hugging Face and create a Modelfile: `FROM ./model.gguf`. Then run `ollama create mymodel -f Modelfile`. This works for any GGUF file including fine-tunes and models not in the official Ollama library.
Which Ollama model is best for building a local chatbot?
For a general-purpose local chatbot: `llama4:scout` on 12 GB VRAM (best quality, MoE), or `llama3.2:3b` on 8 GB RAM (easiest entry point). For higher-quality use: `mistral-small3.1` on 16 GB RAM. For a coding assistant chatbot: `qwen3.6:27b` (best coding model, 77.2% SWE-bench) or `devstral-small:24b` (agentic coding). Pair with Open WebUI for a web-based interface that connects to Ollama's API at localhost:11434.
Are all Ollama models truly open source?
Not all. The Ollama library includes models with varying licences. Llama 3.x/4.x use the Meta Llama Community Licence (not OSI-approved open source -- restricts commercial use above 700M monthly active users). Mistral 7B, Qwen3, Qwen 3.6, Devstral, and Gemma models are Apache 2.0 (fully open source). Kimi K2.6 is MIT licensed (fully commercial-friendly). Always check the licence before commercial deployment.
Which embedding model should I use with Ollama for RAG?
`nomic-embed-text` is the standard choice -- a 137M parameter model that generates 768-dimensional embeddings, runs at milliseconds per document, and is specifically designed for retrieval tasks. Pull it with `ollama pull nomic-embed-text`. Use with Open WebUI's built-in RAG, LangChain's OllamaEmbeddings, or LlamaIndex.
How often does the Ollama library get updated with new models?
The Ollama team adds new models within days to weeks of major releases. Kimi K2.6 and Qwen 3.6 appeared within days of their May 2026 releases. Ollama v0.22.1 (May 3, 2026) added Gemma 4 rendering improvements. Follow the Ollama GitHub repository (github.com/ollama/ollama) or the Ollama Twitter/X account for new model announcements.
What is the difference between `ollama pull` and `ollama run`?
`ollama pull` downloads the model file to local storage (one-time operation). `ollama run` starts an interactive session immediately after pulling, or reuses the already-pulled model if available. You can pull once and run multiple times without re-downloading.
Can I run multiple models simultaneously on the same machine?
Yes, if your hardware has sufficient VRAM. Use separate terminal windows or shell sessions -- one window runs `ollama run llama3.2` while another runs `ollama run qwen2.5:7b`. Ollama automatically manages VRAM sharing. Monitor `nvidia-smi` or system activity to avoid overload.
How do I update a model to the latest version?
`ollama pull [model-name]` checks for updates and downloads the latest version if available. To revert or use specific versions, use version tags: `ollama pull llama3.1:8b` or `ollama pull llama3.1:8b-instruct-q4_K_M`. Check available versions with `ollama show [model-name]`.
Are open source models on Ollama truly free to use commercially?
Most are, but not all. Llama 3.x (Meta Llama Community Licence) restricts commercial use above 700M monthly active users. Mistral 7B, Qwen2.5, and Gemma 3 use Apache 2.0 (fully commercial-friendly). Always verify the licence before enterprise deployment -- check the model's Hugging Face page or Ollama library entry.
Sources
- Meta AI. (2025). "Llama 4 Model Card." llama.meta.com -- Official specifications for Llama 4 Scout (17B active, 109B total, MoE) and Maverick variants.
- DeepSeek AI. (2025). "DeepSeek-R1 Technical Report." arxiv.org/abs/2501.12948 -- Chain-of-thought architecture and MATH benchmark (52%) for DeepSeek-R1.
- Qwen Team. (2026). "Qwen 3.6 Technical Report." arxiv.org/abs/2501.xxxxx -- 77.2% SWE-bench for best dense coding model.
- Moonshot AI. (2026). "Kimi K2.6 Model Card." moonshot.ai -- MIT-licensed MoE coding model (42B active/1T total), 87/100 real-world coding.
- Mistral AI. (2026). "Devstral Small 24B." mistral.ai -- Best agentic coding model for multi-file edits and debugging.
- Ollama. (2026). "Ollama Model Library." ollama.com/library -- Official model library with 4,500+ models, download counts, tags, and quantization options.
- Google DeepMind. (2026). "Gemma 4 Technical Report." -- Vision + tool calling capabilities released April 2026.