Ollama Commands Reference

Ollama is a command-line tool, and understanding its commands makes it much more powerful. This guide covers the essential commands: `ollama pull`, `ollama run`, `ollama list`, `ollama rm`, `ollama serve`, and advanced options like model quantization and custom Modelfiles. As of April 2026, these commands cover 95% of real-world use cases.

Key Takeaways

`ollama pull <model>` — Download a model (e.g., `ollama pull llama3.2:3b`).
`ollama run <model>` — Start a chat with a model.
`ollama list` — Show all downloaded models and their sizes.
`ollama rm <model>` — Delete a downloaded model.
`ollama serve` — Start the Ollama API server (runs automatically on Mac/Windows).
`ollama create <name> -f <modelfile>` — Build a custom model from a Modelfile.
As of April 2026, these commands are stable and cover all common use cases.

What Are the Essential Ollama Commands?

`ollama list` — Show downloaded models, disk usage, and modification date.
`ollama pull <model>` — Download a model by name (e.g., `ollama pull mistral`).
`ollama run <model>` — Start a chat session with a model.
`ollama rm <model>` — Delete a model and free up disk space.
`ollama serve` — Start the REST API server (typically runs automatically).
`ollama help` — Show all available commands.

How Do You Manage Models in Ollama?

Model management in Ollama is entirely command-based:

bash

# List all downloaded models
ollama list

# Download a model from the Ollama library
ollama pull llama3.2:3b       # 7-bit version (~2.5 GB)
ollama pull llama3.2:3b-fp16  # Full precision (~6.5 GB)

# Download specific quantization
ollama pull qwen2.5:7b-q4   # 4-bit quantization
ollama pull qwen2.5:7b-q8   # 8-bit quantization

# See disk usage
du -sh ~/.ollama/models

# Delete a model
ollama rm llama3.2:3b

# Pull from custom registry (advanced)
ollama pull localhost:5000/custom-model

How Do You Run and Serve Models?

There are two ways to use Ollama:

bash

# 1. Interactive chat (CLI)
ollama run llama3.2:3b
# Now type your prompts and press Enter

# 2. Start the API server (runs in background)
ollama serve
# API listens at http://localhost:11434/v1

# 3. Use the model via API from another terminal
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

How Do You Create Custom Models With Modelfiles?

A Modelfile is a configuration file (like a Dockerfile) that defines a custom model by starting from a base model and adding system prompts, parameters, and weights.

bash

# Create a file named Modelfile
FROM llama3.2:3b

# Add a system prompt
SYSTEM """
You are a helpful expert in machine learning.
Always explain complex concepts in simple terms.
"""

# Adjust parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9

# Build the custom model
ollama create ml-expert -f Modelfile

# Use it
ollama run ml-expert

What Quantization Options Does Ollama Support?

Quantization reduces model size and VRAM by using lower-precision numbers. Ollama supports GGUF format with multiple quantizations:

Quantization	Size (7B)	VRAM	Quality	Speed
FP16 (full precision)	14 GB	16 GB	Best	Slowest
Q8_0 (8-bit)	7 GB	8 GB	Excellent	Fast
Q6_K (6-bit)	5.5 GB	6 GB	Very good	Fast
Q5_K_M (5-bit)	5 GB	5.5 GB	Good	Very fast
Q4_K_M (4-bit)	4.7 GB	5 GB	Good	Very fast
Q3_K_M (3-bit)	3.3 GB	4 GB	Fair	Fastest

How Do You Generate Embeddings With Ollama?

Embeddings are numerical representations of text, useful for RAG (Retrieval-Augmented Generation) and semantic search.

bash

# Pull an embedding model
ollama pull nomic-embed-text  # Best for English, 137M params

# Generate embeddings
curl http://localhost:11434/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": "The quick brown fox jumps"
  }'

# Response includes embeddings as a vector of 768 dimensions

What Environment Variables Control Ollama?

Key environment variables:

`OLLAMA_HOST` — Listen address (default: 127.0.0.1:11434). Set to `0.0.0.0:11434` for network access.
`OLLAMA_MODELS` — Where to store models (default: `~/.ollama/models`).
`OLLAMA_DEBUG` — Set to `1` for detailed logs.
`OLLAMA_GPU` — GPU to use (default: auto-detect). Set to `cuda` or `rocm`.
`OLLAMA_KEEP_ALIVE` — How long to keep model in memory (default: 5 minutes).

Common Mistakes With Ollama Commands

Forgetting model tags. `ollama pull llama3.2` pulls the largest version; `ollama pull llama3.2:3b` pulls the 3B version.
Not realizing `ollama serve` runs automatically. On Mac and Windows, Ollama starts the API automatically when you launch the app. On Linux, you may need to start it manually.
Pulling the wrong quantization. Always specify the exact model tag (e.g., `qwen2.5:7b-q4`) to control VRAM usage.
Expecting Ollama to work offline after pulling. Ollama itself works offline, but models must be pulled while connected to the internet.

Common Questions About Ollama Commands

Where are Ollama models stored?

Default: `~/.ollama/models` on macOS/Linux or `%USERPROFILE%\.ollama\models` on Windows. Set `OLLAMA_MODELS` to change the location.

Can I move models between computers?

Yes. Copy the model files from `~/.ollama/models` to another computer's `~/.ollama/models`, then `ollama list` will recognize them.

How do I see active model memory usage?

Use `ollama ps` to list currently-loaded models. Models are unloaded after 5 minutes of inactivity by default.

Can I run multiple models simultaneously?

Yes, but they share VRAM. Running two 8B models requires 16 GB VRAM. Each additional model increases memory usage.

Sources

Ollama GitHub — github.com/ollama/ollama
Ollama Documentation — github.com/ollama/ollama/blob/main/docs
Ollama Model Library — ollama.ai/library

Ollama Command Guide: Every Command Explained (2026)

What Are the Essential Ollama Commands?

How Do You Manage Models in Ollama?

How Do You Run and Serve Models?

How Do You Create Custom Models With Modelfiles?

What Quantization Options Does Ollama Support?

How Do You Generate Embeddings With Ollama?

What Environment Variables Control Ollama?

Common Mistakes With Ollama Commands

Common Questions About Ollama Commands

Where are Ollama models stored?

Can I move models between computers?

How do I see active model memory usage?

Can I run multiple models simultaneously?

Related Reading

Sources