PromptQuorumPromptQuorum
Home/Local LLMs/Install Ollama: 2-Minute Setup for macOS, Windows & Linux
Getting Started

Install Ollama: 2-Minute Setup for macOS, Windows & Linux

Β·8 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

Ollama installs in under 2 minutes on macOS, Windows, and Linux. After installation, one command downloads and runs any model from the Ollama library -- no Python environment, no configuration files, and no GPU required to get started.

Ollama installs in under 2 minutes on macOS, Windows, and Linux. After installation, one command downloads and runs any model from the Ollama library -- no Python environment, no configuration files, and no GPU required to get started. As of April 2026, Ollama supports 200+ models including Meta Llama 3.3, Qwen2.5, and Mistral.

Key Takeaways

  • macOS: download the .dmg from ollama.com or run `brew install ollama` -- then `ollama run llama3.2` to start chatting.
  • Windows: download the installer from ollama.com/download. Ollama runs as a background service in the system tray.
  • Linux: one curl command installs everything -- `curl -fsSL https://ollama.com/install.sh | sh`.
  • Minimum requirements: 4 GB RAM for a 3B model, 8 GB RAM for a 7B model. No GPU needed to start.
  • Ollama exposes an OpenAI-compatible REST API at `http://localhost:11434` -- any OpenAI SDK app can use it without code changes.
  • πŸ‘‰ Before installing, confirm local is right for your use case β€” see Local LLM vs Cloud API for when cloud outperforms local inference.

Before You Install: Is Local LLM Right for Your Use Case?

Installing Ollama takes 5 minutes, but running your first model well can take 20–40 minutes if you hit GPU detection issues, driver mismatches, or RAM constraints.

If you're unsure whether local inference is the right choice for you, **compare the full local vs cloud trade-offs first** β€” you may find that starting with a cloud API (ready in 5 minutes, no troubleshooting) is the smarter path. Many users discover this after installation; better to decide now.

For users committed to local, continue below. For users evaluating cloud first, see the full comparison.

What Is Ollama and Why Use It?

Ollama is an open-source inference engine that runs large language models locally. It packages model management, the llama.cpp inference backend, and an OpenAI-compatible REST API into a single lightweight application. No Python, no conda environment, and no CUDA setup is required.

Ollama maintains a curated model library (ollama.com/library) with one-command downloads for Meta Llama 3.1, Microsoft Phi-3, Google Gemma 2, Mistral, Qwen2.5, and 100+ other models. A model is downloaded once and cached on disk -- subsequent runs start in under 5 seconds.

For alternatives to Ollama, see Local LLM One-Click Installers. To compare Ollama with LM Studio, see How to Install LM Studio.

How Do You Install Ollama on macOS?

There are two methods. The installer download is faster; Homebrew is better if you manage software with brew.

  1. 1
    Go to ollama.com/download and click "Download for macOS".
  2. 2
    Open the downloaded Ollama.dmg file and drag Ollama to your Applications folder.
  3. 3
    Launch Ollama from Applications. A llama icon appears in your menu bar -- Ollama is now running as a background service.
  4. 4
    Open Terminal and run your first model: `ollama run llama3.2`
  5. 5
    The model downloads (~2 GB for llama3.2:3b) and a chat prompt appears. Type a message and press Enter.

Install Ollama on macOS with Homebrew

bash
brew install ollama

# Start the Ollama service
ollama serve &

# Pull and run a model
ollama run llama3.2

How Do You Install Ollama on Windows?

  1. 1
    Go to ollama.com/download and click "Download for Windows".
  2. 2
    Run the downloaded OllamaSetup.exe installer. Ollama installs to %LOCALAPPDATA%\Programs\Ollama.
  3. 3
    Ollama starts automatically and appears as a system tray icon.
  4. 4
    Open PowerShell or Command Prompt and run: `ollama run llama3.2`
  5. 5
    The model downloads on first run. Subsequent runs use the cached model.

GPU Support on Windows

Ollama on Windows automatically detects and uses NVIDIA GPUs (CUDA 11.3+) and AMD GPUs (ROCm 6+). If you have an NVIDIA RTX card, Ollama will offload model layers to VRAM automatically -- no manual configuration needed. To verify GPU is being used, run `ollama run llama3.2` and check Task Manager β†’ GPU for activity.

How Do You Install Ollama on Linux?

A single command installs Ollama on any Linux distribution:

bash
curl -fsSL https://ollama.com/install.sh | sh

Run Ollama as a systemd Service on Linux

The install script automatically registers Ollama as a systemd service. To manage it:

bash
# Check service status
systemctl status ollama

# Start / stop / restart
systemctl start ollama
systemctl stop ollama
systemctl restart ollama

# View logs
journalctl -u ollama -f

How Do You Pull and Run Your First Model in Ollama?

After installing Ollama, run this command to download and start a model:

bash
# Pull a model (downloads to ~/.ollama/models)
ollama pull llama3.2

# Run it interactively
ollama run llama3.2

# Or pull and run in one step
ollama run llama3.2

Which Model Should You Start With?

For a first run, these three models cover different hardware profiles:

ModelDownload SizeRAM RequiredBest For
Llama 3.2 3B~2 GB4 GBFirst test -- any machine
Llama 3.1 8B~4.7 GB8 GBGeneral use on most laptops
phi4-mini~2.3 GB4 GBFast responses, low RAM

How Do You Verify Ollama Is Working?

Test the REST API directly to confirm Ollama is running and accessible:

bash
# Check Ollama is running
curl http://localhost:11434
# Expected: "Ollama is running"

# List downloaded models
ollama list

# Send a prompt via API (OpenAI-compatible)
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What is 2+2?",
  "stream": false
}'

Useful Ollama Commands

CommandWhat It Does
ollama listShow all downloaded models and their sizes
ollama pull <model>Download a model without running it
ollama rm <model>Delete a model from disk
ollama psShow models currently loaded in memory
ollama show <model>Show model details (parameters, template, licence)
ollama serveStart the Ollama server manually (if not running as service)

Troubleshooting Common Ollama Installation Issues

Ollama says "could not connect to ollama app, is it running?"

Ollama is not running as a background service. On macOS: open the Ollama app from Applications. On Linux: run `systemctl start ollama` or `ollama serve` in a terminal. On Windows: launch Ollama from the Start menu.

The model download is very slow or stalled

Model downloads are large (2-47 GB). If the download stalls, press Ctrl+C and re-run `ollama pull <model>` -- Ollama resumes partial downloads. For faster downloads, use a wired connection instead of Wi-Fi.

I get "error: model requires more system memory" when running a model

The model is too large for your available RAM. Try a smaller quantization: `ollama run llama3.2-instruct-q4_0` instead of the default Q4_K_M. Or switch to a smaller model like `llama3.2:3b`. See Best Beginner Local LLM Models for RAM-matched recommendations.

Ollama is running but my GPU is not being used

On Windows, verify your NVIDIA driver is version 452.39 or higher. On Linux, confirm the NVIDIA container toolkit is installed (`nvidia-smi` should return GPU info). Ollama offloads layers to GPU automatically when VRAM is available -- run `ollama ps` after starting a model to see GPU utilization.

Where are Ollama model files stored?

Models are stored at ~/.ollama/models on macOS and Linux. On Windows, the default path is C:\Users\<username>\.ollama\models. You can change the storage location by setting the OLLAMA_MODELS environment variable before starting the service.

What to Do After Installing Ollama?

Once Ollama is running, the next step is Run Your First Local LLM to understand prompting, context length, and what to expect from local inference speed. To pick the best model for your hardware, see Best Beginner Local LLM Models. If you prefer a graphical chat interface over the terminal, How to Install LM Studio covers the desktop app alternative.

Sources

  • Ollama Official Website -- Installation downloads and official documentation
  • Ollama GitHub Repository -- Source code, issues, and community discussions
  • Ollama Model Library -- Curated collection of available models with download links

Common Mistakes When Installing Ollama

  • Not checking that Ollama is running as a background service before expecting the API to respond.
  • Attempting to run models larger than available RAM without checking memory requirements first.
  • Ignoring GPU detection -- Ollama supports NVIDIA and AMD but requires up-to-date drivers.

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each provider's official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Compare your local LLM against 25+ cloud models simultaneously with PromptQuorum.

Join the PromptQuorum Waitlist β†’

← Back to Local LLMs

Install Ollama: 2-Minute Setup for macOS, Windows & Linux