PromptQuorumPromptQuorum
ホーム/ローカルLLM/How Do You Install Ollama: Complete Setup Guide for macOS, Windows, and Linux
Getting Started

How Do You Install Ollama: Complete Setup Guide for macOS, Windows, and Linux

·8 min read·Hans Kuepper 著 · PromptQuorumの創設者、マルチモデルAIディスパッチツール · PromptQuorum

Ollama installs in under 2 minutes on macOS, Windows, and Linux. After installation, one command downloads and runs any model from the Ollama library — no Python environment, no configuration files, and no GPU required to get started. As of April 2026, Ollama supports 200+ models including Meta Llama 3.3, Qwen2.5, and Mistral.

重要なポイント

  • macOS: download the .dmg from ollama.com or run `brew install ollama` — then `ollama run llama3.2` to start chatting.
  • Windows: download the installer from ollama.com/download. Ollama runs as a background service in the system tray.
  • Linux: one curl command installs everything — `curl -fsSL https://ollama.com/install.sh | sh`.
  • Minimum requirements: 4 GB RAM for a 3B model, 8 GB RAM for a 7B model. No GPU needed to start.
  • Ollama exposes an OpenAI-compatible REST API at `http://localhost:11434` — any OpenAI SDK app can use it without code changes.

What Is Ollama and Why Use It?

Ollama is an open-source inference engine that runs large language models locally. It packages model management, the llama.cpp inference backend, and an OpenAI-compatible REST API into a single lightweight application. No Python, no conda environment, and no CUDA setup is required.

Ollama maintains a curated model library (ollama.com/library) with one-command downloads for Meta Llama 3.1, Microsoft Phi-3, Google Gemma 2, Mistral, Qwen2.5, and 100+ other models. A model is downloaded once and cached on disk — subsequent runs start in under 5 seconds.

For alternatives to Ollama, see Local LLM One-Click Installers. To compare Ollama with LM Studio, see How to Install LM Studio.

How Do You Install Ollama on macOS

There are two methods. The installer download is faster; Homebrew is better if you manage software with brew.

  1. 1Go to ollama.com/download and click "Download for macOS".
  2. 2Open the downloaded Ollama.dmg file and drag Ollama to your Applications folder.
  3. 3Launch Ollama from Applications. A llama icon appears in your menu bar — Ollama is now running as a background service.
  4. 4Open Terminal and run your first model: `ollama run llama3.2`
  5. 5The model downloads (~2 GB for llama3.2:3b) and a chat prompt appears. Type a message and press Enter.

How Do You Install Ollama on macOS with Homebrew?

bash
brew install ollama

# Start the Ollama service
ollama serve &

# Pull and run a model
ollama run llama3.2

How Do You Install Ollama on Windows

  1. 1Go to ollama.com/download and click "Download for Windows".
  2. 2Run the downloaded OllamaSetup.exe installer. Ollama installs to %LOCALAPPDATA%\Programs\Ollama.
  3. 3Ollama starts automatically and appears as a system tray icon.
  4. 4Open PowerShell or Command Prompt and run: `ollama run llama3.2`
  5. 5The model downloads on first run. Subsequent runs use the cached model.

How Do You Enable GPU Support on Windows?

Ollama on Windows automatically detects and uses NVIDIA GPUs (CUDA 11.3+) and AMD GPUs (ROCm 6+). If you have an NVIDIA RTX card, Ollama will offload model layers to VRAM automatically — no manual configuration needed. To verify GPU is being used, run `ollama run llama3.2` and check Task Manager → GPU for activity.

How Do You Install Ollama on Linux

A single command installs Ollama on any Linux distribution:

bash
curl -fsSL https://ollama.com/install.sh | sh

How Do You Run Ollama as a systemd Service on Linux?

The install script automatically registers Ollama as a systemd service. To manage it:

bash
# Check service status
systemctl status ollama

# Start / stop / restart
systemctl start ollama
systemctl stop ollama
systemctl restart ollama

# View logs
journalctl -u ollama -f

How Do You Pull and Run Your First Model in Ollama

After installing Ollama, run this command to download and start a model:

bash
# Pull a model (downloads to ~/.ollama/models)
ollama pull llama3.2

# Run it interactively
ollama run llama3.2

# Or pull and run in one step
ollama run llama3.2

Which Model Should You Start With?

For a first run, these three models cover different hardware profiles:

ModelDownload SizeRAM RequiredBest For
llama3.2:3b~2 GB4 GBFirst test — any machine
llama3.1:8b~4.7 GB8 GBGeneral use on most laptops
phi3:mini~2.3 GB4 GBFast responses, low RAM

How Do You Verify Ollama Is Working

Test the REST API directly to confirm Ollama is running and accessible:

bash
# Check Ollama is running
curl http://localhost:11434
# Expected: "Ollama is running"

# List downloaded models
ollama list

# Send a prompt via API (OpenAI-compatible)
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "What is 2+2?",
  "stream": false
}'

Which Ollama Commands Are Most Useful?

CommandWhat It Does
ollama listShow all downloaded models and their sizes
ollama pull <model>Download a model without running it
ollama rm <model>Delete a model from disk
ollama psShow models currently loaded in memory
ollama show <model>Show model details (parameters, template, licence)
ollama serveStart the Ollama server manually (if not running as service)

How Do You Troubleshoot Common Ollama Installation Issues?

Ollama says "could not connect to ollama app, is it running?"

Ollama is not running as a background service. On macOS: open the Ollama app from Applications. On Linux: run `systemctl start ollama` or `ollama serve` in a terminal. On Windows: launch Ollama from the Start menu.

The model download is very slow or stalled

Model downloads are large (2–47 GB). If the download stalls, press Ctrl+C and re-run `ollama pull <model>` — Ollama resumes partial downloads. For faster downloads, use a wired connection instead of Wi-Fi.

I get "error: model requires more system memory" when running a model

The model is too large for your available RAM. Try a smaller quantization: `ollama run llama3.1:8b-instruct-q4_0` instead of the default Q4_K_M. Or switch to a smaller model like `llama3.2:3b`. See Best Beginner Local LLM Models for RAM-matched recommendations.

Ollama is running but my GPU is not being used

On Windows, verify your NVIDIA driver is version 452.39 or higher. On Linux, confirm the NVIDIA container toolkit is installed (`nvidia-smi` should return GPU info). Ollama offloads layers to GPU automatically when VRAM is available — run `ollama ps` after starting a model to see GPU utilization.

Where are Ollama model files stored?

Models are stored at ~/.ollama/models on macOS and Linux. On Windows, the default path is C:\Users\<username>\.ollama\models. You can change the storage location by setting the OLLAMA_MODELS environment variable before starting the service.

What Should You Do After Installing Ollama?

Once Ollama is running, the next step is Run Your First Local LLM to understand prompting, context length, and what to expect from local inference speed. To pick the best model for your hardware, see Best Beginner Local LLM Models. If you prefer a graphical chat interface over the terminal, How to Install LM Studio covers the desktop app alternative.

Sources

  • Ollama Official Website — Installation downloads and official documentation
  • Ollama GitHub Repository — Source code, issues, and community discussions
  • Ollama Model Library — Curated collection of available models with download links

What Are Common Mistakes When Installing Ollama?

  • Not checking that Ollama is running as a background service before expecting the API to respond.
  • Attempting to run models larger than available RAM without checking memory requirements first.
  • Ignoring GPU detection — Ollama supports NVIDIA and AMD but requires up-to-date drivers.

PromptQuorumで、ローカルLLMを25以上のクラウドモデルと同時に比較しましょう。

PromptQuorumを無料で試す →

← ローカルLLMに戻る

How to Install Ollama | PromptQuorum