Skip to main content
PromptQuorumPromptQuorum

Best Mini PC for an Always-On Ollama Server (2026)

Hardware & PerformanceIntermediate

Key Takeaways

  • Mini PCs draw 15–45 W vs 200–350 W for desktop GPUs — 24/7 savings matter
  • UM890 Pro runs 7B models CPU-only at 12–18 tok/s; fine for API server use
  • AOOSTAR GEM12 Pro + OCuLink eGPU unlocks GPU acceleration without a desktop PC
  • Mac Mini M4 Pro: 48 GB unified memory runs 32B models — best macOS option
  • Beelink SER8 is the <$400 starting point — 32 GB RAM for 7B and 13B

Best Mini PCs for Always-On Ollama Server — Ranked

Always-On Electricity Cost Comparison

At $0.15/kWh (US average), running 24/7 for 30 days:

Quick Answers

Can a mini PC run 13B or larger models at useful speed?
Yes — with enough RAM. The Minisforum UM890 Pro with 64 GB runs Llama 3.1 13B Q8 entirely in RAM at ~8–12 tok/s CPU-only. With the Radeon 780M iGPU accelerating, Q4 models run at 10–18 tok/s — usable for background summarization or API calls. Interactive chat benefits from at least 12–15 tok/s. For 30B+ models, the Mac Mini M4 Pro (48 GB unified memory) is the only mini PC option under $1500.
Does Ollama work well as a network server on a mini PC?
Yes. Set OLLAMA_HOST=0.0.0.0 in your environment and Ollama serves requests from any device on your LAN. Pair with Open WebUI (Docker container) for a browser-based interface accessible from phones, tablets, and PCs. The mini PC draws low power, runs silently, and handles one concurrent request at a time without issue.
What about eGPU setups — are they worth it?
For Ollama specifically, an OCuLink eGPU (AOOSTAR GEM12 Pro + RTX 3090 enclosure) is the best of both worlds: desktop GPU speed with mini PC power draw when idle. OCuLink (PCIe 4.0 x4) delivers ~80% of the bandwidth of a direct PCIe x16 slot — enough for LLM inference with minimal bottleneck. Thunderbolt eGPUs are slower (~40% bandwidth) and not recommended for GPU-intensive inference.

Want the full breakdown?

Read the complete guide →