Best Mini PC for Local LLM?

Read in:

🇺🇸en 🇩🇪de 🇫🇷fr 🇯🇵ja 🇨🇳zh 🇪🇸es 🇧🇷pt 🇸🇦ar 🇰🇷ko

Quick Answer

Three mini PCs stand out for local LLM inference: Mac Mini M4 delivers ~18 tok/s with unified memory and zero VRAM bottleneck, Minisforum UM790 Pro scales to 64 GB DDR5 for larger models, and Beelink SER8 offers value at ~8 tok/s with Ryzen 9 8845HS. All three run 7–13B Q4 models without a discrete GPU.

▸Mac Mini M4: fastest for LLMs, ~18 tok/s on Llama 3 8B, power-efficient
▸Minisforum UM790 Pro: AMD Radeon 780M iGPU, up to 64 GB unified RAM
▸Beelink SER8: Ryzen 9 8845HS, ~8 tok/s, budget-friendly alternative

Updated: 2026-05

Model ComparisonsIntermediate

Key Takeaways

✓Mac Mini M4 starts at ~599 USD, uses Apple Metal for GPU acceleration, and reaches ~18 tok/s on a 7B Q4 model using only ~30 W under load
✓Minisforum UM790 Pro (AMD Ryzen 9 7940HS) supports up to 64 GB DDR5 RAM and ~8 tok/s on a 7B model via ROCm on Linux
✓Apple Silicon's unified memory architecture is the key advantage — the M4's RAM is shared between CPU and GPU with no VRAM bottleneck
✓Beelink SER8 (Ryzen 9 8845HS) is the budget pick: same ~8 tok/s as the UM790 Pro but CPU-based inference, lower power draw, and no Linux ROCm setup required

Mac Mini M4 Leads on Speed and Efficiency

The Mac Mini M4 achieves ~18 tokens per second on a 7B Q4 model, consumes ~30 W under load, and starts at approximately 599 USD — making it the fastest mini PC for local LLM inference. The M4 chip uses unified memory architecture, meaning the same physical RAM is shared between CPU and GPU with no memory copy overhead. For users prioritizing speed, the M4 is the top choice.

The Minisforum UM790 Pro is the scaling option: AMD Ryzen 9 7940HS with Radeon 780M iGPU, up to 64 GB DDR5 configured as unified memory, and ~8 tok/s on Linux with ROCm. The Beelink SER8 (Ryzen 9 8845HS) matches the UM790 Pro on throughput but uses CPU-only inference — no discrete GPU required — making it the budget-friendly choice for users on Windows or Linux who want to avoid ROCm setup.

The table below compares the three mini PCs across CPU/GPU, best memory configuration, and measured LLM speed.

Mini PC	CPU/GPU	Best Config	LLM Speed (7B Q4)
Mac Mini M4	Apple M4	16 GB unified	~18 tok/s
Minisforum UM790 Pro	Ryzen 9 7940HS	64 GB DDR5	~8 tok/s
Beelink SER8	Ryzen 9 8845HS	64 GB DDR5	~8 tok/s

Unified Memory Is the Key Differentiator for LLM Performance

Standard mini PCs with discrete GPU slots are not useful for LLM inference because the GPU VRAM is fixed at the factory — typically 4–8 GB — and cannot be expanded. The Mac Mini M4 and UM790 Pro solve this via GPU-based inference with unified memory. The Beelink SER8 takes a different approach: its Ryzen 9 8845HS uses CPU-only inference, which is slower but requires no GPU setup.

The Mac Mini M4 with 16 GB unified memory outperforms the UM790 Pro with 32 GB DDR5 on raw inference speed because Apple's memory bandwidth (~68 GB/s) and Metal GPU acceleration are more efficient than the Radeon 780M iGPU. The UM790 Pro's advantage is the ability to expand to 64 GB, which allows running larger models such as 13B and 30B Q4 that do not fit in 16 GB.

For a full guide to hardware selection for local LLM, see the best Ollama frontend overview which covers the software side of local LLM setup.

For Japan-specific mini PC recommendations with Amazon.co.jp links and JPY prices, see our best mini PC for local LLMs in Japan guide.

Related Guides

▸Best SSD for Fast Model Loading -- best SSD for fast model loading
▸Strix Halo + Ollama + Vulkan: Performance Guide -- Strix Halo Ollama Vulkan setup

Quick Answers About Mini PCs for Local LLMs

Can the Mac Mini M4 run a 13B model locally?▾

Yes, with the 16 GB version at Q4 quantization the model fits with ~1 GB to spare. The 32 GB Mac Mini M4 Pro can comfortably run 13B and 30B Q4 models. Inference speed drops to ~10 tok/s for 13B Q4 on the base 16 GB M4.

Does the Minisforum UM790 Pro need ROCm for GPU acceleration?▾

Yes. On Linux, Ollama and llama.cpp support the Radeon 780M iGPU via ROCm. On Windows, Ollama uses DirectML for AMD iGPU acceleration, which typically yields lower performance than ROCm on Linux. For fastest inference on the UM790 Pro, use Linux with ROCm.

Is the Mac Mini M4 good enough for coding with a 7B model?▾

Yes. At ~18 tok/s with a 7B Q4 model, the Mac Mini M4 produces tokens fast enough for interactive code completion. Response latency for a 200-token completion is approximately 11 seconds — practical for non-real-time coding assistance.

What is the maximum model size the UM790 Pro can run at full speed?▾

With 64 GB DDR5 configured as unified memory on Linux with ROCm, the UM790 Pro can run a 30B Q4 model (~18 GB) at approximately 3–4 tok/s. A 13B Q4 model (~8 GB) runs at ~6 tok/s. See the Ollama frontend guide for software setup to run these models.

When should I pick the Beelink SER8 over the Mac Mini M4 or UM790 Pro?▾

Choose Beelink SER8 if you: (1) want to avoid GPU drivers and ROCm on Linux; (2) prioritize budget over speed (it's cheaper than both); (3) run Windows and don't want to use DirectML; (4) do occasional inference at ~8 tok/s and prefer CPU-based inference simplicity. It won't beat the Mac Mini M4 on speed or the UM790 Pro on scalability, but it's the easiest CPU-only option.

Want the full breakdown?

Read the complete guide →

Related Prompt Bites

← Back to Prompt Bites