Quick Answer
Three mini PCs stand out for local LLM inference: the Mac Mini M4 (fastest, ~18 tok/s), the Minisforum UM790 Pro (most RAM, 64 GB DDR5), and the Beelink SER8 (best value, Ryzen 9 8845HS CPU). All three run 7–13B Q4 models without a discrete GPU.
Updated: 2026-05
Key Takeaways
The Mac Mini M4 achieves ~18 tokens per second on a 7B Q4 model, consumes ~30 W under load, and starts at approximately 599 USD — making it the fastest mini PC for local LLM inference. The M4 chip uses unified memory architecture, meaning the same physical RAM is shared between CPU and GPU with no memory copy overhead. For users prioritizing speed, the M4 is the top choice.
The Minisforum UM790 Pro is the scaling option: AMD Ryzen 9 7940HS with Radeon 780M iGPU, up to 64 GB DDR5 configured as unified memory, and ~8 tok/s on Linux with ROCm. The Beelink SER8 (Ryzen 9 8845HS) matches the UM790 Pro on throughput but uses CPU-only inference — no discrete GPU required — making it the budget-friendly choice for users on Windows or Linux who want to avoid ROCm setup.
The table below compares the three mini PCs across CPU/GPU, best memory configuration, and measured LLM speed.
| Mini PC | CPU/GPU | Best Config | LLM Speed (7B Q4) |
|---|---|---|---|
| Mac Mini M4 | Apple M4 | 16 GB unified | ~18 tok/s |
| Minisforum UM790 Pro | Ryzen 9 7940HS | 64 GB DDR5 | ~8 tok/s |
| Beelink SER8 | Ryzen 9 8845HS | 64 GB DDR5 | ~8 tok/s |
Standard mini PCs with discrete GPU slots are not useful for LLM inference because the GPU VRAM is fixed at the factory — typically 4–8 GB — and cannot be expanded. The Mac Mini M4 and UM790 Pro solve this via GPU-based inference with unified memory. The Beelink SER8 takes a different approach: its Ryzen 9 8845HS uses CPU-only inference, which is slower but requires no GPU setup.
The Mac Mini M4 with 16 GB unified memory outperforms the UM790 Pro with 32 GB DDR5 on raw inference speed because Apple's memory bandwidth (~68 GB/s) and Metal GPU acceleration are more efficient than the Radeon 780M iGPU. The UM790 Pro's advantage is the ability to expand to 64 GB, which allows running larger models such as 13B and 30B Q4 that do not fit in 16 GB.
For a full guide to hardware selection for local LLM, see the best Ollama frontend overview which covers the software side of local LLM setup.