Quick Answer
Llama 3 8B Q4_K_M is the best local LLM for a 16 GB RAM laptop without a dedicated GPU. It uses ~5 GB RAM and runs at ~3–5 tok/s on a modern CPU. Mistral 7B Q4_K_M is a slightly faster alternative. Both work on all major laptop CPUs.
Updated: 2026-05
Key Takeaways
With 16 GB of system RAM and no dedicated GPU, Llama 3 8B Q4_K_M is the practical ceiling — it uses approximately 5 GB RAM and runs at 3–5 tokens per second on a modern x86 laptop CPU. After the OS and other processes, a 16 GB laptop typically has 10–12 GB free, leaving room for the model and a generous context window.
Mistral 7B Q4_K_M uses a similar 5 GB of RAM and typically runs 10–20% faster than Llama 3 8B on the same hardware, reaching ~4–6 tok/s. The speed difference comes from Mistral's architectural choices that favor faster prefill. For instruction-following and coding tasks, both models perform comparably at this quantization level.
Intel Core Ultra and AMD Ryzen 7000 series CPUs run slightly faster than older laptop CPUs due to higher memory bandwidth and improved AVX-512 support. On these platforms, 5–6 tok/s is achievable on Llama 3 8B Q4_K_M.
| Model | RAM Used | Speed on x86 CPU |
|---|---|---|
| Llama 3 8B Q4_K_M | ~5 GB | ~3–5 tok/s |
| Mistral 7B Q4_K_M | ~5 GB | ~4–6 tok/s |
| Llama 3 8B Q4_K_M (Apple M3) | ~5 GB | ~15–20 tok/s |
Apple M-series laptops treat the 16 GB as unified memory shared between CPU and GPU, enabling Metal-accelerated inference at 15–20 tok/s on Llama 3 8B Q4_K_M — three to five times faster than x86 CPU-only inference. This makes interactive chat viable on Apple Silicon where it is not on x86 at the same RAM level.
On x86 laptops, CPU inference at 3–5 tok/s is best suited for two tasks: overnight batch processing such as summarizing or classifying large document sets, and single-query lookups where the user can wait 15–30 seconds for a high-quality response.
To get started, install Ollama and run ollama pull llama3:8b. For the full comparison of laptop configurations and runtime optimization tips, see the local LLM on laptop guide.
ollama pull llama3:8b to download the model, then ollama run llama3:8b to start it. No configuration required.