Quick Answer
Qwen 2.5 Coder 14B is the top 14B coding model for local use. It fits in 10 GB VRAM at Q4_K_M and scores highest on HumanEval among 14B models. DeepSeek Coder 14B is a strong alternative with similar VRAM requirements.
Updated: 2026-05
Key Takeaways
As of May 2026, Qwen 2.5 Coder 14B at Q4_K_M quantization scores 78.4% on HumanEval β the highest of any 14B model available through Ollama or llama.cpp. The model was fine-tuned on over 5 trillion tokens of code-focused data, which distinguishes its performance on multi-step completion and test-case generation.
DeepSeek Coder 14B scores 75.1% on HumanEval under identical Q4_K_M conditions. The gap is small enough that DeepSeek Coder is a valid choice, particularly if you already have it cached or are familiar with its output style.
StarCoder2 15B is the third pick for open-source code-focused work. Trained on The Stack v2, it scores approximately 73% on HumanEval at ~10 GB VRAM Q4_K_M. Its strengths are open-source contribution tasks, code search across large repositories, and structured refactoring β use cases where its training corpus gives it an edge over general instruction-tuned models.
| Model | HumanEval | VRAM (Q4_K_M) |
|---|---|---|
| Qwen 2.5 Coder 14B | 78.4% | ~10 GB |
| DeepSeek Coder 14B | 75.1% | ~10 GB |
| StarCoder2 15B | ~73% | ~10 GB |
Both Qwen 2.5 Coder 14B and DeepSeek Coder 14B require approximately 10 GB VRAM at Q4_K_M, leaving only 2 GB headroom on a 12 GB card. This margin is tight for long-context sessions: at 8k context, VRAM usage climbs to ~11.5 GB. If your workflow involves large files, prefer a card with 16+ GB.
For context windows below 4k tokens β the common case for single-file code completion β all three models run comfortably on an RTX 3060 12 GB or RTX 3080 Ti 12 GB. Speed is approximately 14β18 tok/s for Qwen and DeepSeek Coder; StarCoder2 15B runs at similar throughput given its comparable VRAM footprint. Prefer StarCoder2 when your workflow centers on repository-scale search or open-source contribution patterns.
For a broader comparison of coding models at other sizes and VRAM tiers, see the best coding LLM for 12 GB VRAM guide.