Can You Run RAG on 2 GB RAM?
Quick Answer
Yes — but only for small personal document sets using Llama 3.2 1B (~750 MB) with MiniLM-L6-v2 embeddings (~80 MB) and an in-memory vector store fitting ~1.3–1.5 GB total on a 2 GB device. Larger models (7B+) and larger document sets (200+ pages) need 8 GB minimum.
- ▸Llama 3.2 1B Q4_K_M (~750 MB) + MiniLM-L6-v2 embeddings (~80 MB) fits 2 GB
- ▸Document set must be under ~200 pages to stay within RAM
- ▸7B+ models or larger corpora need at least 8 GB RAM
Updated: 2026-05
Yes — But Only Tiny Setups Work
At 2 GB RAM, the only viable RAG pipeline uses a 1B-class LLM (Llama 3.2 1B or Phi-3 Mini) with a lightweight embedding model (MiniLM-L6-v2 at ~80 MB) and a flat-file or in-memory vector store. As of May 2026, this works — but only for small personal document sets (under ~200 pages).
The table below shows the RAM footprint of each RAG component at minimum viable settings.
| Component | Memory Use | Notes |
|---|---|---|
| LLM (Llama 3.2 1B Q4_K_M) | ~750 MB | Smallest usable instruction-tuned model |
| Embedding model (MiniLM-L6-v2) | ~80 MB | Runs on CPU; no GPU required |
| Vector store (Chroma in-memory) | ~150 MB | Scales with corpus size |
| Python runtime + framework overhead | ~300 MB | LangChain or bare llama-index |
| Total minimum | ~1.3–1.5 GB | Leaves ~500 MB for OS on a 2 GB device |
What Breaks at 2 GB
The most common failure is the LLM exceeding available RAM during context window expansion. At 2 GB, a 1B model context is capped at roughly 2k tokens before the OS starts swapping. Loading a 7B or larger model fails immediately — Llama 3 8B Q4_K_M requires ~5 GB alone.
The second failure mode is vector store growth. A Chroma database for 500 PDF pages uses approximately 400–600 MB depending on chunk size. Combined with the LLM and embedding model, total RAM exceeds 2 GB. The fix: limit ingestion to under 150 pages, use 256-token chunks, and prune the store after each session.
Quick Answers About RAG on 2 GB RAM
What's the smallest LLM that works for RAG?▾
Can I use Ollama on 2 GB RAM?▾
Will Raspberry Pi 5 (8 GB) run proper RAG?▾
Is local RAG worth doing on 2 GB RAM?▾
Want the full breakdown?
Read the complete guide →Related Prompt Bites