What Is RAG?
RAG: Retrieve relevant documents, inject into prompt, then generate response. Keeps model factual without fine-tuning.
When Prompt Engineering Alone Fails
- Knowledge-heavy tasks (company docs, product Q&A)
- Up-to-date information (recent news, current prices)
- Specific facts (customer history, technical specs)
- Multi-source synthesis (combining docs, data)
Prompt Engineering vs RAG
| Task | Prompt Eng | RAG |
|---|---|---|
| General reasoning | β | Not needed |
| Factual accuracy | β | Essential |
| Up-to-date info | β | Yes |
| Cost per call | β | Higher (retrieval + LLM) |
| Latency | β | Slower (retrieval delay) |
When to Add RAG
- Need 90%+ factual accuracy
- Knowledge changes frequently
- Multi-source synthesis
- Company-specific information
RAG Implementation Steps
- 1Choose retriever (dense embedding, keyword, hybrid)
- 2Build knowledge base (documents, chunks)
- 3Embed documents into vector store
- 4At runtime: retrieve + inject into prompt
- 5Evaluate accuracy on gold standard
Common RAG Patterns
- Simple retrieval: Search docs, inject context
- Multi-hop: Retrieve, reason, retrieve again
- Hierarchical: Summary retrieval, then detail retrieval
- Hybrid: Keyword + semantic search
Sources
- OpenAI. RAG patterns
- LangChain. RAG documentation
- Anthropic. Context-aware generation
Common Mistakes
- Adding RAG without baseline prompting
- Poor chunk size (too small = fragmentation, too large = noise)
- Not evaluating retrieval quality separately from generation
- Over-relying on retrieval (garbage in β garbage out)