A comprehensive checklist to optimize every stage of your RAG pipeline.
Remove duplicates, fix encoding issues, and normalize text formatting.
Include source, date, category, and other relevant metadata for filtering.
Build a golden dataset of query-document pairs for measuring retrieval quality.
Test 256, 512, 1024 token chunks. Smaller for precision, larger for context.
Use 10-20% overlap to preserve context across chunk boundaries.
Split on paragraph/section boundaries rather than fixed token counts.
Store parent document ID for context expansion when needed.
Compare OpenAI, Cohere, BGE, E5 on your specific domain data.
Fine-tune embeddings on your corpus for specialized terminology.
Balance between quality (higher dims) and cost/speed (lower dims).
Start with k=5-10, increase if recall is low, decrease if precision suffers.
Filter out low-relevance results below a minimum similarity score.
Combine vector search with BM25/keyword search for better coverage.
Use metadata to narrow search scope (date range, category, source).
Test if reranking improves relevance enough to justify latency cost.
Test Cohere Rerank, BGE Reranker, cross-encoders on your data.
Retrieve more candidates (k=20-50) then rerank to top 5-10.
Clear instructions, context placement, and output format specification.
Instruct LLM to cite sources and indicate uncertainty.
Lower temperature for factual responses, appropriate length limits.
Graceful responses when retrieval returns no relevant results.
Monitor precision@k, recall@k, MRR, and NDCG over time.
Evaluate how well responses are grounded in retrieved context.
Categorize and track retrieval failures for systematic improvement.
Automated tests against golden dataset on every change.
Use RAGDebugger to visualize, diagnose, and improve your pipeline.
Start Free Trial