English · Español

Phase 29 — Retrieval-Augmented Generation (RAG)¶

Requires: 13 — Embeddings & Representation Spaces · 28 — Fine-Tuning, LoRA, QLoRA Teaches: rag · chunking · bm25 · hnsw · hybrid-search · reranking Jump to any chapter from the phase reference index.

Chapter map¶

Pre-written per A12. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 RAG = recuperar primero, generar después. En vez de meter las reglas de gramática en los pesos del modelo, las dejamos en una base vectorial y las traemos al vuelo. Para consultas como "¿cuál es el past participle de eat?" basta con MiniGPT + un buen retriever. Esta fase construye el pipeline a mano: chunker, embedding store, BM25, reranker, lector.

Goal¶

Build a minimal RAG pipeline grounded in a hand-curated knowledge base of English-verb grammar rules (covering §A13's scope: 5 tenses, 3 persons, 20 verbs, plus Spanish pairs). Given a query like "what's the past participle of eat?" or "how do you say 'I will work' in Spanish?", the pipeline retrieves the relevant rule chunks, optionally reranks them, and asks MiniGPT (with the LoRA adapter from Phase 28) to produce a grounded answer with citations.

The artefact is a CLI: verb-tutor ask "what's the past participle of eat?" → answer + (chunk_id, score) list.

Module placement¶

The RAG toolkit lives in src/minirag/ (new this phase): chunk.py, embed.py, index.py, bm25.py, retrieve.py, rerank.py, generate.py, cli.py. The knowledge base lives in data/kb/grammar-rules/. See src/minirag/BLUEPRINT.md for the API contract.

Read order¶

theory/00-motivation.md — why RAG; the closed-book vs open-book LLM tradeoff.
theory/01-embeddings-and-biencoders.md — bi-encoders, cross-encoders, training objectives.
theory/02-chunking-and-indexes.md — chunking strategies; flat / IVF / HNSW; the structures.
theory/03-hybrid-search-and-reranking.md — BM25, dense, RRF, reranker pipelines.
theory/04-evaluation.md — hit-rate, MRR, recall@k, faithfulness, end-to-end metrics.
lab/00-kb-curation.md — curate the 50-chunk grammar-rule KB by hand.
lab/01-bi-encoder-baseline.md — implement dense retrieval; measure hit-rate@k.
lab/02-bm25-and-hybrid.md — implement BM25; combine with dense via RRF; show hybrid wins.
lab/03-end-to-end-rag.md — wire reader + retriever + CLI; measure faithfulness.

solutions/ populated at phase open after src/minirag/ API stabilizes.

Definition of Done¶

See PHASE_29_PLAN.md §6. Briefly:

KB curated (~50 chunks, §A13 verb-tense-person matrix + Spanish).
src/minirag/{chunk,embed,index,bm25,retrieve,rerank,generate,cli}.py implemented.
Hit-rate@5 ≥ 0.80; MRR ≥ 0.60 on synthetic eval set.
Hybrid beats dense-alone and BM25-alone by ≥ 5pp hit-rate@5.
Faithfulness ≥ 70% on 30-query qualitative eval.
verb-tutor ask CLI works end-to-end with citations.

What this phase intentionally does NOT cover¶

langchain / llama-index / haystack. Anti-goal §10. We hand-roll everything.
Production-scale vector DBs. No Pinecone, Weaviate, or Milvus. FAISS-flat is enough for a 50-doc KB.
Multimodal retrieval. Text-only.
Query rewriting (HyDE, multi-query). Mentioned in theory; not implemented. Revisited in Phase 32 if useful.
Streaming / serving infrastructure. Phase 33.
Fine-tuning the embedding model. Use a pretrained sentence-transformers model as-is.
Reranker training. Use a pretrained cross-encoder as-is.
Tool-calling for retrieval. Phase 31 (MCP) territory.

Phase 29's scope is a small, hand-built RAG pipeline grounded in verb-grammar rules, with measurable retrieval and faithfulness metrics. Nothing more.