English · Español

Phase 29 — Quizzes¶

🇪🇸 Espejo legible de data/quizzes/phase-29-rag.yaml. Respuestas detrás de bloques <details>.

Source of truth: data/quizzes/phase-29-rag.yaml.

q-29-01 — Why RAG over fine-tuning the facts in (free)¶

A teammate proposes "just fine-tune Mini-GPT on the irregular-verb table until it memorizes it; we don't need RAG." List two reasons this is the wrong choice at scale, even though it might work on the §A13 KB.

Answer

(a) **Updates**: every KB change requires re-fine-tuning. (b) **Citations**: the model can't say *which* fact it used. At scale (millions of docs, weekly updates), RAG dominates; fine-tuning is for behaviour, not facts.

q-29-02 — Why FlatVectorStore over HNSW for Phase 29¶

The §A13 KB has ~50 chunks. We use brute-force cosine search (FlatVectorStore), not HNSW. What is the asymptotic crossover scale above which a tree-based index becomes faster than the linear scan?

≈ 100 vectors
≈ 1 000 vectors
≈ 10⁴ vectors
≈ 10⁶ vectors

Answer

**Choice 3 (≈ 10⁴).** HNSW pays its log-factor + per-node overhead above ~10⁴ vectors. Below that, brute-force on contiguous numpy arrays wins on raw throughput and is much easier to debug.

q-29-03 — Reciprocal Rank Fusion constant (free)¶

RRF combines two ranked lists with RRF(c) = Σ 1/(60 + rank(c)). Why 60 specifically, and what would change with a much smaller constant like 5?

Answer

**60** is Cormack et al.'s empirically robust default that damps the gap between top-1 and top-5 ranks. **5** would make rank-1 contributions dominate, amplifying single-source mistakes. Much larger constants flatten the fusion, making it less discriminative.

q-29-04 — What skipping retrieval costs you¶

You ablate the retriever — rag_answer sends the bare query to Mini-GPT. Which symptoms should you observe on the §A13 lookup eval set?

Accuracy drops by ≥ 30 percentage points.
Faithfulness metric drops to ~0 (nothing to cite).
Mini-GPT regularizes irregular verbs (e.g., 'writed' for 'wrote').
Latency increases by ≥ 10× (no retrieval to short-circuit).

Answer

**Choices 1, 2, 3.** Accuracy and faithfulness collapse because parametric memory is insufficient. Latency actually *decreases* (no retrieval step) — the perverse incentive that makes skipping retrieval tempting until you check correctness.

q-29-05 — Faithfulness ≠ accuracy (free)¶

Define both metrics in one sentence each, then give one example scenario where you'd have high faithfulness AND low accuracy.

Answer

**Faithfulness**: every claim in the answer is supported by the retrieved context. **Accuracy**: the answer is correct against ground truth. Example: a chunk in your KB says "the past simple of eat is `eated`" (a typo); the model faithfully reports `eated` — high faithfulness, low accuracy. RAG is only as good as its KB.