English · Español
02 — Decision survival: which architectural choices held up?¶
🇪🇸 Cuarenta fases significan cientos de pequeñas decisiones. ¿Cuáles aguantaron sin revisión? ¿Cuáles se revisaron una vez? ¿Cuáles murieron? Es una medida objetiva de la calidad del juicio temprano.
The hypothesis¶
A project of 40 phases generates roughly 50-200 architectural decisions — file structure, library choice, algorithm pick, abstraction boundary. Some held without revision; some were quietly revised when the next phase needed them to be; some were explicitly torn down.
Survival rate = (decisions held without revision) / (total decisions).
A high survival rate is suspicious — either you weren't taking enough risks, or you weren't being honest about revising things. A low survival rate is fine if you can articulate why — it means you were learning. The shape of the survival curve over time tells you about your judgment trajectory.
The data source¶
PROPOSAL_REVIEW.md at the project root is the ledger. Every architectural proposal that needed a "Claude should review this" tag landed there. So did every "Borja decided to revise X" entry.
Plus: the Revisions log (§8) at the bottom of every PHASE_NN_PLAN.md.
Plus: the journal entries in learners/borja/journal/ where the format includes a decisions: field.
The four states¶
Classify each decision into one of four states:
- Held (H) — was made, never revised, still in force at Phase 40.
- Revised once (R1) — was made, revised exactly once at a later phase.
- Revised many (RN) — revised 2+ times across the project.
- Discarded (D) — explicitly rolled back; the system at Phase 40 looks as if this decision was never made.
For each decision, also note:
- When made — the phase.
- Scope — file-level, module-level, project-level.
- What forced the revision (if applicable) — a later constraint, a bug, a learning, a corpus change.
Example classifications¶
| Decision | Made in | State | Why |
|---|---|---|---|
Use uv instead of pip |
Phase 00 | H | Stayed in force; no friction. |
| 9 C-string functions as corpus | Phase 12 (orig) | D | Discarded at A13 amendment for the verb-grammar corpus. |
| Pre-LN over Post-LN transformer | Phase 17 | H | Held; gradient stability matched expectations. |
| Hand-built autograd in Phase 07-08 before PyTorch | Phase 00 | H | The decision that defined the whole curriculum. |
| Sampling without KV cache in Phase 21 | Phase 21 | R1 | Replaced in Phase 22; the gap was intentional (pedagogy). |
| Single-process FastAPI for serving | Phase 33 | H | Multi-process was discussed and rejected; held. |
| Use Pydantic v1 (early phases) | Phase 06 | R1 | Migrated to Pydantic v2 in Phase 30 (for JSONSchema features). |
| In-memory learner memory only | Phase 32 | H | Held; persistence punted to Phase 39 capstone (out of scope). |
What the survival curve might look like¶
If you plot survival rate vs phase of original decision:
- Decisions made early (Phases 0-5) often have higher survival because they're foundational (tooling, language choice).
- Decisions made in the middle (Phases 10-25) have the lowest survival — this is when the curriculum bends to fit reality.
- Decisions made late (Phases 30-39) tend to have artificially high survival because there hasn't been time to revise them.
A "healthy" pattern: ~70-90% survival on tooling, ~40-60% on architecture, ~20-50% on algorithm choices.
Reading the contributing factors¶
For each revised decision, write 1-2 sentences:
- What new information forced the revision? ("Phase 22's KV cache made the Phase 21 sampler obsolete as designed.")
- Was the revision in the original plan? ("Yes — Phase 21 explicitly noted 'no cache here; Phase 22 fixes it.'")
- Surprise level (1-5): 1 = expected; 5 = jarring.
The surprise distribution is more useful than the survival rate. High-surprise revisions are where your model of the system was wrong. Those are the lessons to extract for future projects.
The bias to watch¶
People tend to over-classify decisions as Held because:
- They forget the small revisions.
- They prefer to think they were right.
- They confuse "the spirit of the decision held" with "the literal decision held."
Counter-bias: scan the git history. If a file has been changed more than 5 times across phases, the decisions that file embodies are not Held.
The opposite bias to watch¶
People tend to over-classify decisions as Revised because:
- Every tiny tweak feels like a revision in the moment.
- Hindsight makes early decisions look naive.
Counter-bias: only count revisions that changed the contract of a module or the interface of an API. Renaming a variable doesn't count.
The output¶
For Phase 40's lab 02, this analysis goes in the postmortem (§3 contributing factors) as a small table or waterfall chart. Don't overdo it — 10-20 decisions classified is enough. The point is to see the pattern, not to be exhaustive.
What this analysis is not good for¶
- Predicting what the next project will need to revise. Each project has its own pattern.
- Judging whether the curriculum was well-designed. A high revision rate could mean the design was bad or that the journey was honest about learning.
- Comparing Borja's decision quality to anyone else's. It's a within-project measure only.
What this file does NOT cover¶
- The specific decisions to classify. That's the lab's content.
- The shape of the residual risk acceptance. Next file.