Skip to content

English · Español

02 — Decision survival: which architectural choices held up?

🇪🇸 Cuarenta fases significan cientos de pequeñas decisiones. ¿Cuáles aguantaron sin revisión? ¿Cuáles se revisaron una vez? ¿Cuáles murieron? Es una medida objetiva de la calidad del juicio temprano.

The hypothesis

A project of 40 phases generates roughly 50-200 architectural decisions — file structure, library choice, algorithm pick, abstraction boundary. Some held without revision; some were quietly revised when the next phase needed them to be; some were explicitly torn down.

Survival rate = (decisions held without revision) / (total decisions).

A high survival rate is suspicious — either you weren't taking enough risks, or you weren't being honest about revising things. A low survival rate is fine if you can articulate why — it means you were learning. The shape of the survival curve over time tells you about your judgment trajectory.

The data source

PROPOSAL_REVIEW.md at the project root is the ledger. Every architectural proposal that needed a "Claude should review this" tag landed there. So did every "Borja decided to revise X" entry.

Plus: the Revisions log (§8) at the bottom of every PHASE_NN_PLAN.md.

Plus: the journal entries in learners/borja/journal/ where the format includes a decisions: field.

The four states

Classify each decision into one of four states:

  1. Held (H) — was made, never revised, still in force at Phase 40.
  2. Revised once (R1) — was made, revised exactly once at a later phase.
  3. Revised many (RN) — revised 2+ times across the project.
  4. Discarded (D) — explicitly rolled back; the system at Phase 40 looks as if this decision was never made.

For each decision, also note:

  • When made — the phase.
  • Scope — file-level, module-level, project-level.
  • What forced the revision (if applicable) — a later constraint, a bug, a learning, a corpus change.

Example classifications

Decision Made in State Why
Use uv instead of pip Phase 00 H Stayed in force; no friction.
9 C-string functions as corpus Phase 12 (orig) D Discarded at A13 amendment for the verb-grammar corpus.
Pre-LN over Post-LN transformer Phase 17 H Held; gradient stability matched expectations.
Hand-built autograd in Phase 07-08 before PyTorch Phase 00 H The decision that defined the whole curriculum.
Sampling without KV cache in Phase 21 Phase 21 R1 Replaced in Phase 22; the gap was intentional (pedagogy).
Single-process FastAPI for serving Phase 33 H Multi-process was discussed and rejected; held.
Use Pydantic v1 (early phases) Phase 06 R1 Migrated to Pydantic v2 in Phase 30 (for JSONSchema features).
In-memory learner memory only Phase 32 H Held; persistence punted to Phase 39 capstone (out of scope).

What the survival curve might look like

If you plot survival rate vs phase of original decision:

  • Decisions made early (Phases 0-5) often have higher survival because they're foundational (tooling, language choice).
  • Decisions made in the middle (Phases 10-25) have the lowest survival — this is when the curriculum bends to fit reality.
  • Decisions made late (Phases 30-39) tend to have artificially high survival because there hasn't been time to revise them.

A "healthy" pattern: ~70-90% survival on tooling, ~40-60% on architecture, ~20-50% on algorithm choices.

Reading the contributing factors

For each revised decision, write 1-2 sentences:

  • What new information forced the revision? ("Phase 22's KV cache made the Phase 21 sampler obsolete as designed.")
  • Was the revision in the original plan? ("Yes — Phase 21 explicitly noted 'no cache here; Phase 22 fixes it.'")
  • Surprise level (1-5): 1 = expected; 5 = jarring.

The surprise distribution is more useful than the survival rate. High-surprise revisions are where your model of the system was wrong. Those are the lessons to extract for future projects.

The bias to watch

People tend to over-classify decisions as Held because:

  • They forget the small revisions.
  • They prefer to think they were right.
  • They confuse "the spirit of the decision held" with "the literal decision held."

Counter-bias: scan the git history. If a file has been changed more than 5 times across phases, the decisions that file embodies are not Held.

The opposite bias to watch

People tend to over-classify decisions as Revised because:

  • Every tiny tweak feels like a revision in the moment.
  • Hindsight makes early decisions look naive.

Counter-bias: only count revisions that changed the contract of a module or the interface of an API. Renaming a variable doesn't count.

The output

For Phase 40's lab 02, this analysis goes in the postmortem (§3 contributing factors) as a small table or waterfall chart. Don't overdo it — 10-20 decisions classified is enough. The point is to see the pattern, not to be exhaustive.

What this analysis is not good for

  • Predicting what the next project will need to revise. Each project has its own pattern.
  • Judging whether the curriculum was well-designed. A high revision rate could mean the design was bad or that the journey was honest about learning.
  • Comparing Borja's decision quality to anyone else's. It's a within-project measure only.

What this file does NOT cover

  • The specific decisions to classify. That's the lab's content.
  • The shape of the residual risk acceptance. Next file.

Next: 03-residual-risk-and-offramps.md