English · Español

01 — Postmortem structure: the canonical five sections¶

🇪🇸 Un postmortem no es un diario ni un resumen. Es un documento estructurado que convierte una historia en lecciones. Cinco secciones, cada una con un objetivo. Saltarse una pierde valor.

The five sections¶

Adapted from Google SRE's postmortem template and from John Allspaw's "blameless postmortem" writing:

Summary — the headline. One paragraph. What happened, what the impact was, what the resolution was.
Timeline — when each event occurred. Times in UTC, anchored to objective signals (commit SHA, log line, dashboard screenshot).
Contributing factors — why it happened. Multiple — usually 3-5 — including process, tooling, design, and human factors.
Lessons learned — what general principles can be drawn out. These are project-independent and survive the postmortem.
Action items — concrete, owned, dated. "Fix the alert threshold by 2026-06-15 (owner: Borja)."

For the journey postmortem (Phase 40 lab 01), we adapt this:

Summary — what was the project, what shipped, what didn't.
Timeline — milestones across the 40 phases, with the major bumps annotated.
Contributing factors — what made the journey hard, what made it work.
Lessons — generalisable principles for future projects.
Action items — what to apply next time. Owners: future-Borja.

Section 1 — Summary¶

Length: 100-200 words. Read-out-loud test: if you read just this section to a stranger, they should know what happened.

For a journey postmortem:

What. "Lynx Cortex: a 40-phase first-principles AI systems curriculum, ending in a grammar tutor service for English verb conjugation, with paired Spanish translations."
Shipped. "All 40 phases completed. Working tutor service on CPU. Pre-LN transformer trained on 600-form verb corpus. Continuous batching scheduler. Observability dashboard. Postmortem and reading list."
Didn't ship. "GPU training (deferred per CLAUDE.md §6). Beam search. Speculative decoding. Multi-learner memory persistence (stayed scoped to in-memory)."
Headline lesson. One sentence. The single most important takeaway. Example: "Building NumPy-first before reaching for PyTorch was the highest-leverage decision of the curriculum."

Section 2 — Timeline¶

For a journey postmortem, a month-resolution timeline is enough. For each major milestone, give the date and the signal:

2026-01-15 — Phase 00 kickoff. uv + just + pre-commit working.
2026-01-28 — Phase 05 complete. log-sum-exp drilled in via Bernoulli vs Gaussian comparison.
2026-02-10 — Phase 12 corpus design complete. 600-form §A13 corpus in `data/`.
2026-02-22 — Phase 17 Mini-GPT assembled. 103,680 params; first end-to-end forward green.
2026-03-04 — Phase 18 training loop. Discovered axis bug on day 2 (commit abc1234); fixed by day 4.
2026-03-15 — Phase 22 KV cache. Continuous-batching prerequisite landed.
...
2026-05-22 — Phase 39 capstone integrated. Grammar tutor service shipping verb corrections.
2026-05-23 — Phase 40 closure.

For each line, link to the relevant commit, journal entry, or PHASE_NN_REPORT.md.

Section 3 — Contributing factors¶

This is the honest section. What made it hard, what saved you. Mix structural factors with one-off events:

Structural factors to consider: - The choice to NumPy-first. - The strict per-phase ritual (plan → approve → build → teach → run → report → reflect). - The bilingual policy (forced clarity in two languages). - CPU-only constraint forcing small-corpus design. - Subagents (math-reviewer, numerical-stability-checker) catching issues that would otherwise propagate.

Honest one-off events: - "Phase 13 was rewritten mid-stream when the A13 verb-grammar pivot superseded the C-strings scope." - "Phase 18's training was broken for 2 days due to an axis bug." (Structural cause: insufficient shape-assertion coverage in Phase 17's tests. Not "Borja was sloppy.") - "Phase 26 quantization required redoing Phase 02's numerical-representation labs to internalise fp16 vs bf16 properly."

For each factor, name it and explain in 1-3 sentences. Avoid the temptation to claim everything went smoothly.

Section 4 — Lessons learned¶

These are the lines that survive past the project. They go into MEMORY.md, into future CLAUDE.md files, into the way you start the next thing.

Format each as: rule + why + how to apply.

Example:

Build the corpus before the model. The §A13 verb corpus was designed in Phase 12 before the Mini-GPT architecture was locked in. This meant downstream phases (17-21) could validate against an unambiguous ground truth. Why: Without a designed corpus, you tune the model against noise. How to apply next project: Spend the first 10% of project time on the eval set, no matter how small. The eval comes first.

Three to seven of these is the right number. Fewer means you under-extracted. More means you padded.

Section 5 — Action items¶

Concrete. Owned. Dated. Each one fits the format:

[Action] by [date]. Owner: [person]. Success criterion: [observable signal].

For a journey postmortem, the actions are for future projects:

Adopt the "BLUEPRINT before code" rule from Phase 00 on the next project, before writing any production module. Owner: future-Borja. Success criterion: a BLUEPRINT.md exists in every src module within 1 week of starting it.

Re-read the postmortem at month 6 of the next project. Owner: future-Borja. Success criterion: a calendar event exists.

Three to five action items. Each one realistic — if you write 20, you'll do zero.

What separates a good postmortem from a bad one¶

Bad postmortem: "We learned a lot, it was a great journey, here are some things that went well and some that didn't. We'll do better next time."

Good postmortem: A reader who did not do this project can extract specific, actionable lessons. Names of incidents (not blame; specifics). Times. Commits. Structural causes. Concrete action items they could apply to their own work.

The test: if Borja's future colleague reads this in a year, do they learn something they couldn't have figured out from the code alone?

A note on tone¶

Postmortems are not victory laps. They're not self-flagellation. They're technical documents with a narrative arc. Aim for the tone of a well-written conference paper: assertive about the facts, careful about the claims, honest about the limitations.

What this file does NOT cover¶

How to measure which architectural decisions survived. Next file.
The reading list method. Lab 02.
The specific incidents to include. That's the lab's content; you fill it in.

Next: 02-decision-survival.md