Skip to content

English · Español

Lab 04 — The demo script: finalize and record

🇪🇸 El último lab del currículum. Aquí se cierra scripts/demo/run.py con sus siete bloques, se graba el cast de asciinema, y se publica el "report card" final: la tabla DE-001…DE-020 con todos los checks en verde. Si esa tabla termina en verde, el curriculum ha terminado.

Goal

Finalize scripts/demo/run.py (Theory 05 §anatomy). Record an asciinema cast of one clean run. Commit docs/demo-recording.md pointing at the cast. Verify the final acceptance table — all ≤ 20 DE checks green.

Why this lab is last

Labs 00–03 built and verified each piece. Lab 04 stitches the narration together, records the artifact a stranger sees, and signs off the phase.

This is the lab where the curriculum becomes show-able. After this lab, Borja can hand someone the repo URL, say "run just demo," and walk away.

Deliverables

  • scripts/demo/run.py — the seven-block script from Theory 05, complete and tested.
  • scripts/demo/recorder.py — wraps the run with asciinema; emits experiments/39-demo-script/cast-YYYY-MM-DD.cast.
  • experiments/39-demo-script/transcript.jsonl — structured transcript of the recorded run (each [Phase NN] line as a JSON event).
  • experiments/39-demo-script/cast-YYYY-MM-DD.cast — the asciinema recording.
  • docs/demo-recording.md — markdown page with the embedded asciinema player + a textual summary for accessibility.
  • docs/README.mdthe quickstart, refreshed: git clone && uv sync --frozen && just demo, one screenshot of the final acceptance table.
  • tests/integration/test_demo_end_to_end.py — final integration test; runs just demo-cold, asserts exit 0, all DE-checks pass, transcript well-formed.

Step 1 — Finalize scripts/demo/run.py

The seven-block skeleton from Theory 05 §anatomy is fleshed out. Each block has a contract:

Block Function Contract
1 — Preflight assert_environment_ready() Aborts with clear message if uv, docker, ports, or lockfile aren't ready
2 — Stack bring-up bring_up_stack() Delegates to just demo-cold-up; polls health for 30 s
3 — Narration narrate_loaded_components() Prints the [Phase NN] startup lines from a config table
4 — Happy path send_and_verify() × 3 Three canonical requests; each prints stage-by-stage timings
5 — Security replay_injection(), replay_oversized_body(), replay_mcp_sandbox() Three replays from Lab 03
6 — Acceptance run_acceptance_checks() Reads docs/DONE_ENOUGH.md, runs each check, prints the table
7 — Wrap-up print_summary(), emit_eval_report() Total cost, p95, accuracy summary; writes eval-YYYY-MM-DD.json

The whole script is < 400 lines. Most of the work is delegated to helpers from previous labs; this script is the conductor.

Narration as a structured event stream

Every printed line is also a JSON event written to experiments/39-demo-script/transcript.jsonl. Format:

{"t": 0.5, "phase": 12, "event": "load_corpus", "msg": "Loading verb corpus from DVC..."}
{"t": 1.1, "phase": 11, "event": "tokenizer_ready", "msg": "BPE tokenizer ready (vocab=2048).", "vocab_size": 2048}
{"t": 2.3, "phase": 28, "event": "model_loaded", "msg": "Loading Mini-GPT base + LoRA grammar adapter (rev=sha:a1b2c3).", "lora_rev": "sha:a1b2c3"}
...

Two outputs from one print call: the human-readable text on the terminal and the machine-readable event in the JSONL. This lets the CI test parse the transcript without scraping the terminal.

A small narrate() helper does both:

def narrate(phase: int, event: str, msg: str, **fields):
    t = time.time() - DEMO_START_T
    print(f"[t={t:.1f}s] [Phase {phase:02d}] {msg}")
    transcript.write_event({"t": t, "phase": phase, "event": event, "msg": msg, **fields})

Step 2 — Acceptance table renderer

Block 6 reads docs/DONE_ENOUGH.md's 20 rows, runs each automation, prints the table:

=================================================================
                  Phase 39 — Acceptance Checks
=================================================================
| ID     | Check                                           | Pass |
|--------|-------------------------------------------------|------|
| DE-001 | Stack starts within 30 s                        |  ✓   |
| DE-002 | miniserve responds on :8080 within 5 s          |  ✓   |
| DE-003 | First request completes within 10 s             |  ✓   |
| DE-004 | p95 latency over 3-sentence battery < 5 s       |  ✓   |
| ...    | ...                                             | ...  |
| DE-020 | Demo exits with status 0                        |  ✓   |
=================================================================
Result: 20/20 passed.
=================================================================

If any check fails, the row marker is , the table is followed by a # Failures: section enumerating each, and the script exits 1.

Step 3 — Recording with asciinema

scripts/demo/recorder.py:

import subprocess, datetime
from pathlib import Path

def record_demo():
    today = datetime.date.today().isoformat()
    out_dir = Path("experiments/39-demo-script")
    out_dir.mkdir(parents=True, exist_ok=True)
    cast = out_dir / f"cast-{today}.cast"
    cmd = [
        "asciinema", "rec",
        "--command", "just demo-cold",
        "--idle-time-limit", "2",
        "--title", f"Lynx Cortex — capstone demo {today}",
        str(cast),
    ]
    subprocess.run(cmd, check=True)
    print(f"Recorded: {cast}")

Idle-time-limit of 2 s prevents the cast from including long pauses (model loading, etc.) at real time — the playback feels paced.

Run:

$ uv run python scripts/demo/recorder.py

The cast file is < 100 KB for a 90-second recording; small enough to commit to the repo.

Fallback: plain transcript

If asciinema isn't available, the textual transcript.jsonl is the fallback. The docs/demo-recording.md page links both: the asciinema player (preferred) and a markdown rendering of the JSONL (accessible).

Step 4 — docs/demo-recording.md

# Demo recording

This is the canonical recording of the Phase 39 capstone demo.

## Watch (asciinema)

<script id="asciicast-2026-06-XX" src="../experiments/39-demo-script/cast-2026-06-XX.cast" async></script>

## Read (transcript)

(Below is the human-readable rendering of `transcript.jsonl`. Times are wall-clock from the start of the recording.)

| t (s) | Phase | Event |
|---|---|---|
| 0.0 | 39 | demo_start |
| 0.5 | 12 | load_corpus |
| 1.1 | 11 | tokenizer_ready |
| ... | ... | ... |
| 87.3 | 39 | acceptance_table_printed (20/20 passed) |
| 88.1 | 39 | eval_report_emitted (experiments/39-end-to-end/eval-2026-06-XX.json) |
| 89.4 | 39 | demo_complete (exit=0) |

## Reproduce
git clone https://github.com/borjatarraso/lynx-cortex cd lynx-cortex uv sync --frozen just demo
For details, see the [Phase 39 capstone overview](phase-39-capstone/README.md).

The transcript table is auto-generated by scripts/demo/render_transcript.py from the JSONL; committed to docs alongside the cast.

Step 5 — Refresh docs/README.md

The visitor-facing quickstart. Three sections:

# Lynx Cortex

A first-principles AI systems curriculum: 40 phases from a transistor to a deployed
grammar tutor that corrects English verb conjugations and provides Spanish
translations for 20 verbs × 5 tenses × 3 persons.

## 90-second demo

```
git clone https://github.com/borjatarraso/lynx-cortex
cd lynx-cortex
uv sync --frozen
just demo
```

You will see: the stack come up, three grammar corrections, three security defenses
fire, and a 20-row acceptance table all green. Total: ~90 seconds.

Recording: [docs/demo-recording.md](demo-recording.md)

## Curriculum

[40-phase roadmap](../ROADMAP.md). Each phase has a `theory/` and `lab/` directory
under `docs/phase-NN-*/`.

## Architecture

[docs/ARCHITECTURE.md](ARCHITECTURE.md) — C4 context + container + sequence diagrams.

## Definition of "done enough"

[docs/DONE_ENOUGH.md](DONE_ENOUGH.md) — the 20 binary checks the demo verifies on
every run.

The README's first action is just demo — the demo is the entrypoint.

Step 6 — Final integration test

tests/integration/test_demo_end_to_end.py:

import json, subprocess
from pathlib import Path

def test_demo_runs_clean():
    """`just demo-cold` exits 0; all DE checks pass; transcript well-formed."""
    result = subprocess.run(["just", "demo-cold"], capture_output=True, text=True, timeout=180)
    assert result.returncode == 0, f"demo failed: {result.stdout[-2000:]}"

    transcript = Path("experiments/39-demo-script/transcript.jsonl")
    assert transcript.exists(), "transcript not written"
    events = [json.loads(line) for line in transcript.read_text().splitlines() if line.strip()]
    assert events[-1]["event"] == "demo_complete", "demo didn't reach completion event"
    assert events[-1]["exit"] == 0, "demo exited non-zero"

    # Verify acceptance table.
    acceptance = next(e for e in events if e["event"] == "acceptance_table_printed")
    assert acceptance["passed"] == acceptance["total"], \
        f"acceptance failed: {acceptance['passed']}/{acceptance['total']}"

This is the final guarantor. CI runs it; PR must pass.

Step 7 — One clean recording

The recording for docs/demo-recording.md must be from a clean run:

  1. Bring down any prior stack: just demo-cold-down.
  2. Clear cached artifacts that would falsely accelerate the recorded run: rm -rf ~/.cache/lynx-cortex-warmup/.
  3. Record: uv run python scripts/demo/recorder.py.
  4. Verify the recording: asciinema play experiments/39-demo-script/cast-YYYY-MM-DD.cast.
  5. If a glitch is visible (terminal resize, accidental keypress), re-record. The cast is a product; ship a clean one.

What "done" looks like

  • scripts/demo/run.py complete; seven blocks present and contract-tested.
  • scripts/demo/recorder.py complete; produces a cast on demand.
  • experiments/39-demo-script/transcript.jsonl written cleanly during demo runs.
  • experiments/39-demo-script/cast-YYYY-MM-DD.cast committed.
  • docs/demo-recording.md written with both the cast embed and the textual transcript.
  • docs/README.md refreshed with the 90-second quickstart and the recording link.
  • tests/integration/test_demo_end_to_end.py passes in CI.
  • The acceptance table prints 20/20 green on just demo.

Common pitfalls

  1. "It works locally; the recording was clean." The recording must be from just demo-cold, not just demo (which leaves prior state). Clean state = repeatable result.
  2. Embedding the cast inline in README.md. README is rendered by many tools; only mkdocs handles <script> tags well. Put the embed in docs/demo-recording.md (which is markdown rendered by mkdocs); link from README.
  3. Forgetting the textual transcript. The cast is great for visual learners; some viewers (screen readers, search indexers, future-you grepping git log) need text. Both.
  4. Letting narrate() print without writing to transcript. A debug print in the middle of the demo doesn't appear in the JSONL; the CI parses JSONL only and may not catch a regression. All narration goes through the helper.
  5. Skipping the "loud failure" verification from Lab 03 in the recording. Optional — the clean demo is for the show; the loud-failure verification is captured in Lab 03's experiment log. Keep them separate.

End of Phase 39 lab sequence. Next:

  • Open Phase 39 via /phase-start 39.
  • Walk Labs 00 → 04 in order.
  • Write PHASE_39_REPORT.md (the capstone reflection; structured per LYNX_CORTEX.md §7.6).
  • Then /phase-start 40 — the postmortem.