English · Español
Lab 03 — Train past convergence; see the train/val gap open¶
Goal: extend training past convergence on the tiny corpus and observe the train/val gap in the dashboard. Overfitting is a feature here, not a bug.
Estimated time: 1-2 hours (1 long training run + analysis).
Prereq: Lab 02 committed (dashboard validated against engineered failures).
What you produce¶
A directory experiments/19-overfit/ containing:
train.py— modified Phase-18 driver withT = 4 × Phase_18_T, no early-stopping.config.yaml— same as healthy, exceptTis much larger.inspector.log.jsonl.dashboard.html— long-running dashboard showing overfit progression.gap-analysis.md— your written analysis of where and how overfitting manifests.manifest.json.
TODOs¶
Block A — set up the long training run¶
In experiments/19-overfit/config.yaml:
training:
batch_size: 32
T: 8000 # 4× Phase 18's 2000
warmup: 100
lr_max: 3e-3
lr_min: 3e-4
weight_decay: 0.1
grad_clip: 1.0
betas: [0.9, 0.95]
eps: 1.0e-8
eval_every: 50
ckpt_every: 500
The schedule still cosine-decays over the full T=8000; LR at step 2000 is no longer the minimum (it was for Phase 18), it's the cosine-midpoint instead. That's part of the experiment.
Block B — run training and produce the dashboard¶
Wall-clock expectation: ~2-3 hours on Borja's i5-8250U. Use a session manager (tmux, screen) so it survives terminal close.
After training, render dashboard.html with the standard dashboard.render(...).
Block C — find the overfit onset¶
Open dashboard.html. In Panel 1 (loss curves):
- Where does val loss bottom out? Record
t_val_minandval_loss(t_val_min). - Where does train loss bottom out? Record
t_train_minandtrain_loss(t_train_min). - The overfit onset \(t^*\) is approximately
t_val_min. After \(t^*\), val loss climbs (or plateaus while train keeps falling).
Verify with the gap_t computed by the dashboard's overfit-onset detector (from theory/02's formula). It should match your visual reading within ±100 steps.
Block D — write gap-analysis.md¶
In experiments/19-overfit/gap-analysis.md, 3-5 paragraphs:
-
The numbers. State
t_val_min,val_loss(t_val_min),train_loss(t_train_min), the final gap atT=8000. -
What the dashboard tells you. Beyond Panel 1, what changes? Do dead-neuron counts increase (overfitting often "forgets" rare patterns, killing the heads that handle them)? Do per-layer activation magnitudes drift?
-
Implications for Phase 20. When the eval harness arrives (next phase), the "best checkpoint" is going to be at
t_val_min, not atT. Phase 18 saved checkpoints every 500 steps — confirmexperiments/19-overfit/has a checkpoint at or neart_val_minavailable for Phase 20's eval comparisons. -
Why this is healthy. Overfitting on a tiny corpus is a probe, not a failure mode. State, in one sentence, what the train/val gap reveals about model capacity.
-
What would prevent overfitting? Three options: more data (Phase 12 expansion), more regularization (weight decay, dropout), or earlier stopping (which we'd configure with the Phase-20 eval). Don't implement any of them; just state which you'd reach for first and why.
Constraints¶
- No early stopping. The whole point is to train through and past the overfit onset.
- No regularization tuning. Same
weight_decayas Phase 18. - Single config. Don't sweep over
T. One number, one experiment.
Stop conditions¶
Done when:
experiments/19-overfit/dashboard.htmlshows train loss continuing to drop while val loss has bottomed out or risen.gap-analysis.mdanswers all five paragraphs above.- The best-val checkpoint (at or near
t_val_min) is preserved (not overwritten by later checkpoints).
Pitfalls¶
- The dashboard sometimes shows val loss going down very slowly past \(t^*\). That's not a contradiction — slower than train means a widening gap. Use the gap detector, not eyeball.
- Forgetting to extend the schedule. If you keep
T = 2000in the schedule but run for 8000 steps, the schedule clamps atlr_minfor the last 6000 steps. The experiment is still valid but the dashboard's LR panel will look weird. Either way, document what you did. - Memory pressure on a long run. The Inspector's log file grows linearly with steps. For T=8000, expect ~50-100 MB of JSONL. Acceptable on Borja's 62 GiB box.
When to consult solutions/¶
After gap-analysis.md is committed. The solution at solutions/03-overfit-on-purpose-ref.md (written at phase open) compares your gap numbers to the reference and discusses what Phase 20 will do with this data.
End of Phase 19 labs. Next phase: docs/phase-20-evaluation-harness/.