English · Español
Lab 02 — Train a scalar MLP on a tiny tense-identity task using only minigrad.scalar¶
Goal: build a 2-layer MLP from
Valueneurons, train it on a microscopic grammar-tense task, and see a loss curve descend. The smallest, most pedagogically pure end-to-end ML training run that exists, anchored in the §A13 verb-grammar domain.Estimated time: 90–120 minutes.
Prereqs: lab 00, lab 01 (all ops implemented and tested).
The task (the §A13 anchor)¶
Pick one verb — let's say work. Its 5 tenses are:
| index | tense | English form | Spanish |
|---|---|---|---|
| 0 | infinitive | (to) work |
trabajar |
| 1 | present (3rd sg) | works |
trabaja |
| 2 | past simple | worked |
trabajó |
| 3 | past participle | worked |
trabajado |
| 4 | future (will) | will work |
trabajará |
The task is the 5-way tense-identity mapping: given a 5-dim one-hot input encoding "which tense is this", produce a 5-dim output whose argmax equals the input's argmax. It is artificially simple — a perfect autoencoder for a one-hot — but that is exactly the point: every nontrivial component of the model (autograd, parameters, loss, training loop) must be correct for the network to learn this. Any failure is a Phase-7 bug, not a hard-problem bug.
What you produce¶
A directory experiments/07-train-tense-logits/ containing:
model.py—Neuron,Layer,MLPclasses built fromValue. ~60 lines.train.py— the training loop. ~40 lines.loss.png— loss curve over training.predictions.json— the trained MLP's outputs on the 5 inputs.manifest.json— standard schema.README.md— what you trained, what loss you reached, how long it took, what you'd improve.
Plus a separate directory experiments/07-visualize-graph/:
viz.py— builds a small expression, renders the DAG via graphviz, saves as SVG.graph.svg— the rendered graph, nodes labeled with forward data and backward grad.manifest.json.
TODOs (experiment 1: tense identity)¶
Block A — Neuron, Layer, MLP¶
In model.py:
-
Neuron: takesn_ininputs. Ownsw: list[Value]of lengthn_in(randomly initialised to small values, e.g., fromrandom.uniform(-1, 1)) andb: Value(initialised to 0).__call__(self, xs: list[Value])returns(sum(wᵢ · xᵢ) + b).tanh(). -
Layer: takesn_in, n_out. Ownsneurons: list[Neuron].__call__(self, xs)returns the list of each neuron's output. -
MLP: takesn_in, layer_sizes: list[int]. Ownslayers: list[Layer].__call__(self, xs)chains them. A multi-output last layer (the case here — 5 outputs) returns the list, not a singleValue. -
parameters(self)method on each (return list ofValuefor all weights and biases). Phase 9 introducesParameter; for Phase 7 just collectValues manually.
Block B — Tense dataset¶
The 5 input/target pairs are the 5 one-hot tense vectors for work:
input target
(1, 0, 0, 0, 0) → (1, -1, -1, -1, -1) # infinitive / to work
(0, 1, 0, 0, 0) → (-1, 1, -1, -1, -1) # present 3sg / works
(0, 0, 1, 0, 0) → (-1, -1, 1, -1, -1) # past simple / worked
(0, 0, 0, 1, 0) → (-1, -1, -1, 1, -1) # participle / worked
(0, 0, 0, 0, 1) → (-1, -1, -1, -1, 1) # future / will work
- Encode as
xs: list[list[Value]](5 inputs, each a 5-vector ofValues) andys: list[list[Value]](5 targets). - Use
tanhactivations in the model. Withtanh, the "0 label" is best encoded as -1 (sincetanhoutputs are in (-1, 1)). Use {-1, 1} encoding for the targets.
🇪🇸 La tarea es deliberadamente trivial: dado un one-hot de "qué tiempo verbal", devuelve ese mismo one-hot. La gracia no está en aprender gramática — eso lo hace la fase 9 con el grid completo — sino en confirmar que tu autograd, tu loss y tu loop de entrenamiento funcionan extremo a extremo.
Block C — training loop¶
In train.py:
- Instantiate
model = MLP(5, [4, 5])— 5 inputs (one-hot tense), one hidden layer of 4 neurons, 5 outputs. - Hyperparams:
lr = 0.05,n_epochs = 300. - For each epoch:
- Compute predictions:
preds = [model(x) for x in xs]. Eachpredis a list of 5Values. - Compute loss:
loss = sum((p - y)**2 for pred, target in zip(preds, ys) for p, y in zip(pred, target)). (Sum of squared errors over the 25 logits = 5 inputs × 5 outputs.) Loss is aValue. - Zero gradients on all parameters:
for p in model.parameters(): p.grad = 0.0. loss.backward().- Update parameters:
for p in model.parameters(): p.data -= lr * p.grad. - Log
epoch, loss.datato a structured logger (using Phase 6'sget_logger). - Plot loss vs epoch with matplotlib. Save as
loss.png. - After training, run
model(x)for each tense one-hot. Save the outputs aspredictions.json.
Block D — assert success¶
In your train.py (or a separate verify.py):
- Assert final loss < 0.5 (5 outputs × 5 examples = 25 logits; loose per-logit target ~0.02).
- Assert
argmax(model(x))equals the input's argmax for all 5 inputs.
If either assertion fails, your training didn't converge. Diagnose:
- Loss not decreasing? Likely a backward bug. Re-run unit tests.
- Loss decreasing then exploding? lr too high. Try 0.01.
- Loss decreasing slowly? lr too low or hidden layer too small.
TODOs (experiment 2: visualize)¶
Block E — graph visualization¶
In experiments/07-visualize-graph/viz.py:
- Pick a small expression: e.g., the diamond example from
theory/03:L = (a*b + c)*(a - c)witha=2, b=3, c=4. - Build it with
minigrad.scalar. CallL.backward(). - Use
graphviz(Python binding) to construct aDigraph. For eachValuenode, add a node labeled with{op | data | grad}. Add edges from each_prevto the node. - Render as SVG:
dot.render('graph', format='svg', cleanup=True). - Save
graph.svg. - Print the path. Open in browser. Confirm:
- Nodes show data and grad.
- Diamond shape visible:
ahas two outgoing edges.
This is the visualization the spec calls for in §4 PHASE 7.
Block F — manifest for both¶
Standard schema per Phase 6 lab 00. Include in config:
- For tense identity: hyperparams (lr, epochs, layer_sizes, seed, init range, verb chosen).
- For viz: which expression was rendered, graphviz version.
Constraints¶
- Only
minigrad.scalar. No NumPy in the model or training loop. Lists ofValueonly. - Use Phase 6 utilities.
seed_everything(42),get_logger(__name__)— noprint. graphvizmust be installed. On Fedora:dnf install graphvizfor the system package +pip install graphvizfor the Python binding.- Reproducible. Same seed should produce the same final loss (within ~1% for floating-point noise).
Expected results¶
- Final loss should reach ~0.05–0.5 within 300 epochs (25 logits, target per-logit ~0.01-0.02).
- All 5 predictions should have
argmaxequal to the input's argmax. graph.svgshould clearly show the diamond pattern (ahaving two outgoing arrows).
Stop conditions¶
Done when:
loss.pngshows monotone-decreasing loss to below 0.5.- All 5 tense-identity predictions correct (argmax matches).
graph.svgrendered, opens in a browser, visually correct.- Both
manifest.jsonfiles exist with expected schema.
Pitfalls¶
- Forgot to zero gradients. Loss explodes after the first epoch. Add
p.grad = 0.0before eachbackward(). - Initialised weights to zero. All neurons compute the same output; no symmetry breaking; model doesn't learn. Initialise to
random.uniform(-1, 1)or similar. - Used
0/1labels withtanhoutputs.tanhoutputs in(-1, 1); with(0, 1)targets, the model wants to push outputs to 0 (mid-range), gradients are tiny, training is sluggish. Use(-1, 1)labels. - No seeding. Run-to-run variance is huge for tiny models. Call
seed_everything(42)at top oftrain.py. tanhsaturation.tanh(very_large)is fine (saturates at ±1), but(1 - tanh²)gradient at saturation is ≈0 — vanishing gradients. With a 4-neuron hidden layer andlr=0.05this rarely bites; if it does, lower the init range torandom.uniform(-0.5, 0.5).- Graphviz not installed. Two layers: system package (
dotcommand) and Python binding (pip install graphviz). Both must be present. Test withdot -Vfrom shell. - Treating the 5 outputs as one scalar. Each
model(x)returns a list of 5Values, not aValue. Sum over both example and output axis when computing the loss; one forgotten loop here is the most common bug in this lab.
When to consult solutions/¶
After your tense experiment converges and graph.svg exists. Then solutions/02-train-tense-logits-ref.md (at phase open) provides the reference loss curve and visualization comparison.
End of Phase 7 labs. Next: write PHASE_07_REPORT.md and learners/borja/phase-07/reflections.md.