English · Español

Phase 09 — MLP, Modules, and Optimizers¶

Requires: 08 — Tensor Autograd from Scratch Teaches: module-abstraction · parameter-registration · linear · sequential · optimizers Jump to any chapter from the phase reference index.

Chapter map¶

The phase where the framework emerges. Phases 7 and 8 built two autograd engines (scalar Value and tensor Tensor). Phase 9 wraps Tensor in a PyTorch-shaped API — Module, Parameter, Linear, Sequential, SGD, Adam — so that composing layers is ergonomic and porting small PyTorch scripts to minimodel takes thirty minutes flat.

🇪🇸 La fase donde el framework aparece. Module y Parameter no son magia: son una convención sobre dónde viven los pesos y cómo el optimizador los encuentra. La forma de la API la copiamos de PyTorch — no por imitación, sino porque la convención de PyTorch resuelve un problema real (descubrir parámetros por reflexión a través de un grafo de submodules anidado).

Anchors: LYNX_CORTEX.md §4 / PHASE 9, LYNX_CORTEX_ADDENDUM.md §A12, §A13. PHASE_09_PLAN.md.

What you build¶

A new module src/minimodel/ (~250 LOC by Borja) with:

nn/module.py — Parameter, Module base class with __setattr__ registration.
nn/linear.py — Linear(in, out).
nn/activations.py — ReLU, Tanh, Sigmoid, GELU.
nn/container.py — Sequential([...]).
nn/init.py — kaiming_uniform_, xavier_normal_ (minimal versions; Phase 10 expands).
nn/losses.py — CrossEntropyLoss, MSELoss.
optim.py — Optimizer base, SGD, Adam.

And one experiment: experiments/09-tense-mlp/ — train a 2-layer MLP on the §A13 grammar grid (input = one-hot verb ⊕ one-hot person, 23-dim; output = logits over the 5 tenses; 250 train / 50 val from the 300 conjugation triples). Goal: >85% validation accuracy.

Plus the port drill: experiments/09-pytorch-port-drill/ — a 50-line PyTorch tense-MLP script ported line-by-line to minimodel in ≤30 minutes. The point of the drill is that the API should be close enough to PyTorch that you don't have to rethink anything.

Reading order¶

Theory (in theory/):
00-motivation.md — why a "Module" exists at all; what Parameter solves.
01-parameter-and-module.md — the registration mechanic; why __setattr__ not __init__.
02-linear-and-sequential.md — the simplest layer + composition; init.
03-optimizers.md — SGD, momentum, Adam with bias correction. Engineering recap of Phase 4's math.
Labs (in lab/):
00-parameter-and-module-skeleton.md — Parameter + Module registration. ~30 LOC, all the framework cleverness.
01-linear-and-activations.md — Linear, ReLU, Tanh, Sequential. ~50 LOC.
02-optimizers.md — SGD (+momentum) and Adam. ~70 LOC. Cross-check Adam against torch.optim.Adam.
03-train-tense-mlp.md — close the phase by training the grammar-grid MLP.
Solutions appear after the labs are attempted; never copy before trying.

What this phase does not cover¶

BatchNorm / LayerNorm. Phase 10.
Residual connections. Phase 10. Transformer block: Phase 17.
Embedding layers. Phase 13 (embeddings) deepens; Phase 17 uses them.
Dropout. Phase 18 (training tricks).
Distributed training. Phase 35.
PyTorch's actual implementation. Phase 25 inspects PyTorch internals.

Definition of done (quick reference)¶

See PHASE_09_PLAN.md §6 for the full DoD. Highlights:

mypy --strict clean across src/minimodel/.
Sequential(Linear(2, 3), ReLU(), Linear(3, 1)).parameters() yields exactly 4 Parameter objects in order.
Adam matches torch.optim.Adam to 1e-5 over 100 steps on a quadratic.
experiments/09-tense-mlp/ reaches >85% val accuracy.
experiments/09-pytorch-port-drill/ completes in ≤30 minutes (timed).
/quiz 09 ≥ 70%.

Next phase¶

Phase 10 — Initialization, normalization, regularization — fills in the parts of an MLP that Phase 9 skipped (Kaiming/Xavier derivation, BatchNorm/LayerNorm, weight decay).