English · Español
Phase 09 — MLP, Modules, and Optimizers¶
Requires: 08 — Tensor Autograd from Scratch Teaches:
module-abstraction·parameter-registration·linear·sequential·optimizersJump to any chapter from the phase reference index.
Chapter map¶
The phase where the framework emerges. Phases 7 and 8 built two autograd engines (scalar
Valueand tensorTensor). Phase 9 wrapsTensorin a PyTorch-shaped API —Module,Parameter,Linear,Sequential,SGD,Adam— so that composing layers is ergonomic and porting small PyTorch scripts tominimodeltakes thirty minutes flat.🇪🇸 La fase donde el framework aparece.
ModuleyParameterno son magia: son una convención sobre dónde viven los pesos y cómo el optimizador los encuentra. La forma de la API la copiamos de PyTorch — no por imitación, sino porque la convención de PyTorch resuelve un problema real (descubrir parámetros por reflexión a través de un grafo de submodules anidado).Anchors:
LYNX_CORTEX.md§4 / PHASE 9,LYNX_CORTEX_ADDENDUM.md§A12, §A13.PHASE_09_PLAN.md.
What you build¶
A new module src/minimodel/ (~250 LOC by Borja) with:
nn/module.py—Parameter,Modulebase class with__setattr__registration.nn/linear.py—Linear(in, out).nn/activations.py—ReLU,Tanh,Sigmoid,GELU.nn/container.py—Sequential([...]).nn/init.py—kaiming_uniform_,xavier_normal_(minimal versions; Phase 10 expands).nn/losses.py—CrossEntropyLoss,MSELoss.optim.py—Optimizerbase,SGD,Adam.
And one experiment: experiments/09-tense-mlp/ — train a 2-layer MLP on the §A13 grammar grid (input = one-hot verb ⊕ one-hot person, 23-dim; output = logits over the 5 tenses; 250 train / 50 val from the 300 conjugation triples). Goal: >85% validation accuracy.
Plus the port drill: experiments/09-pytorch-port-drill/ — a 50-line PyTorch tense-MLP script ported line-by-line to minimodel in ≤30 minutes. The point of the drill is that the API should be close enough to PyTorch that you don't have to rethink anything.
Reading order¶
- Theory (in
theory/): 00-motivation.md— why a "Module" exists at all; whatParametersolves.01-parameter-and-module.md— the registration mechanic; why__setattr__not__init__.02-linear-and-sequential.md— the simplest layer + composition; init.-
03-optimizers.md—SGD, momentum,Adamwith bias correction. Engineering recap of Phase 4's math. -
Labs (in
lab/): 00-parameter-and-module-skeleton.md—Parameter+Moduleregistration. ~30 LOC, all the framework cleverness.01-linear-and-activations.md—Linear,ReLU,Tanh,Sequential. ~50 LOC.02-optimizers.md—SGD(+momentum) andAdam. ~70 LOC. Cross-check Adam againsttorch.optim.Adam.-
03-train-tense-mlp.md— close the phase by training the grammar-grid MLP. -
Solutions appear after the labs are attempted; never copy before trying.
What this phase does not cover¶
- BatchNorm / LayerNorm. Phase 10.
- Residual connections. Phase 10. Transformer block: Phase 17.
- Embedding layers. Phase 13 (embeddings) deepens; Phase 17 uses them.
- Dropout. Phase 18 (training tricks).
- Distributed training. Phase 35.
- PyTorch's actual implementation. Phase 25 inspects PyTorch internals.
Definition of done (quick reference)¶
See PHASE_09_PLAN.md §6 for the full DoD. Highlights:
mypy --strictclean acrosssrc/minimodel/.Sequential(Linear(2, 3), ReLU(), Linear(3, 1)).parameters()yields exactly 4Parameterobjects in order.Adammatchestorch.optim.Adamto 1e-5 over 100 steps on a quadratic.experiments/09-tense-mlp/reaches >85% val accuracy.experiments/09-pytorch-port-drill/completes in ≤30 minutes (timed)./quiz 09≥ 70%.
Next phase¶
Phase 10 — Initialization, normalization, regularization — fills in the parts of an MLP that Phase 9 skipped (Kaiming/Xavier derivation, BatchNorm/LayerNorm, weight decay).
Further reading¶
Optional — enrichment, not required to pass the phase.
- ✍️ A Recipe for Training Neural Networks — Karpathy · 2019. how practitioners actually compose and debug modules.