Skip to content

English · Español

Phase 09 — MLP, Modules, and Optimizers

Requires: 08 — Tensor Autograd from Scratch Teaches: module-abstraction · parameter-registration · linear · sequential · optimizers Jump to any chapter from the phase reference index.

Chapter map

The phase where the framework emerges. Phases 7 and 8 built two autograd engines (scalar Value and tensor Tensor). Phase 9 wraps Tensor in a PyTorch-shaped APIModule, Parameter, Linear, Sequential, SGD, Adam — so that composing layers is ergonomic and porting small PyTorch scripts to minimodel takes thirty minutes flat.

🇪🇸 La fase donde el framework aparece. Module y Parameter no son magia: son una convención sobre dónde viven los pesos y cómo el optimizador los encuentra. La forma de la API la copiamos de PyTorch — no por imitación, sino porque la convención de PyTorch resuelve un problema real (descubrir parámetros por reflexión a través de un grafo de submodules anidado).

Anchors: LYNX_CORTEX.md §4 / PHASE 9, LYNX_CORTEX_ADDENDUM.md §A12, §A13. PHASE_09_PLAN.md.


What you build

A new module src/minimodel/ (~250 LOC by Borja) with:

  • nn/module.pyParameter, Module base class with __setattr__ registration.
  • nn/linear.pyLinear(in, out).
  • nn/activations.pyReLU, Tanh, Sigmoid, GELU.
  • nn/container.pySequential([...]).
  • nn/init.pykaiming_uniform_, xavier_normal_ (minimal versions; Phase 10 expands).
  • nn/losses.pyCrossEntropyLoss, MSELoss.
  • optim.pyOptimizer base, SGD, Adam.

And one experiment: experiments/09-tense-mlp/ — train a 2-layer MLP on the §A13 grammar grid (input = one-hot verb ⊕ one-hot person, 23-dim; output = logits over the 5 tenses; 250 train / 50 val from the 300 conjugation triples). Goal: >85% validation accuracy.

Plus the port drill: experiments/09-pytorch-port-drill/ — a 50-line PyTorch tense-MLP script ported line-by-line to minimodel in ≤30 minutes. The point of the drill is that the API should be close enough to PyTorch that you don't have to rethink anything.

Reading order

  1. Theory (in theory/):
  2. 00-motivation.md — why a "Module" exists at all; what Parameter solves.
  3. 01-parameter-and-module.md — the registration mechanic; why __setattr__ not __init__.
  4. 02-linear-and-sequential.md — the simplest layer + composition; init.
  5. 03-optimizers.mdSGD, momentum, Adam with bias correction. Engineering recap of Phase 4's math.

  6. Labs (in lab/):

  7. 00-parameter-and-module-skeleton.mdParameter + Module registration. ~30 LOC, all the framework cleverness.
  8. 01-linear-and-activations.mdLinear, ReLU, Tanh, Sequential. ~50 LOC.
  9. 02-optimizers.mdSGD (+momentum) and Adam. ~70 LOC. Cross-check Adam against torch.optim.Adam.
  10. 03-train-tense-mlp.md — close the phase by training the grammar-grid MLP.

  11. Solutions appear after the labs are attempted; never copy before trying.

What this phase does not cover

  • BatchNorm / LayerNorm. Phase 10.
  • Residual connections. Phase 10. Transformer block: Phase 17.
  • Embedding layers. Phase 13 (embeddings) deepens; Phase 17 uses them.
  • Dropout. Phase 18 (training tricks).
  • Distributed training. Phase 35.
  • PyTorch's actual implementation. Phase 25 inspects PyTorch internals.

Definition of done (quick reference)

See PHASE_09_PLAN.md §6 for the full DoD. Highlights:

  • mypy --strict clean across src/minimodel/.
  • Sequential(Linear(2, 3), ReLU(), Linear(3, 1)).parameters() yields exactly 4 Parameter objects in order.
  • Adam matches torch.optim.Adam to 1e-5 over 100 steps on a quadratic.
  • experiments/09-tense-mlp/ reaches >85% val accuracy.
  • experiments/09-pytorch-port-drill/ completes in ≤30 minutes (timed).
  • /quiz 09 ≥ 70%.

Next phase

Phase 10 — Initialization, normalization, regularization — fills in the parts of an MLP that Phase 9 skipped (Kaiming/Xavier derivation, BatchNorm/LayerNorm, weight decay).

Further reading

Optional — enrichment, not required to pass the phase.