English · Español

Phase 09 — Quiz (human-readable mirror)¶

🇪🇸 Espejo legible del fichero canónico data/quizzes/phase-09-mlp-modules.yaml. El portal (Phase 41) consume el YAML; este .md es para repaso rápido fuera del portal. No edites aquí — edita el YAML y este archivo es regenerable.

Source: data/quizzes/phase-09-mlp-modules.yaml. Schema in src/miniportal/BLUEPRINT.md §1.

q-09-01 — Why does the `Module` class register `Parameter`s explicitly? (single)¶

In the Module class you built, why must _parameters be registered explicitly instead of relying on __dict__ introspection?

Because Python's __dict__ is unordered before 3.7
To allow nested modules and lazy device-move semantics ✓
Because Parameter objects don't have __hash__
It's a PyTorch convention copied for familiarity

Explicit registration lets a parent Module walk its children deterministically so .parameters() and .to(device) work without runtime surprises. PyTorch made the same choice for the same reason.

q-09-02 — Which initializations are commonly used for an MLP's hidden weights? (multi)¶

Select every initialization scheme that is appropriate as a default for the hidden linear layers of a ReLU-activated MLP.

Zeros for every weight
Kaiming (He) normal ✓
Kaiming (He) uniform ✓
Constant 1.0 for every weight

Kaiming normal/uniform preserve activation variance under ReLU. All-zero or all-ones initializations break symmetry or saturate the network on step 1.

q-09-03 — What does `Module.train(False)` typically toggle? (free)¶

In one sentence, what does calling module.train(False) (eval mode) change in modules like Dropout and BatchNorm?

Expected to contain: eval, dropout.

Eval mode disables stochastic dropout and switches BatchNorm to use running statistics so inference is deterministic.

q-09-04 — Backward shape for `Linear`'s weight gradient (single)¶

In a Linear(in=23, out=16) whose forward is Z = X @ W.T + b with X.shape == (4, 23), what shape does the weight gradient ∇_W L have?

(4, 23)
(16, 23) ✓
(23, 16)
(16, 4)

∇_W = (∇_Z)^T @ X has shape (16, 4) @ (4, 23) = (16, 23) — exactly W's shape. Phase 9 theory §04 warns that on square shapes the buggy ∇_W = ∇_Z @ X^T also fits.

q-09-05 — Find the bug: two `Linear`s with no activation (free)¶

A learner writes Sequential(Linear(23, 16), Linear(16, 5)) for the §A13 tense classifier. Validation accuracy is acceptable but the model generalizes slightly worse than a GELU-equipped twin. What single property of the composite map explains this?

Expected to contain: linear.

Composition of two affine maps is affine — the stack collapses to a single rank-≤5 linear map. The §A13 task is linearly separable so accuracy survives, but capacity to fit subtle agreement is lost. Cross-link: break/00-break-gelu-as-identity.md.

Phase 09 — Quiz (human-readable mirror)¶

q-09-01 — Why does the Module class register Parameters explicitly? (single)¶

q-09-02 — Which initializations are commonly used for an MLP's hidden weights? (multi)¶

q-09-03 — What does Module.train(False) typically toggle? (free)¶

q-09-04 — Backward shape for Linear's weight gradient (single)¶

q-09-05 — Find the bug: two Linears with no activation (free)¶

q-09-01 — Why does the `Module` class register `Parameter`s explicitly? (single)¶

q-09-03 — What does `Module.train(False)` typically toggle? (free)¶

q-09-04 — Backward shape for `Linear`'s weight gradient (single)¶

q-09-05 — Find the bug: two `Linear`s with no activation (free)¶