English · Español
Phase 09 — Quiz (human-readable mirror)¶
🇪🇸 Espejo legible del fichero canónico
data/quizzes/phase-09-mlp-modules.yaml. El portal (Phase 41) consume el YAML; este.mdes para repaso rápido fuera del portal. No edites aquí — edita el YAML y este archivo es regenerable.
Source: data/quizzes/phase-09-mlp-modules.yaml. Schema in src/miniportal/BLUEPRINT.md §1.
q-09-01 — Why does the Module class register Parameters explicitly? (single)¶
In the Module class you built, why must _parameters be registered explicitly instead of relying on __dict__ introspection?
- Because Python's
__dict__is unordered before 3.7 - To allow nested modules and lazy device-move semantics ✓
- Because
Parameterobjects don't have__hash__ - It's a PyTorch convention copied for familiarity
Explicit registration lets a parent
Modulewalk its children deterministically so.parameters()and.to(device)work without runtime surprises. PyTorch made the same choice for the same reason.
q-09-02 — Which initializations are commonly used for an MLP's hidden weights? (multi)¶
Select every initialization scheme that is appropriate as a default for the hidden linear layers of a ReLU-activated MLP.
- Zeros for every weight
- Kaiming (He) normal ✓
- Kaiming (He) uniform ✓
- Constant 1.0 for every weight
Kaiming normal/uniform preserve activation variance under ReLU. All-zero or all-ones initializations break symmetry or saturate the network on step 1.
q-09-03 — What does Module.train(False) typically toggle? (free)¶
In one sentence, what does calling module.train(False) (eval mode) change in modules like Dropout and BatchNorm?
Expected to contain: eval, dropout.
Eval mode disables stochastic dropout and switches BatchNorm to use running statistics so inference is deterministic.
q-09-04 — Backward shape for Linear's weight gradient (single)¶
In a Linear(in=23, out=16) whose forward is Z = X @ W.T + b with X.shape == (4, 23), what shape does the weight gradient ∇_W L have?
(4, 23)(16, 23)✓(23, 16)(16, 4)
∇_W = (∇_Z)^T @ Xhas shape(16, 4) @ (4, 23) = (16, 23)— exactlyW's shape. Phase 9 theory §04 warns that on square shapes the buggy∇_W = ∇_Z @ X^Talso fits.
q-09-05 — Find the bug: two Linears with no activation (free)¶
A learner writes Sequential(Linear(23, 16), Linear(16, 5)) for the §A13 tense classifier. Validation accuracy is acceptable but the model generalizes slightly worse than a GELU-equipped twin. What single property of the composite map explains this?
Expected to contain: linear.
Composition of two affine maps is affine — the stack collapses to a single rank-≤5 linear map. The §A13 task is linearly separable so accuracy survives, but capacity to fit subtle agreement is lost. Cross-link:
break/00-break-gelu-as-identity.md.