English · Español
Phase 36 — Quizzes¶
🇪🇸 Espejo legible del banco de preguntas; la fuente canónica es
data/quizzes/phase-36-frontier-architectures.yaml.
q-36-01 — Why does Switch-style MoE add an auxiliary loss?¶
Free response. Acceptable answers contain collapse.
Router collapse: without aux loss, the router learns to send every token to one expert, dead-weighting the others.
q-36-02 — Pick every property of Mamba / SSM relative to transformer.¶
- Memory per step is constant in sequence length. ← correct
- Random access to arbitrary past tokens is preserved exactly.
- Computation can be parallelized via an associative scan. ← correct
- It is universally better than attention on long-context tasks.
q-36-03 — FLOP advantage of MoE vs dense?¶
- k / E (a fraction of dense FLOPs) ← correct
- E / k
- 1
- k · E
q-36-04 — Why is router collapse hidden from the main training loss?¶
Free response. Acceptable answers contain expert.
The single active expert still learns a reasonable FFN; main loss keeps falling. Only val loss or per-expert token counts surface it.