Skip to content

English · Español

Phase 36 — Quizzes

🇪🇸 Espejo legible del banco de preguntas; la fuente canónica es data/quizzes/phase-36-frontier-architectures.yaml.

q-36-01 — Why does Switch-style MoE add an auxiliary loss?

Free response. Acceptable answers contain collapse.

Router collapse: without aux loss, the router learns to send every token to one expert, dead-weighting the others.

q-36-02 — Pick every property of Mamba / SSM relative to transformer.

  • Memory per step is constant in sequence length. ← correct
  • Random access to arbitrary past tokens is preserved exactly.
  • Computation can be parallelized via an associative scan. ← correct
  • It is universally better than attention on long-context tasks.

q-36-03 — FLOP advantage of MoE vs dense?

  • k / E (a fraction of dense FLOPs) ← correct
  • E / k
  • 1
  • k · E

q-36-04 — Why is router collapse hidden from the main training loss?

Free response. Acceptable answers contain expert.

The single active expert still learns a reasonable FFN; main loss keeps falling. Only val loss or per-expert token counts surface it.


See theory/05-moe-routing-math-and-mamba-intuition.md.