English · Español
Phase 35 — Quizzes¶
🇪🇸 Espejo legible del banco de preguntas; la fuente canónica es
data/quizzes/phase-35-distributed.yaml.
q-35-01 — Per-worker bandwidth cost of one ring all-reduce?¶
- S / N
- S
- 2S ← correct
- N · S
q-35-02 — When is ZeRO-3 / FSDP-full the right choice?¶
- The full model parameters do not fit on a single worker. ← correct
- Optimizer states alone exceed worker memory. ← correct
- You want lowest possible comm cost per step.
- Each forward pass tolerates extra all-gathers for parameter prefetch. ← correct
q-35-03 — Why does a misaligned shard split break all-reduce?¶
Free response. Acceptable answers contain hang.
Collective ops are size-strict; mismatched buffers either hang (no peer for the missing chunk) or produce a wrong sum.
q-35-04 — When to reach for tensor parallelism?¶
Free response. Acceptable answers contain matmul.
A single matmul (typically a large d_model linear) no longer fits on one worker; TP slices the matmul itself.
See theory/05-ring-allreduce-derivation-and-strategy-choice.md.