Skip to content

English · Español

Phase 35 — Quizzes

🇪🇸 Espejo legible del banco de preguntas; la fuente canónica es data/quizzes/phase-35-distributed.yaml.

q-35-01 — Per-worker bandwidth cost of one ring all-reduce?

  • S / N
  • S
  • 2S ← correct
  • N · S

q-35-02 — When is ZeRO-3 / FSDP-full the right choice?

  • The full model parameters do not fit on a single worker. ← correct
  • Optimizer states alone exceed worker memory. ← correct
  • You want lowest possible comm cost per step.
  • Each forward pass tolerates extra all-gathers for parameter prefetch. ← correct

q-35-03 — Why does a misaligned shard split break all-reduce?

Free response. Acceptable answers contain hang.

Collective ops are size-strict; mismatched buffers either hang (no peer for the missing chunk) or produce a wrong sum.

q-35-04 — When to reach for tensor parallelism?

Free response. Acceptable answers contain matmul.

A single matmul (typically a large d_model linear) no longer fits on one worker; TP slices the matmul itself.


See theory/05-ring-allreduce-derivation-and-strategy-choice.md.