English · Español
Phase 8 — Quizzes¶
🇪🇸 Espejo legible de
data/quizzes/phase-08-tensor-autograd.yaml. Centrado en los dos problemas nuevos de la fase: gradientes con broadcasting y derivadas de matmul/softmax-CE batcheado.
Source of truth: data/quizzes/phase-08-tensor-autograd.yaml.
q-08-01 — Broadcasting in backward¶
a + b with a.shape = (3,), b.shape = (4, 3). Upstream gradient is (4, 3). How does a receive a (3,)-shaped gradient?
- Reshape
(4, 3)to(3,)by dropping the first axis. - Sum the upstream gradient along the broadcast (replicated) axes (here axis 0) →
(3,). - Multiply by a one-hot of the broadcast axis.
- NumPy handles it automatically.
Answer
**Choice 2.** Forward broadcasting is virtual replication along missing/length-1 axes. The gradient w.r.t. `a` sums the contributions from each replicated copy — so backward must `grad.sum(axis=...)` explicitly. NumPy does **not** do this for you.q-08-02 — Matmul gradients (multi-choice)¶
C = A @ B, A: (M, K), B: (K, N), upstream dC: (M, N).
dA = dC @ B.TdB = A.T @ dCdA = B @ dC.TdB = dC @ AdA = dC * B.T(elementwise)
Answer
**Choices 1 and 2.** The matmul gradient is itself a matmul: `dA = dC B^T`, `dB = A^T dC`. Shapes confirm.q-08-03 — Gradient of s = x.sum() (free)¶
x.shape = (3, 5), ds = 1. What is dx?
Answer
`dx` has shape `(3, 5)` and every entry is `1` (i.e., `np.ones_like(x) * ds`). Reason: `∂s/∂x_ij = 1` for every `(i, j)`.q-08-04 — Why central differences in gradcheck?¶
- Cheaper computationally.
- Truncation error
O(ε²)(even-order Taylor terms cancel) vsO(ε)for forward differences. - Forward differences need boundary derivative knowledge.
- Central is the only differentiable choice.
Answer
**Choice 2.** Central cancels the leading `O(ε)` term; you get ~8 good digits at ε ≈ 1e-4 versus ~4 for forward.q-08-05 — Batched softmax CE gradient (free)¶
z: (B, K), y: (B,). State dL/dz.