English · Español

Phase 8 — Quizzes¶

🇪🇸 Espejo legible de data/quizzes/phase-08-tensor-autograd.yaml. Centrado en los dos problemas nuevos de la fase: gradientes con broadcasting y derivadas de matmul/softmax-CE batcheado.

Source of truth: data/quizzes/phase-08-tensor-autograd.yaml.

q-08-01 — Broadcasting in backward¶

a + b with a.shape = (3,), b.shape = (4, 3). Upstream gradient is (4, 3). How does a receive a (3,)-shaped gradient?

Reshape (4, 3) to (3,) by dropping the first axis.
Sum the upstream gradient along the broadcast (replicated) axes (here axis 0) → (3,).
Multiply by a one-hot of the broadcast axis.
NumPy handles it automatically.

Answer

**Choice 2.** Forward broadcasting is virtual replication along missing/length-1 axes. The gradient w.r.t. `a` sums the contributions from each replicated copy — so backward must `grad.sum(axis=...)` explicitly. NumPy does **not** do this for you.

q-08-02 — Matmul gradients (multi-choice)¶

C = A @ B, A: (M, K), B: (K, N), upstream dC: (M, N).

dA = dC @ B.T
dB = A.T @ dC
dA = B @ dC.T
dB = dC @ A
dA = dC * B.T (elementwise)

Answer

**Choices 1 and 2.** The matmul gradient is itself a matmul: `dA = dC B^T`, `dB = A^T dC`. Shapes confirm.

q-08-03 — Gradient of `s = x.sum()` (free)¶

x.shape = (3, 5), ds = 1. What is dx?

Answer

`dx` has shape `(3, 5)` and every entry is `1` (i.e., `np.ones_like(x) * ds`). Reason: `∂s/∂x_ij = 1` for every `(i, j)`.

q-08-04 — Why central differences in gradcheck?¶

Cheaper computationally.
Truncation error O(ε²) (even-order Taylor terms cancel) vs O(ε) for forward differences.
Forward differences need boundary derivative knowledge.
Central is the only differentiable choice.

Answer

**Choice 2.** Central cancels the leading `O(ε)` term; you get ~8 good digits at ε ≈ 1e-4 versus ~4 for forward.

q-08-05 — Batched softmax CE gradient (free)¶

z: (B, K), y: (B,). State dL/dz.

Answer

`dL/dz = (p - one_hot(y, K)) / B` where `p = softmax(z)`. Same scalar formula `p - y`, just batched with the `/B` from the mean over the batch.

Phase 8 — Quizzes¶

q-08-01 — Broadcasting in backward¶

q-08-02 — Matmul gradients (multi-choice)¶

q-08-03 — Gradient of s = x.sum() (free)¶

q-08-04 — Why central differences in gradcheck?¶

q-08-05 — Batched softmax CE gradient (free)¶

q-08-03 — Gradient of `s = x.sum()` (free)¶