English · Español
Lab 03 — Finite-difference gradcheck for the scalar autograd¶
Pre-req: lab 01 (
Valueand ops implemented). Optional but recommended: lab 02 (the trained MLP gives nontrivial expressions to gradcheck). Goal: build a 30-line gradcheck utility that compares yourminigradanalytic gradients against central finite differences on a battery of expressions. The deliverable is a CLI that fails loudly when any op's backward is off.Estimated time: 60–90 minutes. Runs in seconds on the i5-8250U.
§1 Why this lab exists¶
Lab 01 tested every op against a hand-derived expected gradient. Lab 02 trained an MLP and trusted that the loss going down means the gradients are correct. Both are useful but neither catches a subtle off-by-sign bug in an op you didn't hand-test.
A gradcheck is the cheapest, most general defense against that class of bug: compare the analytic gradient df/dx (your Value._backward) against a numerical estimate (f(x + ε) - f(x - ε)) / (2ε). They should agree to within a small tolerance. If they don't, your op's backward is wrong — or your forward, or your topological sort, or your += accumulation.
PyTorch ships exactly this utility (torch.autograd.gradcheck). You will build a tiny scalar version of it.
§2 What you produce¶
A single file src/minigrad/gradcheck.py (~50 lines), with one public function:
def gradcheck(
f: Callable[..., Value],
inputs: tuple[Value, ...],
*,
eps: float = 1e-5,
atol: float = 1e-4,
rtol: float = 1e-3,
) -> bool:
"""Return True iff every input's analytic gradient matches the
central-difference numerical gradient within tolerance.
Raises AssertionError with a diagnostic message on failure.
"""
Plus tests/test_gradcheck.py exercising every op from lab 01 plus three composite expressions:
- The diamond from theory/03: L = (a*b + c) * (a - c).
- A chain: L = relu(a * b + c) (uses relu if you implemented it; else use tanh).
- A division: L = a / (b + 1.0) to catch sign bugs in __truediv__.
§3 Steps¶
- Sketch the algorithm on paper first. For each input
x_i: - Save
x_i.data. Set it tox_i.data + ε, runf(inputs), recordf_plus. Restore. - Set
x_i.data = x_i.data - ε, runf(inputs), recordf_minus. Restore. - Numerical gradient:
(f_plus - f_minus) / (2ε). - Zero all
.grad, runf(inputs).backward(). Analytic gradient:x_i.grad. - Compare with
abs(num - analytic) <= atol + rtol * abs(num). -
On failure, print: input index, expected (numerical), got (analytic), abs diff, rel diff.
-
Implement. Type-annotate. ~50 lines.
-
Write the test cases. Each test creates fresh
Values, defines a smallf, callsgradcheck, assertsTrue. For each op in your library, write at least one gradcheck test. -
Break and confirm. Intentionally introduce a sign bug in
__sub__._backward(e.g.,a.grad += out.grad, missing the-1for the right operand). Run gradcheck; confirm it fails with a clear error message. Then revert.
§4 Stop conditions¶
-
uv run pytest tests/test_gradcheck.pypasses (all op gradchecks + the three composite expressions). -
mypy --strict src/minigrad/gradcheck.pypasses. -
ruff check src/minigrad/gradcheck.pypasses. - You ran the "intentional sign bug" experiment, confirmed gradcheck caught it, and reverted. Document the bug message you saw in
learners/borja/phase-07/notes/gradcheck.md. - You wrote a short paragraph (4-6 lines) explaining why
eps = 1e-5is a reasonable default and what happens if you pick1e-12or1e-2(cancellation vs truncation). - Commit:
lab: phase-07 add scalar gradcheck CLI and tests.
§5 Hints¶
- Restore
.dataafter each ε perturbation — if you forget, subsequent ops see the perturbed value and the numerical estimate is wrong. - Zero all gradients before each analytic pass. Persistent
.gradfrom a previous test poisons the next. - For inputs that the expression doesn't depend on, gradcheck should report
0.0analytic and0.0numerical. Make sure your test does not include "phantom" inputs by accident. - Tolerance picking:
atol = 1e-4,rtol = 1e-3works foreps = 1e-5on typical scalar expressions withValuemagnitudes ~O(1). If you need tighter tolerance, pick a larger ε; if your expression has magnitudes ~O(1e6), bumpatolproportionally. - Pretty error message:
§6 What you'll have learned¶
- The gradcheck pattern: numerical vs analytic, central differences, tolerance design.
- Why finite differences are a sanity check, not a primary tool (the
epssweet spot is narrow). - A reusable utility that you will reach for again in Phase 8 (vectorized gradcheck on tensors).
- The mindset of "test the gradient, not the loss" — Phase 16's training loop will reuse it as a CI gate before every long run.
§7 References¶
- The PyTorch
torch.autograd.gradchecksource (~200 lines) is the production analogue; reading it after this lab is a worthwhile 15 minutes. - Bishop, Pattern Recognition and Machine Learning, §5.3.5 — derivation of finite-difference gradient checks and their failure modes.