Skip to content

English · Español

Lab 03 — Finite-difference gradcheck for the scalar autograd

Pre-req: lab 01 (Value and ops implemented). Optional but recommended: lab 02 (the trained MLP gives nontrivial expressions to gradcheck). Goal: build a 30-line gradcheck utility that compares your minigrad analytic gradients against central finite differences on a battery of expressions. The deliverable is a CLI that fails loudly when any op's backward is off.

Estimated time: 60–90 minutes. Runs in seconds on the i5-8250U.


§1 Why this lab exists

Lab 01 tested every op against a hand-derived expected gradient. Lab 02 trained an MLP and trusted that the loss going down means the gradients are correct. Both are useful but neither catches a subtle off-by-sign bug in an op you didn't hand-test.

A gradcheck is the cheapest, most general defense against that class of bug: compare the analytic gradient df/dx (your Value._backward) against a numerical estimate (f(x + ε) - f(x - ε)) / (2ε). They should agree to within a small tolerance. If they don't, your op's backward is wrong — or your forward, or your topological sort, or your += accumulation.

PyTorch ships exactly this utility (torch.autograd.gradcheck). You will build a tiny scalar version of it.

§2 What you produce

A single file src/minigrad/gradcheck.py (~50 lines), with one public function:

def gradcheck(
    f: Callable[..., Value],
    inputs: tuple[Value, ...],
    *,
    eps: float = 1e-5,
    atol: float = 1e-4,
    rtol: float = 1e-3,
) -> bool:
    """Return True iff every input's analytic gradient matches the
    central-difference numerical gradient within tolerance.

    Raises AssertionError with a diagnostic message on failure.
    """

Plus tests/test_gradcheck.py exercising every op from lab 01 plus three composite expressions: - The diamond from theory/03: L = (a*b + c) * (a - c). - A chain: L = relu(a * b + c) (uses relu if you implemented it; else use tanh). - A division: L = a / (b + 1.0) to catch sign bugs in __truediv__.

§3 Steps

  1. Sketch the algorithm on paper first. For each input x_i:
  2. Save x_i.data. Set it to x_i.data + ε, run f(inputs), record f_plus. Restore.
  3. Set x_i.data = x_i.data - ε, run f(inputs), record f_minus. Restore.
  4. Numerical gradient: (f_plus - f_minus) / (2ε).
  5. Zero all .grad, run f(inputs).backward(). Analytic gradient: x_i.grad.
  6. Compare with abs(num - analytic) <= atol + rtol * abs(num).
  7. On failure, print: input index, expected (numerical), got (analytic), abs diff, rel diff.

  8. Implement. Type-annotate. ~50 lines.

  9. Write the test cases. Each test creates fresh Values, defines a small f, calls gradcheck, asserts True. For each op in your library, write at least one gradcheck test.

  10. Break and confirm. Intentionally introduce a sign bug in __sub__._backward (e.g., a.grad += out.grad, missing the -1 for the right operand). Run gradcheck; confirm it fails with a clear error message. Then revert.

§4 Stop conditions

  • uv run pytest tests/test_gradcheck.py passes (all op gradchecks + the three composite expressions).
  • mypy --strict src/minigrad/gradcheck.py passes.
  • ruff check src/minigrad/gradcheck.py passes.
  • You ran the "intentional sign bug" experiment, confirmed gradcheck caught it, and reverted. Document the bug message you saw in learners/borja/phase-07/notes/gradcheck.md.
  • You wrote a short paragraph (4-6 lines) explaining why eps = 1e-5 is a reasonable default and what happens if you pick 1e-12 or 1e-2 (cancellation vs truncation).
  • Commit: lab: phase-07 add scalar gradcheck CLI and tests.

§5 Hints

  1. Restore .data after each ε perturbation — if you forget, subsequent ops see the perturbed value and the numerical estimate is wrong.
  2. Zero all gradients before each analytic pass. Persistent .grad from a previous test poisons the next.
  3. For inputs that the expression doesn't depend on, gradcheck should report 0.0 analytic and 0.0 numerical. Make sure your test does not include "phantom" inputs by accident.
  4. Tolerance picking: atol = 1e-4, rtol = 1e-3 works for eps = 1e-5 on typical scalar expressions with Value magnitudes ~O(1). If you need tighter tolerance, pick a larger ε; if your expression has magnitudes ~O(1e6), bump atol proportionally.
  5. Pretty error message:
    gradcheck FAILED at input index 1:
      numerical = 4.000003
      analytic  = -6.000000
      abs diff  = 10.000003
      rel diff  = 2.5
    

§6 What you'll have learned

  • The gradcheck pattern: numerical vs analytic, central differences, tolerance design.
  • Why finite differences are a sanity check, not a primary tool (the eps sweet spot is narrow).
  • A reusable utility that you will reach for again in Phase 8 (vectorized gradcheck on tensors).
  • The mindset of "test the gradient, not the loss" — Phase 16's training loop will reuse it as a CI gate before every long run.

§7 References

  • The PyTorch torch.autograd.gradcheck source (~200 lines) is the production analogue; reading it after this lab is a worthwhile 15 minutes.
  • Bishop, Pattern Recognition and Machine Learning, §5.3.5 — derivation of finite-difference gradient checks and their failure modes.