English · Español
Lab 01 — Implement the ops¶
Goal: flesh out
Valuewith+ - * / ** exp log relu tanh. Each op gets a forward (creates the node + records parents) and a_backwardclosure (contributes to parents' grads via the local derivative).Estimated time: 3–4 hours.
Prereqs: lab 00. Read
theory/02-op-derivatives.mduntil you can recite the table from memory.
What you produce¶
- Extended
src/minigrad/scalar.pywith all ops implemented. tests/test_scalar_autograd.py— per-op tests cross-checking against PyTorch FP64 at tolerance 1e-9.tests/test_scalar_graph.py— diamond-dependency test fromtheory/03.
TODOs¶
Block A — binary ops via dunders¶
Implement these on Value:
-
__add__(self, other). WrapotherinValue(other)if it's a number. -
__radd__(self, other)— so3 + Value(2)works. -
__mul__(self, other),__rmul__(self, other). -
__neg__(self)— defined asself * -1, or natively. -
__sub__(self, other)—self + (-other). -
__rsub__(self, other)—(-self) + other. -
__truediv__(self, other)—self * other ** -1. Requires__pow__. -
__rtruediv__(self, other). -
__pow__(self, n)— requiresnto be a Pythonintorfloat, NOT aValue. RaiseTypeErrorotherwise.
Each op:
- Computes the forward
data. - Creates
out = Value(data, _prev=(self, other), _op=symbol). - Defines
_backwardclosure capturing the parents and the local derivative table fromtheory/02. - Sets
out._backward = _backward. - Returns
out.
Block B — unary ops as methods¶
-
exp(self) -> Value—out.data = math.exp(self.data),_backwardaddsout.data * out.gradtoself.grad. -
log(self) -> Value—out.data = math.log(self.data),_backwardadds(1/self.data) * out.grad. Raise onself.data <= 0. -
relu(self) -> Value—out.data = max(0.0, self.data),_backwardadds(1.0 if self.data > 0 else 0.0) * out.grad. Document the sub-gradient choice in docstring. -
tanh(self) -> Value—out.data = math.tanh(self.data),_backwardadds(1 - out.data**2) * out.grad.
Block C — per-op tests¶
In tests/test_scalar_autograd.py, for each op write a test of this shape (pseudocode — Borja fills bodies):
def test_op_NAME():
# Set up inputs.
a = Value(2.5)
b = Value(-1.3)
# Forward in minigrad.
c = a OP b # or a.OP(b) for unary
c.backward()
# Forward + backward in PyTorch FP64 as oracle.
ta = torch.tensor(2.5, dtype=torch.float64, requires_grad=True)
tb = torch.tensor(-1.3, dtype=torch.float64, requires_grad=True)
tc = ta OP tb
tc.backward()
# Compare.
assert abs(c.data - tc.item()) < 1e-9
assert abs(a.grad - ta.grad.item()) < 1e-9
assert abs(b.grad - tb.grad.item()) < 1e-9
Claude has provided the test list as comments in test_scalar_autograd.py. Fill in the bodies. One test per op, plus a few edge cases:
- ReLU at
a.data == 0(sub-gradient convention). 1 / Value(2.0)—__rtruediv__path.Value(2.0) ** 3andValue(2.0) ** 0.5— power op with float exponent.log(Value(2.0))andexp(Value(0.5)).
Block D — diamond test¶
In tests/test_scalar_graph.py:
def test_diamond_accumulation():
a = Value(2.0)
b = Value(3.0)
c = Value(4.0)
L = (a*b + c) * (a - c)
L.backward()
assert math.isclose(L.data, -20.0)
assert math.isclose(a.grad, 4.0)
assert math.isclose(b.grad, -4.0)
assert math.isclose(c.grad, -12.0)
This is the worked example from theory/03. If this test passes, your += accumulation works on diamond patterns. If it fails, you almost certainly used = somewhere in _backward.
Block E — closure trap test¶
To catch the common "captured the wrong variable in a loop" bug:
def test_closure_captures_correctly():
values = [Value(float(i)) for i in range(5)]
total = values[0]
for v in values[1:]:
total = total + v
total.backward()
# Each value contributed equally to the sum.
for v in values:
assert math.isclose(v.grad, 1.0)
If you wrote _backward as a lambda capturing the loop variable carelessly, this test fails (all but the last v.grad will be 0 or wrong). The fix is to capture parents at op-creation time inside a non-loop function — which the per-op method form already does correctly. But test it.
Block F — cross-check property¶
Optional but recommended: use hypothesis to generate random small expressions and cross-check against PyTorch. This is mostly a Phase 8 concern; for Phase 7, the per-op tests + diamond test are enough.
Constraints¶
- PyTorch tolerance 1e-9. FP64 should agree to ~1e-12; 1e-9 leaves headroom.
- One test per op minimum. Don't bundle "test all ops" into one mega-test.
- No
numpyimport. Usemathforexp,log,tanh. - Type hints required on all new methods and closures' free variables.
- All tests must pass under
pytest -x(stop on first failure — surfaces bugs faster).
Stop conditions¶
Done when:
- All ten ops implemented in
scalar.py. - Per-op tests green for all ten ops.
- Diamond test green.
- Closure-capture test green.
mypy --strictandruffclean.
Pitfalls¶
__radd__argument order.__radd__(self, other)is called whenother + selfis evaluated andother.__add__(self)returnedNotImplemented. Sootheris the left operand. For commutative ops (add, mul) this doesn't matter; for sub and div it does.Value(0.0) ** 0.0**0in Python is1. Mathematically debatable. Our__pow__should follow Python's convention; PyTorch does the same.log(Value(0)). Should raise. If you forward-computemath.log(0)you get-infand backward getsinffrom1/0. Decide: raise in forward, or let it propagate toinf/nan. Phase 7 default: raise. Document.relu(Value(0)). Test specifically thatgrad == 0.0at this point.Valueequality. Don't override__eq__— that would makeValue(2) == Value(2)true and break things likeif v in some_set. Use the default identity equality.__hash__is fine. Default object hash is identity-based;Valueis unhashable... no, actually default__hash__works for any class. We need it for thevisitedset in topo sort.- PyTorch import in tests is slow. ~1s. Fine for a test suite; just don't be surprised.
When to consult solutions/¶
After all listed tests pass. Then solutions/01-implement-ops-ref.md (at phase open) shows the canonical structure of each op for comparison.
Next lab: lab/02-train-xor.md.