Skip to content

English · Español

00 — Motivation

🇪🇸 Esta fase no se salta. El 80% de los bugs de IA que verás en los próximos seis meses viven aquí: vistas que parecen copias, broadcasting silencioso entre (N,) y (N,1), semillas globales que ensucian la reproducibilidad. Si no se interiorizan ahora, vuelven una y otra vez disfrazados de "el modelo no converge".


Why Phase 6 exists at all

There's a strong temptation to skip "Python engineering" and dive directly into autograd. Don't.

Every single later phase in this curriculum will be written on top of NumPy. The way NumPy treats memory — when it copies, when it views, when it broadcasts silently — is the substrate Borja's autograd will sit on. The way Python treats references — what a = b actually does when b is a list, an array, a Tensor — is the substrate the autograd graph will sit on. The way the GIL interacts with NumPy's release-on-C-call semantics is what makes parallel data loading possible in Phase 18.

You can't see these substrates until you go looking for them. They are invisible by design — Python and NumPy are friendly languages that hide their cost models. The price of that friendliness is that bugs also hide.

The three kinds of bug this phase prevents

1. Silent shape bugs

The single most common AI bug, by far, is the shape mismatch that doesn't error. Consider:

y_true = np.array([1.0, 2.0, 3.0])      # shape (3,)
y_pred = np.array([[1.1], [2.1], [3.1]]) # shape (3, 1)
loss = ((y_true - y_pred) ** 2).mean()

The user's intent: pairwise squared difference, mean is a scalar. What Python does: broadcasts (3,) and (3,1) to (3,3) (yes, really — broadcasting aligns from the right, (3,) becomes (1,3) against (3,1), and the result is (3,3)). The .mean() returns a scalar all right, but it's the wrong scalar — averaged over 9 entries instead of 3, including 6 nonsensical cross terms.

The code runs. The number is wrong. The model trains on garbage.

The only defense is knowing the broadcasting rule cold. Lab 02 of this phase forces Borja to produce this bug intentionally so the pattern is burned in.

2. Aliasing bugs

a = np.arange(10)
b = a[::2]
b[0] = 999
print(a[0])  # 999

b is a view, not a copy. b[0] = 999 writes through to a. If a was a model weight and b was supposed to be a temporary normalized version, the model is now corrupted in-place, silently.

The only defense is knowing when NumPy copies and when it views. The full rule: indexing with slices and int indices returns a view; indexing with arrays of indices, boolean arrays, or np.copy/np.ascontiguousarray returns a copy. arr.flags.OWNDATA tells you directly.

3. Cost-model surprises

total = 0.0
for x in arr:           # arr is np.ndarray of 10 million fp32
    total += x ** 2

This is the right math. It's the wrong cost. A Python loop over a NumPy array creates 10 million Python float objects, each going through PyObject allocation, the ** dispatch, the += dispatch. Expect 30+ seconds.

total = (arr ** 2).sum()

Same math. ~10 ms. Three orders of magnitude.

The only defense is internalizing that Python-level iteration over a NumPy array is malpractice, and recognizing it on sight in code review. Lab 03 makes Borja measure the ratio across array sizes so the rule has a number on it.

What "Python for AI engineering" actually means here

It does not mean teaching Python syntax — Borja is an experienced developer.

It means treating NumPy as a piece of systems software with its own memory model, its own performance contract, and its own surprising behaviors, and learning that systems software the way a security engineer learns a CPU: from the bottom up, with measurements, with intentional bug-production.

The four theory pages of this phase walk that ladder:

  • 01-references-mutation-gil.md — Python's object model, reference semantics, GIL myths.
  • 02-strides-and-broadcasting.md — NumPy's (data, shape, strides, dtype, flags) quintuple; the broadcasting algorithm, formalized.
  • 03-vectorization-and-profiling.md — the cost model + four profilers, when each one earns its keep.

The labs make Borja produce, by hand, the bugs the theory warned about. The point is not to avoid bugs in lab — the point is to make the bugs in a controlled setting so they don't reappear in Phase 18 when the training-loop log is 80 lines and you can't tell which line is lying to you.

What this phase produces

Two small utility files that every later phase imports:

  • src/utils/seeding.py — a seed_everything(seed) that sets Python random, NumPy default_rng, and PYTHONHASHSEED. Returns the seed for logging.
  • src/utils/logging.py — a get_logger(name) returning a structlog-backed logger that writes JSON with a phase field. Every later experiment uses this; no print statements past Phase 6.

Both are <40 LOC. The point isn't the code — the point is that from this phase forward, reproducibility and structured logging are infrastructure, not afterthoughts.

One-paragraph recap

Phase 6 is the Python+NumPy engineering substrate that everything from Phase 7 onward sits on. The substrate is full of silent traps — broadcasting bugs, aliasing bugs, Python-loop overhead — and the only way to see them is to produce them intentionally with measurement. By the end of this phase, Borja owns two small utility modules (seeding, logging) and an internalized cost-and-shape model for any NumPy expression he'll write in the next 33 phases.


Next: 01-references-mutation-gil.md