English · Español
Break — Silent broadcasting trap with a column vector¶
🇪🇸 La trampa más sutil de NumPy: sumar un vector de shape
(3,)a una matriz(3, 4)no hace lo que crees. La forma compatible es(3, 1). Sin el[:, None], broadcasting elige el otro eje y los números salen mal — pero no hay error.
Target: any §A13 logits batch + bias addition. Typical setup: logits has shape (3 persons, 4 features); a per-person bias has shape (3,). You want to add the bias to each row.
Hypothesis¶
The learner predicts: "logits + bias where logits.shape = (3, 4) and bias.shape = (3,) will not add the bias row-wise. NumPy will align (3,) to the trailing axis (length 4), which is incompatible — so it broadcasts against the column axis (length 3) instead, producing the wrong sum or, depending on shapes, a crash."
The break¶
In a function that should add a per-row bias:
def add_row_bias(logits: np.ndarray, bias: np.ndarray) -> np.ndarray:
- # logits: (P, F); bias: (P,) -> reshape to (P, 1) so broadcasting works
- return logits + bias[:, None]
+ return logits + bias # /break: relies on accidental shape alignment
Run procedure¶
uv run python -c "
import numpy as np
# Case A: P=3 persons, F=4 features. bias is per-person.
logits_A = np.array([[1., 2., 3., 4.],
[5., 6., 7., 8.],
[9.,10.,11.,12.]])
bias_A = np.array([100., 200., 300.]) # one per person
# Case B: P=3, F=3 (square). Same bias.
logits_B = np.array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
bias_B = np.array([100., 200., 300.])
print('--- Case A (3x4) + (3,) ---')
try:
print(logits_A + bias_A)
except Exception as e:
print('CRASH:', e)
print('--- Case B (3x3) + (3,) ---')
print(logits_B + bias_B)
print('--- correct: row-wise add ---')
print(logits_B + bias_B[:, None])
"
Expected failure mode¶
--- Case A (3x4) + (3,) ---
CRASH: operands could not be broadcast together with shapes (3,4) (3,)
--- Case B (3x3) + (3,) ---
[[101. 202. 303.]
[104. 205. 306.]
[107. 208. 309.]] <-- biases applied COLUMN-WISE, not row-wise!
--- correct: row-wise add ---
[[101. 102. 103.]
[204. 205. 206.]
[307. 308. 309.]] <-- bias 100 to row 0, 200 to row 1, 300 to row 2
Case A crashes loudly (good — easy to catch). Case B is the trap: the shapes accidentally line up because the matrix is square, but the bias is applied column-wise instead of row-wise. No error, wrong answer. This is the bug that silently ships.
Diagnostic¶
From logs alone:
- Print the shapes of operands before every elementwise op at least in dev.
print(f'{logits.shape=} {bias.shape=}'). Catches the trap in 5 seconds. - Write a known-answer test with a non-square shape. Square matrices hide many broadcasting bugs; rectangular ones expose them.
- Use
np.broadcast_shapes(a.shape, b.shape)to see what NumPy will produce. If it disagrees with your mental model, fix the alignment. - Add a property test: for random
(P, F)withP != F,add_row_bias(logits, bias).shape == (P, F)and(add_row_bias(logits, bias) - logits)[i, :] == bias[i]for every rowi.
Lesson¶
NumPy broadcasting aligns shapes from the rightmost axis. A (3,) aligns to the last axis of the other operand. If the last axis has length 3 (square matrix), the math "works" but is column-wise, not row-wise.
The fix is one character: bias[:, None] (or bias.reshape(-1, 1), or bias[:, np.newaxis]). It reshapes (3,) to (3, 1), which broadcasts unambiguously against (3, 4) → (3, 4) row-wise.
This is the same trap as Phase 8's tensor broadcasting (where backward gradients have to sum along the broadcast axis). Learn it here at the cost of a debugging session; in Phase 8 it would cost a 4-hour gradcheck failure to diagnose.
References¶
- NumPy broadcasting docs: https://numpy.org/doc/stable/user/basics.broadcasting.html — Figure 4 shows the alignment rule visually.
- The lab file
lab/02-broadcasting-trap.mdin this phase is built around exactly this failure mode.