Skip to content

English · Español

Break — Silent broadcasting trap with a column vector

🇪🇸 La trampa más sutil de NumPy: sumar un vector de shape (3,) a una matriz (3, 4) no hace lo que crees. La forma compatible es (3, 1). Sin el [:, None], broadcasting elige el otro eje y los números salen mal — pero no hay error.

Target: any §A13 logits batch + bias addition. Typical setup: logits has shape (3 persons, 4 features); a per-person bias has shape (3,). You want to add the bias to each row.

Hypothesis

The learner predicts: "logits + bias where logits.shape = (3, 4) and bias.shape = (3,) will not add the bias row-wise. NumPy will align (3,) to the trailing axis (length 4), which is incompatible — so it broadcasts against the column axis (length 3) instead, producing the wrong sum or, depending on shapes, a crash."

The break

In a function that should add a per-row bias:

 def add_row_bias(logits: np.ndarray, bias: np.ndarray) -> np.ndarray:
-    # logits: (P, F); bias: (P,) -> reshape to (P, 1) so broadcasting works
-    return logits + bias[:, None]
+    return logits + bias    # /break: relies on accidental shape alignment

Run procedure

uv run python -c "
import numpy as np

# Case A: P=3 persons, F=4 features. bias is per-person.
logits_A = np.array([[1., 2., 3., 4.],
                     [5., 6., 7., 8.],
                     [9.,10.,11.,12.]])
bias_A = np.array([100., 200., 300.])  # one per person

# Case B: P=3, F=3 (square). Same bias.
logits_B = np.array([[1., 2., 3.],
                     [4., 5., 6.],
                     [7., 8., 9.]])
bias_B = np.array([100., 200., 300.])

print('--- Case A (3x4) + (3,) ---')
try:
    print(logits_A + bias_A)
except Exception as e:
    print('CRASH:', e)

print('--- Case B (3x3) + (3,) ---')
print(logits_B + bias_B)
print('--- correct: row-wise add ---')
print(logits_B + bias_B[:, None])
"

Expected failure mode

--- Case A (3x4) + (3,) ---
CRASH: operands could not be broadcast together with shapes (3,4) (3,)

--- Case B (3x3) + (3,) ---
[[101. 202. 303.]
 [104. 205. 306.]
 [107. 208. 309.]]      <-- biases applied COLUMN-WISE, not row-wise!

--- correct: row-wise add ---
[[101. 102. 103.]
 [204. 205. 206.]
 [307. 308. 309.]]      <-- bias 100 to row 0, 200 to row 1, 300 to row 2

Case A crashes loudly (good — easy to catch). Case B is the trap: the shapes accidentally line up because the matrix is square, but the bias is applied column-wise instead of row-wise. No error, wrong answer. This is the bug that silently ships.

Diagnostic

From logs alone:

  1. Print the shapes of operands before every elementwise op at least in dev. print(f'{logits.shape=} {bias.shape=}'). Catches the trap in 5 seconds.
  2. Write a known-answer test with a non-square shape. Square matrices hide many broadcasting bugs; rectangular ones expose them.
  3. Use np.broadcast_shapes(a.shape, b.shape) to see what NumPy will produce. If it disagrees with your mental model, fix the alignment.
  4. Add a property test: for random (P, F) with P != F, add_row_bias(logits, bias).shape == (P, F) and (add_row_bias(logits, bias) - logits)[i, :] == bias[i] for every row i.

Lesson

NumPy broadcasting aligns shapes from the rightmost axis. A (3,) aligns to the last axis of the other operand. If the last axis has length 3 (square matrix), the math "works" but is column-wise, not row-wise.

The fix is one character: bias[:, None] (or bias.reshape(-1, 1), or bias[:, np.newaxis]). It reshapes (3,) to (3, 1), which broadcasts unambiguously against (3, 4)(3, 4) row-wise.

This is the same trap as Phase 8's tensor broadcasting (where backward gradients have to sum along the broadcast axis). Learn it here at the cost of a debugging session; in Phase 8 it would cost a 4-hour gradcheck failure to diagnose.

References