English · Español
Lab 01 — Strides and Views¶
Goal: see that
arr.Tis O(1) and thatnp.ascontiguousarray(arr.T)is O(n), by measurement. Produce, intentionally, an aliasing bug. Build muscle memory forflags.OWNDATA.Estimated time: 45–60 minutes.
Prereqs: lab 00.
What you produce¶
A directory experiments/06-strides-and-views/ containing:
measure.py— your benchmark script.aliasing.py— a 20-line script that produces the aliasing bug intentionally and prints proof.results.json— measured times forarr.Tvsnp.ascontiguousarray(arr.T)vsarr.copy()across sizes.transpose-cost.png— log-log plot of time vs array size for all three operations.manifest.json—{seed, versions, config, hardware}.README.md(3–4 paragraphs) — what you measured, what the aliasing bug demonstrated, three sentences of interpretation.
TODOs¶
Block A — measure transpose vs contiguous-transpose¶
- Allocate
a = np.random.default_rng(42).standard_normal((N, N), dtype=np.float32)forN ∈ {64, 128, 256, 512, 1024, 2048, 4096}(7 sizes). - For each
N, time three operations (repeat each ≥10× after one warm-up): b = a.T(just relabel)c = np.ascontiguousarray(a.T)(copy with new layout)d = a.copy()(copy without transposing — should match (2) asymptotically)- Record
N, n_bytes (= 4·N²), op, time_ns_per_call. Save toresults.json. - Assert in your script:
b.base is a,c.base is not a,c.flags.C_CONTIGUOUS is True,b.flags.C_CONTIGUOUS is False.
Block B — plot¶
- matplotlib. x-axis:
n_bytesin MiB (log scale). y-axis: time in ms (log scale). - Three lines:
a.T,ascontiguousarray(a.T),a.copy(). Markers + lines. -
a.Tshould be flat near 1 μs (independent ofN). The other two should grow linearly withn_bytes. - Save as
transpose-cost.png.
Block C — aliasing bug, on purpose¶
In aliasing.py:
- Allocate
a = np.arange(20, dtype=np.int32). - Take
b = a[::2](every other element). - Assert
b.flags.OWNDATA is Falseandb.base is a. - Mutate:
b[0] = -999. - Print
a— show thata[0]is now-999. - Then show the safe pattern:
c = a[::2].copy(). Mutatec. Show thatais unaffected. - Write a 1-sentence comment at the top of
aliasing.pydescribing the bug.
Block D — as_strided demo (optional, exploratory)¶
If you have time, use numpy.lib.stride_tricks.sliding_window_view to create a length-3 rolling-window view of a 10-element array. Print it. Confirm via np.shares_memory that it shares the base buffer. Do not write into it.
Block E — manifest¶
Standard schema (see lab 00 for the template). Include in config:
"sizes_N": [64, 128, 256, 512, 1024, 2048, 4096],
"ops": ["transpose", "ascontiguousarray_T", "copy"],
"warmup_iters": 1,
"min_repeats": 10
Constraints¶
- fp32, not fp64. We want to match later phases' tensor dtype.
- Same RNG seed for all sizes. Reproducibility.
- CPU governor
performance. As in Phase 1 lab. - No threading. Single-threaded measurement.
Expected results (rough, for the i5-8250U)¶
| Op | N=1024 | N=4096 |
|---|---|---|
a.T |
~1 μs | ~1 μs |
np.ascontiguousarray(a.T) |
~3 ms | ~70 ms |
a.copy() |
~2 ms | ~30 ms |
The first row should be flat. The other two should scale roughly with N². ascontiguousarray(a.T) is slower than plain copy() because the transpose walks the buffer in non-unit-stride order (poor cache behavior).
Stop conditions¶
Done when:
transpose-cost.pngshows the flat-vs-linear contrast clearly.aliasing.pyruns end-to-end and the printedaarray reflects the write throughb.README.mdinterprets the curve shape (one sentence per line: "a.Tis flat because…", "ascontiguousarray(a.T)scales linearly because…", "copy()is slightly faster because…").
Pitfalls¶
- JIT warmup. First call to
np.ascontiguousarraymay include library load time. Run one warm-up before the timing loop. - GC during timing. Wrap timed blocks with
gc.disable()/gc.enable()for repeatability. - Misleading large-N numbers from swap. A 4096×4096 fp32 is 64 MiB — fine on 62 GiB RAM. But if you push to N=16384 (1 GiB), make sure you're not hitting swap. Watch
free -h. b = a.T; b[0, 0] = 99should mutatea. If it doesn't, you accidentally took a copy somewhere. Checkb.base.- Forgetting OWNDATA proof. The whole point of the aliasing block is to make the bug visible. Print
b.flagsbefore the mutation.
When to consult solutions/¶
After your transpose-cost.png exists, your aliasing.py runs, and your README.md has the three interpretation sentences. Then solutions/01-strides-and-views-ref.md (written at phase open) shows the reference timings and the canonical aliasing-bug walkthrough.
Next lab: lab/02-broadcasting-trap.md.