English · Español
Lab 01 — Sinusoidal Positional Encoding¶
Goal: implement sinusoidal PE; visualize it as a heatmap; plot the dot-product-decay-with-distance property.
Estimated time: 45–60 minutes.
Prereq: lab 00 committed;
theory/01-sinusoidal.mdread.
What you produce¶
A directory experiments/16-sinusoidal/ containing:
sinusoidal.py— your implementation, insrc/minimodel/positional/sinusoidal.py.visualize.py— script that builds PE and plots it.pe_heatmap.png— heatmap of the PE matrix.dot_product_decay.png— \(\text{PE}(0) \cdot \text{PE}(p)\) vs \(p\).manifest.json.README.md.
TODOs¶
Block A — implementation¶
- In
src/minimodel/positional/sinusoidal.py, implementsinusoidal_pe(T: int, d: int) -> np.ndarrayper the spec insrc/minimodel/positional/BLUEPRINT.md. - Six lines of body. Vectorized over T and d/2.
- Assert
d % 2 == 0.
Block B — heatmap¶
- Build
pe = sinusoidal_pe(T=64, d=32). - Plot as a heatmap: rows = positions, columns = dimensions, color = PE value.
- Use
cmap='RdBu_r', range[-1, 1]. - Annotate the high-frequency columns (left) and low-frequency columns (right).
- Save as
pe_heatmap.png.
Expected: the leftmost columns oscillate fast (visible stripes within the visible range); the rightmost columns are nearly constant over the 64 positions.
Block C — dot-product decay¶
- Build
pe = sinusoidal_pe(T=128, d=32). - Compute
decay[p] = pe[0] @ pe[p]for \(p = 0, 1, \ldots, 127\). - Plot
decayvs \(p\), linear scales. - Annotate the peak at \(p = 0\) (value = \(d/2\), ish), and the oscillating decay.
- Save as
dot_product_decay.png.
Expected: a high peak at \(p = 0\), oscillating decay to near-zero by \(p = 100\). The shape illustrates the "nearby positions look similar, distant positions look unrelated" property.
Block D — verify the linear-shift property¶
The theory file proves \(\text{PE}(p + \Delta)\) is a fixed-rotation of \(\text{PE}(p)\). Verify numerically:
- Build
pe = sinusoidal_pe(T=10, d=8). - For each pair (k = 0, 1, 2, 3) of dimensions \((2k, 2k+1)\), compute the rotation matrix from PE(0) to PE(1) — call it \(R_k\).
- Verify that the same \(R_k\) takes PE(1) → PE(2), PE(2) → PE(3), etc., for all positions.
- Assert max diff < 1e-5.
Block E — write up¶
In README.md:
- Describe the heatmap. What pattern do you see in the high-frequency vs low-frequency columns?
- Describe the dot-product decay. Where does it cross zero? Where does it stabilize?
- Confirm the linear-shift property. State the max diff from Block D.
Block F — manifest¶
{
"experiment": "16-sinusoidal",
"date": "YYYY-MM-DD",
"seed": 0,
"versions": { "python": "3.11.x", "numpy": "X.Y.Z", "matplotlib": "X.Y.Z" },
"config": {
"T_heatmap": 64,
"d_heatmap": 32,
"T_decay": 128,
"d_decay": 32
},
"results_summary": {
"linear_shift_max_diff": null,
"decay_zero_crossing_at_p": null
}
}
Constraints¶
- No PyTorch.
- Vectorized. No
for p in range(T)loop in the PE generator.
Stop conditions¶
Done when:
- All six files committed.
- Linear-shift assertion passes.
- Both plots saved.
README.mdanswers all three Block E questions.
Pitfalls¶
- Frequency formula.
omega_k = 1 / 10000**(2*k/d)— note the exponent depends on \(d\) (the total model dimension), not on \(d_\text{head}\). Easy to get wrong. - Sin/cos interleaving.
pe[:, 0::2] = sin(...),pe[:, 1::2] = cos(...). Some implementations splitpe[:, :d/2] = sin, pe[:, d/2:] = cos— equivalent up to a permutation of dimensions, but you need to be consistent with how the model consumes it. omega_kshape.(d/2,), broadcast with positions(T, 1)to get angles(T, d/2).
When to consult solutions/¶
After all six files committed and the linear-shift assertion passes. Solution at solutions/01-sinusoidal-ref.md.
Next lab: 02-rope-implementation.md.