English · Español

Lab 01 — Sinusoidal Positional Encoding¶

Goal: implement sinusoidal PE; visualize it as a heatmap; plot the dot-product-decay-with-distance property.

Estimated time: 45–60 minutes.

Prereq: lab 00 committed; theory/01-sinusoidal.md read.

What you produce¶

A directory experiments/16-sinusoidal/ containing:

sinusoidal.py — your implementation, in src/minimodel/positional/sinusoidal.py.
visualize.py — script that builds PE and plots it.
pe_heatmap.png — heatmap of the PE matrix.
dot_product_decay.png — \(\text{PE}(0) \cdot \text{PE}(p)\) vs \(p\).
manifest.json.
README.md.

TODOs¶

Block A — implementation¶

In src/minimodel/positional/sinusoidal.py, implement sinusoidal_pe(T: int, d: int) -> np.ndarray per the spec in src/minimodel/positional/BLUEPRINT.md.
Six lines of body. Vectorized over T and d/2.
Assert d % 2 == 0.

Block B — heatmap¶

Build pe = sinusoidal_pe(T=64, d=32).
Plot as a heatmap: rows = positions, columns = dimensions, color = PE value.
Use cmap='RdBu_r', range [-1, 1].
Annotate the high-frequency columns (left) and low-frequency columns (right).
Save as pe_heatmap.png.

Expected: the leftmost columns oscillate fast (visible stripes within the visible range); the rightmost columns are nearly constant over the 64 positions.

Block C — dot-product decay¶

Build pe = sinusoidal_pe(T=128, d=32).
Compute decay[p] = pe[0] @ pe[p] for \(p = 0, 1, \ldots, 127\).
Plot decay vs \(p\), linear scales.
Annotate the peak at \(p = 0\) (value = \(d/2\), ish), and the oscillating decay.
Save as dot_product_decay.png.

Expected: a high peak at \(p = 0\), oscillating decay to near-zero by \(p = 100\). The shape illustrates the "nearby positions look similar, distant positions look unrelated" property.

Block D — verify the linear-shift property¶

The theory file proves \(\text{PE}(p + \Delta)\) is a fixed-rotation of \(\text{PE}(p)\). Verify numerically:

Build pe = sinusoidal_pe(T=10, d=8).
For each pair (k = 0, 1, 2, 3) of dimensions \((2k, 2k+1)\), compute the rotation matrix from PE(0) to PE(1) — call it \(R_k\).
Verify that the same \(R_k\) takes PE(1) → PE(2), PE(2) → PE(3), etc., for all positions.
Assert max diff < 1e-5.

Block E — write up¶

In README.md:

Describe the heatmap. What pattern do you see in the high-frequency vs low-frequency columns?
Describe the dot-product decay. Where does it cross zero? Where does it stabilize?
Confirm the linear-shift property. State the max diff from Block D.

Block F — manifest¶

{
  "experiment": "16-sinusoidal",
  "date": "YYYY-MM-DD",
  "seed": 0,
  "versions": { "python": "3.11.x", "numpy": "X.Y.Z", "matplotlib": "X.Y.Z" },
  "config": {
    "T_heatmap": 64,
    "d_heatmap": 32,
    "T_decay": 128,
    "d_decay": 32
  },
  "results_summary": {
    "linear_shift_max_diff": null,
    "decay_zero_crossing_at_p": null
  }
}

Constraints¶

No PyTorch.
Vectorized. No for p in range(T) loop in the PE generator.

Stop conditions¶

Done when:

All six files committed.
Linear-shift assertion passes.
Both plots saved.
README.md answers all three Block E questions.

Pitfalls¶

Frequency formula. omega_k = 1 / 10000**(2*k/d) — note the exponent depends on \(d\) (the total model dimension), not on \(d_\text{head}\). Easy to get wrong.
Sin/cos interleaving. pe[:, 0::2] = sin(...), pe[:, 1::2] = cos(...). Some implementations split pe[:, :d/2] = sin, pe[:, d/2:] = cos — equivalent up to a permutation of dimensions, but you need to be consistent with how the model consumes it.
omega_k shape. (d/2,), broadcast with positions (T, 1) to get angles (T, d/2).

When to consult `solutions/`¶

After all six files committed and the linear-shift assertion passes. Solution at solutions/01-sinusoidal-ref.md.

Next lab: 02-rope-implementation.md.