Skip to content

English · Español

Lab 01 — Sinusoidal Positional Encoding

Goal: implement sinusoidal PE; visualize it as a heatmap; plot the dot-product-decay-with-distance property.

Estimated time: 45–60 minutes.

Prereq: lab 00 committed; theory/01-sinusoidal.md read.


What you produce

A directory experiments/16-sinusoidal/ containing:

  • sinusoidal.py — your implementation, in src/minimodel/positional/sinusoidal.py.
  • visualize.py — script that builds PE and plots it.
  • pe_heatmap.png — heatmap of the PE matrix.
  • dot_product_decay.png\(\text{PE}(0) \cdot \text{PE}(p)\) vs \(p\).
  • manifest.json.
  • README.md.

TODOs

Block A — implementation

  • In src/minimodel/positional/sinusoidal.py, implement sinusoidal_pe(T: int, d: int) -> np.ndarray per the spec in src/minimodel/positional/BLUEPRINT.md.
  • Six lines of body. Vectorized over T and d/2.
  • Assert d % 2 == 0.

Block B — heatmap

  • Build pe = sinusoidal_pe(T=64, d=32).
  • Plot as a heatmap: rows = positions, columns = dimensions, color = PE value.
  • Use cmap='RdBu_r', range [-1, 1].
  • Annotate the high-frequency columns (left) and low-frequency columns (right).
  • Save as pe_heatmap.png.

Expected: the leftmost columns oscillate fast (visible stripes within the visible range); the rightmost columns are nearly constant over the 64 positions.

Block C — dot-product decay

  • Build pe = sinusoidal_pe(T=128, d=32).
  • Compute decay[p] = pe[0] @ pe[p] for \(p = 0, 1, \ldots, 127\).
  • Plot decay vs \(p\), linear scales.
  • Annotate the peak at \(p = 0\) (value = \(d/2\), ish), and the oscillating decay.
  • Save as dot_product_decay.png.

Expected: a high peak at \(p = 0\), oscillating decay to near-zero by \(p = 100\). The shape illustrates the "nearby positions look similar, distant positions look unrelated" property.

Block D — verify the linear-shift property

The theory file proves \(\text{PE}(p + \Delta)\) is a fixed-rotation of \(\text{PE}(p)\). Verify numerically:

  • Build pe = sinusoidal_pe(T=10, d=8).
  • For each pair (k = 0, 1, 2, 3) of dimensions \((2k, 2k+1)\), compute the rotation matrix from PE(0) to PE(1) — call it \(R_k\).
  • Verify that the same \(R_k\) takes PE(1) → PE(2), PE(2) → PE(3), etc., for all positions.
  • Assert max diff < 1e-5.

Block E — write up

In README.md:

  1. Describe the heatmap. What pattern do you see in the high-frequency vs low-frequency columns?
  2. Describe the dot-product decay. Where does it cross zero? Where does it stabilize?
  3. Confirm the linear-shift property. State the max diff from Block D.

Block F — manifest

{
  "experiment": "16-sinusoidal",
  "date": "YYYY-MM-DD",
  "seed": 0,
  "versions": { "python": "3.11.x", "numpy": "X.Y.Z", "matplotlib": "X.Y.Z" },
  "config": {
    "T_heatmap": 64,
    "d_heatmap": 32,
    "T_decay": 128,
    "d_decay": 32
  },
  "results_summary": {
    "linear_shift_max_diff": null,
    "decay_zero_crossing_at_p": null
  }
}

Constraints

  • No PyTorch.
  • Vectorized. No for p in range(T) loop in the PE generator.

Stop conditions

Done when:

  1. All six files committed.
  2. Linear-shift assertion passes.
  3. Both plots saved.
  4. README.md answers all three Block E questions.

Pitfalls

  • Frequency formula. omega_k = 1 / 10000**(2*k/d) — note the exponent depends on \(d\) (the total model dimension), not on \(d_\text{head}\). Easy to get wrong.
  • Sin/cos interleaving. pe[:, 0::2] = sin(...), pe[:, 1::2] = cos(...). Some implementations split pe[:, :d/2] = sin, pe[:, d/2:] = cos — equivalent up to a permutation of dimensions, but you need to be consistent with how the model consumes it.
  • omega_k shape. (d/2,), broadcast with positions (T, 1) to get angles (T, d/2).

When to consult solutions/

After all six files committed and the linear-shift assertion passes. Solution at solutions/01-sinusoidal-ref.md.


Next lab: 02-rope-implementation.md.