Skip to content

English · Español

Lab 02 — RoPE Implementation and the Relative-Position Property

Goal: implement RoPE in NumPy; verify numerically that \(\langle R_m q, R_n k \rangle = \langle q, R_{n-m} k \rangle\) to 1e-5.

Estimated time: 90–120 minutes.

Prereq: lab 01 committed; theory/03-rope.md read.


What you produce

A directory experiments/16-rope/ containing:

  • rope.py — your implementation, in src/minimodel/positional/rope.py.
  • verify.py — verification script.
  • verify_output.txt — captured printout.
  • relative_position_demo.png — plot showing relative-position property holds across \(m - n\).
  • manifest.json.
  • README.md.

TODOs

Block A — implementation

Per src/minimodel/positional/BLUEPRINT.md:

  • rope_frequencies(d_head: int, base: float = 10000.0) -> np.ndarray. Shape (d_head/2,).
  • precompute_rope(T: int, d_head: int) -> tuple[np.ndarray, np.ndarray]. Returns (cos_pe, sin_pe) of shape (T, d_head/2).
  • apply_rope(q, k, cos_pe, sin_pe) -> tuple[np.ndarray, np.ndarray]. Each of shape (T, d_head).

Implement using the "interleaved pairs" convention (see theory/03-rope.md): pair \((2k, 2k+1)\) rotated together.

def apply_rope(q, k, cos_pe, sin_pe):
    # q, k: (T, d_head)
    # cos_pe, sin_pe: (T, d_head // 2)
    q_even = q[:, 0::2]
    q_odd  = q[:, 1::2]
    q_rope_even = q_even * cos_pe - q_odd * sin_pe
    q_rope_odd  = q_even * sin_pe + q_odd * cos_pe
    q_rope = np.empty_like(q)
    q_rope[:, 0::2] = q_rope_even
    q_rope[:, 1::2] = q_rope_odd
    # same for k
    ...
    return q_rope, k_rope

About 15 LOC total for apply_rope. Hand-built; no shortcuts.

Block B — verify the relative-position property

This is the key test. The property: \(\langle R_m q, R_n k \rangle = \langle q, R_{n-m} k \rangle\) for any \(q, k, m, n\).

  • Pick d_head = 16, \(T = 32\).
  • Generate random q, k of shape (d_head,) with seed 0.
  • Pre-compute cos_pe, sin_pe = precompute_rope(T=32, d_head=16).
  • For each pair \((m, n) \in \{(0, 1), (3, 7), (5, 5), (10, 12), (20, 8)\}\):
  • Compute q_m = apply_rope_single(q, cos_pe[m], sin_pe[m]) and k_n = apply_rope_single(k, cos_pe[n], sin_pe[n]).
  • Compute lhs = q_m @ k_n.
  • Compute k_diff = apply_rope_single(k, cos_pe[n - m], sin_pe[n - m]) if \(n \geq m\), else use cos_pe[-(m-n)] with \(\sin\) sign-flipped.
  • Compute rhs = q @ k_diff.
  • Assert abs(lhs - rhs) < 1e-5.
  • Print all five pairs in a table.

(apply_rope_single is a single-position version of apply_rope for convenience. Or implement the full apply_rope and slice.)

Block C — plot the relative-position property

  • For fixed \(m = 0\), vary \(n\) from 0 to 31.
  • At each \(n\), compute \(\langle R_m q, R_n k \rangle\).
  • Plot as a function of \(n - m\).
  • Overlay \(\langle q, R_{n-m} k \rangle\) on the same axes.
  • The two curves must coincide exactly (within 1e-5).
  • Save as relative_position_demo.png.

Block D — sanity: RoPE preserves norm

Rotations are orthogonal: \(\|R q\| = \|q\|\).

  • Random q of shape (d_head,).
  • For each position \(p \in \{0, 1, 5, 100\}\), compute the rotated version.
  • Assert np.allclose(np.linalg.norm(q_rotated), np.linalg.norm(q), atol=1e-6).

Block E — write up

In README.md:

  1. State the relative-position property. In one sentence.
  2. Report the max diff across the five test cases in Block B. Should be < 1e-5.
  3. Confirm norm preservation (Block D).
  4. Why does V not get rotated? From theory/03-rope.md, in your own words.

Block F — manifest

{
  "experiment": "16-rope",
  "date": "YYYY-MM-DD",
  "seed": 0,
  "versions": { "python": "3.11.x", "numpy": "X.Y.Z" },
  "config": {
    "d_head": 16,
    "T": 32,
    "rope_base": 10000.0
  },
  "results_summary": {
    "relative_position_max_diff": null,
    "norm_preservation_max_diff": null
  }
}

Constraints

  • No PyTorch.
  • Interleaved pairs convention. Document this clearly in README.md. Other implementations use "split-half" convention; that's also valid but the two are not bit-equivalent.
  • No fancy einsums. Use explicit indexing and concatenation. Vectorized but readable.

Stop conditions

Done when:

  1. All six files committed.
  2. Block B assertions all pass (max diff < 1e-5).
  3. Block D norm-preservation passes.
  4. relative_position_demo.png shows two coinciding curves.
  5. README.md answers all four Block E questions.

Pitfalls

  • Sign convention. The rotation matrix in theory/03-rope.md uses \(\begin{pmatrix} \cos & -\sin \\ \sin & \cos \end{pmatrix}\). This is "counter-clockwise". Some references use clockwise (signs flipped). Stick with the theory file's convention.
  • apply_rope_single(q, cos_pe[m], sin_pe[m]). When m > n, the relative position is negative. Rotation by \(-\theta\) has \(\sin(-\theta) = -\sin(\theta)\). Handle the sign correctly when computing the rhs.
  • cos_pe[n - m] requires \(n - m \geq 0\). For the test cases with \(n < m\), compute the rotation manually rather than indexing.
  • Float precision. With float32 and large positions, the trigonometric values lose precision. For Phase 16's small T this is fine; flag for Phase 22+ if Borja extends to long contexts.

When to consult solutions/

After all six files committed and assertions pass. Solution at solutions/02-rope-ref.md.


Next lab: 03-extrapolation-compare.md.