Skip to content

Lynx Cortex

Quizzes

borjatarraso/lynx-cortex

English · Español

Phase 16 — Quiz (human-readable mirror)¶

🇪🇸 Espejo legible del canónico data/quizzes/phase-16-positional-encodings.yaml.

Source: data/quizzes/phase-16-positional-encodings.yaml.

q-16-01 — Why is attention permutation-equivariant without PE? (single)¶

softmax(P Q K^T P^T / sqrt(d_k)) P V = P softmax(Q K^T / sqrt(d_k)) V ✓
softmax(Q K^T / sqrt(d_k)) V = softmax(Q^T K / sqrt(d_k)) V
softmax(Q K^T) V = V softmax(Q K^T)
softmax is invariant under any unitary transformation

Permutation P commutes through the entire attention block — no operation depends on absolute position.

q-16-02 — What does RoPE encode that sinusoidal does not? (multi)¶

RoPE encodes relative position via a dot-product identity ✓
RoPE is applied inside the attention computation, not at the input ✓
RoPE has learnable parameters; sinusoidal does not
RoPE extrapolates better to lengths beyond the training distribution ✓

Both RoPE and vanilla sinusoidal have zero learnable parameters.

q-16-03 — Find the bug: `cos(PE[t], PE[t+1]) ≈ 0` (free)¶

Expected to contain: smooth.

Sinusoidal PE has smooth phase relationships — adjacent positions should be cosine-similar. Orthogonal adjacent rows mean PE was shuffled or randomized.

q-16-04 — When does RoPE win on §A13? (single)¶

Sinusoidal (it's the original)
RoPE (relative position extrapolates better) ✓
They are identical in extrapolation
Learned PE with weight decay

RoPE's relative-position identity stays in-distribution for any length. Sinusoidal at T=20 is OOD.

q-16-05 — RoPE: relative-position identity (single)¶

For RoPE, the dot product (R_θ(t) q) · (R_θ(s) k) depends only on which quantity?

t + s
t - s ✓
t × s
max(t, s)

R_θ(t)^T R_θ(s) = R_θ(s - t). The dot product depends only on the relative offset. Su et al. 2021 §3.4.