Skip to content

English · Español

Phase 16 — Positional Encodings

Requires: 15 — Attention from Scratch Teaches: positional-encoding · rope · sinusoidal · extrapolation Jump to any chapter from the phase reference index.

Chapter map

Pre-written per A12; English verb grammar scope per A13. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 Attention sin posiciones es permutación-equivariante: para el modelo, I work he y he work I son el mismo conjunto de tokens reordenado — pero la respuesta correcta a ... he ___ (works) depende crucialmente de qué viene antes del slot. Esta fase añade la información posicional como una pieza separada al embedding (o, en RoPE, como una rotación de Q y K). Implementamos tres variantes y comparamos extrapolación más allá del largo de entrenamiento.


Goal

Implement three positional encoding schemes — sinusoidal, learned, RoPE — and compare their behavior. By the end of Phase 16:

  1. Borja can demonstrate permutation-equivariance of attention with a 3-token counter-example (on a verb-grammar fragment like I work he).
  2. All three PE schemes exist in src/minimodel/positional/.
  3. The RoPE relative-position property is verified numerically.
  4. Borja has a written argument for why Phase 17's Mini-GPT should use RoPE (preferred default) or sinusoidal (fallback).

Phase 16 is a tool-shed phase. Three implementations, one comparison experiment, decide which to use downstream. Plan for 8–15 study hours.

Read order

  1. theory/00-motivation.md — permutation-equivariance, the problem we're solving.
  2. theory/01-sinusoidal.md — Vaswani's original PE. Derive the formula.
  3. theory/02-learned-vs-sinusoidal.md — when each wins; why both fail at extrapolation; the gap RoPE/ALiBi fill.
  4. theory/03-rope.md — RoPE derivation, the relative-position property, the implementation trick.
  5. lab/00-permutation-equivariance.md — prove attention without PE doesn't distinguish word order.
  6. lab/01-sinusoidal-pe.md — implement and visualize sinusoidal PE.
  7. lab/02-rope-implementation.md — implement RoPE; verify the relative-position property numerically.
  8. lab/03-extrapolation-compare.md — head-to-head comparison beyond training length.

solutions/ is empty during pre-write.

Definition of Done

See PHASE_16_PLAN.md §6. Briefly:

  • Three PE schemes implemented.
  • Permutation-equivariance counter-example committed.
  • RoPE numerical verification passes.
  • Extrapolation comparison plot committed; pick a winner for Phase 17.

What this phase intentionally does NOT cover

  • T5-style relative position biases. One-paragraph mention in theory/02. Not implemented.
  • ALiBi. One-paragraph mention. Not implemented (optional Borja extension).
  • xPos, NoPE, other 2023+ variants. Out of scope.
  • 2D positions (vision transformers). Phase 22 territory if at all.
  • Position-aware training. We don't train in Phase 16. Forward-pass and pattern-comparison only.

Phase 16's scope is the three canonical positional encodings and their extrapolation behavior. Nothing more.

Further reading

Optional — enrichment, not required to pass the phase.