Skip to content

English · Español

Phase 28 — Fine-Tuning, LoRA, QLoRA

Requires: 26 — Quantization Deep Dive · 27 — Modern Attention Optimizations Teaches: fine-tuning · sft · lora · qlora · catastrophic-forgetting Jump to any chapter from the phase reference index.

Chapter map

Pre-written per A12. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 LoRA = adaptación barata de modelos pretrenados. Dos matrices pequeñas reemplazan miles de millones de parámetros congelados durante el fine-tune. QLoRA = LoRA pero con el modelo base en NF4. Es así como el ecosistema open-source ajusta modelos en hardware modesto. En esta fase: especializar MiniGPT en corregir conjugaciones de verbos irregulares (goed → went).


Goal

Take the MiniGPT model from Phase 17 and fine-tune it on the irregular-verb subset of the Phase 12 corpus using LoRA. The goal is two-fold: (1) demonstrate that LoRA's low-rank update is sufficient to push irregular-verb conjugation accuracy up by 10+ percentage points (the model reliably scores went above goed, ate above eated, etc., for the 8 irregular verbs be, have, do, go, come, see, eat, write), and (2) demonstrate that LoRA preserves base-model knowledge on an orthogonal control task — the 12 regular verbs (work, play, walk, talk, listen, watch, study, finish, start, look, want, like) should still conjugate correctly with -ed (no catastrophic forgetting).

This phase also previews QLoRA: combining Phase 26's NF4 quantizer (src/miniquant/) with this phase's LoRA gives the standard recipe for fine-tuning 7B+ models on a single 24-GB consumer GPU. For MiniGPT the absolute memory savings are modest — what matters is internalizing the recipe and the ratios.

Module placement

The fine-tuning toolkit lives in src/minituner/ (new in this phase): lora.py, qlora.py, train.py, data.py. The QLoRA composer reads from src/miniquant/ (Phase 26). See src/minituner/BLUEPRINT.md for the API contract.

Read order

  1. theory/00-motivation.md — why fine-tuning needs reform; the over-parameterization observation.
  2. theory/01-sft-and-forgetting.md — supervised fine-tuning and catastrophic forgetting.
  3. theory/02-parameter-count.md — derive LoRA's parameter count savings.
  4. theory/03-memory-footprint.md — derive LoRA and QLoRA memory savings in detail.
  5. theory/04-alignment-survey.md — concept survey: DPO, ORPO, SimPO, RLHF, RLAIF. Reading-only.
  6. lab/00-lora-by-hand.md — implement LoRALinear from scratch and verify the identity property at init.
  7. lab/01-lora-counts.md — derive and measure parameter / memory counts for MiniGPT under full FT vs LoRA vs QLoRA.
  8. lab/02-lora-finetune.md — train MiniGPT + LoRA on the irregular-verb subset; measure task accuracy + regular-verb PPL drift; sweep rank.
  9. lab/03-qlora-preview.md — combine NF4 base with LoRA; measure memory; verify outputs agree with LoRA-only within quantization error.

solutions/ populated at phase open after src/minituner/ API stabilizes.

Definition of Done

See PHASE_28_PLAN.md §6. Briefly:

  • src/minituner/lora.py + src/minituner/qlora.py implemented (Borja).
  • Fine-tune raises irregular-verb task accuracy ≥ 10 points; regular-verb control PPL drift ≤ 5%.
  • Rank ablation plot committed.
  • QLoRA preview run completes with measured memory savings.
  • Adapter-only checkpoint < 1% of base checkpoint size.

What this phase intentionally does NOT cover

  • RLHF / DPO / PPO actually implemented. Concept-only survey in theory 04. Implementing alignment training is its own phase; for now we want SFT mechanics solid.
  • Adapter methods other than LoRA (prefix tuning, prompt tuning, IA³). Mentioned briefly; LoRA is the workhorse and the only one we implement.
  • Multi-adapter management (loading/unloading adapters at serve time). Phase 31 territory.
  • Fine-tuning a 7B+ model. Our model is MiniGPT (n_layer=4, n_head=4, d_model=64). The math scales; the experiment doesn't need to.
  • Hyperparameter search at scale. One reasonable LR + one reasonable batch size + one decay schedule. Rank is the only swept knob.
  • Reward model training. Out of scope.
  • Plural pronouns or tenses beyond §A13's 5 (infinitive, present simple, past simple, past participle, simple future). The corpus is what it is.

Phase 28's scope is LoRA mechanics, applied to MiniGPT, with QLoRA as a memory-saving variant, specialized on irregular-verb conjugation. Nothing more.

Further reading

Optional — enrichment, not required to pass the phase.