English · Español
Phase 28 — Fine-Tuning, LoRA, QLoRA¶
Requires: 26 — Quantization Deep Dive · 27 — Modern Attention Optimizations Teaches:
fine-tuning·sft·lora·qlora·catastrophic-forgettingJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per A12. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.
🇪🇸 LoRA = adaptación barata de modelos pretrenados. Dos matrices pequeñas reemplazan miles de millones de parámetros congelados durante el fine-tune. QLoRA = LoRA pero con el modelo base en NF4. Es así como el ecosistema open-source ajusta modelos en hardware modesto. En esta fase: especializar MiniGPT en corregir conjugaciones de verbos irregulares (
goed → went).
Goal¶
Take the MiniGPT model from Phase 17 and fine-tune it on the irregular-verb subset of the Phase 12 corpus using LoRA. The goal is two-fold: (1) demonstrate that LoRA's low-rank update is sufficient to push irregular-verb conjugation accuracy up by 10+ percentage points (the model reliably scores went above goed, ate above eated, etc., for the 8 irregular verbs be, have, do, go, come, see, eat, write), and (2) demonstrate that LoRA preserves base-model knowledge on an orthogonal control task — the 12 regular verbs (work, play, walk, talk, listen, watch, study, finish, start, look, want, like) should still conjugate correctly with -ed (no catastrophic forgetting).
This phase also previews QLoRA: combining Phase 26's NF4 quantizer (src/miniquant/) with this phase's LoRA gives the standard recipe for fine-tuning 7B+ models on a single 24-GB consumer GPU. For MiniGPT the absolute memory savings are modest — what matters is internalizing the recipe and the ratios.
Module placement¶
The fine-tuning toolkit lives in src/minituner/ (new in this phase): lora.py, qlora.py, train.py, data.py. The QLoRA composer reads from src/miniquant/ (Phase 26). See src/minituner/BLUEPRINT.md for the API contract.
Read order¶
theory/00-motivation.md— why fine-tuning needs reform; the over-parameterization observation.theory/01-sft-and-forgetting.md— supervised fine-tuning and catastrophic forgetting.theory/02-parameter-count.md— derive LoRA's parameter count savings.theory/03-memory-footprint.md— derive LoRA and QLoRA memory savings in detail.theory/04-alignment-survey.md— concept survey: DPO, ORPO, SimPO, RLHF, RLAIF. Reading-only.lab/00-lora-by-hand.md— implementLoRALinearfrom scratch and verify the identity property at init.lab/01-lora-counts.md— derive and measure parameter / memory counts for MiniGPT under full FT vs LoRA vs QLoRA.lab/02-lora-finetune.md— train MiniGPT + LoRA on the irregular-verb subset; measure task accuracy + regular-verb PPL drift; sweep rank.lab/03-qlora-preview.md— combine NF4 base with LoRA; measure memory; verify outputs agree with LoRA-only within quantization error.
solutions/ populated at phase open after src/minituner/ API stabilizes.
Definition of Done¶
See PHASE_28_PLAN.md §6. Briefly:
src/minituner/lora.py+src/minituner/qlora.pyimplemented (Borja).- Fine-tune raises irregular-verb task accuracy ≥ 10 points; regular-verb control PPL drift ≤ 5%.
- Rank ablation plot committed.
- QLoRA preview run completes with measured memory savings.
- Adapter-only checkpoint < 1% of base checkpoint size.
What this phase intentionally does NOT cover¶
- RLHF / DPO / PPO actually implemented. Concept-only survey in theory 04. Implementing alignment training is its own phase; for now we want SFT mechanics solid.
- Adapter methods other than LoRA (prefix tuning, prompt tuning, IA³). Mentioned briefly; LoRA is the workhorse and the only one we implement.
- Multi-adapter management (loading/unloading adapters at serve time). Phase 31 territory.
- Fine-tuning a 7B+ model. Our model is MiniGPT (n_layer=4, n_head=4, d_model=64). The math scales; the experiment doesn't need to.
- Hyperparameter search at scale. One reasonable LR + one reasonable batch size + one decay schedule. Rank is the only swept knob.
- Reward model training. Out of scope.
- Plural pronouns or tenses beyond §A13's 5 (infinitive, present simple, past simple, past participle, simple future). The corpus is what it is.
Phase 28's scope is LoRA mechanics, applied to MiniGPT, with QLoRA as a memory-saving variant, specialized on irregular-verb conjugation. Nothing more.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 LoRA: Low-Rank Adaptation of Large Language Models — Hu et al. · 2021. the adapter you implement this phase.
- 📄 QLoRA: Efficient Finetuning of Quantized LLMs — Dettmers et al. · 2023. LoRA on a 4-bit frozen base — NF4 and double quant.