Skip to content

English · Español

Phase 38 — MLOps

Requires: 37 — Security & Safety of AI Systems Teaches: mlops · model-registry · ab-testing · canary-deploys · drift-detection Jump to any chapter from the phase reference index.

Chapter map

Pre-written per A12. Topic per A13: English verb grammar tutor. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 De demo a servicio. Registro por SHA (envuelto sobre MLflow), lineage desde corpus DVC hasta deploy, A/B vs shadow vs canary, detección de drift (KL, PSI) sobre la distribución de conjugaciones, FinOps básico (coste por unidad de calidad) y gates de CI que deciden qué entra a producción. Sin esto, las 37 fases anteriores son código de una sola tirada.


Goal

Build the MLOps spine so every grammar-tutor artifact is traceable, comparable, and replaceablewithout introducing a new src/ module. Phase 38 composes existing components (MLflow run tracking from Phase 18, DVC corpus versioning from Phase 12, Mini-GPT + LoRA adapter, src/miniserve/, src/miniobserve/) and adds the operational glue: a registry wrapper, traffic routing, drift detection, FinOps, and a CI deploy gate. By the end Borja can:

  1. List, by SHA, every grammar-tutor checkpoint that ever served traffic, with its conjugation-accuracy score and cost-per-1k-tokens.
  2. Route 10% of live English-sentence traffic through a shadow LoRA variant and produce a side-by-side correction-quality report.
  3. Roll back to a previous registry entry with one just command.
  4. Compute KL/PSI drift on the live verb-token distribution week over week and decide whether retraining is warranted.
  5. Prove that the CI workflow refuses to promote a deliberately-regressed checkpoint.

Phase 38 is engineering hygiene + topic-aware integration, not new mathematics. The math (KL, PSI, z-test, CpQU, canonical SHA) is light and lands in theory/01, theory/03, theory/04. The bulk of the time is in scripts/mlops/, the extensions to src/miniserve/ and src/miniobserve/, and the GitHub Actions workflow.

Read order

  1. theory/00-motivation.md — why this phase exists; what specifically distinguishes a "demo" from a "service" for a grammar tutor.
  2. theory/01-registry-and-lineage.md — content-addressable storage layered on MLflow + DVC; canonical SHA; lineage walks back to corpus DVC hashes.
  3. theory/02-traffic-strategies.md — A/B vs shadow vs canary for the grammar-correction endpoint; the precise question each one answers.
  4. theory/03-drift-detection.md — KL divergence over verb-token histograms; PSI over scalar features. Thresholds and pitfalls.
  5. theory/04-traffic-and-finops.md — two-proportion z-test for A/B significance on conjugation pass-rate; CpQU derivation.
  6. theory/05-capacity-and-scaling.md — CI workflow as the only path to production (deploy gates, baseline comparison, audit trail); vocabulary tour of HPA / MIG / MPS / spot. No experiments for capacity — Borja has no GPU on the laptop.
  7. lab/00-registry-roundtrip.md — register the Phase 18 checkpoint + the Phase 26 INT8 variant + the Phase 28 LoRA grammar tutor; verify SHA stability.
  8. lab/01-shadow-ab.md — wire shadow routing into src/miniserve/; route 10% to the LoRA variant.
  9. lab/02-drift-detection.md — synthetic shift on the Phase 12 verb-corpus; KL + PSI cross thresholds.
  10. lab/03-finops-table.md — compute CpQU for every registry entry; commit docs/COSTS.md.
  11. lab/04-ci-deploy-gate.md — wire .github/workflows/deploy-grammar-tutor.yml; push a deliberately-regressed model; confirm CI blocks promotion.

solutions/ is empty during pre-write — populated at phase open after Borja's MLflow run-tracking conventions and DVC remote are finalized.

Definition of Done

See PHASE_38_PLAN.md §6. Briefly:

  • scripts/mlops/{registry,lineage,drift,cpqu}.py implemented and tested.
  • src/miniserve/traffic.py and src/miniobserve/cost_emitter.py added (no new src/<module>/).
  • .github/workflows/deploy-grammar-tutor.yml enforces the eval-pass-rate gate.
  • All five experiments run end-to-end with manifests.
  • just rollback <sha> works against the Phase 33 serving stack.
  • docs/COSTS.md row per registry entry.
  • Borja can articulate the precise distinction between A/B, shadow, canary, and "soft launch" (the anti-pattern).

What this phase intentionally does NOT cover

  • A new src/<module>/. §A5 BLUEPRINT convention applies only to new modules; Phase 38 extends src/miniserve/ and src/miniobserve/, both blueprinted in Phases 33 and 34.
  • Airflow / Kubeflow / Argo / Prefect. §10 anti-goal. Orchestration is the Justfile + GitHub Actions.
  • MLflow as the sole registry truth. The canonical SHA lives in our scripts/mlops/registry.py; MLflow's registry is the storage backend, not the identity authority.
  • GPU-sharing experiments. MIG/MPS appear in theory/05 for vocabulary only — Borja has no CUDA.
  • Autoscaling experiments. HPA on tokens/sec is described in theory/05; no live experiment — would require a cloud-deployed serving stack, which the capstone Phase 39 does minimally.
  • Real cloud cost API integration. Cost-per-1k-tokens is hand-recorded per run. Parsing AWS/GCP billing is future work.
  • langchain / llama-index / langfuse-orchestrator. Langfuse for traces only, from Phase 34. No agent-orchestration imports.

Phase 38's scope is the minimum MLOps surface that makes the Phase 39 capstone reproducible by a third party and refuses to deploy a regression. Anything beyond that is over-build.

Further reading

Optional — enrichment, not required to pass the phase.