English · Español
Phase 38 — MLOps¶
Requires: 37 — Security & Safety of AI Systems Teaches:
mlops·model-registry·ab-testing·canary-deploys·drift-detectionJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per A12. Topic per A13: English verb grammar tutor. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.
🇪🇸 De demo a servicio. Registro por SHA (envuelto sobre MLflow), lineage desde corpus DVC hasta deploy, A/B vs shadow vs canary, detección de drift (KL, PSI) sobre la distribución de conjugaciones, FinOps básico (coste por unidad de calidad) y gates de CI que deciden qué entra a producción. Sin esto, las 37 fases anteriores son código de una sola tirada.
Goal¶
Build the MLOps spine so every grammar-tutor artifact is traceable, comparable, and replaceable — without introducing a new src/ module. Phase 38 composes existing components (MLflow run tracking from Phase 18, DVC corpus versioning from Phase 12, Mini-GPT + LoRA adapter, src/miniserve/, src/miniobserve/) and adds the operational glue: a registry wrapper, traffic routing, drift detection, FinOps, and a CI deploy gate. By the end Borja can:
- List, by SHA, every grammar-tutor checkpoint that ever served traffic, with its conjugation-accuracy score and cost-per-1k-tokens.
- Route 10% of live English-sentence traffic through a shadow LoRA variant and produce a side-by-side correction-quality report.
- Roll back to a previous registry entry with one
justcommand. - Compute KL/PSI drift on the live verb-token distribution week over week and decide whether retraining is warranted.
- Prove that the CI workflow refuses to promote a deliberately-regressed checkpoint.
Phase 38 is engineering hygiene + topic-aware integration, not new mathematics. The math (KL, PSI, z-test, CpQU, canonical SHA) is light and lands in theory/01, theory/03, theory/04. The bulk of the time is in scripts/mlops/, the extensions to src/miniserve/ and src/miniobserve/, and the GitHub Actions workflow.
Read order¶
theory/00-motivation.md— why this phase exists; what specifically distinguishes a "demo" from a "service" for a grammar tutor.theory/01-registry-and-lineage.md— content-addressable storage layered on MLflow + DVC; canonical SHA; lineage walks back to corpus DVC hashes.theory/02-traffic-strategies.md— A/B vs shadow vs canary for the grammar-correction endpoint; the precise question each one answers.theory/03-drift-detection.md— KL divergence over verb-token histograms; PSI over scalar features. Thresholds and pitfalls.theory/04-traffic-and-finops.md— two-proportion z-test for A/B significance on conjugation pass-rate; CpQU derivation.theory/05-capacity-and-scaling.md— CI workflow as the only path to production (deploy gates, baseline comparison, audit trail); vocabulary tour of HPA / MIG / MPS / spot. No experiments for capacity — Borja has no GPU on the laptop.lab/00-registry-roundtrip.md— register the Phase 18 checkpoint + the Phase 26 INT8 variant + the Phase 28 LoRA grammar tutor; verify SHA stability.lab/01-shadow-ab.md— wire shadow routing intosrc/miniserve/; route 10% to the LoRA variant.lab/02-drift-detection.md— synthetic shift on the Phase 12 verb-corpus; KL + PSI cross thresholds.lab/03-finops-table.md— compute CpQU for every registry entry; commitdocs/COSTS.md.lab/04-ci-deploy-gate.md— wire.github/workflows/deploy-grammar-tutor.yml; push a deliberately-regressed model; confirm CI blocks promotion.
solutions/ is empty during pre-write — populated at phase open after Borja's MLflow run-tracking conventions and DVC remote are finalized.
Definition of Done¶
See PHASE_38_PLAN.md §6. Briefly:
scripts/mlops/{registry,lineage,drift,cpqu}.pyimplemented and tested.src/miniserve/traffic.pyandsrc/miniobserve/cost_emitter.pyadded (no newsrc/<module>/)..github/workflows/deploy-grammar-tutor.ymlenforces the eval-pass-rate gate.- All five experiments run end-to-end with manifests.
just rollback <sha>works against the Phase 33 serving stack.docs/COSTS.mdrow per registry entry.- Borja can articulate the precise distinction between A/B, shadow, canary, and "soft launch" (the anti-pattern).
What this phase intentionally does NOT cover¶
- A new
src/<module>/. §A5 BLUEPRINT convention applies only to new modules; Phase 38 extendssrc/miniserve/andsrc/miniobserve/, both blueprinted in Phases 33 and 34. - Airflow / Kubeflow / Argo / Prefect. §10 anti-goal. Orchestration is the
Justfile+ GitHub Actions. - MLflow as the sole registry truth. The canonical SHA lives in our
scripts/mlops/registry.py; MLflow's registry is the storage backend, not the identity authority. - GPU-sharing experiments. MIG/MPS appear in
theory/05for vocabulary only — Borja has no CUDA. - Autoscaling experiments. HPA on tokens/sec is described in
theory/05; no live experiment — would require a cloud-deployed serving stack, which the capstone Phase 39 does minimally. - Real cloud cost API integration. Cost-per-1k-tokens is hand-recorded per run. Parsing AWS/GCP billing is future work.
langchain/llama-index/langfuse-orchestrator. Langfuse for traces only, from Phase 34. No agent-orchestration imports.
Phase 38's scope is the minimum MLOps surface that makes the Phase 39 capstone reproducible by a third party and refuses to deploy a regression. Anything beyond that is over-build.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 Hidden Technical Debt in Machine Learning Systems — Sculley et al. · 2015. where ML systems rot, and why.
- 📕 Designing Machine Learning Systems — Chip Huyen · 2022. registries, drift, and deployment in practice.