English · Español
Phase 33 — Quizzes¶
🇪🇸 Mirror legible para humanos del banco de preguntas. La fuente canónica YAML está en
data/quizzes/phase-33-inference-serving.yaml; el portal de la Fase 41 carga ese archivo.
This page mirrors data/quizzes/phase-33-inference-serving.yaml for human reading. The portal seeder is the source of truth.
q-33-01 — Which term dominates the §A13 grammar tutor's latency on i5-8250U?¶
For a typical single-client §A13 grammar-tutor request on Borja's i5-8250U (NumPy backend, K≈20 decode tokens), which component of the latency budget dominates p50?
- JSON parsing and Pydantic validation
- BPE tokenization of the input sentence
- The auto-regressive decode loop (K · t_decode_step) ← correct
- TLS handshake on the inbound connection
Why: Theory 05's budget puts ~130 ms of the ~150 ms p50 in the decode loop.
q-33-02 — Why does throughput collapse without batching at C=8?¶
Select every reason why the single-request handler that calls model.forward() directly fails to sustain 8 concurrent clients on a 4C/8T CPU.
- Each request gets its own matmul; BLAS overhead is paid per-request, not amortized. ← correct
- Requests serialize on the GIL-bound NumPy thread; concurrency doesn't increase parallelism. ← correct
- FastAPI's event loop becomes the bottleneck, not the model.
- TCP backlog overflows before the model is even reached.
q-33-03 — What does the KV-cache buy at the §A13 grammar-tutor scale?¶
Free response. Acceptable answers contain decode.
The cache avoids re-computing attention over the prefix on every decode step, dropping t_decode_step from ~18 ms to ~6.5 ms — about half the total.
q-33-04 — Which health-check endpoint should the load balancer poll?¶
- /healthz (liveness)
- /readyz (readiness) ← correct
- /metrics
- /correct
Why: /readyz signals "ready to take traffic" and returns 503 under backpressure so the LB shifts traffic to other replicas. /healthz is for orchestrator-level restart decisions.
See theory/05-latency-budget-i5-8250u.md and the break/ exercises for the practical grounding.