English · Español

Lab 01 — Sync vs async: the blocking-handler pitfall¶

🇪🇸 Carga el servicio de lab 00 con 50 clientes concurrentes. Mide latencias. Cambia def por async def sin offloading. Mide de nuevo: catástrofe. Aplica to_thread. Re-mide: arreglado.

Objective¶

Demonstrate the async def + blocking-call pitfall described in theory/01-async-and-the-event-loop.md. Load-test three handler variants and produce a side-by-side latency CDF.

Setup¶

Lab 00's src/miniserve/app.py.
uv add httpx (or aiohttp) for the load generator.
A list of 100 verb-correction prompts from the §A13 corpus.

Tasks¶

Write the load generator at scripts/loadtest.py:

import asyncio, time, httpx

async def one_request(client, payload, results):
    t0 = time.perf_counter()
    r = await client.post("/correct", json=payload)
    results.append((time.perf_counter() - t0, r.status_code))

async def run_load(concurrency: int, total: int, payloads: list[dict]):
    results = []
    async with httpx.AsyncClient(base_url="http://127.0.0.1:8000", timeout=30) as c:
        sem = asyncio.Semaphore(concurrency)
        async def bounded(p):
            async with sem:
                await one_request(c, p, results)
        await asyncio.gather(*(bounded(payloads[i % len(payloads)]) for i in range(total)))
    return results

Variant A — sync handler (lab 00 baseline). Keep def correct(req): .... Start server with uv run uvicorn miniserve.app:app --workers 1. Run loadtest with concurrency=50, total=200.
Variant B — async handler with blocking call. Change to:

@app.post("/correct")
async def correct(req: CorrectRequest) -> CorrectResponse:
    result = agent.correct(req.sentence, learner_id=req.learner_id)  # blocking!
    return ...

Restart server. Re-run loadtest. Expect catastrophe: p95 should be ~5-10× worse.

Variant C — async handler with to_thread. Change to:

import anyio

@app.post("/correct")
async def correct(req: CorrectRequest) -> CorrectResponse:
    result = await anyio.to_thread.run_sync(
        agent.correct, req.sentence, req.learner_id
    )
    return ...

Restart server. Re-run loadtest. Expect recovery — similar to variant A.

Plot a latency CDF with all three variants on the same axes (scripts/plot_cdf.py). x-axis: latency (ms), log scale; y-axis: cumulative fraction.

Annotate p50, p95, p99 on each curve.

Write a short note (5-10 lines, in lab notes) explaining:
Why variant B is so much worse than A and C.
Why A and C are roughly equivalent.
When you'd prefer C over A (hint: when the handler also does await on async I/O for other reasons — e.g., logging to a remote sink, calling a database).

Measurements¶

Save to experiments/<date>-phase-33-lab-01/:

latencies_sync.json, latencies_async_blocking.json, latencies_async_tothread.json — arrays of (latency, status_code).
latency_cdf.png — the side-by-side CDF.
summary.md — your written observations.
manifest.json — seeds, versions, concurrency, total requests.

Acceptance¶

All three variants achieve ≥ 99% HTTP 200 under the load test (no timeouts).
Variant B's p95 is at least 3× worse than variants A and C.
Variants A and C have p95 within 20% of each other.
The CDF plot clearly shows three distinct curves.

Pitfalls¶

Forgetting --workers 1. With multiple uvicorn workers, the blocking-handler issue is partially masked (each worker has its own event loop). We're studying the per-process behavior; pin workers to 1.
Cold start contaminating the measurement. Send 10 warmup requests before recording. The first request is always slow (lazy imports, JIT, cache misses).
Network overhead. Run the loadtest on 127.0.0.1 to avoid network jitter. We're measuring server behavior, not TCP.
httpx default timeout = 5s — too short. Set to 30s, otherwise variant B will report timeout errors that look like server errors.
Not enough samples. With total=200 and concurrency=50, each batch is ~4 deep. For p99 stability, push total to 500+.

Stretch¶

Repeat the experiment with --workers 4. How does the picture change?
Add a time.sleep(0.05) inside the handler (to simulate an additional I/O delay) and re-run. The sync threadpool variant should degrade more than async + to_thread, because the threadpool size is bounded.

Next: 02-static-batching.md