Skip to content

English · Español

Lab 01 — Sync vs async: the blocking-handler pitfall

🇪🇸 Carga el servicio de lab 00 con 50 clientes concurrentes. Mide latencias. Cambia def por async def sin offloading. Mide de nuevo: catástrofe. Aplica to_thread. Re-mide: arreglado.

Objective

Demonstrate the async def + blocking-call pitfall described in theory/01-async-and-the-event-loop.md. Load-test three handler variants and produce a side-by-side latency CDF.

Setup

  • Lab 00's src/miniserve/app.py.
  • uv add httpx (or aiohttp) for the load generator.
  • A list of 100 verb-correction prompts from the §A13 corpus.

Tasks

  1. Write the load generator at scripts/loadtest.py:
import asyncio, time, httpx

async def one_request(client, payload, results):
    t0 = time.perf_counter()
    r = await client.post("/correct", json=payload)
    results.append((time.perf_counter() - t0, r.status_code))

async def run_load(concurrency: int, total: int, payloads: list[dict]):
    results = []
    async with httpx.AsyncClient(base_url="http://127.0.0.1:8000", timeout=30) as c:
        sem = asyncio.Semaphore(concurrency)
        async def bounded(p):
            async with sem:
                await one_request(c, p, results)
        await asyncio.gather(*(bounded(payloads[i % len(payloads)]) for i in range(total)))
    return results
  1. Variant A — sync handler (lab 00 baseline). Keep def correct(req): .... Start server with uv run uvicorn miniserve.app:app --workers 1. Run loadtest with concurrency=50, total=200.

  2. Variant B — async handler with blocking call. Change to:

@app.post("/correct")
async def correct(req: CorrectRequest) -> CorrectResponse:
    result = agent.correct(req.sentence, learner_id=req.learner_id)  # blocking!
    return ...

Restart server. Re-run loadtest. Expect catastrophe: p95 should be ~5-10× worse.

  1. Variant C — async handler with to_thread. Change to:
import anyio

@app.post("/correct")
async def correct(req: CorrectRequest) -> CorrectResponse:
    result = await anyio.to_thread.run_sync(
        agent.correct, req.sentence, req.learner_id
    )
    return ...

Restart server. Re-run loadtest. Expect recovery — similar to variant A.

  1. Plot a latency CDF with all three variants on the same axes (scripts/plot_cdf.py). x-axis: latency (ms), log scale; y-axis: cumulative fraction.

Annotate p50, p95, p99 on each curve.

  1. Write a short note (5-10 lines, in lab notes) explaining:
  2. Why variant B is so much worse than A and C.
  3. Why A and C are roughly equivalent.
  4. When you'd prefer C over A (hint: when the handler also does await on async I/O for other reasons — e.g., logging to a remote sink, calling a database).

Measurements

Save to experiments/<date>-phase-33-lab-01/:

  • latencies_sync.json, latencies_async_blocking.json, latencies_async_tothread.json — arrays of (latency, status_code).
  • latency_cdf.png — the side-by-side CDF.
  • summary.md — your written observations.
  • manifest.json — seeds, versions, concurrency, total requests.

Acceptance

  • All three variants achieve ≥ 99% HTTP 200 under the load test (no timeouts).
  • Variant B's p95 is at least 3× worse than variants A and C.
  • Variants A and C have p95 within 20% of each other.
  • The CDF plot clearly shows three distinct curves.

Pitfalls

  • Forgetting --workers 1. With multiple uvicorn workers, the blocking-handler issue is partially masked (each worker has its own event loop). We're studying the per-process behavior; pin workers to 1.
  • Cold start contaminating the measurement. Send 10 warmup requests before recording. The first request is always slow (lazy imports, JIT, cache misses).
  • Network overhead. Run the loadtest on 127.0.0.1 to avoid network jitter. We're measuring server behavior, not TCP.
  • httpx default timeout = 5s — too short. Set to 30s, otherwise variant B will report timeout errors that look like server errors.
  • Not enough samples. With total=200 and concurrency=50, each batch is ~4 deep. For p99 stability, push total to 500+.

Stretch

  • Repeat the experiment with --workers 4. How does the picture change?
  • Add a time.sleep(0.05) inside the handler (to simulate an additional I/O delay) and re-run. The sync threadpool variant should degrade more than async + to_thread, because the threadpool size is bounded.

Next: 02-static-batching.md