English · Español

Lab 02 — Distributed tracing through the request path¶

Goal: a complete OTel span tree for a single LLM request, visible in Tempo via Grafana.

Estimated time: 2-3 hours.

Prereq: Lab 01 metrics flowing; OTel-Collector running on localhost:4317 from Lab 00.

What you produce¶

src/observability/tracing.py — OTel tracer provider setup + span attribute conventions + a trace_request decorator/context manager.
src/observability/logging.py — structlog configuration that injects trace_id + span_id into every log line.
Phase 33 server modified to start a root span per request and child spans for tokenize / retrieve / prefill / decode / format.
A single sample trace JSON exported from Tempo, committed to experiments/34-traces/sample-trace.json.

The span tree contract¶

Every LLM-generation request emits exactly this hierarchy (extra child spans below llm.decode are allowed if you implement per-step sampling):

http.handler
├── tokenize.input
├── rag.retrieve         (optional — only if request triggered RAG)
│   ├── embed.query
│   ├── vector.search
│   └── rerank
├── llm.prefill
├── llm.decode
└── format.output

Span attributes contract (set these consistently, the dashboard depends on them):

Span	Attributes
`http.handler`	`http.method`, `http.route`, `http.status_code`, `llm.model`, `llm.request.id`
`tokenize.input`	`llm.tokens.in`, `tokenizer.name`
`rag.retrieve`	`rag.docs.retrieved`, `rag.docs.reranked`, `rag.index.name`
`llm.prefill`	`llm.tokens.in`, `llm.cache.hits`, `llm.cache.misses`
`llm.decode`	`llm.tokens.out`, `llm.decode.steps`, `llm.decode.mean_step_ms`, `llm.stop_reason`
`format.output`	`llm.tokens.out`, `output.format`

TODOs¶

Block A — tracer provider setup (`tracing.py`)¶

Use opentelemetry-sdk + opentelemetry-exporter-otlp-proto-grpc.
Configure a TracerProvider with:
Resource attributes: service.name=miniserve, service.version=<git-sha>, deployment.environment=dev.
Span processor: BatchSpanProcessor exporting to OTLPSpanExporter(endpoint="localhost:4317", insecure=True).
Set as the global provider.
Export tracer = trace.get_tracer("lynx-cortex.miniserve").

Block B — instrument the server's request handler¶

Wrap the top-level handler in with tracer.start_as_current_span("http.handler") as root_span: and set the attributes from the contract.
At each pipeline stage, open a child span: with tracer.start_as_current_span("tokenize.input") as span: ....
Set per-span attributes from the contract.
On exception, span.record_exception(e); span.set_status(StatusCode.ERROR).

Block C — context propagation across asyncio¶

The hard part. The Phase 33 streaming generator uses asyncio.create_task to fan out work. Bare create_task does NOT propagate contextvars correctly for OTel. Either:
Use loop.create_task with context=contextvars.copy_context() (Python 3.11+), OR
Use opentelemetry.context.attach() / detach() manually inside the task.
Write a small helper create_task_with_context(coro) that does this. Use it everywhere.

Block D — structured logging with trace context (`logging.py`)¶

Configure structlog with the processor chain from theory file 03:

import structlog
from opentelemetry import trace

def add_trace_context(logger, method_name, event_dict):
    span = trace.get_current_span()
    if span and span.get_span_context().is_valid:
        ctx = span.get_span_context()
        event_dict["trace_id"] = f"{ctx.trace_id:032x}"
        event_dict["span_id"] = f"{ctx.span_id:016x}"
    return event_dict

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        add_trace_context,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.add_log_level,
        structlog.processors.JSONRenderer(),
    ],
)

Replace all existing logging.info(...) and print(...) calls in src/miniserve/ with structlog's log.info("event_name", k=v, ...).
Log redactor: drop any key whose name matches prompt|completion|messages|text (regex). Default-deny.

Block E — verify in Tempo¶

Run the server. Send a single curl /v1/completions request.
In Grafana → Explore → Tempo. Search for traces from service.name=miniserve.
Open the most recent trace. Verify the span tree matches the contract above.
Click a span. Verify the attributes are set.
Export the trace as JSON (Tempo UI button) and commit to experiments/34-traces/sample-trace.json.

Block F — verify log/trace join¶

Send another request. Capture its trace_id from the response header (add an X-Trace-Id response header in the middleware).
grep <trace_id> logs/miniserve.log (or stdout — wherever you're writing logs).
Should see 5-10 log lines all carrying the same trace_id. They tell the request's story.

Constraints¶

Don't emit per-step decode spans by default. Aggregate via attributes on llm.decode. Per-step is the optional extension.
Don't log prompt/completion bodies. Lab 03's cost tracker observes token counts, not content.
No logging.basicConfig. Use structlog's configuration exclusively. Mixing the two is the most common observability bug.

Stop conditions¶

Done when:

A single request produces a 5-7-span trace tree visible in Tempo.
Each span has the attributes from the contract above.
Every log line emitted during a request carries the same trace_id and span_id.
experiments/34-traces/sample-trace.json committed.
The Phase 37 prompt-injection lab can later filter traces by service.name="miniserve" AND llm.tokens.out=0 to find anomalies (forward compat check — confirm the schema supports the query).

Pitfalls (read before debugging)¶

Orphan spans in Tempo. Symptom: child spans show as roots. Cause: context propagation broken across create_task or thread executor. Apply the Block C fix.
No spans in Tempo at all. Check docker compose logs otel-collector — common issue is the SDK failing to connect to localhost:4317 (firewall, wrong port). Try OTEL_EXPORTER_OTLP_INSECURE=true env var.
Span attributes truncated. OTel SDK has a default 128-char limit on string attributes. Increase via OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT=2048 if needed, but ideally don't put large strings in attributes — link to a separate store.
Log lines without trace_id. The log was emitted outside an active span (e.g., during app startup). That's OK; just confirm the request-lifecycle logs all have it.

When to consult `solutions/`¶

After all five Stop conditions pass. Solution at solutions/02-tracing-ref.md.

Next lab: lab/03-cost-and-loadtest.md.