English · Español
Lab 02 — Distributed tracing through the request path¶
Goal: a complete OTel span tree for a single LLM request, visible in Tempo via Grafana.
Estimated time: 2-3 hours.
Prereq: Lab 01 metrics flowing; OTel-Collector running on
localhost:4317from Lab 00.
What you produce¶
src/observability/tracing.py— OTel tracer provider setup + span attribute conventions + atrace_requestdecorator/context manager.src/observability/logging.py—structlogconfiguration that injectstrace_id+span_idinto every log line.- Phase 33 server modified to start a root span per request and child spans for tokenize / retrieve / prefill / decode / format.
- A single sample trace JSON exported from Tempo, committed to
experiments/34-traces/sample-trace.json.
The span tree contract¶
Every LLM-generation request emits exactly this hierarchy (extra child spans below llm.decode are allowed if you implement per-step sampling):
http.handler
├── tokenize.input
├── rag.retrieve (optional — only if request triggered RAG)
│ ├── embed.query
│ ├── vector.search
│ └── rerank
├── llm.prefill
├── llm.decode
└── format.output
Span attributes contract (set these consistently, the dashboard depends on them):
| Span | Attributes |
|---|---|
http.handler |
http.method, http.route, http.status_code, llm.model, llm.request.id |
tokenize.input |
llm.tokens.in, tokenizer.name |
rag.retrieve |
rag.docs.retrieved, rag.docs.reranked, rag.index.name |
llm.prefill |
llm.tokens.in, llm.cache.hits, llm.cache.misses |
llm.decode |
llm.tokens.out, llm.decode.steps, llm.decode.mean_step_ms, llm.stop_reason |
format.output |
llm.tokens.out, output.format |
TODOs¶
Block A — tracer provider setup (tracing.py)¶
- Use
opentelemetry-sdk+opentelemetry-exporter-otlp-proto-grpc. - Configure a
TracerProviderwith: - Resource attributes:
service.name=miniserve,service.version=<git-sha>,deployment.environment=dev. - Span processor:
BatchSpanProcessorexporting toOTLPSpanExporter(endpoint="localhost:4317", insecure=True). - Set as the global provider.
- Export
tracer = trace.get_tracer("lynx-cortex.miniserve").
Block B — instrument the server's request handler¶
- Wrap the top-level handler in
with tracer.start_as_current_span("http.handler") as root_span:and set the attributes from the contract. - At each pipeline stage, open a child span:
with tracer.start_as_current_span("tokenize.input") as span: .... - Set per-span attributes from the contract.
- On exception,
span.record_exception(e); span.set_status(StatusCode.ERROR).
Block C — context propagation across asyncio¶
- The hard part. The Phase 33 streaming generator uses
asyncio.create_taskto fan out work. Barecreate_taskdoes NOT propagatecontextvarscorrectly for OTel. Either: - Use
loop.create_taskwithcontext=contextvars.copy_context()(Python 3.11+), OR - Use
opentelemetry.context.attach()/detach()manually inside the task. - Write a small helper
create_task_with_context(coro)that does this. Use it everywhere.
Block D — structured logging with trace context (logging.py)¶
- Configure
structlogwith the processor chain from theory file 03:
import structlog
from opentelemetry import trace
def add_trace_context(logger, method_name, event_dict):
span = trace.get_current_span()
if span and span.get_span_context().is_valid:
ctx = span.get_span_context()
event_dict["trace_id"] = f"{ctx.trace_id:032x}"
event_dict["span_id"] = f"{ctx.span_id:016x}"
return event_dict
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
add_trace_context,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.add_log_level,
structlog.processors.JSONRenderer(),
],
)
- Replace all existing
logging.info(...)andprint(...)calls insrc/miniserve/withstructlog'slog.info("event_name", k=v, ...). - Log redactor: drop any key whose name matches
prompt|completion|messages|text(regex). Default-deny.
Block E — verify in Tempo¶
- Run the server. Send a single
curl /v1/completionsrequest. - In Grafana → Explore → Tempo. Search for traces from
service.name=miniserve. - Open the most recent trace. Verify the span tree matches the contract above.
- Click a span. Verify the attributes are set.
- Export the trace as JSON (Tempo UI button) and commit to
experiments/34-traces/sample-trace.json.
Block F — verify log/trace join¶
- Send another request. Capture its
trace_idfrom the response header (add anX-Trace-Idresponse header in the middleware). -
grep <trace_id> logs/miniserve.log(or stdout — wherever you're writing logs). - Should see 5-10 log lines all carrying the same
trace_id. They tell the request's story.
Constraints¶
- Don't emit per-step decode spans by default. Aggregate via attributes on
llm.decode. Per-step is the optional extension. - Don't log prompt/completion bodies. Lab 03's cost tracker observes token counts, not content.
- No
logging.basicConfig. Usestructlog's configuration exclusively. Mixing the two is the most common observability bug.
Stop conditions¶
Done when:
- A single request produces a 5-7-span trace tree visible in Tempo.
- Each span has the attributes from the contract above.
- Every log line emitted during a request carries the same
trace_idandspan_id. experiments/34-traces/sample-trace.jsoncommitted.- The Phase 37 prompt-injection lab can later filter traces by
service.name="miniserve"ANDllm.tokens.out=0to find anomalies (forward compat check — confirm the schema supports the query).
Pitfalls (read before debugging)¶
- Orphan spans in Tempo. Symptom: child spans show as roots. Cause: context propagation broken across
create_taskor thread executor. Apply the Block C fix. - No spans in Tempo at all. Check
docker compose logs otel-collector— common issue is the SDK failing to connect tolocalhost:4317(firewall, wrong port). TryOTEL_EXPORTER_OTLP_INSECURE=trueenv var. - Span attributes truncated. OTel SDK has a default 128-char limit on string attributes. Increase via
OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT=2048if needed, but ideally don't put large strings in attributes — link to a separate store. - Log lines without
trace_id. The log was emitted outside an active span (e.g., during app startup). That's OK; just confirm the request-lifecycle logs all have it.
When to consult solutions/¶
After all five Stop conditions pass. Solution at solutions/02-tracing-ref.md.
Next lab: lab/03-cost-and-loadtest.md.