Skip to content

English · Español

Lab 02 — Distributed tracing through the request path

Goal: a complete OTel span tree for a single LLM request, visible in Tempo via Grafana.

Estimated time: 2-3 hours.

Prereq: Lab 01 metrics flowing; OTel-Collector running on localhost:4317 from Lab 00.


What you produce

  • src/observability/tracing.py — OTel tracer provider setup + span attribute conventions + a trace_request decorator/context manager.
  • src/observability/logging.pystructlog configuration that injects trace_id + span_id into every log line.
  • Phase 33 server modified to start a root span per request and child spans for tokenize / retrieve / prefill / decode / format.
  • A single sample trace JSON exported from Tempo, committed to experiments/34-traces/sample-trace.json.

The span tree contract

Every LLM-generation request emits exactly this hierarchy (extra child spans below llm.decode are allowed if you implement per-step sampling):

http.handler
├── tokenize.input
├── rag.retrieve         (optional — only if request triggered RAG)
│   ├── embed.query
│   ├── vector.search
│   └── rerank
├── llm.prefill
├── llm.decode
└── format.output

Span attributes contract (set these consistently, the dashboard depends on them):

Span Attributes
http.handler http.method, http.route, http.status_code, llm.model, llm.request.id
tokenize.input llm.tokens.in, tokenizer.name
rag.retrieve rag.docs.retrieved, rag.docs.reranked, rag.index.name
llm.prefill llm.tokens.in, llm.cache.hits, llm.cache.misses
llm.decode llm.tokens.out, llm.decode.steps, llm.decode.mean_step_ms, llm.stop_reason
format.output llm.tokens.out, output.format

TODOs

Block A — tracer provider setup (tracing.py)

  • Use opentelemetry-sdk + opentelemetry-exporter-otlp-proto-grpc.
  • Configure a TracerProvider with:
  • Resource attributes: service.name=miniserve, service.version=<git-sha>, deployment.environment=dev.
  • Span processor: BatchSpanProcessor exporting to OTLPSpanExporter(endpoint="localhost:4317", insecure=True).
  • Set as the global provider.
  • Export tracer = trace.get_tracer("lynx-cortex.miniserve").

Block B — instrument the server's request handler

  • Wrap the top-level handler in with tracer.start_as_current_span("http.handler") as root_span: and set the attributes from the contract.
  • At each pipeline stage, open a child span: with tracer.start_as_current_span("tokenize.input") as span: ....
  • Set per-span attributes from the contract.
  • On exception, span.record_exception(e); span.set_status(StatusCode.ERROR).

Block C — context propagation across asyncio

  • The hard part. The Phase 33 streaming generator uses asyncio.create_task to fan out work. Bare create_task does NOT propagate contextvars correctly for OTel. Either:
  • Use loop.create_task with context=contextvars.copy_context() (Python 3.11+), OR
  • Use opentelemetry.context.attach() / detach() manually inside the task.
  • Write a small helper create_task_with_context(coro) that does this. Use it everywhere.

Block D — structured logging with trace context (logging.py)

  • Configure structlog with the processor chain from theory file 03:
import structlog
from opentelemetry import trace

def add_trace_context(logger, method_name, event_dict):
    span = trace.get_current_span()
    if span and span.get_span_context().is_valid:
        ctx = span.get_span_context()
        event_dict["trace_id"] = f"{ctx.trace_id:032x}"
        event_dict["span_id"] = f"{ctx.span_id:016x}"
    return event_dict

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        add_trace_context,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.add_log_level,
        structlog.processors.JSONRenderer(),
    ],
)
  • Replace all existing logging.info(...) and print(...) calls in src/miniserve/ with structlog's log.info("event_name", k=v, ...).
  • Log redactor: drop any key whose name matches prompt|completion|messages|text (regex). Default-deny.

Block E — verify in Tempo

  • Run the server. Send a single curl /v1/completions request.
  • In Grafana → Explore → Tempo. Search for traces from service.name=miniserve.
  • Open the most recent trace. Verify the span tree matches the contract above.
  • Click a span. Verify the attributes are set.
  • Export the trace as JSON (Tempo UI button) and commit to experiments/34-traces/sample-trace.json.

Block F — verify log/trace join

  • Send another request. Capture its trace_id from the response header (add an X-Trace-Id response header in the middleware).
  • grep <trace_id> logs/miniserve.log (or stdout — wherever you're writing logs).
  • Should see 5-10 log lines all carrying the same trace_id. They tell the request's story.

Constraints

  • Don't emit per-step decode spans by default. Aggregate via attributes on llm.decode. Per-step is the optional extension.
  • Don't log prompt/completion bodies. Lab 03's cost tracker observes token counts, not content.
  • No logging.basicConfig. Use structlog's configuration exclusively. Mixing the two is the most common observability bug.

Stop conditions

Done when:

  1. A single request produces a 5-7-span trace tree visible in Tempo.
  2. Each span has the attributes from the contract above.
  3. Every log line emitted during a request carries the same trace_id and span_id.
  4. experiments/34-traces/sample-trace.json committed.
  5. The Phase 37 prompt-injection lab can later filter traces by service.name="miniserve" AND llm.tokens.out=0 to find anomalies (forward compat check — confirm the schema supports the query).

Pitfalls (read before debugging)

  • Orphan spans in Tempo. Symptom: child spans show as roots. Cause: context propagation broken across create_task or thread executor. Apply the Block C fix.
  • No spans in Tempo at all. Check docker compose logs otel-collector — common issue is the SDK failing to connect to localhost:4317 (firewall, wrong port). Try OTEL_EXPORTER_OTLP_INSECURE=true env var.
  • Span attributes truncated. OTel SDK has a default 128-char limit on string attributes. Increase via OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT=2048 if needed, but ideally don't put large strings in attributes — link to a separate store.
  • Log lines without trace_id. The log was emitted outside an active span (e.g., during app startup). That's OK; just confirm the request-lifecycle logs all have it.

When to consult solutions/

After all five Stop conditions pass. Solution at solutions/02-tracing-ref.md.


Next lab: lab/03-cost-and-loadtest.md.