English · Español

Lab 00 — Bring Prometheus + Grafana up locally¶

Goal: a one-shot bootstrap of the observability stack that subsequent labs scrape.

Estimated time: 45–60 minutes (mostly waiting for image pulls).

Prereq: Docker + docker-compose installed on Fedora 43 (sudo dnf install docker docker-compose). User added to docker group. Phase 33 server runs locally on a known port (default 8000).

What you produce¶

A working infra/compose/observability.yml and verified working stack at:

Prometheus UI: http://localhost:9090
Grafana UI: http://localhost:3000 (admin/admin on first start)
OTel-Collector OTLP/gRPC endpoint: localhost:4317
OTel-Collector OTLP/HTTP endpoint: localhost:4318
Tempo UI (via Grafana data source): backend on localhost:3200

Plus:

infra/compose/prometheus.yml — scrape config for Borja's local Phase 33 server on port 8000.
infra/compose/otel-collector.yaml — pipeline: OTLP receiver → processor (batching) → exporter to Tempo + logging.
infra/grafana/provisioning/datasources/all.yaml — auto-provision Prometheus and Tempo as data sources.
Justfile recipe: just serve-obs brings the stack up; just stop-obs tears it down.

TODOs¶

Block A — write the compose file¶

Services to include:
prometheus (prom/prometheus:v2.54.0 or current LTS)
grafana (grafana/grafana-oss:11.x)
tempo (grafana/tempo:2.x)
otel-collector (otel/opentelemetry-collector-contrib:0.x)
node-exporter (prom/node-exporter:v1.x) — scrapes the host's USE metrics
All services on the same docker network (obs-net).
Prometheus mount ./prometheus.yml:/etc/prometheus/prometheus.yml.
Grafana mount ./grafana/provisioning:/etc/grafana/provisioning.
Tempo with the minimal local-storage config.
node-exporter with --path.rootfs=/host and the host root bind-mounted (read-only).

Do not use docker-compose v1 syntax — use the compose plugin (no version: line at the top of the file).

Block B — write prometheus.yml¶

Scrape jobs:

prometheus itself (localhost:9090).
node-exporter (node-exporter:9100).
miniserve (the Phase 33 server). Address depends on whether the server runs in docker or on the host:
If host: use host.docker.internal:8000 (Linux: add extra_hosts: host.docker.internal:host-gateway to the Prometheus service).
If docker: add miniserve to the same network and scrape miniserve:8000.

Scrape interval: 5 s (default 15 s is fine for production; 5 s gives faster feedback during development).

Block C — write otel-collector.yaml¶

Pipeline:

receivers:  [otlp]      # gRPC :4317, HTTP :4318
processors: [batch]     # buffer to reduce export load
exporters:  [otlp/tempo, debug]

otlp/tempo exporter sends to Tempo's OTLP endpoint on tempo:4317. debug exporter logs spans to stdout (useful for the next lab).

Block D — provision Grafana data sources¶

In infra/grafana/provisioning/datasources/all.yaml:

Prometheus data source: URL http://prometheus:9090.
Tempo data source: URL http://tempo:3200.
Set Prometheus as the default.

Both will appear under Connections → Data sources on first Grafana start.

Block E — Justfile recipes¶

serve-obs:
    docker compose -f infra/compose/observability.yml up -d
    @echo "Prometheus: http://localhost:9090"
    @echo "Grafana:    http://localhost:3000 (admin/admin)"
    @echo "Tempo:      via Grafana"

stop-obs:
    docker compose -f infra/compose/observability.yml down

Block F — smoke test¶

just serve-obs.
Open http://localhost:9090/targets. All four scrape jobs (prometheus, node-exporter, miniserve, tempo) should be UP (or only miniserve DOWN if the Phase 33 server isn't running yet — that's fine for this lab).
Open http://localhost:3000. Log in admin/admin. Force password change to something local (e.g. localdev).
Navigate to Connections → Data sources. Prometheus and Tempo both listed and "OK" on test.
Run a trivial Prom query: up. Should return 3-4 series.
just stop-obs. Verify clean shutdown.

Constraints¶

No production-grade config. No TLS, no auth on Prometheus, no Grafana SMTP. The stack is localhost-only.
No persistent volumes for Prometheus/Tempo data. Each restart wipes. Fine for a learning environment; the lab notes how to add volumes if Borja wants persistence across restarts.
No external dependencies pulled at runtime. All images pinned by digest in the compose file. Re-resolve digests with docker pull if needed; commit the digests.
No Loki yet. Structured logs land in stdout for now; Loki integration is a Phase 38 nice-to-have.

Stop conditions¶

Done when:

infra/compose/observability.yml brings up 4-5 services cleanly.
All four scrape jobs (minus miniserve if not running) are UP.
Grafana logs in, lists both data sources, query up returns data.
just stop-obs cleanly removes everything.

Pitfalls (read before debugging)¶

host.docker.internal on Linux is not automatic. You need extra_hosts: ["host.docker.internal:host-gateway"] on the service that needs to reach the host.
Grafana volume permissions. Grafana's container runs as UID 472. If you bind-mount with the wrong owner, Grafana refuses to start. Either run with the bind-mount approach + correct chown, or use a named volume.
Prometheus refuses to start if prometheus.yml has a syntax error. Errors land in docker compose logs prometheus. Common: tabs vs spaces in the YAML.
Tempo's storage config drift. Tempo 2.x's storage config differs from 1.x — copy from the current official single-binary example, not from old blog posts.
SELinux on Fedora denies bind mounts. Either run with :Z flag on each mount, or setenforce 0 for the dev session (and document it).

When to consult `solutions/`¶

After your compose stack is up. The solution at solutions/00-prom-grafana-up-ref.md (written at phase open) shows a working compose file and the standard provisioning layout.

Next lab: lab/01-instrument-server.md.