Worked Example: Scraping and Dashboarding the Platform¶
What this demonstrates¶
A working Prometheus scrape configuration for the two metric surfaces — the
gateway's Python-tier /metrics endpoint (CONCEPT:OS-5.23) and the engine
shards' native listeners — plus ready-to-paste Grafana panel queries over the
headline series.
Deep dives: gateway_scaling.md and engine_sharding.md. Full metric reference: metrics.md.
Prerequisites (ladder rung)¶
Any rung of the deployment ladder
that runs the gateway. Python-tier metrics need the optional metrics extra:
Without it the middleware degrades to a no-op and GET /metrics returns a
self-describing placeholder (still HTTP 200). The endpoint is controlled by
GATEWAY_METRICS (default on) and is exempt from the identity middleware
— scrapers cannot mint JWTs (CONCEPT:OS-5.23).
1. Scrape targets¶
Two distinct surfaces:
- Gateway —
GET /metricson the gateway HTTP port, serving everyagent_utilities_*series registered inagent_utilities/observability/gateway_metrics.py. Multi-worker note: metrics live in the per-process default registry; withGATEWAY_WORKERS>1a scrape samples ONE worker — run one worker per container or aggregate across replicas in Prometheus. - Engine shards — each
epistemic-graph-serverprocess exposes its own nativeepistemic_graph_*series on its--metrics-addrlistener. In the shippeddocker/engine-shards.compose.ymlthe RPC ports are 9101–9103 and the metrics listeners are 9111–9113 (one scrape target per shard) — see the sharding walkthrough.
# prometheus.yml
scrape_configs:
- job_name: agent-utilities-gateway
metrics_path: /metrics
static_configs:
- targets: ["gateway.example.com:8000"] # your gateway host:port
- job_name: epistemic-graph-shards
metrics_path: /metrics
static_configs:
- targets:
- "shards.example.com:9111" # engine-shard-1 --metrics-addr
- "shards.example.com:9112" # engine-shard-2
- "shards.example.com:9113" # engine-shard-3
labels:
tier: engine
2. Headline series (compact)¶
The full table, including labels and producers, lives in the metrics reference. The ones most dashboards start with:
| Series | Type | What it tells you |
|---|---|---|
agent_utilities_gateway_requests_total{route,method,status} |
counter | Traffic + error ratio per route template |
agent_utilities_gateway_request_duration_seconds{route} |
histogram | Latency distribution per route |
agent_utilities_gateway_in_flight_requests |
gauge | Concurrency right now |
agent_utilities_gateway_rate_limited_total{tenant} |
counter | 429s from the per-tenant token bucket |
agent_utilities_gateway_engine_requests_total{op,outcome} |
counter | Engine client calls + failure mix |
agent_utilities_gateway_engine_breaker_state{endpoint} |
gauge | Engine circuit breaker (0=closed, 1=half-open, 2=open) |
agent_utilities_engine_shard_up{endpoint} |
gauge | Per-shard reachability (1/0) |
agent_utilities_engine_shard_requests_total{endpoint,outcome} |
counter | Engine calls split per shard |
agent_utilities_kg_ingest_queue_depth{backend} |
gauge | Pending KG ingest tasks (autoscaler signal queue_depth) |
agent_utilities_kg_ingest_consumer_lag{topic,group} |
gauge | Kafka kg-ingest lag (autoscaler signal consumer_lag) |
agent_utilities_dispatch_queue_depth{backend} |
gauge | Unclaimed dispatched agent turns |
agent_utilities_dispatch_turns_total{outcome} |
counter | Processed agent turns by outcome |
agent_utilities_dispatch_workers |
gauge | Live dispatch workers (fresh heartbeats) |
agent_utilities_mcp_child_calls_total{server,outcome} |
counter | Multiplexer tool calls per child |
agent_utilities_mcp_child_breaker_state{server} |
gauge | Per-child breaker (0/1/2) |
agent_utilities_mcp_child_restarts_total{server} |
counter | Child crash-restarts |
agent_utilities_mcp_child_queue_depth{server} |
gauge | Calls queued behind a child's concurrency limit |
Cardinality discipline worth knowing: route is always a route TEMPLATE
(/api/graph/{name}); unmatched scanner traffic collapses into a single
unmatched bucket.
3. Grafana panel queries¶
Request rate by route (req/s):
p95 latency per route — agent_utilities_gateway_request_duration_seconds
is a Prometheus Histogram (verified in
observability/gateway_metrics.py; buckets 5 ms – 60 s), so the standard
quantile recipe applies to its _bucket series:
histogram_quantile(
0.95,
sum by (le, route) (rate(agent_utilities_gateway_request_duration_seconds_bucket[5m]))
)
Error ratio (5xx share of all requests):
sum(rate(agent_utilities_gateway_requests_total{status=~"5.."}[5m]))
/
sum(rate(agent_utilities_gateway_requests_total[5m]))
Ingest backlog and Kafka consumer lag (the same series the OS-5.29 autoscaler target-tracks — see the autoscaling example):
Agent-dispatch queue depth vs live workers (two queries, one panel):
Shard availability and breaker states (alert when a shard is down or any breaker is open, i.e. value 2):
Multiplexer child restarts over the last hour (a climbing line = a crash-looping child MCP server):
4. Alert sketches¶
groups:
- name: agent-utilities
rules:
- alert: EngineShardDown
expr: agent_utilities_engine_shard_up == 0
for: 2m
labels: {severity: critical}
annotations:
summary: "Engine shard {{ $labels.endpoint }} unreachable"
- alert: EngineBreakerOpen
expr: agent_utilities_gateway_engine_breaker_state == 2
for: 1m
labels: {severity: critical}
annotations:
summary: "Engine circuit breaker open for {{ $labels.endpoint }}"
Point the Alertmanager receiver for these at the fleet-events ingress and the platform will triage its own incidents — that loop is the fleet events example.
Verification¶
curl -sS http://localhost:8000/metrics | grep -E '^agent_utilities_' | head
python3 -m pytest tests/unit/test_gateway_metrics.py -q
Smoke-run against this tree (2026-06-11): every agent_utilities_* series
name and label set above was verified against
agent_utilities/observability/gateway_metrics.py, and python3 -m pytest
tests/unit/test_gateway_metrics.py -q passed as part of a 99-test green run.
The prometheus.yml/Grafana/alert snippets target external tooling and were
reviewed against code only (no Prometheus was deployed for this doc).