Tenant-Partitioned Engine Sharding¶
CONCEPT:KG-2.58 (sharding) · CONCEPT:OS-5.28 (topology visibility)
Stage-2 scaling for the epistemic-graph compute tier: run N independent engine processes ("shards") and let every client route to the right one — no proxy hop, no coordinator, no engine changes.
The partition model in one line¶
- The named graph is the partition unit. Each engine process keeps its own string-keyed named-graph registry; a graph lives wholly on exactly one shard, so single-graph operations never need cross-shard coordination.
- Tenancy enters only by choosing the graph name. When a caller does not
target an explicit graph and the ambient
ActorContext(CONCEPT:OS-5.14) carries a tenant, the default graph is mapped totenant__<tenant>__<base>bytenant_graph_name()(knowledge_graph/core/shard_topology.py, also exported fromagent_utilities.knowledge_graphand asKnowledgeGraph.tenant_graph()). - Shard choice is a pure function of the graph name. The sync client path
(
GraphComputeEngine) delegates to the exact HRW implementation inepistemic_graph.pool.ShardRouter, so sync and async callers can never disagree on placement.
Configuration¶
Flag (on AgentConfig) |
Default | Meaning |
|---|---|---|
GRAPH_SERVICE_ENDPOINTS |
unset | Comma-separated or JSON list of shard endpoints (unix:// / tcp://). Unset or one entry = today's single-engine behaviour (zero-infra preserved); 2+ entries enable sharding. |
KG_DEFAULT_GRAPH |
__bus__ |
The default named graph; the ambient tenant maps onto tenant__<t>__<default> in sharded mode only. |
Routing-key resolution (resolve_routing_graph):
- explicit, non-default graph name → used verbatim;
- ambient
ActorContexttenant →tenant_graph_name(tenant, default); - otherwise → the configured default graph.
Endpoint strings are hashed verbatim — configure every client with the identical list (order does not matter; HRW is order-independent) and with explicit schemes.
Operational semantics¶
- Autostart is local-only.
EPISTEMIC_GRAPH_AUTOSTART=1may spawn an engine only for a local (unix://) endpoint. In sharded mode an unreachable remote (tcp://) shard is a fail-loudConnectionErrornaming the shard, the graph it owns, and the remediation — the same hard-contract convention as the CONCEPT:KG-2.55 task queue. Auto-starting a local stand-in would silently split that shard's graphs into invisible islands. - The flock host role is per-host.
host_lock.pyelects ONE daemon owner per host for the local engine; remote shards are reported by the status surfaces, never managed. - Auth is fleet-wide. All shards and all clients must share ONE
GRAPH_SERVICE_AUTH_SECRET(CONCEPT:OS-5.14). Set it explicitly in multi-host deployments — the auto-generated per-install secret only covers one host.
Rebalancing (out of scope — the honest caveat)¶
HRW keeps key movement minimal when a shard is added or removed (~1/N of graphs change owner), but no data moves automatically. A graph whose HRW winner changed re-creates empty on its new shard until you migrate it manually with the existing snapshot tooling:
- quiesce writers for that graph;
- export from the old shard (
lifecycle.to_msgpack/GraphComputeEngine.to_msgpack(), or copy its--persist-dircheckpoint); - import on the new shard (
from_msgpack) after the endpoint list changed everywhere; - delete the stale copy from the old shard.
Durable tiers (pg-age L3) are unaffected — they are not partitioned by this mechanism.
Topology visibility (CONCEPT:OS-5.28)¶
shard_topology_status()→ shard mode, per-endpoint transport-level reachability probe, locality, and circuit-breaker state. Surfaced on:- the unified daemon status (
unified_daemon_status()["shards"], i.e.GET /daemon/statusandpython -m agent_utilities.gateway.daemon --status); - the gateway dashboard route
GET /daemon/shards; - graph-os
GET /health(cheap config-only summary:shard_mode,shard_count— no probe on the liveness path). - Prometheus (OS-5.23 registry,
agent_utilities/observability/gateway_metrics.py): agent_utilities_engine_shard_up{endpoint}— 1/0, refreshed on every real client connect and by the status probe;agent_utilities_engine_shard_requests_total{endpoint,outcome}— the engine-call outcomes (ok | connection_error | error | short_circuited) split per shard;- the existing
agent_utilities_gateway_engine_breaker_state{endpoint}is already per-endpoint, so each shard gets its own circuit breaker for free. - Each engine process can additionally expose its own native metrics with
--metrics-addr(epistemic_graph_*series, one scrape target per shard).
Worked example — 3 shards on one host¶
See docker/engine-shards.compose.yml
for the runnable compose file (3 engines, distinct ports + persist dirs +
metrics listeners, one shared secret), or by hand:
export GRAPH_SERVICE_AUTH_SECRET="$(openssl rand -hex 32)"
for i in 1 2 3; do
epistemic-graph-server \
--tcp-addr "127.0.0.1:910${i}" \
--persist-dir "/var/lib/epistemic-graph/shard-${i}" \
--metrics-addr "127.0.0.1:911${i}" &
done
# Every agent-utilities client / gateway / ingest worker:
export GRAPH_SERVICE_ENDPOINTS="tcp://127.0.0.1:9101,tcp://127.0.0.1:9102,tcp://127.0.0.1:9103"
Multi-host is the same picture with one engine (or a few, on big hosts) per
machine and hostnames in the endpoint list. Capacity planning for shard
counts lives in docs/scaling/capacity_model.md
(RESIDENTS_PER_L0_SHARD).