Deploying epistemic-graph as a database¶
epistemic-graph is a durable, Rust-native graph database engine. Agents in the
agent-packages fleet can either embed it (it ships transitively with
agent-utilities[agent]) or connect to a standalone, centralized database container
shared across many agents. This guide covers the standalone deployment: container recipes,
connection configuration, the configuration surface, and the database architecture.
Embedded vs centralized. The slim MCP-server install (
<pkg>[mcp]) does not include the engine. The full agent install (<pkg>[agent]) embeds it for single-process use. Run the standalone server (below) when you want one knowledge graph shared by multiple agents, durable separately from any agent process, or replicated for high availability.
Deployment tiers (cargo feature flags)¶
The server binary is built for one tier; pick the smallest that fits. Build with
--build-arg EG_FEATURES=... (see docker/Dockerfile).
| Tier | EG_FEATURES |
Includes | Use when |
|---|---|---|---|
| pi | pi,ast-extended |
redb-authoritative store + cypher (no DataFusion/pgwire/raft) | Raspberry Pi / edge, lean single node |
| node | node,ast-extended |
pi + DataFusion SQL + ANN vectors + tsdb | Single-node server |
| cluster (default) | cluster,ast-extended |
node + Raft replication + pgwire (Postgres wire SQL) | HA / multi-node / SQL clients |
All tiers include the redb feature, so the persist dir is the authoritative source of
truth and a committed write survives kill -9 (commit-before-ack).
Single-node (durable, recommended start)¶
Docker Compose¶
export GRAPH_SERVICE_AUTH_SECRET="$(openssl rand -hex 32)"
docker compose -f docker/compose.yml up -d server
The bundled docker/compose.yml builds the image, binds RPC 9100
and metrics 9101 to 127.0.0.1, and persists to the named volume eg-data.
Plain docker run¶
docker volume create eg-data
docker run -d --name epistemic-graph \
-e GRAPH_SERVICE_AUTH_SECRET="$(openssl rand -hex 32)" \
-p 127.0.0.1:9100:9100 -p 127.0.0.1:9101:9101 \
-v eg-data:/var/lib/epistemic-graph/data \
<registry>/epistemic-graph:<tag>
The server refuses to start without
GRAPH_SERVICE_AUTH_SECRET(HMAC-SHA256 auth on the RPC transport). For development only, you may pass--allow-insecure/EPISTEMIC_GRAPH_ALLOW_INSECURE=1to run unauthenticated.
High availability (Raft, cluster tier)¶
The cluster tier replicates the authoritative redb store across nodes via in-engine openraft. Run one container per node with a matching node id and the shared peer list:
SECRET="$(openssl rand -hex 32)"
docker run -d --name eg-node-1 \
-e GRAPH_SERVICE_AUTH_SECRET="$SECRET" \
-e EPISTEMIC_GRAPH_RAFT_NODE_ID=1 \
-e EPISTEMIC_GRAPH_RAFT_PEERS="1@eg-node-1:9200,2@eg-node-2:9200,3@eg-node-3:9200" \
-p 9100:9100 -p 9101:9101 \
-v eg-data-1:/var/lib/epistemic-graph/data \
<registry>/epistemic-graph:<tag>
# repeat for eg-node-2 (NODE_ID=2) and eg-node-3 (NODE_ID=3)
Connecting an agent¶
Local clients use the per-platform UDS; remote clients use TCP (9100). Configure via
environment, then agent-utilities (and any fleet agent) picks it up automatically.
# Remote TCP (the standalone container)
export GRAPH_SERVICE_TCP_ADDR=epistemic-graph:9100
export GRAPH_SERVICE_AUTH_SECRET=<same secret as the server>
# OR local UDS (same host)
export GRAPH_SERVICE_SOCKET=/run/epistemic-graph/epistemic-graph.sock
export GRAPH_SERVICE_AUTH_SECRET=<secret>
from epistemic_graph import EpistemicGraphClient
import asyncio
async def main():
client = await EpistemicGraphClient.connect() # reads GRAPH_SERVICE_* env
await client.nodes.add("agent:planner", {"type": "Agent"})
print(await client.nodes.has("agent:planner")) # True
await client.close()
asyncio.run(main())
Configuration reference¶
| Argument | Env var | Default | Description |
|---|---|---|---|
--socket-path |
GRAPH_SERVICE_SOCKET |
/tmp/epistemic-graph.sock |
UDS socket path (local clients) |
--tcp-addr |
GRAPH_SERVICE_TCP_ADDR |
0.0.0.0:9100 (image) |
TCP RPC listener (remote clients) |
--auth-secret |
GRAPH_SERVICE_AUTH_SECRET |
— (required) | HMAC-SHA256 secret; empty refuses to start |
--allow-insecure |
EPISTEMIC_GRAPH_ALLOW_INSECURE |
off | Dev-only: start unauthenticated |
--persist-dir |
GRAPH_SERVICE_PERSIST_DIR |
/var/lib/epistemic-graph/data (image) |
Durable redb-authoritative store (mount a volume) |
--checkpoint-interval |
— | 300 |
Auto-checkpoint interval (seconds) |
--metrics-addr |
GRAPH_SERVICE_METRICS_ADDR |
0.0.0.0:9101 (image) |
Prometheus /metrics listener |
| — | EPISTEMIC_GRAPH_PERSIST_BACKEND |
redb |
Persist backend (redb authoritative; snapshot legacy) |
| — | EPISTEMIC_GRAPH_RAFT_NODE_ID |
— | Raft node id (cluster tier) |
| — | EPISTEMIC_GRAPH_RAFT_PEERS |
— | id@host:port peer list (cluster tier) |
Ports: 9100 RPC (clients), 9101 Prometheus metrics. Volume: /var/lib/epistemic-graph/data.
Database architecture¶
Engine components¶
The engine is a Cargo workspace: a layered crate stack under one server process that opens the RPC transports and owns the durable store.
flowchart TD
subgraph Client["Client (any agent / agent-utilities)"]
PY[epistemic_graph.EpistemicGraphClient]
end
subgraph Server["epistemic-graph-server (single Rust process)"]
T["Transport — length-prefixed MessagePack over UDS / TCP, HMAC-SHA256"]
ADM["Admission control (try-acquire permits, sheds BUSY)"]
CORE["GraphCore — petgraph StableDiGraph + node/edge property maps"]
subgraph Query["Query surfaces"]
CY[Cypher]
SQL["eg-query — DataFusion SQL"]
PGW["pgwire — Postgres wire SQL"]
ANN["eg-ann — IVF-PQ vector ANN"]
TS[eg-tsdb]
RDF[eg-rdf]
TXT[eg-text]
end
subgraph Durability["Durability"]
WAL[wal.rs — write-ahead log]
WC[write_coalescer.rs]
REDB[(redb_store — authoritative on disk)]
RAFT[raft — openraft replication, cluster tier]
end
end
PY -->|RPC| T --> ADM --> CORE
CORE --> CY & SQL & PGW & ANN & TS & RDF & TXT
CORE --> WAL --> WC --> REDB
REDB <--> RAFT
subgraph Crates["Crate dependency stack"]
direction LR
EGT[eg-types] --> EGC[eg-core] --> EGCO[eg-compute] --> EGTOP[epistemic-graph]
end
Deployment topologies¶
flowchart LR
subgraph Single["Single-node"]
A1[agent / MCP container] -->|TCP 9100| S1[(epistemic-graph<br/>redb @ eg-data)]
GOS[graph-os front-end] -->|UDS / TCP| S1
end
subgraph HA["Cluster (Raft HA)"]
A2[agents] --> N1[(eg-node-1)]
N1 <-->|raft :9200| N2[(eg-node-2)]
N2 <-->|raft :9200| N3[(eg-node-3)]
N1 <-->|raft :9200| N3
end
Write path & data model¶
Writes are durable before the client is acked (commit-before-ack); reads are served from RAM with a redb read-through for evicted nodes.
flowchart LR
W[Client write] --> WAL[Write-ahead log]
WAL --> WC[Write coalescer<br/>group commit]
WC --> REDB[(redb authoritative store)]
REDB -->|commit| ACK[ACK to client]
REDB -.read-through.-> CACHE[In-RAM GraphCore<br/>LRU eviction]
subgraph Model["Stored entities"]
direction LR
N["Node (id, JSON props, labels)"]
E["Edge (src, dst, JSON props)"]
IDX[Label / property indexes]
TH["Thread / Message (chat persistence)"]
end
REDB --- Model
Durability & backup¶
- The persist dir (
/var/lib/epistemic-graph/data) is the authoritative store — back it up by snapshotting the volume. A committed write surviveskill -9. - Auto-checkpoints run every
--checkpoint-intervalseconds and onSIGTERM. - In the cluster tier, openraft replicates the authoritative store across nodes.
Observability¶
With --metrics-addr set (default 0.0.0.0:9101 in the image), the server exposes
Prometheus text-format metrics — request counts/latency, in-flight permits, BUSY rejections,
per-graph node/edge gauges, checkpoint timing, and auth/ACL failures. See
service_mode.md for the full metric list.