KG-Native Orchestration Architecture¶
CONCEPT:ORCH-1.1 through ORCH-1.19 — Dynamic, Knowledge-Graph-Driven Agent Orchestration
Overview¶
KG-Native Orchestration transforms agent-utilities from a KG-aware system (where the Knowledge Graph is optionally consulted) into a KG-driven system where the KG is the primary control surface for all orchestration decisions.
Before: Query → Static Graph Topology → (optionally consult KG) → Execute
After: Query → KG resolves topology → Dynamic Graph Materialization → Execute → KG learns
Architecture¶
flowchart TD
Q[ORCH-1.0: User Query] --> TC[ORCH-1.0: KGTeamComposer]
TC -->|"1. Search proven teams"| KG[(KG-2.0: Knowledge Graph)]
TC -->|"2. Select topology"| KG
TC -->|"3. Populate specialists"| KG
TC --> TEAM[AHE-3.3: TeamComposition]
TEAM --> TENG[ORCH-1.4: TopologyEngine]
TENG -->|"Materialize"| PLAN[ORCH-1.21: Execution Plan]
PLAN --> SEQ[ORCH-1.21: Sequential Steps]
PLAN --> PAR[ORCH-1.21: Parallel Groups]
PLAN --> MIX[ORCH-1.21: Mixed DAG]
SEQ & PAR & MIX --> EXEC[ORCH-1.2: Execute Specialists]
EXEC --> CP[ORCH-1.3: StateCheckpointer]
CP -->|"Checkpoint"| KG
EXEC -->|"Success?"| TC2[ORCH-1.2: Promote to TeamConfig]
TC2 --> KG
Core Components¶
1. KG-Driven Team Composer (graph/team_composer.py)¶
CONCEPT:ORCH-1.1 — Replaces static discover_agents() registration with dynamic KG-topology-driven team assembly.
Composition Flow:
1. Reuse: Search KG for proven TeamConfigNode matching the query (AHE-3.3)
2. Select: Choose best TopologyTemplateNode by domain + complexity
3. Populate: Walk KG edges (PROVIDES, HAS_CAPABILITY) to assign tools
4. Promote: On success, save the team as a new TeamConfigNode
Default Topologies:
| Template | Mode | Complexity | Specialists |
|---|---|---|---|
| Single Agent | Sequential | 1 | 1 executor |
| Simple Q&A | Sequential | 1-2 | router → expert → verifier |
| Multi-Source Research | Mixed | 3-4 | router → planner → [researchers] → synthesizer → verifier |
| Expert Team | Mixed | 4-5 | router → planner → architect → [implementer, reviewer] → synthesizer |
| Finance Pipeline | Sequential | 3-5 | router → alpha → risk → execution → attribution |
2. Dynamic Topology Engine (graph/topology_engine.py)¶
CONCEPT:ORCH-1.2 — Materializes KG-stored topology templates into executable graphs.
Supported Execution Modes:
- Sequential:
A → B → C— Simple pipeline - Parallel:
[A, B, C]— All execute concurrently - Fan-out:
A → [B₁, B₂, ..., Bₙ]— Scatter - Fan-in:
[B₁, B₂, ..., Bₙ] → C— Gather - Mixed:
A → [B, C] → D → [E, F] → G— Arbitrary DAG
Each materialized specialist gets:
- System Prompt: Role-specific or KG-loaded via PromptNode
- MCP Tools: Only the tools needed for that role
- Model: Per-specialist model selection
- Memory Channels: Shared KG channels for P2P communication
3. Execution State Checkpointing (core/checkpoint/manager.py)¶
CONCEPT:ORCH-1.1 — Bridges ephemeral GraphState with persistent KG. The
former graph/state_checkpoint.StateCheckpointer was consolidated into the
core/checkpoint/ package (KGBackend + CheckpointManager).
from agent_utilities.core.checkpoint.manager import KGBackend
backend = KGBackend(engine)
checkpoint_id = backend.checkpoint(state, session_id="sess:abc")
restored = backend.restore("sess:abc")
Capabilities: - Checkpoint at HSM transition boundaries - Session resume after crashes - Cross-session learning - Multi-agent coordination (other agents can query active state)
4. Topological Routing Policy (graph/routing/strategies/policy.py)¶
CONCEPT:ORCH-1.4 — Routes using KG-derived topological signals instead of keyword TF-IDF.
Scoring Dimensions:
1. PageRank centrality — Highly-connected specialists preferred
2. Historical success rate — Weighted by outcome evaluations
3. Tool affinity — Specialists with relevant PROVIDES edges score higher
Falls back to RuleBasedPolicy when no KG is available (cold start).
5. Persistent Background Agents (graph/persistent_agents.py)¶
CONCEPT:ORCH-1.4 — Long-running KG-coordinated agents.
mgr = PersistentAgentManager(engine)
mgr.register_agent("bg:monitor", "System Monitor",
subscriptions=["system.alert"],
schedule_cron="*/5 * * * *")
Lifecycle: registered → idle → running → idle → ... → terminated
Agent Types: - Monitor: Watches KG for conditions - Scheduler: Runs periodic tasks - Rebalancer: Continuously adjusts configurations - Background: General-purpose
6. Shareable Team Compositions¶
CONCEPT:ORCH-1.1 Extension — Export/import proven team configurations.
# Export
bundle = engine.export_team_config("tc:proven-team")
# Import on another deployment
new_id = engine.import_team_config(bundle)
Pydantic Models¶
| Model | Type | Purpose |
|---|---|---|
TopologyTemplateNode |
RegistryNode |
KG-stored execution topology template |
SessionCheckpointNode |
RegistryNode |
Persisted execution state |
PersistentAgentNode |
RegistryNode |
Long-running background agent |
TeamComposition |
BaseModel |
Result of team composition (not persisted) |
KG Node/Edge Types¶
New Node Types:
- TOPOLOGY_TEMPLATE — Execution topology templates
- SESSION_CHECKPOINT — Execution state checkpoints
- PERSISTENT_AGENT — Background agent registrations
- TOPOLOGY_TRANSITION — Transition records
New Edge Types:
- TRANSITIONS_TO — Topology transitions between roles
- CHECKPOINTED_STATE — Links sessions to checkpoints
- SUBSCRIBED_TO — Agent event subscriptions
- MATERIALIZED_FROM — Links executions to templates
- COMPOSED_TEAM — Links compositions to team configs
Invoker to Spawned-Agent Handoff and Native Channels¶
CONCEPT:ORCH-1.37, ORCH-1.39, ORCH-1.40 — when one agent spawns another via
graph_orchestrate(action="execute_agent"), three additive capabilities let the invoker shape, observe, and converse with the spawned run. All are backward-compatible: omit the new inputs and behaviour is unchanged.
ORCH-1.37 — Execution-flow diagram surfacing¶
The ORCH-1.8 WorkflowVisualizer already generates a Mermaid diagram of the routed graph; ORCH-1.37
surfaces it in the graph_orchestrate responses instead of only logging it. swarm,
compile_workflow, and execute_workflow add an additive mermaid JSON key (null when
unavailable); execute_agent returns a JSON object {"output", "mermaid"} when a diagram was
produced (otherwise the bare output string, preserving the old contract).
ORCH-1.39 — Curated context, budget, tool-scope & credential handoff¶
The invoking agent can hand the spawned agent a curated working set so it starts informed and bounded, without leaking secrets:
Input (execute_agent) |
Effect on the spawned run | Mechanism |
|---|---|---|
context |
Injected as an ### INVOKER CONTEXT system-prompt block |
Budgeted to the target model window (invoker_context_section) |
context_ref |
Same, but the content is fetched from a persisted ContextBlob by id |
Cross-process handoff; the run's RunTrace links the consumed blob for provenance |
budget_tokens |
Hard UsageLimits.total_tokens_limit on the spawned run |
spawn_usage_limits |
allowed_tools |
Least-privilege allow-list; tools/toolsets are intersected with it | apply_tool_scope |
cred_ref |
A reference (secret key) resolved to the raw token on the transient AgentDeps.auth_token at spawn |
_resolve_invoker_cred — the raw secret is never written to a graph node, GraphState, or logs |
Security invariant: only a reference to a credential ever travels through the graph or the context. The raw token is resolved from the secrets backend onto the ephemeral
AgentDepsat spawn time and is never persisted or logged.
context/context_ref are stored and fetched with the graph_context MCP tool (put/get/list).
ORCH-1.40 — Session-anchored collections & native message channels¶
The epistemic-graph engine is a pure id-addressed store with no property/label index — it is reliable at id-lookup and traversal-from-a-known-id, but unreliable at property scans. ORCH-1.40 builds on that strength rather than fighting it:
-
Session anchor. Each session has an id-addressable
Sessionnode (session:{sid}). Its collections hang off single-hop edges —HAS_CONTEXT → ContextBlob,HAS_MESSAGE → AgentMessage,HAS_RUN → RunTrace. "List by session" is then a reliable anchored traversal (MATCH (s {id:$snode})-[:HAS_CONTEXT]->(c:ContextBlob) RETURN c), not a property scan. This also hardened a latent bug: an unparsedWHEREno longer silently returns the whole graph (opt-inKG_ALLOW_FULL_SCAN). -
Native channels. The invoker and the spawned agent exchange ordered, cross-process messages over the engine's native Communication Channels (KG-2.0, ~sub-ms/op), via the
graph_messageMCP tool and themessaging/agent_channel.pyhelper. The channel id is deterministic —orch:{session_id}:{run_id}.graph_orchestrate(execute_agent, open_channel=True)opens it and returns thechannel_id; the spawned agent receives it onAgentDeps.message_channel_id.
Channels are the Group type (members may join after creation, unlike PeerToPeer which locks
membership), and send auto-joins the sender so any sender label works.
-
Durable backstop. Live channel messages are in-RAM.
send(durable=True)additionally dual-writes each message as aSession -[:HAS_MESSAGE]-> AgentMessagenode, so the dialogue is replayable viagraph_message(action="history")and survives an engine restart. -
Elicitation bridge. A spawned agent can ask its invoker (→ user) a question with
send_elicitation; the invoker forwards it to its in-processelicitation_queue/ApprovalManagerwithdrain_to_elicitation_queue— a clean cross-process → in-process bridge with no UI change.
sequenceDiagram
participant I as Invoker
participant E as epistemic-graph (channels + Session anchor)
participant S as Spawned agent
I->>E: graph_orchestrate(execute_agent, context_ref, cred_ref, open_channel=True)
Note over E,S: spawn with budgeted context, scoped tools,<br/>resolved auth_token, channel_id on AgentDeps
I->>E: graph_message(send, "proceed", durable=True)
S->>E: graph_message(receive) → ["proceed"]
S->>E: send_elicitation("May I write to /etc?")
I->>E: graph_message(receive) → forwarded to elicitation_queue
S-->>I: {"output", "mermaid", "channel_id"}
Note over E: durable messages replayable via graph_message(history)
See docs/examples/graph-os-mcp-examples.md for graph_context
and graph_message tool call examples.
Integration with Existing Systems¶
- SubagentPatternRouter (ORCH-1.5): Now uses KG backend for O(1) historical lookups instead of O(N) NX scans; persists decisions via tiered architecture
- CognitiveScheduler (OS-5.2): Unified scheduler for both ephemeral and persistent agents
- EventStreamIngester (Company Brain): Routes events to persistent agent subscribers
- TeamConfigNode (AHE-3.3): Extended with export/import for cross-deployment sharing