Skip to content

KG-Native Orchestration Architecture

CONCEPT:ORCH-1.1 through ORCH-1.19 — Dynamic, Knowledge-Graph-Driven Agent Orchestration

Overview

KG-Native Orchestration transforms agent-utilities from a KG-aware system (where the Knowledge Graph is optionally consulted) into a KG-driven system where the KG is the primary control surface for all orchestration decisions.

Before:  Query → Static Graph Topology → (optionally consult KG) → Execute
After:   Query → KG resolves topology → Dynamic Graph Materialization → Execute → KG learns

Architecture

flowchart TD
    Q[ORCH-1.0: User Query] --> TC[ORCH-1.0: KGTeamComposer]
    TC -->|"1. Search proven teams"| KG[(KG-2.0: Knowledge Graph)]
    TC -->|"2. Select topology"| KG
    TC -->|"3. Populate specialists"| KG
    TC --> TEAM[AHE-3.3: TeamComposition]
    TEAM --> TENG[ORCH-1.4: TopologyEngine]
    TENG -->|"Materialize"| PLAN[ORCH-1.21: Execution Plan]
    PLAN --> SEQ[ORCH-1.21: Sequential Steps]
    PLAN --> PAR[ORCH-1.21: Parallel Groups]
    PLAN --> MIX[ORCH-1.21: Mixed DAG]
    SEQ & PAR & MIX --> EXEC[ORCH-1.2: Execute Specialists]
    EXEC --> CP[ORCH-1.3: StateCheckpointer]
    CP -->|"Checkpoint"| KG
    EXEC -->|"Success?"| TC2[ORCH-1.2: Promote to TeamConfig]
    TC2 --> KG

Core Components

1. KG-Driven Team Composer (graph/team_composer.py)

CONCEPT:ORCH-1.1 — Replaces static discover_agents() registration with dynamic KG-topology-driven team assembly.

Composition Flow: 1. Reuse: Search KG for proven TeamConfigNode matching the query (AHE-3.3) 2. Select: Choose best TopologyTemplateNode by domain + complexity 3. Populate: Walk KG edges (PROVIDES, HAS_CAPABILITY) to assign tools 4. Promote: On success, save the team as a new TeamConfigNode

Default Topologies:

Template Mode Complexity Specialists
Single Agent Sequential 1 1 executor
Simple Q&A Sequential 1-2 router → expert → verifier
Multi-Source Research Mixed 3-4 router → planner → [researchers] → synthesizer → verifier
Expert Team Mixed 4-5 router → planner → architect → [implementer, reviewer] → synthesizer
Finance Pipeline Sequential 3-5 router → alpha → risk → execution → attribution

2. Dynamic Topology Engine (graph/topology_engine.py)

CONCEPT:ORCH-1.2 — Materializes KG-stored topology templates into executable graphs.

Supported Execution Modes:

  • Sequential: A → B → C — Simple pipeline
  • Parallel: [A, B, C] — All execute concurrently
  • Fan-out: A → [B₁, B₂, ..., Bₙ] — Scatter
  • Fan-in: [B₁, B₂, ..., Bₙ] → C — Gather
  • Mixed: A → [B, C] → D → [E, F] → G — Arbitrary DAG

Each materialized specialist gets: - System Prompt: Role-specific or KG-loaded via PromptNode - MCP Tools: Only the tools needed for that role - Model: Per-specialist model selection - Memory Channels: Shared KG channels for P2P communication

3. Execution State Checkpointing (core/checkpoint/manager.py)

CONCEPT:ORCH-1.1 — Bridges ephemeral GraphState with persistent KG. The former graph/state_checkpoint.StateCheckpointer was consolidated into the core/checkpoint/ package (KGBackend + CheckpointManager).

from agent_utilities.core.checkpoint.manager import KGBackend

backend = KGBackend(engine)
checkpoint_id = backend.checkpoint(state, session_id="sess:abc")
restored = backend.restore("sess:abc")

Capabilities: - Checkpoint at HSM transition boundaries - Session resume after crashes - Cross-session learning - Multi-agent coordination (other agents can query active state)

4. Topological Routing Policy (graph/routing/strategies/policy.py)

CONCEPT:ORCH-1.4 — Routes using KG-derived topological signals instead of keyword TF-IDF.

Scoring Dimensions: 1. PageRank centrality — Highly-connected specialists preferred 2. Historical success rate — Weighted by outcome evaluations 3. Tool affinity — Specialists with relevant PROVIDES edges score higher

Falls back to RuleBasedPolicy when no KG is available (cold start).

5. Persistent Background Agents (graph/persistent_agents.py)

CONCEPT:ORCH-1.4 — Long-running KG-coordinated agents.

mgr = PersistentAgentManager(engine)
mgr.register_agent("bg:monitor", "System Monitor",
                     subscriptions=["system.alert"],
                     schedule_cron="*/5 * * * *")

Lifecycle: registered → idle → running → idle → ... → terminated

Agent Types: - Monitor: Watches KG for conditions - Scheduler: Runs periodic tasks - Rebalancer: Continuously adjusts configurations - Background: General-purpose

6. Shareable Team Compositions

CONCEPT:ORCH-1.1 Extension — Export/import proven team configurations.

# Export
bundle = engine.export_team_config("tc:proven-team")

# Import on another deployment
new_id = engine.import_team_config(bundle)

Pydantic Models

Model Type Purpose
TopologyTemplateNode RegistryNode KG-stored execution topology template
SessionCheckpointNode RegistryNode Persisted execution state
PersistentAgentNode RegistryNode Long-running background agent
TeamComposition BaseModel Result of team composition (not persisted)

KG Node/Edge Types

New Node Types: - TOPOLOGY_TEMPLATE — Execution topology templates - SESSION_CHECKPOINT — Execution state checkpoints - PERSISTENT_AGENT — Background agent registrations - TOPOLOGY_TRANSITION — Transition records

New Edge Types: - TRANSITIONS_TO — Topology transitions between roles - CHECKPOINTED_STATE — Links sessions to checkpoints - SUBSCRIBED_TO — Agent event subscriptions - MATERIALIZED_FROM — Links executions to templates - COMPOSED_TEAM — Links compositions to team configs

Invoker to Spawned-Agent Handoff and Native Channels

CONCEPT:ORCH-1.37, ORCH-1.39, ORCH-1.40 — when one agent spawns another via graph_orchestrate(action="execute_agent"), three additive capabilities let the invoker shape, observe, and converse with the spawned run. All are backward-compatible: omit the new inputs and behaviour is unchanged.

ORCH-1.37 — Execution-flow diagram surfacing

The ORCH-1.8 WorkflowVisualizer already generates a Mermaid diagram of the routed graph; ORCH-1.37 surfaces it in the graph_orchestrate responses instead of only logging it. swarm, compile_workflow, and execute_workflow add an additive mermaid JSON key (null when unavailable); execute_agent returns a JSON object {"output", "mermaid"} when a diagram was produced (otherwise the bare output string, preserving the old contract).

ORCH-1.39 — Curated context, budget, tool-scope & credential handoff

The invoking agent can hand the spawned agent a curated working set so it starts informed and bounded, without leaking secrets:

Input (execute_agent) Effect on the spawned run Mechanism
context Injected as an ### INVOKER CONTEXT system-prompt block Budgeted to the target model window (invoker_context_section)
context_ref Same, but the content is fetched from a persisted ContextBlob by id Cross-process handoff; the run's RunTrace links the consumed blob for provenance
budget_tokens Hard UsageLimits.total_tokens_limit on the spawned run spawn_usage_limits
allowed_tools Least-privilege allow-list; tools/toolsets are intersected with it apply_tool_scope
cred_ref A reference (secret key) resolved to the raw token on the transient AgentDeps.auth_token at spawn _resolve_invoker_cred — the raw secret is never written to a graph node, GraphState, or logs

Security invariant: only a reference to a credential ever travels through the graph or the context. The raw token is resolved from the secrets backend onto the ephemeral AgentDeps at spawn time and is never persisted or logged.

context/context_ref are stored and fetched with the graph_context MCP tool (put/get/list).

ORCH-1.40 — Session-anchored collections & native message channels

The epistemic-graph engine is a pure id-addressed store with no property/label index — it is reliable at id-lookup and traversal-from-a-known-id, but unreliable at property scans. ORCH-1.40 builds on that strength rather than fighting it:

  • Session anchor. Each session has an id-addressable Session node (session:{sid}). Its collections hang off single-hop edges — HAS_CONTEXT → ContextBlob, HAS_MESSAGE → AgentMessage, HAS_RUN → RunTrace. "List by session" is then a reliable anchored traversal (MATCH (s {id:$snode})-[:HAS_CONTEXT]->(c:ContextBlob) RETURN c), not a property scan. This also hardened a latent bug: an unparsed WHERE no longer silently returns the whole graph (opt-in KG_ALLOW_FULL_SCAN).

  • Native channels. The invoker and the spawned agent exchange ordered, cross-process messages over the engine's native Communication Channels (KG-2.0, ~sub-ms/op), via the graph_message MCP tool and the messaging/agent_channel.py helper. The channel id is deterministic — orch:{session_id}:{run_id}. graph_orchestrate(execute_agent, open_channel=True) opens it and returns the channel_id; the spawned agent receives it on AgentDeps.message_channel_id.

Channels are the Group type (members may join after creation, unlike PeerToPeer which locks membership), and send auto-joins the sender so any sender label works.

  • Durable backstop. Live channel messages are in-RAM. send(durable=True) additionally dual-writes each message as a Session -[:HAS_MESSAGE]-> AgentMessage node, so the dialogue is replayable via graph_message(action="history") and survives an engine restart.

  • Elicitation bridge. A spawned agent can ask its invoker (→ user) a question with send_elicitation; the invoker forwards it to its in-process elicitation_queue/ApprovalManager with drain_to_elicitation_queue — a clean cross-process → in-process bridge with no UI change.

sequenceDiagram
    participant I as Invoker
    participant E as epistemic-graph (channels + Session anchor)
    participant S as Spawned agent
    I->>E: graph_orchestrate(execute_agent, context_ref, cred_ref, open_channel=True)
    Note over E,S: spawn with budgeted context, scoped tools,<br/>resolved auth_token, channel_id on AgentDeps
    I->>E: graph_message(send, "proceed", durable=True)
    S->>E: graph_message(receive) → ["proceed"]
    S->>E: send_elicitation("May I write to /etc?")
    I->>E: graph_message(receive) → forwarded to elicitation_queue
    S-->>I: {"output", "mermaid", "channel_id"}
    Note over E: durable messages replayable via graph_message(history)

See docs/examples/graph-os-mcp-examples.md for graph_context and graph_message tool call examples.

Integration with Existing Systems

  • SubagentPatternRouter (ORCH-1.5): Now uses KG backend for O(1) historical lookups instead of O(N) NX scans; persists decisions via tiered architecture
  • CognitiveScheduler (OS-5.2): Unified scheduler for both ephemeral and persistent agents
  • EventStreamIngester (Company Brain): Routes events to persistent agent subscribers
  • TeamConfigNode (AHE-3.3): Extended with export/import for cross-deployment sharing