Pillar 1: Graph Orchestration Engine¶

Overview¶

The Graph Orchestration Engine represents the foundational execution layer of the agent-utilities ecosystem. Moving away from rigid LLM chains and monolithic prompt contexts, this pillar implements a Hierarchical Task Network (HTN) backed by Pydantic Graph, transitioning linear execution into dynamic, topological routing.

Why We Built This (Rationale)¶

As our agent ecosystem scaled to include dozens of domain specialists (Python, TS, CI/CD, DB) and hundreds of MCP tools, we encountered three critical failure modes: 1. Prompt Bloat & Context Pollution: Injecting all available tools into a single prompt exceeded context limits and degraded LLM reasoning accuracy. 2. Sequential Bottlenecks: Large features were executed linearly, squandering the opportunity for parallel discovery and implementation. 3. Catastrophic Forgetting & Loop Cycles: Agents would forget successful tool combinations or fall into infinite retry loops without an enforced architectural guardrail.

How It Works (Implementation)¶

The architecture solves these bottlenecks through several interdependent primitives:

Registry Hot Cache & Unified Specialists (ORCH-1.2)¶

We collapsed the artificial boundary between prompt and mcp agents into a singular specialist type. The Registry Hot Cache maintains an O(1) session-scoped index of these specialists. Instead of passing 50+ specialists to the orchestrator, it filters down to the Top-7 relevant specialists per query, reducing prompt token bloat by ~7x.

Spec-Driven Development Pipeline (ORCH-1.6)¶

The orchestrator implements a multi-stage SDD pipeline: - Discovery & Requirements: Generates structured Spec models with measurable success criteria. - Task Decomposition: Emits a Tasks dependency graph, identifying which subtasks can be executed in parallel (e.g., frontend and backend). - Parallel Dispatch: Fuses tasks out to specific specialist workers, leveraging the Execution Visibility Graph to constrain context so a backend specialist only sees backend-related prior steps.

Learned Agent Routing & Execution Budgets (ORCH-1.7 & ORCH-1.3)¶

Routing isn't static. TraceLearnedPolicy uses softmax scoring over historical ExecutionTrace records with an exponential moving average (EMA) to actively down-weight specialists with low success rates. ExecutionBudget acts as an absolute cost governor, preempting infinite loops by enforcing USD/token constraints at the dispatcher step.

Benefits Introduced¶

Cost Efficiency: By utilizing Confidence-Gated Model Routing, trivial queries fallback to smaller models (gpt-4o-mini), saving reasoning tokens for complex HTN planning.
Architectural Safety: Subagent Lifecycle Patterns and recursive execution constraints ensure the system fails gracefully and retries contextually rather than spinning in infinite loops.
Test-Time Scaling: The system achieves zero-shot generalization by spawning parallel agent rollouts and selecting the optimal path via dynamic subgraph convergence and evolutionary aggregation.

Key Concepts Leveraged¶

ORCH-1.0: Orchestration Engine
ORCH-1.1: Agentic Planning Engine (Planning)
ORCH-1.2: Agentic Planning Engine (Routing)
ORCH-1.3: Execution Budgets & State Safety
ORCH-1.6: Spec-Driven Development
ORCH-1.7: Learned Agent Routing
ORCH-1.19: Subgraph Synthesis (Legacy Compat)
ORCH-1.20: KG-Driven Graph Factory — materializes pydantic-graph topologies from AgentTemplate nodes
ORCH-1.21: Agent Runner — KG-to-LLM execution bridge with dynamic tool binding and provenance tracking
ORCH-1.22: RecursiveMAS Latent Orchestrator 🔬 — continuous latent loop or simulated semantic collaboration
ORCH-1.8: Parallel Engine — unified 1→300+ agent execution engine with semaphore-governed concurrency, DAG scheduling, and tiered synthesis
ORCH-1.8: RLM-Native Hierarchical Synthesis — flat/hierarchical/progressive/rlm output merging strategies
ORCH-1.9: Autonomous Department Orchestration — OWL-materialized company departments with reportsTo hierarchy
ORCH-1.10: Reactive Event Sourcing — reactive event-driven state and graph staging dispatcher
ORCH-1.11: WASM Micro-Agent Execution — isolated WebAssembly sandbox runner with gas/memory limits and Python emulation fallback (the RLM execution tier is now realized by ORCH-1.38)
ORCH-1.12: Structured Predict-RLM Runtime — standard Pydantic signatures and dynamic skill injection wrapper for sandboxed REPL, plus schema-constrained subagent contracts so RLM fan-out returns typed values (bool/model/list) instead of free-form prose
ORCH-1.38: Tiered RLM Code Sandbox + Capability Router — uniform Sandbox contract with four real backends (local / monty / wasm / docker) behind a deterministic ast capability router that escalates per-snippet (monty→wasm→docker→local); monty is the fast isolated default that still serves the RLM host helpers, Docker serves them over a UDS bridge under --network none, and the wasm tier is real CPython-WASI (replacing the ORCH-1.11 stub)
ORCH-1.13: GEPA Reflective Prompt Optimizer — Genetic-Pareto optimization loop with reflective mutation and structural crossover for prompt evolution
ORCH-1.37: Orchestration execution-flow mermaid-diagram surfacing in graph_orchestrate responses (additive, backward-compatible)
ORCH-1.39: Invoker→spawned-agent handoff of curated context, token budget, tool scope, and credential reference (raw secret never persisted/logged) — see KG-Native Orchestration § Invoker to Spawned Handoff
ORCH-1.40: Session-anchored collections (Session node + HAS_CONTEXT/HAS_MESSAGE/HAS_RUN edges) and native cross-process invoker↔spawned message channels with a durable backstop and elicitation bridge (graph_context, graph_message MCP tools)
ORCH-1.41: Process Plan Compiler — graph_orchestrate(action="compile_process") lifts a descriptive BPMN process into an executable plan (see Ontology-to-Workflow Execution)
ORCH-1.42: Execution Ontology Gate — ontology validation on the execution path before a compiled process runs (knowledge_graph/core/workflow_gate.py)
ORCH-1.43: Workflow Lineage Close-Out — run lineage written back to the KG, closing the descriptive↔executable provenance loop (workflows/runner.py)
ORCH-1.44: Durable Goal Registry — goals persist across restarts; stranded runs rehydrate as orphaned instead of silently vanishing (see State Externalization)
ORCH-1.45: Queue-Driven Agent Dispatch — session-keyed agent_turns queue (AgentTurnEnvelope) consumed by a stateless agent-dispatch-worker fleet with fleet-visible placement (see Agent Dispatch)

🧬 First Principles Architecture¶

The First Principles Architecture (CONCEPT:ORCH-1.2 through CONCEPT:ECO-4.0) rewires the routing, dispatch, and feedback layers from basic primitives. These four concepts solve the key scalability and intelligence bottlenecks that emerge when managing dozens of specialists and hundreds of tools.

Concept	Problem Solved	Solution
CONCEPT:ORCH-1.2: Registry Hot Cache	O(N) specialist lookups on every routing call	Session-scoped cache with O(1) lookups, event-driven invalidation
CONCEPT:AHE-3.3: TeamConfig Promotion	LLM re-discovers same specialist teams for recurring patterns	Persist proven coalitions as reusable templates in the KG
CONCEPT:ORCH-1.2: AgentCapability System	Static tool bindings; no dynamic capability activation	First-class KG capability nodes with trigger conditions
CONCEPT:ECO-4.0: PlannerGraphSkill	A2A requests require full LLM round-trip	Direct graph-backed A2A routing, bypassing LLM overhead
CONCEPT:ECO-4.0: A2A Config File	No mechanism to discover/register external A2A agents	File-based auto-discovery with `secret://` auth & periodic refresh
CONCEPT:ORCH-1.2: Unified Specialist	Artificial `prompt`/`mcp` type split complicates dispatch	Single `specialist` type hosting any tools/skills combination

graph LR
    subgraph Routing ["3-Stage Hybrid Routing"]
        Query(["ORCH-1.0: User Query"]) --> TC{"ORCH-1.2: TeamConfig\nMatch?"}
        TC -- "Hit" --> Dispatch["ORCH-1.2: Direct\nDispatch"]
        TC -- "Miss" --> SM{"AHE-3.3: Self-Model\nBias"}
        SM --> LLM["ORCH-1.1: LLM Planner\n(Top-7 Filtered)"]
        LLM --> Dispatch
    end

    subgraph Execution ["Execute & Learn"]
        Dispatch --> Exec["ORCH-1.2: Specialist\nExecution"]
        Exec --> Verify["Verify"]
        Verify --> Feedback["ORCH-1.2: Self-Model Update\n+ TeamConfig Reward"]
        Feedback -.-> TC
    end

→ Deep-dive: docs/first-principles.md · docs/registry-cache.md · docs/process-lifecycle.md

Architecture & Orchestration Overview¶

agent-utilities implements a multi-stage execution pipeline using pydantic-graph for maximum precision and resilience. Protocol adapters (AG-UI, ACP) leverage graph.iter() for direct, step-by-step graph execution — bypassing the outer LLM agent entirely when a graph is present.

Spec-Driven Development (SDD) Lifecycle¶

agent-utilities implements a rigorous SDD workflow to ensure that complex feature requests are handled with absolute technical fidelity and measurable success criteria.

Project Constitution (constitution-generator): Establishes the governing principles, tech stack standards, and quality gates for the entire agent workshop.
Requirement Specification (spec-generator): Decomposes user intent into a formal Spec including user scenarios, functional requirements, and measurable success metrics.
Technical Implementation Plan (task-planner): Generates a step-by-step architectural approach and a Tasks model with explicit dependencies and file-path affinity for collision-free parallel execution.
Baseline & Manual Testing: Integrates first_run_tests and run_manual_test into the implementation phase to ensure baseline stability and exploratory verification.
Parallel Execution (SDDManager): The dispatcher leverages the SDD analysis engine to identify safe parallel execution batches, fanning out implementation tasks to domain specialists (Python, TS, etc.).
Quality Verification & Documentation: Audits results via spec-verifier, then generates code-walkthrough and interactive-explain artifacts to document the final implementation.

Execution Flow: Dynamic Multi-Layer Parallelism¶

agent-utilities implements a multi-stage execution pipeline with autonomous gap analysis and resilient feedback loops. The system can "fan out" research tasks in parallel before coalescing results. If implementation fails, it can automatically retry locally or loop back to research.

  graph TB
  Start(["ORCH-1.0: User Query + Images"]) --> ACPLayer["<b>ACP / AG-UI / SSE </b><br/><i>(Unified Protocol Layer)</i>"]
  ACPLayer --> UsageGuard["ORCH-1.3: Usage Guard: Rate Limiting"]
  UsageGuard -- "Allow" --> router_step["ORCH-1.2: Router: Topology Selection"]
  UsageGuard -- "Block" --> End(["ORCH-1.21: End Result"])

  router_step -- "Trivial Query" --> End
  router_step -- "Full Pipeline" --> dispatcher["ORCH-1.0: Dispatcher: Dynamic Routing"]
  dispatcher -- "First Entry" --> mem_step["KG-2.3: Memory: Context Retrieval"]
  mem_step --> dispatcher["ORCH-1.0: Dispatcher: Dynamic Routing"]

  subgraph "ORCH-1.2: Discovery Phase"
    direction TB
    Researcher["<b>Researcher</b><br/>---<br/><i>u-skill:</i> web-search, web-crawler, web-fetch<br/><i>t-tool:</i> project_search, read_workspace_file"]
    Architect["<b>Architect</b><br/>---<br/><i>u-skill:</i> c4-architecture, spec-generator, product-strategy, user-research, brainstorming<br/><i>t-tool:</i> developer_tools"]
    KGDiscovery["<b>Unified Discovery</b><br/>---<br/><i>source:</i> Knowledge Graph<br/>"]
    res_joiner["ORCH-1.0: Research Joiner: Barrier Sync"]
  end

  dispatcher -- "Research First" --> Researcher
  dispatcher -- "Research First" --> Architect
  dispatcher -- "Research First" --> KGDiscovery
  Researcher --> res_joiner
  Architect --> res_joiner
  KGDiscovery --> res_joiner
  res_joiner -- "Coalesced Context" --> dispatcher

  subgraph "Execution Phase"
    direction TB

    subgraph "Programmers"
      direction LR
      PyP["<b>Python</b><br/>---<br/><i>u-skill:</i> agent-builder, tdd-methodology, mcp-builder, jupyter-notebook<br/><i>g-skill:</i> python-docs, fastapi-docs, pydantic-ai-docs<br/><i>t-tool:</i> developer_tools"]
      TSP["<b>TypeScript</b><br/>---<br/><i>u-skill:</i> react-development, web-artifacts, tdd-methodology, canvas-design<br/><i>g-skill:</i> nodejs-docs, react-docs, nextjs-docs, shadcn-docs<br/><i>t-tool:</i> developer_tools"]
      GoP["<b>Go</b><br/>---<br/><i>u-skill:</i> tdd-methodology<br/><i>g-skill:</i> go-docs<br/><i>t-tool:</i> developer_tools"]
      RustP["<b>Rust</b><br/>---<br/><i>u-skill:</i> tdd-methodology<br/><i>g-skill:</i> rust-docs<br/><i>t-tool:</i> developer_tools"]
      CSP["<b>C Programmer</b><br/>---<br/><i>u-skill:</i> developer-utilities<br/><i>g-skill:</i> c-docs<br/><i>t-tool:</i> developer_tools"]
      CPP["<b>C++ Programmer</b><br/>---<br/><i>u-skill:</i> developer-utilities<br/><i>t-tool:</i> developer_tools"]
      JSP["<b>JavaScript</b><br/>---<br/><i>u-skill:</i> web-artifacts, canvas-design, developer-utilities<br/><i>g-skill:</i> nodejs-docs, react-docs<br/><i>t-tool:</i> developer_tools"]
    end

    subgraph "Infrastructure"
      direction LR
      DevOps["<b>DevOps</b><br/>---<br/><i>u-skill:</i> cloudflare-deploy<br/><i>g-skill:</i> docker-docs, terraform-docs<br/><i>t-tool:</i> developer_tools"]
      Cloud["<b>Cloud</b><br/>---<br/><i>u-skill:</i> c4-architecture<br/><i>g-skill:</i> aws-docs, azure-docs, gcp-docs<br/><i>t-tool:</i> developer_tools"]
      DBA["<b>Database</b><br/>---<br/><i>u-skill:</i> database-tools<br/><i>g-skill:</i> postgres-docs, mongodb-docs, redis-docs<br/><i>t-tool:</i> developer_tools"]
    end

    subgraph Specialized ["Specialized & Quality"]
      direction LR
      Sec["<b>Security</b><br/>---<br/><i>u-skill:</i> security-tools<br/><i>g-skill:</i> linux-docs<br/><i>t-tool:</i> developer_tools"]
      QA["<b>QA</b><br/>---<br/><i>u-skill:</i> spec-verifier, tdd-methodology<br/><i>g-skill:</i> testing-library-docs<br/><i>t-tool:</i> developer_tools"]
      UIUX["<b>UI/UX</b><br/>---<br/><i>u-skill:</i> theme-factory, brand-guidelines, algorithmic-art<br/><i>g-skill:</i> shadcn-docs, framer-docs<br/><i>t-tool:</i> developer_tools"]
      Debug["<b>Debugger</b><br/>---<br/><i>u-skill:</i> developer-utilities, agent-builder<br/><i>t-tool:</i> developer_tools"]
    end
  Programmers --> exe_joiner["ORCH-1.0: Execution Joiner: Barrier Sync"]
  Infrastructure --> exe_joiner
  Specialized --> exe_joiner

  exe_joiner -- "Implementation Results" --> dispatcher

  dispatcher -- "Plan Complete" --> verifier["AHE-3.1: Verifier: Quality Gate"]
  verifier -- "Score ≥ 0.7" --> synthesizer["ORCH-1.0: Synthesizer: Response Composition"]
  verifier -- "Score 0.4-0.7" --> dispatcher
  verifier -- "Score < 0.4" --> planner_step["ORCH-1.1: Planner: Re-plan with Feedback"]
  planner_step --> dispatcher
  synthesizer -- "Final Response" --> End
  dispatcher -- "Terminal Failure" --> End

  %% Styling
  style Researcher fill:#e1d5e7,stroke:#9673a6,stroke-width:2px
  style Architect fill:#e1d5e7,stroke:#9673a6,stroke-width:2px
  style A2ADiscovery fill:#e1d5e7,stroke:#9673a6,stroke-width:2px
  style MCPDiscovery fill:#e1d5e7,stroke:#9673a6,stroke-width:2px

  style Programmers fill:#dae8fe,stroke:#6c8ebf,stroke-width:2px
  style PyP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px
  style TSP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px
  style GoP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px
  style RustP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px
  style CSP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px
  style CPP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px
  style JSP fill:#dae8fe,stroke:#6c8ebf,stroke-width:1px

  style Infrastructure fill:#fad9b8,stroke:#d6b656,stroke-width:2px
  style DevOps fill:#fad9b8,stroke:#d6b656,stroke-width:1px
  style Cloud fill:#fad9b8,stroke:#d6b656,stroke-width:1px
  style DBA fill:#fad9b8,stroke:#d6b656,stroke-width:1px

  style Specialized fill:#e0d3f5,stroke:#82b366,stroke-width:2px
  style Sec fill:#e0d3f5,stroke:#82b366,stroke-width:1px
  style QA fill:#e0d3f5,stroke:#82b366,stroke-width:1px
  style UIUX fill:#e0d3f5,stroke:#82b366,stroke-width:1px
  style Debug fill:#e0d3f5,stroke:#82b366,stroke-width:1px

  style verifier fill:#fff2cc,stroke:#d6b656,stroke-width:2px
  style synthesizer fill: #d5e8d4,stroke:#82b366,stroke-width:2px
  style planner_step fill: #dae8fe,stroke:#6c8ebf,stroke-width:2px
  style End fill:#f8cecc,stroke:#b85450,stroke-width:2px
  style res_joiner fill:#f5f5f5,stroke:#666,stroke-dasharray: 5 5
  style exe_joiner fill:#f5f5f5,stroke:#666,stroke-dasharray: 5 5
  style dispatcher fill:#f5f5f5,stroke:#666,stroke-width:2px
  style Start color:#000000,fill:#38B6FF
    style subGraph0 color:#000000,fill:#f5ebd3
    style subGraph5 color:#000000,fill:#f5f1d3
    style dispatcher fill:#d5e8d4,stroke:#666,stroke-width:2px
  style LocalAgents fill:#f5d0ef,stroke:#d6b656,stroke-width:1px
    style RemotePeers fill:#f5d0ef,stroke:#d6b656,stroke-width:1px
  style ACPLayer color:#000000,fill:#38B6FF,stroke-width:2px
  style Start color:#000000,fill:#38B6FF
    style subGraph0 color:#000000,fill:#f5ebd3
    style subGraph5 color:#000000,fill:#f5f1d3
    style dispatcher fill:#d5e8d4,stroke:#666,stroke-width:2px
  style LocalAgents fill:#f5d0ef,stroke:#d6b656,stroke-width:1px
    style RemotePeers fill:#f5d0ef,stroke:#d6b656,stroke-width:1px

🔬 RecursiveMAS Latent Orchestration (research / not yet wired)¶

Status: research design only — the native open-weights pipeline described below (RecursiveLink, a dedicated rlm/mas_local.py module) is not yet implemented in the codebase. The shipped RLM substrate lives in agent_utilities/rlm/ (RLMEnvironment in rlm/repl.py, rlm/specialist.py, rlm/predict_rlm.py). Note also that the concept ID ORCH-1.22 is now assigned to Workflow Persistence & Replay Relationships in docs/concepts.yaml; this section is retained as forward-looking design.

RecursiveMAS (Recursive Multi-Agent System) is a research-backed design that redefines multi-agent collaboration by passing continuous representations in latent space rather than raw text sequences.

Why RecursiveMAS?¶

In traditional multi-agent architectures, agents communicate via textual generation. This forces the system to undergo costly autoregressive decoding/generation cycles at every step, creating: 1. Severe Latency Bottlenecks: Each agent in a chain must wait for the preceding agent to fully compile its reasoning text. 2. Context Blowup & Token Bloat: Spells out intermediate thoughts, rapidly consuming the prompt context window and driving up costs. 3. Discrete Handoff Loss: Text sequences lose the continuous, high-dimensional semantic richness of the model's internal last-layer hidden states.

By routing communication through embedding projections (latent space), RecursiveMAS achieves 1.2x to 2.4x inference speedup and up to 75.6% token usage reduction by the third recursion round, while improving reasoning accuracy by 8.3% on complex mathematical and coding benchmarks.

Latent Collaboration Architecture¶

graph TD
    UserQuery["User Query / Task Input"] --> Agent1["Agent A (e.g. Planner) <br/> (Model Frozen)"]

    %% Agent A loops internally
    Agent1 -->|Hidden States| InnerLinkA["Inner RecursiveLink A <br/> (Self-thoughts loop)"]
    InnerLinkA -->|Self-Feed Embeddings| Agent1

    %% Connection between A and B
    Agent1 -->|Raw Output Activations| OuterLinkAB["Outer RecursiveLink AB <br/> (Projection & Dimension Matching)"]
    OuterLinkAB -->|Projected Embeddings| Agent2["Agent B (e.g. Specialist) <br/> (Model Frozen)"]

    %% Agent B loops internally
    Agent2 -->|Hidden States| InnerLinkB["Inner RecursiveLink B"]
    InnerLinkB -->|Self-Feed Embeddings| Agent2

    %% Loop back or Final Text Output
    Agent2 -->|Raw Output Activations| OuterLinkBA["Outer RecursiveLink BA <br/> (Recurrent Loop Link)"]
    OuterLinkBA -->|Round t+1 Latent Feed| Agent1

    Agent2 -->|Final Round| DecodedOutput["Final Text Decoder / Response"]

Key Components:¶

RecursiveLink: A lightweight, multi-layer projection module that acts as the connective tissue between models, leaving original LLM weights completely frozen:
Inner RecursiveLink: Maps an agent's newly generated hidden states directly back into its own input embedding space, enabling continuous internal reasoning loops without token decoding.
Outer RecursiveLink: Maps embedding dimensions between disparate model shapes (e.g. Llama-3's 4096-d space to Gemma-3's 3072-d space) to bridge latent states between heterogeneous agents.
Shared Backbone Brain: Multiple agent roles (e.g., Planner, Coder, Critic) can reside on the exact same foundation model instance in VRAM, sharing base weights while utilizing lightweight individual RecursiveLink modules for role specialization.

Dual-Architecture Implementation Strategy¶

To implement RecursiveMAS without introducing performance overhead or heavy dependencies into the core framework, agent-utilities employs a decoupled Dual-Architecture Strategy:

graph TD
    TaskIn(["Incoming Task"]) --> ModelCheck{"Is Model Local & <br/> Open-Weights?"}

    ModelCheck -- "Yes (Local GPU)" --> LocalPath["<b>Native Open-Weights Pipeline</b> <br/> - Dynamic PyTorch/vLLM import <br/> - Last hidden state weight access <br/> - Direct projection tensors"]
    ModelCheck -- "No (Off-the-shelf API)" --> APIPath["<b>Universal API Semantic Simulator</b> <br/> - Zero extra VRAM/PyTorch imports <br/> - Local REPL variable state passing <br/> - Semantic thought vectors via embeddings"]

    LocalPath --> Execution["Orchestration Engine Execution"]
    APIPath --> Execution

1. Native Open-Weights Pipeline (Optional GPU Mode)¶

For specialized local runs executing open-source weights (via PyTorch, Hugging Face transformers, or custom vLLM adapters): * The system accesses the model's last_hidden_state activations during generation, runs them through the lightweight PyTorch RecursiveLink projection layers, and injects them directly into the input attention space of the next agent. * Decoupled Security: All neural modeling code would be isolated in a modular wrapper (planned agent_utilities/rlm/mas_local.py, not yet present). This ensures zero dependencies (like PyTorch) are imported during standard framework operations, maintaining a strict zero-overhead baseline.

2. Universal API Semantic Simulator (Off-the-Shelf Fallback)¶

For standard deployments using cloud-hosted models (Gemini, OpenAI, Claude) where hidden layer access is technically restricted, the orchestrator extrapolates the core benefits of RecursiveMAS via API-level symbolic emulation:

State Containment via persistent REPL: Rather than passing raw chat logs back and forth via the API, the orchestrator keeps all intermediate calculations, databases, and heavy text dumps stored locally in variables inside the persistent RLM Python REPL (RLMEnvironment).
Metadata-Only Prompting: The API prompt is fed only constant-size metadata (e.g. variable names, types, and lengths) rather than raw variables. Agents interact by writing python code to mutate REPL states, achieving the whitepaper's 75% token reduction and preventing context pollution entirely.
Semantic Embedding Vectors: Agents share intermediate thought states by passing lightweight high-dimensional embedding vectors (retrieved via cheap off-the-shelf embedding endpoints) representing the semantic "Thought Mementos". These vectors are used to programmatically query the local Knowledge Graph or rank context slices without generating raw text, mimicking the continuous latent state hand-off of the neural pipeline.

ORCH-1.27 — Role-Specialized Model Routing¶

Assimilated from Quarq Agent's three-specialized-model pattern (planner / generator / learner; agent-oss/agent.py:58-92), generalized so functional roles bind to model tiers + capability tags rather than hardcoded model ids. ModelRegistry.pick_for_role(role) resolves planner|generator|learner|judge through the existing pick_for_task tier-fallback, so the same configuration runs on any provider pool (local LM Studio, cloud frontier, mixed) and degrades gracefully. Overridable per-call, via ModelRegistry.role_routing, via AgentConfig.role_routing, or live through graph_configure(action="set_role_routing"). This is the routing substrate for the memory-first synergy pipeline: the HyDE planner (KG-2.12), the background learner (KG-2.13), and the LongMemEval judge all request their role here. Extends ORCH-1.2.

ORCH-1.2 — Global Workspace Attention loop (revived + instrumented)¶

After each multi-agent wave, ParallelEngine drives a Global Workspace Theory loop: WorkspaceAttention scores specialist outputs (relevance·track-record·confidence), selects the top-K, and broadcasts the winners to the KG as ProposalNodes. get_attention_score(specialist) reads those back as each specialist's runtime standing, feeding routing/confidence in pick_specialist_model. The loop is instrumented with write/read counters and a suspected_engine_mismatch guard (surfaced in ExecutionResult.telemetry["workspace_attention"]; strict mode via AGENT_UTILITIES_GWT_STRICT). Winners are also recorded into the EvolvingMemoryStore INSIGHT bank. Full design: Global Workspace Attention.

The swarm is modeled as a social system S=(f,g,G): archetype-tagged agents over an explicit interaction graph (built from manifest depends_on edges), with local-neighborhood observability, a co-evolution edge-update loop, and a P1–P4 swarm-health snapshot (degree-partition heterogeneity, topology variance, neighbor co-evolution slope, Wasserstein-1 drift). ParallelEngine attaches the snapshot to ExecutionResult.telemetry["social_system"]. Full design: Multi-Agent Social System.

ORCH-1.41 / 1.42 / 1.43 — Ontology-to-Workflow Execution Path¶

Descriptive process knowledge in the KG is now executable, with the ontology in the loop at every step:

ORCH-1.41 — Process Plan Compiler (knowledge_graph/process_plan_compiler.py): graph_orchestrate(action="compile_process") (REST twin /api/graph/orchestrate/compile-process) lifts a descriptive BPMN process — ingested via the Camunda extractor and given step-level ontology shape by KG-2.53 — into an executable plan.
ORCH-1.42 — Execution Ontology Gate (knowledge_graph/core/workflow_gate.py): ontology validation sits on the execution path, so a compiled process is checked against the published ontology before it runs.
ORCH-1.43 — Lineage Close-Out (workflows/runner.py + core/owl_bridge.py): workflow runs write lineage back to the KG, closing the descriptive↔executable provenance loop — the process model, the compiled plan, and the run that executed it stay connected.

The authoritative TBox these steps validate against is published to Fuseki on a background tick (KG-2.52, knowledge_graph/core/ontology_publisher.py). Walkthrough: ontology-to-workflow example.

ORCH-1.44 — Durable Goal Registry¶

Goals are durable records in the externalized state store (OS-5.16), not in-process objects: they persist across gateway restarts, and a run stranded by a crashed host rehydrates as orphaned instead of silently vanishing (core/sessions.py, models/goal.py). See State Externalization.

ORCH-1.45 — Queue-Driven Agent Dispatch¶

Agent turns (goal-loop iterations and orchestrator jobs) can dispatch through a session-partitioned durable queue instead of the in-process scheduler: AGENT_DISPATCH_BACKEND=queue makes graph_orchestrate(action="dispatch") and the goal machinery enqueue a typed AgentTurnEnvelope (orchestration/agent_dispatch.py — job id as idempotency key; payload stays a reference into the state store) onto the agent_turns queue (Kafka, Postgres SKIP LOCKED, or per-host SQLite — composing the KG-2.55 transport stack with a session:<id> partition key above the KG-2.56 tenant key). A stateless agent-dispatch-worker fleet claims turns under per-session mutual exclusion, rehydrates from the shared state store, executes the existing goal/orchestration bodies, and writes back durably before acking — at-least-once with idempotent re-claims, so a crashed worker is crash recovery, not data loss. Workers heartbeat into the dispatch_workers registry, /api/fleet/topology lists them, and graph_orchestrate job/{id} reports the executing worker. The inline default is byte-for-byte the previous behavior. Full design: Queue-Driven Agent Dispatch; walkthrough: queue-dispatch example.

ORCH-1.50 — Task-Management Ergonomics on SDD¶

The Spec-Driven Development pipeline (ORCH-1.6) already persists a durable, dependency-aware task list — Spec / Task / Tasks / ImplementationPlan (models/sdd.py) round-tripped to .specify/ by SDDManager (sdd/__init__.py). ORCH-1.50 adds the loop-driving ergonomics on top, so a long-horizon goal can be decomposed, scored, and worked one actionable task at a time:

parse_prd(prd_text, feature_id) — decompose a PRD into sequential, dependency-linked tasks (zero-infra structural parser by default; an LLM decomposer is injectable).
analyze_complexity(feature_id) — score each task 0–10 and recommend a subtask count (a deployable structural heuristic by default; an LLM scorer is injectable), persisting a report under .specify/reports/.
Tasks.next_task() — pick the next actionable task, preferring subtasks of an in-progress parent, then top-level tasks whose dependencies are satisfied, breaking ties by priority → fewer deps → id. detect_cycles() / validate_dependencies() reject an unschedulable graph before work starts.
scope_task(... "up"|"down") — renegotiate scope, preserving done/in-progress subtasks. Tagged contexts (via feature_id, with branch_tasks / list_task_contexts) give parallel task streams.

New fields on Task (priority, complexity_score, recommended_subtasks, test_strategy, expansion_prompt) round-trip through a full-fidelity tasks.json sidecar (the markdown mirror stays human-readable). Surfaced over the harness MCP server as task_parse_prd, task_analyze_complexity, task_next, task_set_status, and task_scope (mcp/harness_server.py).

flowchart LR
    PRD[PRD / goal] -->|parse_prd| T[(Tasks\n.specify + tasks.json)]
    T -->|analyze_complexity| S[scores +\nrecommended subtasks]
    S -->|scope_task up/down| T
    T -->|validate_dependencies| G{cycle?}
    G -- yes --> X[reject]
    G -- no --> N[next_task]
    N -->|work + set_task_status| T