Skip to content

The Evolvable Surface — DSPy and the Self-Optimization Substrate (CONCEPT:AHE-3.1)

DSPy optimizes anything you can express as a Signature (typed inputs→outputs) + a metric + a trainset of demonstrations. This page maps the full surface DSPy (and the adjacent evolution machinery) optimizes across agent-utilities — prompts, sampling profiles, MCP tool descriptions, agent skills, knowledge-graph extraction, concept matching, and routing policies.

Status (CONCEPT:AHE-3.39–3.46): all six opportunities are wired as one unified optimization subsystem — a real graded metric (no longer exact-match), a pluggable optimizable-target registry, a shared compile+demo-refine driver, generalized KG persistence, two self-supervised optimizers, a single graph_orchestrate action=optimize_component surface (+ REST twin), and a scheduled, propose-only daemon tick (AHE-3.46) that closes the loop. See The unified optimization subsystem and Closing the loop — the scheduled sweep.

The mental model: optimizer · substrate · metric

DSPy is not a standalone feature; it is one optimizer plugged into a larger self-evolution loop. Three roles matter:

flowchart LR
    subgraph Metric["Metric source — is the new version better?"]
        EC["EvalCorpus / continuous_evaluation_engine"]
        TD["TraceDistiller → EvidenceCorpus"]
        EMA["CapabilityIndex reward EMA"]
        PP["PreferencePairs RAPPO/TI-DPO"]
    end
    subgraph Opt["Optimizers — propose better artifacts"]
        DSPY["DSPy BootstrapFewShot / MIPROv2"]
        GEPA["GEPA reflective Pareto"]
        MUT["VariantPool parametric mutation"]
    end
    subgraph Sub["Substrate — select / promote / remember"]
        VP["VariantPool tournament + promote"]
        RB["ReplayBuffer prioritized"]
        BAN["Explore/Exploit bandit"]
    end
    TD --> Opt
    EC --> Opt
    Opt --> Sub
    Sub --> APPLY["apply edit → KG + git"]
    APPLY --> RUN["next execution"]
    RUN --> Metric
    EMA --> Sub
    PP --> Opt
  • Optimizer proposes a better artifact. DSPy compiles a Signature and runs BootstrapFewShot/MIPROv2; GEPA (rlm/gepa.py) explores prompt candidates by reflective Pareto search; VariantPool.mutate_profile jitters numeric configs.
  • Substrate decides what survives: VariantPool tournament + promote_winner, the ReplayBuffer (decisive states resurface), the explore/exploit bandit (explore_exploit_router.py, used by decentralized_memory.py) — reuse a proven artifact or try a fresh candidate.
  • Metric answers "better?": the EvalCorpus/continuous_evaluation_engine, the TraceDistiller's EvidenceCorpus, the CapabilityIndex reward EMA, and the consolidated PreferencePairs (AHE-3.17). This is the load-bearing piece — see The metric problem.

What is wired today

flowchart TD
    TRACE["execution traces"] --> DIST["TraceDistiller.distill"]
    DIST --> EVID["EvidenceCorpus<br/>failure clusters + component_attribution"]
    EVID --> EV["EvolveAgent.evolve"]
    EV -->|"attribution == system_prompt"| DSPY["_dspy_optimize_cluster"]
    DSPY --> COMP["compile_json_to_signature → BootstrapFewShot/MIPROv2"]
    COMP --> EDIT["ComponentEdit: optimized prefix + few-shot demos"]
    COMP --> KGB["DSPyKGBridge.ingest_evolved_prompt"]
    KGB --> KG[("EvolvedPromptNode + DSPyTraceNode:OptimizationTrajectoryNode")]
    EDIT --> APPLY["apply to JSON blueprint + git"]

EvolveAgent._dspy_optimize_cluster (harness/evolve_agent.py) runs only when a failure cluster's component_attribution is system_prompt. It compiles the target JSON prompt blueprint to a dspy.Signature (prompting/dspy_compiler.py), draws a trainset of passing traces from the EvidenceCorpus, runs BootstrapFewShot (or MIPROv2/BootstrapFewShotWithRandomSearch), and persists the compiled state + few-shot demos back to the blueprint and to the KG via DSPyKGBridge (CONCEPT:ORCH-1.8: EvolvedPromptNode, OptimizationTrajectoryNode). Sampling-profile evolution (AHE-3.38) is wired in parallel via the VariantPool (see Sampling Profiles).

Two edit engines, one apply side. The ComponentType enum already names tool_description, tool_implementation, skill, middleware, … as attribution categories, and EvolveAgent already edits all of them — but via a one-shot LLM heuristic (the "fallback to LLM heuristic edits if DSPy isn't applicable" path), not DSPy's metric-driven bootstrap. The apply side is fully built for them too: PhysicalDistillationEngine (AHE-3.9, knowledge_graph/distillation/physical_distiller.py) has distill_skill, distill_mcp_tool (docstrings + input schemas) and distill_system_prompt, committing changes to files via GitOps (AHE-3.11). So the surface below is "swap the LLM-heuristic editor for DSPy optimization", not greenfield — the persistence, attribution, and apply spine already exist.

The evolvable surface

Surface Representation (file:symbol) Optimizer fit Metric source Status
System prompts SystemPromptNode; JSON blueprints; system_prompt target DSPy Signature (instruction prefix + demos) graded EvalCorpus score (AHE-3.39) Wired (registry target)
Sampling profiles SamplingProfile (agent/sampling_profile.py) parametric mutation (not DSPy) CapabilityIndex reward EMA Wired (AHE-3.38 evolve_profile)
Few-shot example sets compiled demos; refine_demos DSPy bootstrap + drop-one ablation held-out graded score Wired (AHE-3.43)
MCP tool descriptions tool_description target; distill_mcp_tool apply side DSPy Signature (description → selectability) graded score / record_outcome Wired (AHE-3.41)
Agent skills (SOP / trigger) skill target; SOP via mount_skill_unit (ORCH-1.28); distill_skill DSPy for SOP/trigger text graded score Wired (AHE-3.42)
KG fact extraction extraction_optimizer.py over FACT_EXTRACTION_PROMPT DSPy module wrapping extraction self-supervised dedup + canonical consistency Wired (AHE-3.44)
Concept matching policy_optimization.optimize_concept_matcher DSPy classifier (article × concept → relevant?) classification accuracy vs ADDRESSES edges Wired (AHE-3.45)
Routing / role policy policy_optimization.optimize_routing_policy DSPy policy (task → primitive) realized ExecutionTrace success Wired (AHE-3.45)

The unified optimization subsystem (now wired)

All six opportunities landed as one subsystem (harness/dspy_optimization.py), not six bolt-ons — the same metric, registry, driver, and persistence spine reused across targets.

flowchart TD
    subgraph Core["harness/dspy_optimization.py"]
        MET["make_optimization_metric<br/>AHE-3.39 graded, not exact-match"]
        REG["OPTIMIZABLE_TARGETS registry<br/>AHE-3.40 system_prompt · tool_description · skill"]
        DRV["run_dspy_optimization<br/>compile + refine_demos AHE-3.43"]
        DISP["run_component_optimization<br/>one dispatch for all targets"]
    end
    EV["EvolveAgent._dspy_optimize_cluster<br/>dispatches by component_attribution"] --> REG
    REG --> DRV
    DRV --> MET
    DRV --> KGB["DSPyKGBridge.ingest_evolved_component<br/>AHE-3.40 generalized persist"]
    SS1["extraction_optimizer<br/>AHE-3.44 self-supervised"] --> DISP
    SS2["policy_optimization<br/>AHE-3.45 concept-match + routing"] --> DISP
    EVSURF["graph_orchestrate action=optimize_component<br/>+ REST twin"] --> DISP
    DISP --> REG
  • Real metric (AHE-3.39). make_optimization_metric grades prediction.response against example.response via the existing EvalRunner semantic scorer (token-overlap fallback offline), optionally blending a reward EMA. This replaces the exact-match placeholder — the upgrade every text target inherits.
  • Target registry (AHE-3.40). OPTIMIZABLE_TARGETS holds one OptimizableTarget handler per ComponentType (system_prompt, tool_description, skill), each declaring how to read the artifact's text and name it. EvolveAgent._dspy_optimize_cluster is generalized to dispatch by attribution through the registry (the hardcoded system-prompt-only path is gone) and now persists every target via the bridge — closing a prior Wire-First gap where DSPyKGBridge.ingest_evolved_* had no caller.
  • Demo refinement (AHE-3.43). refine_demos runs a drop-one ablation on the bootstrapped demos against a held-out slice, so a noisy demo can't survive into the blueprint.
  • Self-supervised optimizers. extraction_optimizer (AHE-3.44) scores extractions by dedup rate + canonical consistency — no labels needed; policy_optimization (AHE-3.45) optimizes the concept matcher against ADDRESSES-edge labels and the routing policy against realized ExecutionTrace success.
  • One surface. graph_orchestrate action=optimize_component (task=<system_prompt|tool_description|skill|extraction|concept_match|routing>, dependencies=optional JSON data) dispatches through run_component_optimization; the REST twin is automatic (graph_orchestrate is already in ACTION_TOOL_ROUTES).

Sampling profiles (AHE-3.38) are evolved by parametric mutation, not DSPy — DSPy optimizes text, profiles are numbers — but share the same reward-EMA + tournament substrate. See Sampling Profiles.

The metric problem

Every optimization needs a metric, and the metric is where this gets real. The original _dspy_optimize_cluster used an exact-match placeholder; AHE-3.39 replaced it with a graded scorer. The system owns three signals an optimizer can be steered by, in increasing strength:

flowchart LR
    subgraph Signals["already in the system"]
        A["CapabilityIndex.record_outcome<br/>reward EMA 0..1"]
        B["PreferencePairs<br/>chosen ≻ rejected + margin"]
        C["EvalCorpus / TraceDistiller<br/>pass-rate, failure clusters"]
    end
    A --> M["DSPy metric / VariantPool fitness"]
    B --> M
    C --> M
    M --> PROMO["promote only if it beats the incumbent on the metric"]
  • CapabilityIndex.record_outcome — an EMA reward in [0,1] per entity/profile; the fitness signal VariantPool and evolve_profile already consume.
  • PreferencePairs (AHE-3.17, preference_pairs.py) — consolidates eval-corpus regressions, distilled success/fail episodes, and human corrections into (chosen ≻ rejected) pairs with RAPPO margins / TI-DPO token weights — a ready-made reward model for any text optimizer.
  • EvalCorpus / TraceDistiller — pass-rate on regression cases and failure-cluster attribution; the natural metric for "did this prompt lower the failure rate on cluster X?".

The synergy machinery

Mechanism File:symbol Role for DSPy
Variant pool (AHE-3.2) harness/variant_pool.py holds competing candidates; tournament + promote_winner is generic over any optimizer's output
Capability reward EMA (KG-2.6) retrieval/capability_index.py::record_outcome the feedback channel from execution back to optimization
Preference pairs (AHE-3.17) harness/preference_pairs.py reward-model substrate (DPO-family) for text targets
Replay buffer (AHE-3.0) harness/replay_buffer.py decisive states (plateau-breakers) resurface for curriculum
Explore/exploit bandit (KG-2.82) harness/{decentralized_memory,explore_exploit_router}.py per-agent UCB1/Thompson choice: reuse proven vs. try fresh candidate
Self-guided self-play (AHE-3.37) harness/self_guided_play.py generates harder task variants (a curriculum DSPy can optimize against)
GEPA (ORCH-1.13) rlm/gepa.py reflective Pareto prompt explorer — complements DSPy's local fine-tune
Trace distiller harness/continuous_evaluation_engine.py turns raw traces into the EvidenceCorpus that seeds trainsets + attribution

Where a DSPy pass hooks into a live loop

flowchart TD
    TICK["daemon evolution tick / agent-utilities-evolution skill"] --> LC["LoopController.run_one_cycle"]
    LC --> AEE["AgenticEvolutionEngine"]
    AEE --> EVA["EvolveAgent.evolve"]
    EVA -->|"per failure cluster, by component_attribution"| BR{"attribution?"}
    BR -->|"system_prompt / tool_description / skill"| D1["registry target → run_dspy_optimization ✓"]
    BR -->|"sampling"| D2["evolve_profile ✓"]
    BR -->|"extraction / concept_match / routing"| D3["self-supervised optimizer ✓"]
    D1 & D2 & D3 --> SUB["score vs incumbent → promote"]
    SUB --> KGB["persist to KG + git"]

The cycle is driven by the consolidated KG daemon tick and the agent-utilities-evolution skill, both routing through LoopController.run_one_cycle and the AgenticEvolutionEngine/EvolveAgent. New optimization targets are added as new component_attribution branches in EvolveAgent — each reusing the same distiller → optimizer → variant-pool → KG-bridge spine.

Status — all delivered (AHE-3.39–3.46)

# Opportunity Concept Where
1 Real metric (replaces exact-match) AHE-3.39 dspy_optimization.make_optimization_metric
2 Few-shot demo-set refinement AHE-3.43 dspy_optimization.refine_demos
3 MCP tool descriptions AHE-3.41 tool_description registry target
4 KG extraction prompt AHE-3.44 extraction_optimizer.optimize_extraction_prompt
5 Skill SOP/trigger AHE-3.42 skill registry target (SOP already reaches the model via ORCH-1.28)
6 Concept-matching + routing AHE-3.45 policy_optimization.optimize_concept_matcher / optimize_routing_policy
7 Scheduled sweep + promotion gate AHE-3.46 run_optimization_sweep · should_promote · daemon tick

Each was a registry target or a self-supervised optimizer reusing the one metric/driver/persist spine — not new infrastructure.

Closing the loop — the scheduled sweep (AHE-3.46)

The on-demand surface is now matched by a scheduled, propose-only daemon tick — the operational step that makes optimization continuous rather than manual.

flowchart LR
    TICK["KG daemon tick<br/>KG_DSPY_OPTIMIZATION (opt-in)"] --> SWEEP["run_optimization_sweep"]
    MCP["graph_orchestrate action=optimize_component task=all"] --> SWEEP
    SWEEP --> GD["gather_optimization_data<br/>query_cypher: Documents / ADDRESSED_BY / ExecutionTrace"]
    GD --> RUN["run_component_optimization per target"]
    RUN --> TRAJ["persist OptimizationTrajectory (propose-only)"]
    TRAJ --> GATE["should_promote(baseline, candidate, min_delta)"]
    GATE -.future auto-apply gate.-> APPLY["distill → files"]
  • Daemon tick_tick_optimize_components (knowledge_graph/core/engine_tasks.py), registered in the consolidated maintenance scheduler when KG_DSPY_OPTIMIZATION=True (off by default — each pass is an LLM-gated compile), on KG_DSPY_OPTIMIZATION_INTERVAL (default 3600s). The scheduled twin of the MCP action; both call run_optimization_sweep.
  • Sweep — runs the schedulable self-supervised targets (extraction / concept_match / routing), gathering live data via gather_optimization_data (engine.query_cypher, degrading to no_data rather than breaking the daemon).
  • Propose-only — like KG_GOLDEN_AUTO_MERGE, the sweep records optimization trajectories but never auto-applies. should_promote(baseline, candidate, min_delta) is the gate a candidate must clear on the held-out metric before a future auto-apply step lets it supersede the live artifact.

What remains is genuinely operational tuning: a reachable LLM for the compile, and populated graph data for the gatherers to draw on.

Code paths

  • agent_utilities/harness/dspy_optimization.pythe spine: make_optimization_metric (AHE-3.39), OPTIMIZABLE_TARGETS/OptimizableTarget (AHE-3.40), refine_demos (AHE-3.43), run_dspy_optimization, run_component_optimization.
  • agent_utilities/knowledge_graph/extraction/extraction_optimizer.py — AHE-3.44: extraction_quality (self-supervised metric), optimize_extraction_prompt.
  • agent_utilities/harness/policy_optimization.py — AHE-3.45: classification_accuracy, routing_success_rate, optimize_concept_matcher, optimize_routing_policy.
  • agent_utilities/mcp/tools/analysis_tools.pygraph_orchestrate action=optimize_component (the two-surface entry point; task=all runs the sweep).
  • agent_utilities/harness/dspy_optimization.py — AHE-3.46: run_optimization_sweep, gather_optimization_data, should_promote, SCHEDULABLE_TARGETS.
  • agent_utilities/knowledge_graph/core/engine_tasks.py_tick_optimize_components (the KG_DSPY_OPTIMIZATION maintenance-scheduler tick).
  • agent_utilities/prompting/dspy_compiler.pycompile_json_to_signature, AgentTaskModule.
  • agent_utilities/harness/evolve_agent.pyEvolveAgent._dspy_optimize_cluster (registry-dispatched; was system-prompt-only).
  • agent_utilities/knowledge_graph/dspy_kg_bridge.pyDSPyKGBridge.ingest_evolved_component (generalized; Evolved*Node + OptimizationTrajectoryNode).
  • agent_utilities/knowledge_graph/distillation/physical_distiller.py — the apply side (AHE-3.9/3.11): distill_system_prompt, distill_mcp_tool, distill_skill, commit_distilled_changes (KG-optimized artifacts → files under GitOps).
  • agent_utilities/harness/{variant_pool,preference_pairs,replay_buffer,decentralized_memory,explore_exploit_router,self_guided_play,continuous_evaluation_engine}.py — the substrate + metric sources.
  • agent_utilities/retrieval/capability_index.pyrecord_outcome reward EMA.
  • agent_utilities/rlm/gepa.py — reflective Pareto prompt optimizer.
  • agent_utilities/knowledge_graph/research/loop_controller.py, agent_utilities/harness/agentic_evolution_engine.py — the live loop drivers.

Relationship to other concepts

  • AHE-3.1 (mathematical prompt optimization) is the DSPy spine; AHE-3.2 (variant selection) the substrate; AHE-3.17 (preference corpus) the reward model.
  • Sampling-profile evolution (AHE-3.38) is the same loop applied to numeric configs — see Sampling Profiles.
  • The KG persistence (ORCH-1.8 DSPyKGBridge) makes every optimization a durable, queryable OptimizationTrajectory — optimization history is itself in the graph.