The Evolvable Surface — DSPy and the Self-Optimization Substrate (CONCEPT:AHE-3.1)¶
DSPy optimizes anything you can express as a Signature (typed inputs→outputs) + a metric + a trainset of demonstrations. This page maps the full surface DSPy (and the adjacent evolution machinery) optimizes across agent-utilities — prompts, sampling profiles, MCP tool descriptions, agent skills, knowledge-graph extraction, concept matching, and routing policies.
Status (CONCEPT:AHE-3.39–3.46): all six opportunities are wired as one unified optimization subsystem — a real graded metric (no longer exact-match), a pluggable optimizable-target registry, a shared compile+demo-refine driver, generalized KG persistence, two self-supervised optimizers, a single
graph_orchestrate action=optimize_componentsurface (+ REST twin), and a scheduled, propose-only daemon tick (AHE-3.46) that closes the loop. See The unified optimization subsystem and Closing the loop — the scheduled sweep.
The mental model: optimizer · substrate · metric¶
DSPy is not a standalone feature; it is one optimizer plugged into a larger self-evolution loop. Three roles matter:
flowchart LR
subgraph Metric["Metric source — is the new version better?"]
EC["EvalCorpus / continuous_evaluation_engine"]
TD["TraceDistiller → EvidenceCorpus"]
EMA["CapabilityIndex reward EMA"]
PP["PreferencePairs RAPPO/TI-DPO"]
end
subgraph Opt["Optimizers — propose better artifacts"]
DSPY["DSPy BootstrapFewShot / MIPROv2"]
GEPA["GEPA reflective Pareto"]
MUT["VariantPool parametric mutation"]
end
subgraph Sub["Substrate — select / promote / remember"]
VP["VariantPool tournament + promote"]
RB["ReplayBuffer prioritized"]
BAN["Explore/Exploit bandit"]
end
TD --> Opt
EC --> Opt
Opt --> Sub
Sub --> APPLY["apply edit → KG + git"]
APPLY --> RUN["next execution"]
RUN --> Metric
EMA --> Sub
PP --> Opt
- Optimizer proposes a better artifact. DSPy compiles a
Signatureand runsBootstrapFewShot/MIPROv2; GEPA (rlm/gepa.py) explores prompt candidates by reflective Pareto search;VariantPool.mutate_profilejitters numeric configs. - Substrate decides what survives:
VariantPooltournament +promote_winner, theReplayBuffer(decisive states resurface), the explore/exploit bandit (explore_exploit_router.py, used bydecentralized_memory.py) — reuse a proven artifact or try a fresh candidate. - Metric answers "better?": the
EvalCorpus/continuous_evaluation_engine, theTraceDistiller'sEvidenceCorpus, theCapabilityIndexreward EMA, and the consolidatedPreferencePairs(AHE-3.17). This is the load-bearing piece — see The metric problem.
What is wired today¶
flowchart TD
TRACE["execution traces"] --> DIST["TraceDistiller.distill"]
DIST --> EVID["EvidenceCorpus<br/>failure clusters + component_attribution"]
EVID --> EV["EvolveAgent.evolve"]
EV -->|"attribution == system_prompt"| DSPY["_dspy_optimize_cluster"]
DSPY --> COMP["compile_json_to_signature → BootstrapFewShot/MIPROv2"]
COMP --> EDIT["ComponentEdit: optimized prefix + few-shot demos"]
COMP --> KGB["DSPyKGBridge.ingest_evolved_prompt"]
KGB --> KG[("EvolvedPromptNode + DSPyTraceNode:OptimizationTrajectoryNode")]
EDIT --> APPLY["apply to JSON blueprint + git"]
EvolveAgent._dspy_optimize_cluster (harness/evolve_agent.py) runs only when a
failure cluster's component_attribution is system_prompt. It compiles the target
JSON prompt blueprint to a dspy.Signature (prompting/dspy_compiler.py), draws a
trainset of passing traces from the EvidenceCorpus, runs BootstrapFewShot (or
MIPROv2/BootstrapFewShotWithRandomSearch), and persists the compiled state +
few-shot demos back to the blueprint and to the KG via DSPyKGBridge (CONCEPT:ORCH-1.8:
EvolvedPromptNode, OptimizationTrajectoryNode). Sampling-profile evolution
(AHE-3.38) is wired in parallel via the VariantPool (see
Sampling Profiles).
Two edit engines, one apply side. The
ComponentTypeenum already namestool_description,tool_implementation,skill,middleware, … as attribution categories, andEvolveAgentalready edits all of them — but via a one-shot LLM heuristic (the "fallback to LLM heuristic edits if DSPy isn't applicable" path), not DSPy's metric-driven bootstrap. The apply side is fully built for them too:PhysicalDistillationEngine(AHE-3.9,knowledge_graph/distillation/physical_distiller.py) hasdistill_skill,distill_mcp_tool(docstrings + input schemas) anddistill_system_prompt, committing changes to files via GitOps (AHE-3.11). So the surface below is "swap the LLM-heuristic editor for DSPy optimization", not greenfield — the persistence, attribution, and apply spine already exist.
The evolvable surface¶
| Surface | Representation (file:symbol) | Optimizer fit | Metric source | Status |
|---|---|---|---|---|
| System prompts | SystemPromptNode; JSON blueprints; system_prompt target |
DSPy Signature (instruction prefix + demos) | graded EvalCorpus score (AHE-3.39) | Wired (registry target) |
| Sampling profiles | SamplingProfile (agent/sampling_profile.py) |
parametric mutation (not DSPy) | CapabilityIndex reward EMA |
Wired (AHE-3.38 evolve_profile) |
| Few-shot example sets | compiled demos; refine_demos |
DSPy bootstrap + drop-one ablation | held-out graded score | Wired (AHE-3.43) |
| MCP tool descriptions | tool_description target; distill_mcp_tool apply side |
DSPy Signature (description → selectability) | graded score / record_outcome |
Wired (AHE-3.41) |
| Agent skills (SOP / trigger) | skill target; SOP via mount_skill_unit (ORCH-1.28); distill_skill |
DSPy for SOP/trigger text | graded score | Wired (AHE-3.42) |
| KG fact extraction | extraction_optimizer.py over FACT_EXTRACTION_PROMPT |
DSPy module wrapping extraction | self-supervised dedup + canonical consistency | Wired (AHE-3.44) |
| Concept matching | policy_optimization.optimize_concept_matcher |
DSPy classifier (article × concept → relevant?) | classification accuracy vs ADDRESSES edges |
Wired (AHE-3.45) |
| Routing / role policy | policy_optimization.optimize_routing_policy |
DSPy policy (task → primitive) | realized ExecutionTrace success |
Wired (AHE-3.45) |
The unified optimization subsystem (now wired)¶
All six opportunities landed as one subsystem (harness/dspy_optimization.py), not
six bolt-ons — the same metric, registry, driver, and persistence spine reused across
targets.
flowchart TD
subgraph Core["harness/dspy_optimization.py"]
MET["make_optimization_metric<br/>AHE-3.39 graded, not exact-match"]
REG["OPTIMIZABLE_TARGETS registry<br/>AHE-3.40 system_prompt · tool_description · skill"]
DRV["run_dspy_optimization<br/>compile + refine_demos AHE-3.43"]
DISP["run_component_optimization<br/>one dispatch for all targets"]
end
EV["EvolveAgent._dspy_optimize_cluster<br/>dispatches by component_attribution"] --> REG
REG --> DRV
DRV --> MET
DRV --> KGB["DSPyKGBridge.ingest_evolved_component<br/>AHE-3.40 generalized persist"]
SS1["extraction_optimizer<br/>AHE-3.44 self-supervised"] --> DISP
SS2["policy_optimization<br/>AHE-3.45 concept-match + routing"] --> DISP
EVSURF["graph_orchestrate action=optimize_component<br/>+ REST twin"] --> DISP
DISP --> REG
- Real metric (AHE-3.39).
make_optimization_metricgradesprediction.responseagainstexample.responsevia the existingEvalRunnersemantic scorer (token-overlap fallback offline), optionally blending a reward EMA. This replaces the exact-match placeholder — the upgrade every text target inherits. - Target registry (AHE-3.40).
OPTIMIZABLE_TARGETSholds oneOptimizableTargethandler perComponentType(system_prompt, tool_description, skill), each declaring how to read the artifact's text and name it.EvolveAgent._dspy_optimize_clusteris generalized to dispatch by attribution through the registry (the hardcoded system-prompt-only path is gone) and now persists every target via the bridge — closing a prior Wire-First gap whereDSPyKGBridge.ingest_evolved_*had no caller. - Demo refinement (AHE-3.43).
refine_demosruns a drop-one ablation on the bootstrapped demos against a held-out slice, so a noisy demo can't survive into the blueprint. - Self-supervised optimizers.
extraction_optimizer(AHE-3.44) scores extractions by dedup rate + canonical consistency — no labels needed;policy_optimization(AHE-3.45) optimizes the concept matcher againstADDRESSES-edge labels and the routing policy against realizedExecutionTracesuccess. - One surface.
graph_orchestrate action=optimize_component(task=<system_prompt|tool_description|skill|extraction|concept_match|routing>,dependencies=optional JSON data) dispatches throughrun_component_optimization; the REST twin is automatic (graph_orchestrateis already inACTION_TOOL_ROUTES).
Sampling profiles (AHE-3.38) are evolved by parametric mutation, not DSPy — DSPy optimizes text, profiles are numbers — but share the same reward-EMA + tournament substrate. See Sampling Profiles.
The metric problem¶
Every optimization needs a metric, and the metric is where this gets real. The
original _dspy_optimize_cluster used an exact-match placeholder; AHE-3.39 replaced it
with a graded scorer. The system owns three signals an optimizer can be steered by, in
increasing strength:
flowchart LR
subgraph Signals["already in the system"]
A["CapabilityIndex.record_outcome<br/>reward EMA 0..1"]
B["PreferencePairs<br/>chosen ≻ rejected + margin"]
C["EvalCorpus / TraceDistiller<br/>pass-rate, failure clusters"]
end
A --> M["DSPy metric / VariantPool fitness"]
B --> M
C --> M
M --> PROMO["promote only if it beats the incumbent on the metric"]
CapabilityIndex.record_outcome— an EMA reward in [0,1] per entity/profile; the fitness signalVariantPoolandevolve_profilealready consume.PreferencePairs(AHE-3.17,preference_pairs.py) — consolidates eval-corpus regressions, distilled success/fail episodes, and human corrections into (chosen ≻ rejected) pairs with RAPPO margins / TI-DPO token weights — a ready-made reward model for any text optimizer.EvalCorpus/TraceDistiller— pass-rate on regression cases and failure-cluster attribution; the natural metric for "did this prompt lower the failure rate on cluster X?".
The synergy machinery¶
| Mechanism | File:symbol | Role for DSPy |
|---|---|---|
| Variant pool (AHE-3.2) | harness/variant_pool.py |
holds competing candidates; tournament + promote_winner is generic over any optimizer's output |
| Capability reward EMA (KG-2.6) | retrieval/capability_index.py::record_outcome |
the feedback channel from execution back to optimization |
| Preference pairs (AHE-3.17) | harness/preference_pairs.py |
reward-model substrate (DPO-family) for text targets |
| Replay buffer (AHE-3.0) | harness/replay_buffer.py |
decisive states (plateau-breakers) resurface for curriculum |
| Explore/exploit bandit (KG-2.82) | harness/{decentralized_memory,explore_exploit_router}.py |
per-agent UCB1/Thompson choice: reuse proven vs. try fresh candidate |
| Self-guided self-play (AHE-3.37) | harness/self_guided_play.py |
generates harder task variants (a curriculum DSPy can optimize against) |
| GEPA (ORCH-1.13) | rlm/gepa.py |
reflective Pareto prompt explorer — complements DSPy's local fine-tune |
| Trace distiller | harness/continuous_evaluation_engine.py |
turns raw traces into the EvidenceCorpus that seeds trainsets + attribution |
Where a DSPy pass hooks into a live loop¶
flowchart TD
TICK["daemon evolution tick / agent-utilities-evolution skill"] --> LC["LoopController.run_one_cycle"]
LC --> AEE["AgenticEvolutionEngine"]
AEE --> EVA["EvolveAgent.evolve"]
EVA -->|"per failure cluster, by component_attribution"| BR{"attribution?"}
BR -->|"system_prompt / tool_description / skill"| D1["registry target → run_dspy_optimization ✓"]
BR -->|"sampling"| D2["evolve_profile ✓"]
BR -->|"extraction / concept_match / routing"| D3["self-supervised optimizer ✓"]
D1 & D2 & D3 --> SUB["score vs incumbent → promote"]
SUB --> KGB["persist to KG + git"]
The cycle is driven by the consolidated KG daemon tick and the
agent-utilities-evolution skill, both routing through LoopController.run_one_cycle
and the AgenticEvolutionEngine/EvolveAgent. New optimization targets are added as
new component_attribution branches in EvolveAgent — each reusing the same
distiller → optimizer → variant-pool → KG-bridge spine.
Status — all delivered (AHE-3.39–3.46)¶
| # | Opportunity | Concept | Where |
|---|---|---|---|
| 1 | Real metric (replaces exact-match) | AHE-3.39 | dspy_optimization.make_optimization_metric |
| 2 | Few-shot demo-set refinement | AHE-3.43 | dspy_optimization.refine_demos |
| 3 | MCP tool descriptions | AHE-3.41 | tool_description registry target |
| 4 | KG extraction prompt | AHE-3.44 | extraction_optimizer.optimize_extraction_prompt |
| 5 | Skill SOP/trigger | AHE-3.42 | skill registry target (SOP already reaches the model via ORCH-1.28) |
| 6 | Concept-matching + routing | AHE-3.45 | policy_optimization.optimize_concept_matcher / optimize_routing_policy |
| 7 | Scheduled sweep + promotion gate | AHE-3.46 | run_optimization_sweep · should_promote · daemon tick |
Each was a registry target or a self-supervised optimizer reusing the one metric/driver/persist spine — not new infrastructure.
Closing the loop — the scheduled sweep (AHE-3.46)¶
The on-demand surface is now matched by a scheduled, propose-only daemon tick — the operational step that makes optimization continuous rather than manual.
flowchart LR
TICK["KG daemon tick<br/>KG_DSPY_OPTIMIZATION (opt-in)"] --> SWEEP["run_optimization_sweep"]
MCP["graph_orchestrate action=optimize_component task=all"] --> SWEEP
SWEEP --> GD["gather_optimization_data<br/>query_cypher: Documents / ADDRESSED_BY / ExecutionTrace"]
GD --> RUN["run_component_optimization per target"]
RUN --> TRAJ["persist OptimizationTrajectory (propose-only)"]
TRAJ --> GATE["should_promote(baseline, candidate, min_delta)"]
GATE -.future auto-apply gate.-> APPLY["distill → files"]
- Daemon tick —
_tick_optimize_components(knowledge_graph/core/engine_tasks.py), registered in the consolidated maintenance scheduler whenKG_DSPY_OPTIMIZATION=True(off by default — each pass is an LLM-gated compile), onKG_DSPY_OPTIMIZATION_INTERVAL(default 3600s). The scheduled twin of the MCP action; both callrun_optimization_sweep. - Sweep — runs the schedulable self-supervised targets (extraction / concept_match /
routing), gathering live data via
gather_optimization_data(engine.query_cypher, degrading tono_datarather than breaking the daemon). - Propose-only — like
KG_GOLDEN_AUTO_MERGE, the sweep records optimization trajectories but never auto-applies.should_promote(baseline, candidate, min_delta)is the gate a candidate must clear on the held-out metric before a future auto-apply step lets it supersede the live artifact.
What remains is genuinely operational tuning: a reachable LLM for the compile, and populated graph data for the gatherers to draw on.
Code paths¶
agent_utilities/harness/dspy_optimization.py— the spine:make_optimization_metric(AHE-3.39),OPTIMIZABLE_TARGETS/OptimizableTarget(AHE-3.40),refine_demos(AHE-3.43),run_dspy_optimization,run_component_optimization.agent_utilities/knowledge_graph/extraction/extraction_optimizer.py— AHE-3.44:extraction_quality(self-supervised metric),optimize_extraction_prompt.agent_utilities/harness/policy_optimization.py— AHE-3.45:classification_accuracy,routing_success_rate,optimize_concept_matcher,optimize_routing_policy.agent_utilities/mcp/tools/analysis_tools.py—graph_orchestrate action=optimize_component(the two-surface entry point;task=allruns the sweep).agent_utilities/harness/dspy_optimization.py— AHE-3.46:run_optimization_sweep,gather_optimization_data,should_promote,SCHEDULABLE_TARGETS.agent_utilities/knowledge_graph/core/engine_tasks.py—_tick_optimize_components(theKG_DSPY_OPTIMIZATIONmaintenance-scheduler tick).agent_utilities/prompting/dspy_compiler.py—compile_json_to_signature,AgentTaskModule.agent_utilities/harness/evolve_agent.py—EvolveAgent._dspy_optimize_cluster(registry-dispatched; was system-prompt-only).agent_utilities/knowledge_graph/dspy_kg_bridge.py—DSPyKGBridge.ingest_evolved_component(generalized;Evolved*Node+OptimizationTrajectoryNode).agent_utilities/knowledge_graph/distillation/physical_distiller.py— the apply side (AHE-3.9/3.11):distill_system_prompt,distill_mcp_tool,distill_skill,commit_distilled_changes(KG-optimized artifacts → files under GitOps).agent_utilities/harness/{variant_pool,preference_pairs,replay_buffer,decentralized_memory,explore_exploit_router,self_guided_play,continuous_evaluation_engine}.py— the substrate + metric sources.agent_utilities/retrieval/capability_index.py—record_outcomereward EMA.agent_utilities/rlm/gepa.py— reflective Pareto prompt optimizer.agent_utilities/knowledge_graph/research/loop_controller.py,agent_utilities/harness/agentic_evolution_engine.py— the live loop drivers.
Relationship to other concepts¶
- AHE-3.1 (mathematical prompt optimization) is the DSPy spine; AHE-3.2 (variant selection) the substrate; AHE-3.17 (preference corpus) the reward model.
- Sampling-profile evolution (AHE-3.38) is the same loop applied to numeric configs — see Sampling Profiles.
- The KG persistence (ORCH-1.8
DSPyKGBridge) makes every optimization a durable, queryableOptimizationTrajectory— optimization history is itself in the graph.