Enabling Autonomous Evolution¶
The platform's self-evolution arcs are fully wired but off by default: the
daemon ticks for the golden loop (KG-2.7) and failure-driven evolution
(AHE-3.18) are registered in the engine's maintenance scheduler, yet their
flags default to False in code. That is deliberate — turning a fleet
autonomous is a deployment decision, made in your .env, never a library
default.
This guide describes the safety chain you get when you turn the loops on, and exactly which flags do what.
The safety chain¶
Every autonomous change passes a chain of independent stages, each of which can stop it:
flowchart LR
A["Propose-only loops<br/>(golden loop, failure ingest,<br/>anomaly consumer, fleet events)"] --> B["Governed validation<br/>PromotionGovernanceValidator<br/>(AHE-3.20)"]
B --> C["Regression gate<br/>recorded RegressionGateResult<br/>(AHE-3.18)"]
C --> D["Merge<br/>human by default;<br/>auto only with KG_GOLDEN_AUTO_MERGE"]
D --> E["Promotion policy gate<br/>ActionPolicy merge_promotion<br/>(OS-5.24) — deny blocks the flip"]
E --> F["Publication<br/>same merge_promotion approval<br/>(AHE-3.21, approval by default)"]
F --> G["Reviewable branch<br/>change synthesis + RLM sandbox<br/>(AHE-3.21) — never pushed"]
G --> H["Human merges<br/>normal release flow"]
- Propose-only loops. The golden loop, the failure-evolution sweep, the
PerformanceAnomaly consumer (
AHE-3.19) and fleet-event triage (OS-5.15) only ever write proposals:failure_gapConcept topics, spec drafts under.specify/, andTeamSpec/AgentSpecproposal nodes. No code executes, nothing is promoted. - Governed validation.
GovernedAutoMergernow constructs the productionPromotionGovernanceValidatorby default (knowledge_graph/research/promotion_governance.py). A promotion candidate must clear all four rules: MergePolicy quality thresholds, the bundled SHACL governance shapes (shapes/governance.shapes.ttl), the recorded regression-gate verdict, and active constitutionforbidrules in the KG. - Regression gate. Failure remediations carry a live regression check
bound to the failures they address; every verdict is also persisted as a
RegressionGateResultnode, and a recordedholdblocks promotion until a later gate run records apass. - Human merge. With
KG_GOLDEN_AUTO_MERGEunset (the default), even a proposal that passes every gate stays proposal-only; promotion is a human act. Flipping it on delegates only the final step — to the governed, audited path above, with every decision logged through thegolden_loop.auto_mergeaudit trail. - Operational promotion gate (
OS-5.24adoption). Even with auto-merge on, the merger's own promotion decision consults the operationalActionPolicyunder the reservedmerge_promotionkind before the lifecycle flip (research/auto_merge.py). Adeny— aforbiddentier, a rate-limit breach, or a policy-engine failure (the gate fails closed) — blocks the promotion and is recorded on the evaluation and the audit trail. The shippedapproval_requiredtier queues the sameActionApprovalthe publication step consumes (deduped per kind+target), so the KG-internal lifecycle flip proceeds while the real-world change stays human-gated;auto/auto_notifytiers proceed (with the policy's own notification). - Publication as a reviewable branch (
AHE-3.21). Promotion used to end at a KG lifecycle flip. The evolution→branch bridge closes that gap: a merged proposal is materialized into a concrete change set and published as a local git branch — gated by the operationalActionPolicy's reservedmerge_promotionaction kind (OS-5.24), which shipsapproval_required. Nothing is ever pushed or merged tomain; a human reviews the branch and takes it through the normal release flow.
The evolution→branch bridge (AHE-3.21)¶
Two modules under knowledge_graph/research/ implement the bridge:
change_synthesis.py— materializes a promoted proposal into aChangeSet, with no LLM calls (generation belongs to the golden loop's synthesize/distill stages):- a proposal that embeds explicit file artifacts (
files/files_json=[{"path", "content"}, ...]) becomes akind="code"change set, validated through the tiered RLM sandbox (ORCH-1.38): per-file syntax compile + best-effort import. Proposal-named tests (tests/tests_json) are run later in the publisher's worktree, where the full repository context exists — a snippet sandbox cannot run repo-relative pytest (an honest v1 limit). Sandbox-invalid change sets are never published. - a prose-only proposal (most SpecDrafts/TeamSpecs) becomes a
kind="sdd_plan"change set: an SDD skeleton under.specify/specs/<topic>/(spec.md+tasks.md). For prose, that skeleton is the reviewable artifact. change_publisher.py— the publication seam.ChangePublisheris a protocol (publish(change_set, metadata) -> PublishResult); the defaultLocalBranchPublisheruses plaingit: it adds a fresh worktree off the target repo's default branch underEVOLUTION_WORKTREE_ROOT(defaultdata_dir()/evolution_worktrees— never a checkout's working tree), applies the change set, runs proposal-named tests + the injected regression gate (make_regression_check,AHE-3.18), and commits citing the proposal + concept ids. The result (branch, sha, gate verdict) is recorded as aProposalPublicationnode linkedPUBLISHED_ASfrom the proposal, stamped onto the proposal node, and mirrored intoActionExecution+ thegolden_loop.publish_proposalaudit trail.
The human workflow (approve → publish → merge)¶
- A proposal merges through the governed chain (or you decide to publish a
promoted one). With the shipped policy the merger's own promotion consult
has already queued the
merge_promotionapproval; the bridge consults the ActionPolicy again and dedups to that same approval — visible inGET /api/fleet/approvals. - A human grants it:
POST /api/fleet/approvals/grantwith the approval'sjob_id. (Grantedmerge_promotionapprovals are deliberately not drained by the fleet reconciler — they belong to the bridge.) - The human (or any agent surface) triggers the one-shot publication:
graph_orchestrate(action="publish_proposal", task="<proposal node id>")over MCP, or RESTPOST /api/graph/orchestrate/publish-proposalwith{"proposal_id": "..."}. The granted approval is consumed; the change set is synthesized, sandbox-validated, and published as a local branch. - Review the branch (
PublishResult.worktree_path/branchon the proposal node), then merge + release through the normal flow (workspace-validator phasedauto_push). The bridge never pushes.
A deployment that wants zero manual steps can relax the tier with a KG
override — governance_rule {scope: 'action_policy', kind: 'merge_promotion',
tier: 'auto'} — at which point a merged proposal publishes its branch
immediately (still local, still human-merged).
Wiring an MCP-backed publisher¶
agent-utilities takes no hard dependency on repository-manager. A deployment
that wants publication to flow through its repo tooling (e.g. the
rm_git/rm_worktree MCP tools, which can also open a hosted PR) registers
its own publisher at startup — same seam pattern as set_fleet_actuator:
from agent_utilities.knowledge_graph.research.change_publisher import (
PublishResult, set_change_publisher,
)
class RepositoryManagerPublisher:
name = "repository_manager"
def __init__(self, mcp_call): # e.g. a bound multiplexer client
self._call = mcp_call
def publish(self, change_set, metadata=None):
# rm_worktree: create a worktree + branch; rm_git: apply/commit (and
# optionally push + open a PR — that policy lives in the deployment,
# not in agent-utilities).
...
return PublishResult(ok=True, branch=..., commit_sha=..., repo_path=...)
set_change_publisher(RepositoryManagerPublisher(mcp_call))
Flags¶
All flags are typed AgentConfig fields (see
Configuration); set them in the deployment .env (see the
commented blocks in .env.example and docker/mcp.compose.yml).
| Flag | Default | Effect |
|---|---|---|
KG_LOOP |
false |
Hourly propose-only self-evolution cycle (intake → acquire → resolve → distill/synthesize proposals). |
KG_LOOP_INTERVAL / KG_LOOP_TOPICS |
3600 / 5 |
Tick cadence and per-cycle topic budget. |
KG_FAILURE_EVOLUTION |
false |
Pull Langfuse failures → failure_gap topics → regression-gated remediation cycle. |
KG_FAILURE_EVOLUTION_INTERVAL / KG_FAILURE_EVOLUTION_WINDOW |
3600 / 86400 |
Tick cadence and telemetry look-back. |
KG_ANOMALY_CONSUMER |
true |
Consume unconsumed PerformanceAnomaly nodes into failure_gap topics (cheap, LLM-free, propose-only — on by default). |
KG_GOLDEN_AUTO_MERGE |
false |
Allow governed proposal→active promotion. Keep false until you trust the proposal stream. |
KG_GOLDEN_MERGE_THRESHOLD |
0.85 |
Minimum proposal quality score for auto-merge eligibility. |
EVOLUTION_WORKTREE_ROOT |
data_dir()/evolution_worktrees |
Where the AHE-3.21 bridge creates fresh git worktrees when publishing a promoted proposal as a local branch. |
FLEET_EVENTS_TOKEN |
unset | Shared secret for the POST /api/fleet/events monitoring-webhook ingress (OS-5.15). |
FLEET_RECONCILER |
false |
Desired-state fleet reconciler tick — registry vs observed, converged through the OS-5.24 ActionPolicy gate (see Fleet Autonomy). |
ACTION_POLICY_PATH |
shipped default | Operational action policy (tiers / rate limits / maintenance windows / blast-radius caps); the shipped default keeps every mutating action approval-required (OS-5.24). |
Recommended rollout¶
- Enable
KG_LOOP=trueandKG_FAILURE_EVOLUTION=trueand watch the proposal stream (EvolutionCyclenodes,failure_gapConcepts, audit log) for a few cycles. Nothing merges. - Point Alertmanager / Uptime Kuma at
POST /api/fleet/events(setFLEET_EVENTS_TOKEN) so production incidents also feed the loop. Critical events now dispatch theOS-5.26remediation playbooks — with the shipped action policy every mutating step lands inGET /api/fleet/approvalsinstead of executing. - Only once the proposals are consistently sane, consider
KG_GOLDEN_AUTO_MERGE=true. Every promotion remains gated by theAHE-3.20validator + regression gate and is fully audited; rejected proposals stay proposal-only for human review. - With auto-merge on, merged proposals additionally queue a
merge_promotionapproval (AHE-3.21). Work the approve → publish → merge loop above; only relax the tier toautoonce you trust the published branches — even then nothing is pushed without a human.
Closing the loop: generate, verify, ratchet (AHE-3.22 / AHE-3.23 / AHE-3.24)¶
Through AHE-3.21 the loop could branch a code change, but nothing on the live
path ever generated the diff — every real proposal fell back to the prose SDD
skeleton and a human wrote the code. These three concepts close that gap; together
they turn "branch a change" into "branch a verified, capability-ratcheted
change". All three sit inside the existing governed_publish flow, so the OS-5.24
merge_promotion ActionPolicy gate (default: human approval queue) still fronts
everything — nothing here can auto-merge or push.
-
AHE-3.22 — autonomous code-synthesis (
research/code_synthesis.py). Before synthesis, for a proposal that names a resolvable, existing, repo-relative.pytarget and carries no embedded files, a single-file generator reads that file and emits a{path, content}edit, fed into the unchangedsynthesize_change_set → validate_in_sandbox → publisherpipeline via the newextra_filesseam. Safety envelope: single attributed.pyfile only; un-attributed proposals fall through to the prose skeleton exactly as before; the generated file is sandbox-validated (a broken diff is never branched); the default generator self-degrades to "no edit" when no model is reachable. The LLM call lives incode_synthesis.py—change_synthesis.pystays generation-free. -
AHE-3.24 — capability ratchet (
research/capability_ratchet.py). After a branch is published, a standing capability suite is run in that worktree, producing a per-capability score vector compared against a persistedCapabilityScoreVectorbaseline node. Every tracked capability must stay at-or-above baseline (monotone ratchet); a passing run advances the baseline, the first run bootstraps it. A worktree with no probes present is not measured and never blocks. The recordedCapabilityRatchetResultis consulted by theAHE-3.20promotion-governance gate as an additional predicate. -
AHE-3.23 — verified apply→verify→rollback. The keep/abandon decision is the authoritative recommendation from the existing
ManifestVerifier(confirm/partial_revert/full_revert, derived from the measured benchmark delta), fed the ratchet's before/after scores. On a*_revertrecommendation — or any per-capability regression —governed_publishabandons the branch (git worktree remove+branch -D); since the branch was never pushed, the publication is fully undone. The probe set (DEFAULT_CAPABILITY_TARGETS) is tunable.