Skip to content

Self-Improving Reasoning Substrate

The unified loop that ties the AGI→ASI work (arXiv:2606.12683 gap analysis) into one native system: route → reason → measure → learn, wrapped in corrigible, cost-bounded, capability-ratcheted governance, and — at scale — a market of specialist reasoner-agents whose winning traces distil back into training data.

This document is the architecture view of the concepts implemented from the "From AGI to ASI" gap analysis (reports/agi-to-asi-gap-analysis-2026-06-13.md). The insight: these are not independent features — they compose into a single feedback loop where the system improves how it thinks, measures whether it improved, and bounds the cost and the risk of doing so.

The core loop

flowchart LR
    task([Task]) --> ROUTE
    subgraph envelope["governance envelope — SAFE-1.5 corrigible · OS-5.33/5.35 cost-bounded"]
      ROUTE["ROUTE\nKG-2.68 ReasonerRouter\n(CapabilityIndex reward-EMA designate)"]
      REASON["REASON\nparadigms: KG-2.69 induction ·\nKG-2.67 model-based · deductive · generative"]
      MEASURE["MEASURE\nSAFE-1.1 frontier scorers +\nAHE-3.24 capability ratchet + reliability"]
      LEARN["LEARN\nrecord_outcome → paradigm\nreward EMA self-adjusts"]
      ROUTE --> REASON --> MEASURE --> LEARN
      LEARN -- "routing reward" --> ROUTE
    end
    LEARN --> LEDGER["AHE-3.26 / SAFE-1.3\nRSI velocity ledger\n(improving? / research-gets-harder?)"]
    MEASURE -. "winning traces" .-> DISTIL["OS-5.34 search-distillation\n→ training corpus\n(SAFE-1.4 collapse-guarded)"]
    REASON -. "at scale" .-> MARKET["ORCH-1.46 market →\nORCH-1.47 emergent specialists →\nORCH-1.48 hierarchical coordination"]

The loop is closed: a paradigm's measured score becomes its routing reward, so the router learns which way of thinking works for which task class — the paper's recursive-improvement pathway applied to reasoning itself.

Why this is robust, not a switch

A naive paradigm selector is an if task_is_logic: use_owl. That cannot improve and encodes a guess. Instead the router reuses the proven reward-aware retrieval substrate (CapabilityIndex.designate, which already ranks candidates by similarity blended with a learned reward EMA, and record_outcome, which updates it). Registering each paradigm as a capability entity makes paradigm selection a learned designation — identical machinery to how the platform already routes tools/agents, so it inherits the same calibration, auditability, and tag-gating. The seam is at the routing layer, so the structurally different engines (a generative LLM, an inductive synthesizer, a model-based planner, a deductive chainer) never have to be forced under one implementation.

C4 — Level 3: Reasoning Substrate components

C4Component
    title Component diagram — Self-Improving Reasoning Substrate

    Container_Boundary(kg, "Knowledge Graph (KG-2)") {
        Component(facade, "KnowledgeGraph.reason()", "facade.py", "First-class entry: dispatch a ReasoningTask")
        Component(router, "ReasonerRouter", "core/reasoner.py — KG-2.68", "Routes by tags + learned reward EMA; feeds score back")
        Component(capidx, "CapabilityIndex", "retrieval/capability_index.py", "designate() reward-blended ranking + record_outcome() EMA")
        Component(induct, "ProgramSynthesisReasoner", "harness/program_synthesis.py — KG-2.69", "Inductive: shortest program (MDL/Occam prior)")
        Component(world, "WorldModelReasoner", "core/world_model.py — KG-2.67", "Model-based: roll a policy over predicted transitions")
        Component(deduce, "DeductiveReasoner", "core/reasoner.py", "Symbolic: forward-chain to a fixpoint")
        Component(gen, "GenerativeReasoner", "core/reasoner.py", "Generative: an injected completion fn")
    }
    Container_Boundary(ahe, "Agentic Harness (AHE-3) + Safety (SAFE-1)") {
        Component(frontier, "Frontier scorers", "harness/frontier_scorers.py — SAFE-1.1", "Non-saturating: compression, Elo, saturation detector")
        Component(ratchet, "CapabilityRatchet", "research/capability_ratchet.py — AHE-3.24", "Monotone capability gate on a baseline")
        Component(ledger, "ImprovementLedger", "research/improvement_ledger.py — AHE-3.26/SAFE-1.3", "RSI velocity: improving / stalling")
        Component(corr, "Corrigibility", "core/corrigibility.py — SAFE-1.5", "Yield-without-resisting + irreversibility aversion")
    }

    Rel(facade, router, "delegates")
    Rel(router, capidx, "designate / record_outcome")
    Rel(router, induct, "runs")
    Rel(router, world, "runs")
    Rel(router, deduce, "runs")
    Rel(router, gen, "runs")
    Rel(router, frontier, "scores results")
    Rel(ratchet, ledger, "feeds capability deltas")
    Rel(corr, router, "wraps autonomous loops")

C4 — Dynamic: one routed reasoning step

sequenceDiagram
    participant C as Caller
    participant F as KnowledgeGraph.reason()
    participant R as ReasonerRouter
    participant I as CapabilityIndex
    participant P as Paradigm (e.g. ProgramSynthesis)
    C->>F: ReasoningTask(goal, tags, payload)
    F->>R: reason(task)
    R->>I: designate(emb, required_caps=tags, reward_weight)
    I-->>R: top paradigm (tag-gated, reward-ranked)
    R->>P: reason(task)
    P-->>R: ReasoningResult(answer, score)
    R->>I: record_outcome(paradigm, reward=score)
    Note over I: paradigm reward EMA self-adjusts
    R-->>F: result (trace.routed_to)
    F-->>C: result

Concept → role map

Concept Module Role in the loop
KG-2.68 core/reasoner.py ROUTE — learning paradigm router (keystone)
KG-2.69 harness/program_synthesis.py REASON — inductive paradigm + MDL/Occam selection prior
KG-2.67 core/world_model.py REASON — model-based planning paradigm
SAFE-1.1 harness/frontier_scorers.py MEASURE — non-saturating progress signals
AHE-3.24 research/capability_ratchet.py MEASURE — monotone capability gate (+ AHE-3.22/3.23 generate/verify)
AHE-3.26 / SAFE-1.3 research/improvement_ledger.py LEARN — RSI velocity / research-gets-harder signal
OS-5.33 / OS-5.35 orchestration/cost_governor.py ENVELOPE — cost/throughput bound on compute & scaling
SAFE-1.5 core/corrigibility.py ENVELOPE — corrigible, irreversibility-averse autonomy
OS-5.34 / SAFE-1.4 (search-distillation) DISTIL — winning traces → guarded training corpus
ORCH-1.46/47/48 (multi-agent) SCALE — market → emergent specialists → hierarchical coordination

Engineering around the weaknesses

The paper's recurring thesis — for every friction, you need a countermeasure or a way to measure whether it binds — is the design rule here:

  • Paradigms don't unify → unify at the routing layer, not the implementation.
  • Can't tell capability from metric saturation → non-saturating frontier scorers (SAFE-1.1).
  • Self-improvement may degenerate / saturate → capability ratchet (AHE-3.24) + RSI velocity ledger (AHE-3.26) that flags a non-positive derivative.
  • Autonomy is brittle as the human leaves the loop → corrigibility + irreversibility aversion (SAFE-1.5) as objective-level, not just permission-level, safety.
  • The data wall → distil the loop's own verified winning traces back into training data (OS-5.34), guarded against model collapse (SAFE-1.4).

Status

Implemented + merged (local main): KG-2.67, KG-2.68, KG-2.69, SAFE-1.1, SAFE-1.3, SAFE-1.5, AHE-3.22, AHE-3.23, AHE-3.24, AHE-3.26, OS-5.35. The DISTIL and SCALE stages (OS-5.34/AHE-3.25, ORCH-1.46/47/48, SAFE-1.4) extend the same loop and are tracked in the gap-analysis report.