Recursive Language Models (RLM)¶

CONCEPT:ORCH-1.1 — Recursive Language Model Execution

Overview¶

The RLM subsystem provides a persistent Python REPL that enables agents to process arbitrarily long inputs through recursive, programmatic decomposition. Based on Zhang et al. (2025), the key insight is that long prompts should NOT be fed into the neural network directly — they should be treated as part of the environment the LLM symbolically and recursively interacts with.

RLM enables agents to: - Process inputs two orders of magnitude beyond model context windows - Perform unbounded semantic work through recursive sub-calls - Leverage OWL reasoning and KG bulk analysis within execution - Power AHE trace distillation for large-scale evolution analysis (CONCEPT:AHE-3.0)

Architecture¶

┌─────────────────────────────────────────┐
│  RLMEnvironment                         │
│                                         │
│  ┌─────────────────────────────┐        │
│  │  Persistent Globals Dict    │        │
│  │  - context, depth           │        │
│  │  - rlm_query(schema=)       │        │
│  │  - magma_view()             │        │
│  │  - graph_query()            │        │
│  │  - owl_query()      [NEW]   │        │
│  │  - kg_bulk_export() [NEW]   │        │
│  │  - sub_agent_call()         │        │
│  │  - FINAL_VAR()              │        │
│  │  - run_parallel_sub_calls() │        │
│  └─────────────────────────────┘        │
│                                         │
│  execute(code)                          │
│      │                                  │
│      ▼  SandboxRouter (ORCH-1.38)        │
│  ┌─────────────────────────────────────┐│
│  │ ast-analyze → cheapest capable tier ││
│  │ monty → wasm → docker → local        ││
│  │ (escalate on SandboxRejected)        ││
│  └─────────────────────────────────────┘│
└─────────────────────────────────────────┘

Code execution is no longer a hardcoded local/container switch: execute() routes each snippet through the tiered sandbox router (CONCEPT:ORCH-1.38) — see ORCH-1.38 — Tiered RLM Sandbox.

Invocation Triggers¶

RLM is automatically invoked when any of the following conditions are met. No global ENABLE_RLM=True is required — the system uses smart thresholds to route intelligently.

#	Trigger	Condition	Default Threshold
1	Global Override	`ENABLE_RLM=True`	Always
2	Long Horizon	`state.requires_long_horizon=True`	Always
3	Large Output	Tool/specialist output exceeds threshold	50,000 chars
4	AHE Distillation	Trace count exceeds threshold	500 traces
5	KG Bulk Analysis	KG query returns too many nodes	1,000 nodes

Use the unified RLMConfig.should_trigger() method for consistent routing:

from agent_utilities.rlm.config import RLMConfig

config = RLMConfig()
if config.should_trigger(output_size=len(data)):
    # Route to RLM
    ...

Whitepaper Alignment (Algorithm 1)¶

Our implementation aligns with the core algorithm from Zhang et al.:

Metadata-Only Root Prompting (config.metadata_only_root=True): The root LLM receives only constant-size metadata about the context:
context_length — character count
context_prefix — first 200 chars
context_type — inferred type (json, text, csv, xml)
Access instructions (slice, parse, split)

This prevents context window pollution and forces the model to rely on symbolic variable access.

Trimmed Stdout Feedback: Each turn's stdout is stored in a numbered variable (_stdout_N) and only metadata is fed back to the root LLM.
Recursive Sub-Calls: rlm_query() spawns a full sub-RLM at depth+1 with independent context.

Structured Outputs (Subagent Contracts)¶

CONCEPT:ORCH-1.12 — Structured Predict-RLM Runtime

A swarm of sub-agents only helps if the parent can cleanly aggregate what they return. When sub-agents reply with free-form prose, the parent has to re-read and re-classify dozens of unstructured blurbs and frequently loses the plot — it ends up hand-writing an answer instead of routing on the evidence. The fix is to force each sub-agent to return a schema-constrained, typed value that the parent reads directly. The booleans (or models, or lists) act as an external attention mask over the original context.

Passing a schema to sub-agents¶

Both fan-out helpers accept a schema. The sub-RLM's FINAL is validated and coerced against it, so the parent receives a real Python value — not a string to parse:

# Inside an RLM code block — chunk the context, then ask ONE boolean sub-agent
# per chunk whether it's relevant, in parallel. Filter on the typed result.
chunks = [context[i:i+5000] for i in range(0, len(context), 5000)]

flags = await run_parallel_sub_calls([
    {"prompt": "Does this chunk describe where Saltram lives?",
     "context": c, "schema": {"type": "boolean"}}
    for c in chunks
])
relevant = [c for c, keep in zip(chunks, flags) if keep]   # keep is a real bool

# A single typed sub-call:
is_relevant = await rlm_query("Relevant to his living situation?",
                              sub_context=chunk, schema=bool)

Supported schema forms¶

schema= is normalized by SchemaContract.from_spec() (rlm/schema.py) and accepts:

Form	Example	Validated via
Primitive type	`bool`, `int`, `str`, `float`	`pydantic.TypeAdapter`
Typing generic	`list[FindingModel]`, `dict[str, int]`	`pydantic.TypeAdapter`
Pydantic model	`class Finding(BaseModel): ...`	`model_validate`
Raw JSON Schema	`{"type": "boolean"}`	`jsonschema` (shallow fallback if absent)

Validate-on-FINAL, retry-don't-restart¶

When a contract is set, run_full_rlm validates the value the sub-agent passes to FINAL_VAR. On a mismatch the sub-agent is shown the required JSON Schema plus the specific validation errors (path: message) and asked to fix the value and call FINAL_VAR again — the REPL state is preserved, so it never restarts from scratch. The JSON Schema is also injected into the sub-REPL prompt at startup so the model knows the exact shape before it writes any code.

Root-level contracts¶

The same machinery enforces a contract on the root agent. Use a PredictRLM signature (InputField/OutputField) for a multi-field contract, or pass a single typed spec to the entry point:

from agent_utilities.rlm.runner import run_rlm

out = await run_rlm("Is this PR a security risk?", input_text=diff, output_type=bool)
# out["result"] is a real bool

Key Components¶

`RLMEnvironment` (`rlm/repl.py`)¶

The core execution environment. Initializes with a context variable and a set of approved helper functions exposed to the LLM-generated code.

env = RLMEnvironment(
    context={"data": large_dataset},
    depth=0,
    config=RLMConfig(max_depth=3, use_container=False),
    graph_deps=graph_deps,
)
result = await env.run_full_rlm("Analyze the dataset and find anomalies")

`RLMConfig` (`rlm/config.py`)¶

Configuration for RLM behavior:

Parameter	Default	Description
`max_depth`	`3`	Maximum recursion depth
`sandbox`	`"auto"`	Sandbox selection (ORCH-1.38): `auto` routes per-snippet (monty→wasm→docker→local); or pin `local`/`monty`/`wasm`/`docker`. Env `RLM_SANDBOX`.
`use_monty`	`False`	Legacy override: force the monty sandbox (maps onto `sandbox`)
`use_wasm`	`False`	Legacy override: force the wasm sandbox
`use_container`	`False`	Legacy override: force the Docker sandbox
`async_enabled`	`True`	Enable parallel sub-call execution
`sub_llm_model_large`	Provider default	Model for depth-0 reasoning
`sub_llm_model_small`	Provider default	Model for deeper recursion levels
`trajectory_storage`	`"process_flow"`	Where to store reasoning traces
`metadata_only_root`	`True`	Send only metadata to root LLM
`trigger_on_large_output`	`True`	Auto-trigger on large tool outputs
`trigger_on_ahe_distillation`	`True`	Auto-trigger for AHE trace analysis
`trigger_on_kg_bulk_analysis`	`True`	Auto-trigger for KG bulk queries
`ahe_trace_threshold`	`500`	Trace count for AHE auto-trigger
`kg_bulk_threshold`	`1000`	Node count for KG auto-trigger

Available Helpers¶

Functions available inside the RLM execution environment:

Helper	Signature	Purpose
`rlm_query`	`await rlm_query(prompt, context, schema=None)`	Spawn a recursive sub-RLM at depth+1; pass `schema=` to get a validated, typed return
`magma_view`	`await magma_view(query, views)`	MAGMA orthogonal memory views
`graph_query`	`await graph_query(cypher, params)`	Run Cypher against the knowledge graph
`owl_query`	`await owl_query(sparql)`	Run SPARQL against the OWL reasoner
`kg_bulk_export`	`await kg_bulk_export(node_type, limit)`	Export KG nodes as JSON for bulk analysis
`sub_agent_call`	`await sub_agent_call(prompt, agent_id, data)`	Dispatch to specialist agent
`FINAL_VAR`	`FINAL_VAR("name", value)`	Output the final result
`run_parallel_sub_calls`	`await run_parallel_sub_calls(calls)`	Run multiple sub-calls in parallel; each call dict may carry a per-call `"schema"`

`run_full_rlm()` Loop¶

The main agent loop:

LLM generates a response (potentially containing ```python blocks)
Code blocks are extracted and executed via execute()
stdout is captured and fed back to the LLM
If FINAL_VAR was called, the result is returned
Otherwise, the loop continues (up to max_turns=5)

AHE Integration (CONCEPT:AHE-3.0)¶

RLM is deeply integrated with the Agentic Harness Engineering evolution loop:

TraceDistiller × RLM¶

When the AHE TraceDistiller encounters more than ahe_trace_threshold (default: 500) failure traces in an evolution round, it automatically delegates clustering to an RLM sub-agent. The RLM can programmatically:

Loop over all failure entries
Apply semantic similarity grouping
Cross-reference with KG data via graph_query() and owl_query()
Produce structured FailureCluster objects

Falls back to keyword-based clustering when RLM is disabled or trace count is below threshold.

EvolveAgent × RLM¶

When the serialized EvidenceCorpus exceeds the context threshold, the EvolveAgent._deep_analyze_evidence() method uses RLM to:

Programmatically analyze all evidence entries
Cross-reference failure patterns with KG provenance chains
Produce a prioritized list of ComponentEdit proposals

KG/OWL Integration¶

`owl_query(sparql)`¶

Executes SPARQL queries against the OWL reasoner backend from within the RLM REPL. Enables transitive reasoning without loading raw triples into the context window:

# Inside RLM code block
results = await owl_query("""
    PREFIX au: <http://agent-utilities.dev/ontology#>
    SELECT ?manifest ?edit WHERE {
        ?manifest a au:ChangeManifest .
        ?manifest au:hasEditFor ?edit .
    }
""")
for r in results:
    print(f"Manifest {r['manifest']} -> Edit {r['edit']}")

`kg_bulk_export(node_type, limit)`¶

Exports KG nodes as JSON dicts for programmatic analysis. The LLM can aggregate, filter, and cross-reference nodes without context pollution:

# Inside RLM code block
memories = await kg_bulk_export("memory", limit=200)
failures = [m for m in memories if "error" in m.get("name", "").lower()]
FINAL_VAR("failure_memories", json.dumps(failures))

Security Considerations¶

CWE-94 (Code Injection): The RLM REPL intentionally uses exec() to execute LLM-generated code. This is by design — the execution namespace is restricted to approved helpers only.

Mitigations¶

Restricted globals: Only approved functions and modules (json, asyncio, nx) are exposed
Container mode: Set use_container=True to run code in an isolated Docker container
Recursion limits: max_depth prevents infinite recursion
Turn limits: Maximum 5 turns per RLM invocation
Trajectory storage: All executions are logged for audit

`rlm_large_output_hook` (`rlm/hook.py`)¶

A pre-call lifecycle hook (async def rlm_large_output_hook(input: HookInput)) that auto-routes oversized tool/specialist outputs into an RLM pass before they reach the root LLM context.

`RecursiveReasonerSpecialist` (`rlm/specialist.py`)¶

Integration with the graph executor as a specialist node (wrapping an RLMEnvironment via run_with_context()). The recursive reasoner can be routed to by the graph router for tasks requiring iterative code execution. A module-level recursive_reasoner_tool(...) coroutine exposes the same capability as a callable tool.

Example: Multi-Step Data Analysis¶

from agent_utilities.rlm.repl import RLMEnvironment
from agent_utilities.rlm.config import RLMConfig

config = RLMConfig(max_depth=2, use_container=True)
env = RLMEnvironment(
    context={"csv_data": "..."},
    config=config,
)

# The LLM will iteratively write code to analyze the data
result = await env.run_full_rlm(
    "Parse the CSV data, identify outliers using IQR method, "
    "and produce a summary report with FINAL_VAR."
)