Skip to content

Generalizing GEPA (CONCEPT:ORCH-1.30)

Overview

Makes GEPA-optimized skills transfer off the optimization split — the AppWorld RLM-GEPA result (the +7.2 pp SGC that held on held-out test). Adds the GEPA paper's held-out feedback/Pareto split, predict-rlm's AgentSpec anti-overfit grounding, and held-out candidate selection. Extends ORCH-1.13.

How it works

  • Held-out split (split_dataset) — D_train → D_feedback (propose on) + D_pareto (held-out select on). With dev_fraction > 0, GEPAOptimizer.optimize proposes/evaluates on the feedback set and selects the final candidate by held-out score (select_best_on_heldout over _score_candidate_on), so a candidate that merely memorized the minibatch does not win.
  • AgentSpec groundinguse_cases + runtime_grounding + scoring_rule + counterfactual_axis are prepended (as_prompt()) to the reflective-mutation prompt, steering the proposer toward a general standard-operating-procedure rather than rules that overfit the examples.
  • Patch-merge selectionselect_best_on_heldout picks the winning instruction graft on the held-out set (ties → earlier/simpler generation).

Key files / API

Piece Location
Generalization core rlm/gepa.py (AgentSpec, split_dataset, select_best_on_heldout, GEPAOptimizer.optimize(dev_fraction=...), _score_candidate_on)

Wiring (≤3 hops)

graph_orchestrate(action="rlm_optimize")optimize_rlm_skillGEPAOptimizer.optimize (≤3 hops).

Research provenance

GEPA paper (Agrawal et al., ICLR 2026 — D_feedback/D_pareto split, Algorithm 1); predict-rlm src/rlm_gepa/schema.py (AgentSpec) — verified.