Skip to content

Task-Aware Sampling Profiles (CONCEPT:ORCH-1.58 / AHE-3.38 / KG-2.94–2.96)

The router already decides which model answers a question. Sampling profiles decide how to sample from it — temperature, top_p, top_k, min_p, repetition_penalty, max_tokens, penalties — per question, chosen by task class, evolved from outcomes, and mapped into the one OWL/RDF ontology.

Why

Model selection was dynamic (AdaptiveAgentRouter, ModelRegistry role routing, RLM depth tiers) but the inference parameters were static: a single ModelSettings was built once at agent construction (agent/factory.py) from core/config.py defaults (temperature 0.7, top_p 1.0) and never overridden per call. A deterministic extraction task and an open-ended brainstorm were sampled identically. vLLM already accepts per-request knobs via the OpenAI-compatible extra_body (proven in knowledge_graph/extraction/fact_extractor.py), but nothing threaded them dynamically, and neither models nor parameters existed in the ontology. This closes all three gaps without disturbing the static default — the old behaviour becomes the floor a profile merges over.

The shape: one value object, four flows

A SamplingProfile (agent/sampling_profile.py) bundles the per-call knobs. Every knob is optional; None means inherit the agent's base setting. It moves through the system exactly the way model selection already does:

flowchart LR
    Q["Question / task text"] --> CL["classify_task / router features"]
    CL --> SEL["ModelRegistry.pick_profile_for_task/role"]
    SEL --> P["SamplingProfile"]
    P -->|"to_model_settings over static base"| MS["per-call ModelSettings"]
    MS --> RUN["Agent.run model_settings=..."]
    RUN --> VLLM[("vLLM / OpenAI-compatible<br/>temperature·top_p·extra_body")]
    RUN --> OUT["outcome"]
    OUT -->|"reward EMA"| EV["AHE-3.38 evolve_profile"]
    EV -->|"promote winner"| REG[("registry.task_class_profiles")]
    REG -.->|"next call"| SEL
    P -.->|"projected"| OWL[("OWL: InferenceProfile / Model")]

Layer A — per-call threading (CONCEPT:ORCH-1.58)

SamplingProfile.to_model_settings(base) builds the per-call override from the agent's static base settings — because pydantic-ai replaces (does not deep-merge) per-call model_settings, an unset knob must inherit the base value. The vLLM-only knobs (top_k/min_p/repetition_penalty) are dict-merged into extra_body, so pre-existing keys (e.g. RLM's chat_template_kwargs) survive.

attach_profile_resolver(agent, base) wraps the agent's run/run_sync/run_stream in factory.create_agent, so every call resolves a profile from the prompt unless the caller passes an explicit model_settings. DEFAULT_PROFILE (all-None) reproduces today's behaviour exactly, guaranteeing zero change when no specific profile is resolved.

sequenceDiagram
    participant C as Caller
    participant W as run-wrapper (attach_profile_resolver)
    participant R as resolve_sampling_profile
    participant Reg as ModelRegistry
    participant A as pydantic-ai Agent
    C->>W: agent.run("extract fields from invoice")
    alt caller passed model_settings
        W->>A: run(... model_settings=caller's)
    else resolve per task
        W->>R: classify_task(prompt)
        R->>Reg: pick_profile_for_task("extraction")
        Reg-->>R: SamplingProfile(temp 0.0, top_k 20, min_p 0.0)
        R-->>W: profile
        W->>A: run(... model_settings=profile.to_model_settings(base))
    end
    A->>A: extra_body merged, base knobs preserved

RLM (rlm/repl.py) threads a depth-tiered profile explicitly: the root is the strong proposer (rlm-proposer → reasoning profile), recursive sub-calls are deterministic executors (rlm-executor → code profile).

Layer B — task-aware selection (CONCEPT:ORCH-1.58)

ModelRegistry (models/model_registry.py) carries curated task_class_profiles (module constants — no env flags) and pick_profile_for_task / pick_profile_for_role mirroring the existing pick_for_task / pick_for_role (ORCH-1.27) tier-fallback semantics, so selecting a model and selecting a profile share one task-class key. The AdaptiveAgentRouter populates RoutingDecision.sampling_profile from the TF-IDF task features it already computes (no new feature extraction).

Hand-tuned seeds (evolution refines them in place):

Task class temperature top_p extra knobs Intent
code 0.1 0.9 top_k 20, min_p 0 one right answer
extraction 0.0 0.8 top_k 20, min_p 0 deterministic
judge 0.0 1.0 stable verdicts
reasoning 0.6 0.95 balanced
plan 0.4 0.9 structured
generate 0.7 1.0 natural prose
brainstorm 1.0 1.0 presence_penalty 0.3 spread
default (inherit) (inherit) zero-change fallback

Layer C — evolution (CONCEPT:AHE-3.38)

The VariantPool (AHE-3.2) docstring always named "mutating configuration parameters (temperature, …)" as a parametric dimension — now it is a live loop. evolve_profile mutates the incumbent (Gaussian jitter on floats, ±step on ints, clamped to bounds and SHACL-gated via sampling_profile_violations), scores each candidate by the capability-reward EMA (CapabilityIndex.record_outcome), and tournament-promotes the winner into registry.task_class_profiles — which Layer B reads on the next route. No new RL machinery: it reuses the existing reward EMA and tournament.

flowchart TD
    INC["incumbent profile (task_class)"] --> MUT["mutate_profile<br/>bounded + SHACL-gated"]
    MUT --> CANDS["candidate profiles"]
    CANDS --> EVAL["evaluator → reward 0..1"]
    INC --> EVAL
    EVAL --> EMA["CapabilityIndex.record_outcome<br/>EMA per profile id"]
    EMA --> TOUR["tournament: highest EMA"]
    TOUR --> PROMO["registry.set_task_profile<br/>source=learned"]
    PROMO -->|"process-global registry"| LIVE["router/factory pick it next run"]

Layer D — ontology mapping (CONCEPT:KG-2.94 / 2.95 / 2.96)

Models, profiles, and the sampling knobs are first-class in the one OWL/RDF ontology (reached only via kg.ontology), so OWL reasoning can extrapolate which profile fits a task class from how related models/roles are tuned.

flowchart TD
    M["Model (model_id, tier)"] -->|"HAS_PROFILE / PROFILE_OF"| IP["InferenceProfile<br/>task_class + SHACL-bounded knobs"]
    IP -->|"TUNED_FOR"| TC["TaskClass"]
    IP -->|"BOUND_TO_ROLE"| RO["Role"]
    AG["Agent"] -->|"USES_PROFILE"| IP
    M -.->|"implements"| SC["SamplingConfigurable (interface)"]
    IP -.->|"validated by"| VT["value types: Temperature 0..2, TopP 0..1, TopK ≥1, ..."]
  • Value types (KG-2.94, ontology/value_types.py)Temperature [0,2], TopP/MinP [0,1], TopK/MaxTokens (int ≥1), RepetitionPenalty (>0), PresencePenalty/ FrequencyPenalty [-2,2]. Each compiles to a runtime validator + SHACL property shape
  • OWL datatype restriction. sampling_profile_violations() is the governance gate a profile passes before it is promoted (C) or set (E).
  • Interfaces (KG-2.94, ontology/interfaces.py)InferenceProfile (shape) and SamplingConfigurable (a Model object type implements it, declaring a HAS_PROFILE link). inference_owl_ttl() projects the registry's models + profiles to OWL.
  • Links (KG-2.96, ontology/links.py)HAS_PROFILE/PROFILE_OF, TUNED_FOR, BOUND_TO_ROLE, USES_PROFILE.

Layer E — two surfaces

The shared action core (mcp/tools/ontology_tools.py::ontology_sampling_profile) is reachable identically over MCP and REST (parity-mapped in kg_server.ACTION_TOOL_ROUTES/ontology/sampling-profiles, with granular GET twins in gateway/ontology_api.py):

action Effect
list every effective task-class profile (curated ∪ learned)
describe the profile served for one task class
resolve the profile that would be picked for a prompt / role (inspection)
set write a profile (rejected if sampling_profile_violations is non-empty)
evolve run one mutate→score→promote round for a task class
owl the OWL projection of models + profiles

How to use it

  • It just happens. Every factory-built agent resolves a task-aware profile per call; callers who need a fixed profile pass model_settings= explicitly and win.
  • Inspect: ontology_sampling_profile action=resolve task_text="…" (MCP) or GET /api/ontology/sampling-profiles/{task_class} (REST).
  • Tune by hand: ontology_sampling_profile action=set task_class=code profile_json='{"temperature":0.05,"top_k":10}'.
  • Let it learn: ontology_sampling_profile action=evolve task_class=code (the daemon evolution tick can drive this as outcomes accumulate).

Configuration

No new environment variables. Task-class profiles are module constants seeded in model_registry._DEFAULT_TASK_PROFILES; learned overrides live in the process-global registry (load_active_registry / reset_active_registry) and, where a registry file is configured (config.model_registry_path), round-trip through it.

Code paths

  • agent_utilities/agent/sampling_profile.pySamplingProfile, to_model_settings, DEFAULT_PROFILE, classify_task, resolve_sampling_profile, attach_profile_resolver.
  • agent_utilities/agent/factory.py — wires attach_profile_resolver onto every agent.
  • agent_utilities/rlm/repl.py — depth-tiered profile threading.
  • agent_utilities/models/model_registry.pytask_class_profiles, pick_profile_for_task/role, set_task_profile, load_active_registry, inference_owl_ttl.
  • agent_utilities/graph/adaptive_agent_router.pyRoutingDecision.sampling_profile.
  • agent_utilities/harness/variant_pool.pymutate_profile, evolve_profile.
  • agent_utilities/knowledge_graph/ontology/{value_types,interfaces,links}.py — value types + sampling_profile_violations, InferenceProfile/SamplingConfigurable/ Model, typed links.
  • agent_utilities/mcp/tools/ontology_tools.py, agent_utilities/mcp/kg_server.py, agent_utilities/gateway/ontology_api.py — the two surfaces.

Relationship to other concepts

  • Sits beside ORCH-1.27 (role-specialized model routing) — the same role/task-class key now also selects how to sample, not just which model.
  • Reuses AHE-3.2 (VariantPool tournament) and the KG-2.6 capability reward EMA (CapabilityIndex.record_outcome) as the evolution substrate — see What can be evolved.
  • Extends the KG-2.38/2.39/2.26 ontology layers (interfaces / value types / links) per the one-ontology principle: every new artifact becomes OWL the reasoner can traverse.