Configuration Reference & Flag Audit¶
This is the single, authoritative inventory of every environment variable
agent-utilities reads, with a verdict for each: is the flag actually needed,
or should the system detect and self-configure instead?
It exists because the codebase had grown to ~96 distinct KG_* / GRAPH_* /
EPISTEMIC_* / AGENT_UTILITIES_* flags — over-configuration that is overwhelming
to operate and a frequent source of footguns. The rule for adding new flags is in
AGENTS.md → Configuration discipline. The CI gate scripts/check_no_env_sprawl.py
enforces that flags are declared on AgentConfig (core/config.py), not read with bare
os.environ.get() scattered across modules.
How configuration is read (two centralized paths)¶
core/config.py (and core/paths.py) are the only files that touch os.environ.
Every other module reads through one of two paths, both driven by config.json —
the XDG loader injects ~/.config/agent-utilities/config.json (or
AGENT_UTILITIES_CONFIG_DIR) into os.environ before anything reads it, so a JSON key
graph_db_uri becomes GRAPH_DB_URI:
- Typed
AgentConfigfield (Field(alias="MY_VAR"), readconfig.my_var) — for static settings parsed once at import. config.setting("MY_VAR", default, cast=…)— the sanctioned accessor for reads that must stay live (daemon cadences, test-varied flags, runtime toggles). It readsos.environat call time with a declared default and type coercion (inferred from the default's type, or passcast).
Decision: field for static, setting() for dynamic — never a bare
os.environ.get / os.getenv / os.environ[...] read in a module.
The fold is complete: ZERO bare env reads remain anywhere in agent_utilities/
(every prefix — KG_*, GRAPH_*, AGENT_*, VAULT_*, OTEL_*, connector creds, …).
Deployment-varying/behavioral flags → setting() (or a typed field); pure load/cadence
tunables → auto-sized via compute_ingest_worker_count() or named module constants.
scripts/check_no_env_sprawl.py covers every prefix and its baseline
(scripts/env_flag_baseline.txt) is empty — any new bare read fails CI.
setting() itself lives in the dependency-free core/_env.py (re-exported by
config) so it stays importable while config is still initializing.
Verdict legend
- KEEP — legitimate deployment config (path / DSN / secret / port / socket). Must be
read via the central config object, not bare os.environ.
- AUTO — should be auto-detected/auto-sized from the runtime; the flag should be
removed (or kept only as an override with an auto default).
- REMOVE — always-on behavior or stale experiment; delete the flag.
A. Deployment config — KEEP (read via AgentConfig)¶
| Flag | Default | What it sets |
|---|---|---|
GRAPH_DB_URI / PGGRAPH_DSN |
none | Durable L3 Postgres/pg-age DSN |
GRAPH_BACKEND / GRAPH_BACKEND_L1 |
tiered / epistemic_graph |
Backend selection |
EPISTEMIC_GRAPH_SOCKET |
/tmp/epistemic-graph.sock |
Rust engine UDS |
GRAPH_PERSISTENCE_PATH, GRAPH_SERVICE_PERSIST_DIR |
data dir | L1 snapshot dir |
GRAPH_DB_HOST/PORT/NAME/USER/PASSWORD/PATH |
— | DSN parts (legacy; prefer GRAPH_DB_URI) |
GRAPH_FUSEKI_URL/USER/PASSWORD/DATASET |
— | SPARQL endpoint (optional backend) |
GITLAB_INSTANCES |
none | JSON list of GitLab instances to index/query — the multi-tenant source of truth shared by the KG GitLab indexer and the gitlab-api connector registry. Each entry {"name":<str>,"url":<str>,"token":<str>,"verify_ssl":<bool>}. Unset → single-host GITLAB_URL/GITLAB_TOKEN (CONCEPT:KG-2.9g) |
GRAPH_PGGRAPH_SCHEMA |
public |
Postgres schema |
AGENT_UTILITIES_{CONFIG,DATA,CACHE,LOG,MEMORY,RUNTIME}_DIR |
XDG | Path overrides (resolved in core/paths.py) |
AGENT_UTILITIES_TOKEN_SECRET |
— | Run-scoped tool-token secret |
KG_DAEMON_ROLE |
auto |
host/client/auto election (topology) |
STATE_DB_URI |
none | Externalize ALL durable state (durable-exec checkpoints, sessions/turns/goals, KG task queue) to a shared Postgres; unset keeps the zero-infra per-host SQLite files (CONCEPT:OS-5.16) |
STATE_DB_POOL_SIZE |
8 |
Max connections in the ONE shared state-store psycopg pool (CONCEPT:OS-5.16) |
TASK_QUEUE_BACKEND |
none (auto) | Ingest task queue: sqlite|postgres|kafka. Unset = auto (postgres when STATE_DB_URI set, else sqlite). Explicit kafka/postgres is FAIL-LOUD at startup (CONCEPT:KG-2.55) |
KG_TASKS_PARTITIONS |
6 |
Partitions ensured on the kg_tasks topic at startup (grow-only, never shrinks); bounds kg-ingest consumer-group parallelism (CONCEPT:KG-2.56) |
AGENT_DISPATCH_BACKEND |
inline |
How agent turns (goal runs / orchestrator jobs) dispatch: inline keeps the in-process execution; queue publishes a session-keyed envelope onto the agent_turns queue (transport follows TASK_QUEUE_BACKEND) and returns a job handle for the agent-dispatch-worker fleet (CONCEPT:ORCH-1.45) |
AGENT_TURNS_PARTITIONS |
6 |
Partitions ensured on the agent_turns topic when Kafka carries dispatched agent turns (grow-only); bounds fleet-wide concurrent-session parallelism (CONCEPT:ORCH-1.45) |
EPISTEMIC_GRAPH_AUTOSTART |
— | Auto-spawn the engine (local unix:// endpoint only; never remote shards) |
GRAPH_SERVICE_ENDPOINTS |
unset | Engine shard endpoints (comma/JSON list). 2+ entries = tenant-partitioned sharding via HRW over graph names; unset/1 = single-engine zero-infra default (CONCEPT:KG-2.58) |
KG_DEFAULT_GRAPH |
__bus__ |
Default named graph; in sharded mode the ambient ActorContext tenant maps it to tenant__<t>__<base> before HRW (CONCEPT:KG-2.58) |
KG_WATCH_DIRS |
unset | Operator document directories the file-watcher auto-ingests recursively, unified with the built-in ScholarX/research download dirs. New files are ingested and modified files re-ingested on the 5s watch tick; unchanged files delta-skip by content hash (CONCEPT:KG-2.8). Value is a JSON array or an os.pathsep/comma-separated list of paths (~ expanded), e.g. ~/Documents. config.json key: kg_watch_dirs. Resolved by sdd/watcher.py:get_watched_directories() |
GRAPH_SERVICE_AUTH_SECRET |
auto-generated | Engine HMAC secret; unset → per-install secret persisted at data_dir()/engine_secret (0600) (CONCEPT:OS-5.14) |
KG_ENGINE_INSECURE |
false |
Dev opt-out of engine HMAC auth; sets EPISTEMIC_GRAPH_ALLOW_INSECURE=1 on spawned engines (CONCEPT:OS-5.14) |
KG_AUTH_REQUIRED |
false |
Require server-validated JWT identity for KG access — 401 without it; caller _actor/_roles/_tenant kwargs ignored (CONCEPT:OS-5.14) |
KG_AUTH_TOKEN |
— | JWT minting the stdio MCP process identity (validated against AUTH_JWT_JWKS_URI) (CONCEPT:OS-5.14) |
KG_ACL_DEFAULT_ALLOW |
false |
With KG_BRAIN_ENFORCE on: allow nodes WITHOUT an ACL (escape hatch from the fail-closed default-deny) (CONCEPT:OS-5.14) |
KG_SERVED_PROFILE |
true |
Fail-closed served-security profile for network MCP transports: refuse to start without AUTH_JWT_JWKS_URI, auto-enable auth + enforcement. 0 = serve open (local dev only) (CONCEPT:OS-5.14) |
KG_ENGINE_POOL_SIZE |
0 |
Max warm per-tenant engine clients per process (elastic LRU pool over KG-2.58 routing); 0 = per-use construction (CONCEPT:KG-2.62) |
KG_ENGINE_POOL_DROP_ON_EVICT |
false |
On pool eviction also unload the tenant's named graph from the engine to reclaim memory — only safe when L3 mirrors the data (CONCEPT:KG-2.62) |
GATEWAY_METRICS |
true |
Python-tier Prometheus middleware + GET /metrics on the gateway (CONCEPT:OS-5.23) |
GATEWAY_RATE_LIMIT |
0 (off) |
Per-tenant token-bucket rate limit, sustained req/s; buckets are per-process (CONCEPT:OS-5.23) |
GATEWAY_RATE_BURST |
0 (→ 2× rate) |
Token-bucket burst capacity (CONCEPT:OS-5.23) |
GATEWAY_WORKERS |
1 |
Pre-forked gateway worker processes on one shared listen socket; the flock host-lock elects ONE KG host among them (CONCEPT:OS-5.23) |
ENGINE_BREAKER_THRESHOLD |
5 |
Consecutive engine connect/timeout failures before the client circuit opens (0 = off) (CONCEPT:OS-5.23) |
ENGINE_BREAKER_COOLDOWN |
15 |
Seconds an open engine circuit waits before the half-open probe (CONCEPT:OS-5.23) |
MCP_CHILD_MAX_CONCURRENCY |
8 |
Max in-flight tool calls per multiplexer child (0 = unlimited); per-server max_concurrency override in mcp_config.json (CONCEPT:ECO-4.34) |
MCP_CHILD_QUEUE_TIMEOUT |
30 |
Seconds an excess call queues for a child slot before the typed MCPChildBusyError; per-server queue_timeout override (CONCEPT:ECO-4.34) |
MCP_CHILD_POOL_SIZE |
1 |
Session-pool size for remote (streamable-http/SSE) children — N round-robin connections for parallel calls; stdio stays single-pipe; per-server pool_size override (CONCEPT:ECO-4.34) |
MCP_CHILD_MAX_RESTARTS |
5 |
Auto-restarts a crashed child may consume inside the window before being parked failed (0 = no auto-restart); per-server max_restarts override (CONCEPT:ECO-4.34) |
MCP_CHILD_RESTART_WINDOW |
300 |
Sliding window (s) for the restart budget; older restarts are forgiven; per-server restart_window override (CONCEPT:ECO-4.34) |
MCP_CHILD_BREAKER_THRESHOLD |
5 |
Consecutive transport failures/timeouts before a child's circuit opens (typed MCPChildCircuitOpenError, 0 = off); per-server breaker_threshold override (CONCEPT:ECO-4.34) |
MCP_CHILD_BREAKER_COOLDOWN |
15 |
Seconds an open child circuit waits before the half-open probe; per-server breaker_cooldown override (CONCEPT:ECO-4.34) |
MCP_MULTIPLEXER_MODE |
eager |
Tool-exposure strategy: eager spawns every child and exposes all tools at boot (historical); dynamic exposes only the find_tools/load_tools/unload_tools/multiplexer_status meta-tools + always-on children, mounting other tools on demand with a tools/list_changed notification (CONCEPT:ECO-4.36) |
MCP_DYNAMIC_ALWAYS_ON |
["graph-os"] |
Child servers mounted at boot in dynamic mode (in addition to meta-tools); defaults to the KG server so find_tools can rank semantically (CONCEPT:ECO-4.36) |
MCP_DYNAMIC_TOP_K |
8 |
Default number of ranked candidates find_tools returns when top_k is unspecified (CONCEPT:ECO-4.36) |
ACTION_POLICY_PATH |
shipped default | Operational ActionPolicy YAML; empty → conservative deploy/action-policy.default.yml (everything mutating = approval_required). KG governance_rule overrides win (CONCEPT:OS-5.24) |
FLEET_RECONCILER |
false |
Opt-in leader-only desired-state fleet reconciler tick — diff registry vs observed, converge through the ActionPolicy gate + actuator seam (CONCEPT:OS-5.25) |
FLEET_RECONCILER_INTERVAL |
120 |
Seconds between fleet-reconciler ticks (CONCEPT:OS-5.25) |
FLEET_RECONCILER_MAX_ACTIONS |
5 |
Storm guard: max convergence actions per tick, rest deferred (CONCEPT:OS-5.25) |
FLEET_REGISTRY_PATH |
shipped registry | Fleet service registry YAML; empty → deploy/mcp-fleet.registry.yml (CONCEPT:OS-5.25) |
FLEET_DESIRED_STATE_PATH |
unset | Optional desired-state override YAML (per-service replicas/desired/version) layered on the registry (CONCEPT:OS-5.25) |
FLEET_ACTUATOR |
dryrun |
Actuator selection: dryrun (records intent, mutates nothing) or docker (reference CLI actuator). Portainer/Swarm actuators are deployment-wired via set_fleet_actuator() (CONCEPT:OS-5.25) |
DEPLOY_WATCH_WINDOW |
300 |
Health-watch window (s) after a deploy/restart; failure inside the window triggers the policy-gated rollback (CONCEPT:OS-5.27) |
DEPLOY_WATCH_POLL |
15 |
Seconds between health probes inside a deploy watch (CONCEPT:OS-5.27) |
FLEET_AUTOSCALER |
false |
Opt-in leader-only reactive replica autoscaler tick — load signal → registry-declared min/max bounds → policy-gated scale_service + deploy watch (CONCEPT:OS-5.29) |
FLEET_AUTOSCALER_INTERVAL |
60 |
Seconds between autoscaler ticks (CONCEPT:OS-5.29) |
SCALING_PROMETHEUS_URL |
unset | Prometheus base URL for autoscaling signals (instant /api/v1/query GETs); unset → zero-infra in-process gauges; injected provider via set_scaling_signal_provider() wins (CONCEPT:OS-5.29) |
These genuinely vary per host and aren't derivable. Action: ensure each is a typed
AgentConfig field; remove duplicate bare reads (GRAPH_DB_URI is read in 4 places,
AGENT_UTILITIES_CONFIG_DIR in 5).
B. Daemon on/off toggles, all default ON — REMOVED (Phase 3) ✓¶
Done. The six always-on toggles below were deleted and collapsed behind a single
KG_DEV_MODE switch on AgentConfig, read through one engine_tasks._kg_dev_mode() helper
that gates the maintenance scheduler + embedding-backfill startup. Production keeps every daemon
on; KG_DEV_MODE=1 silences the lot. Removed: KG_EMBED_BACKFILL, KG_ENRICH_DAEMON,
KG_FILE_WATCH, KG_HYGIENE_DAEMON, KG_TASK_REAPER_DAEMON, KG_RECONCILE_DURABLE. Also fixed
the KG_EMBED_BACKFILL_BATCH dual-default bug → two named constants
(_EMBED_BACKFILL_BUDGET=256, _EMBED_BACKFILL_FETCH=512). Sprawl baseline 95 → 88. (The
KG_CONCEPT_CODE_LINK toggle lives in another module and is still pending.)
Original inventory (for reference):
| Flag | Default | Gated thread/job (engine_tasks.py) |
|---|---|---|
KG_EMBED_BACKFILL |
1 |
vector-embedding backfill drain (L402) |
KG_ENRICH_DAEMON |
1 |
semantic enrichment tick (L618) |
KG_FILE_WATCH |
1 |
SDD/skills/config file-watch (L635) |
KG_HYGIENE_DAEMON |
1 |
memory decay/dedup (L645) |
KG_TASK_REAPER_DAEMON |
1 |
zombie-task recovery (L657) |
KG_RECONCILE_DURABLE |
1 |
L1→L2/L3 autoheal (L609) |
KG_CONCEPT_CODE_LINK |
1 |
concept↔code bridge |
GRAPH_DIRECT_DISPATCH |
true |
sync dispatch |
KG_RETRIEVAL_QUALITY_GATE |
true |
relevance filter |
Nobody runs these off in production. Action: delete the env gates; if a dev escape
hatch is wanted, a single KG_DEV_MODE=1 disables all background daemons.
B.1 Safety overrides — KEEP (typed on AgentConfig)¶
| Flag | Default | What it gates |
|---|---|---|
KG_ALLOW_FULL_SCAN |
false |
Permit an unscoped Cypher query to enumerate the whole graph (ORCH-1.40). Off by default so a buggy unscoped query can never silently full-scan; deliberate opt-in only. Typed config.kg_allow_full_scan, read in backends/epistemic_graph_backend.py. |
C. Ingest-throughput knobs — REMOVED ✓¶
Done. All three deleted:
- KG_INGEST_FEATURES / KG_INGEST_PROFILE → per-repo call-graph community detection is now
always on. The hang risk that motivated the opt-out is fixed at the source: the engine's
community_detection is deterministically bounded (15s wall-clock + iteration cap,
epistemic-graph algorithms.rs), and make_community_fn loads its scratch tenant in one
batch_update round-trip instead of per-element RPCs.
Correction (verified against code): one KG_INGEST_PROFILE read survives — pipeline
phase selection in knowledge_graph/pipeline/__init__.py (select_phases; values
structural | full, unset = full). It no longer gates community detection. It is
tracked in the bare-read baseline (see section H).
- KG_BULK_INGEST → the maintenance scheduler auto-detects a bulk ingest from the durable
submission-queue depth (_submission_queue.get_queue_size() > _BULK_QUEUE_THRESHOLD) and defers
its whole-graph passes per-tick, instead of a manual startup flag.
D. Performance tunables — keep as constants / deployment config (auto-sizing deferred)¶
The one real defect here — the KG_EMBED_BACKFILL_BATCH dual-default — is fixed (Phase 3:
two named constants). The rest are left as-is by design: the worker pool already auto-sizes
(engine_tasks.py CPU 36% + mem cap), and the remaining batch sizes / intervals have correct
universal defaults. Per the Configuration discipline rule, a tunable with a good default should
be a constant, not a knob — but mechanically converting every one to a CPU-derived auto-sizer is
speculative churn with behaviour-change risk and little payoff, so it is not pursued. GRAPH_TIMEOUT's
20-minute default is noted (it made the old community-detection hang look infinite); now moot since
the engine bounds the call itself.
| Flag | Default | Notes |
|---|---|---|
KG_LLM_CONCURRENCY |
4 | typed on AgentConfig (kg_llm_concurrency); max concurrent LLM calls for KG operations — set to match the inference endpoint's parallel capacity |
KG_PARSE_BATCH |
128 | constant |
KG_ENRICH_BATCH / KG_ENRICH_MAX_BATCHES |
16 / 8 | constants |
KG_EMBED_BACKFILL_BATCH |
256 and 512 (BUG) | read twice with different defaults (L1040, L1155) — unify |
KG_BACKGROUND_MAX_CONCURRENT |
2 | auto |
GRAPH_POOL_MIN/MAX |
2 / 10 | auto from cpu |
KG_CHAT_CONCURRENCY |
8 | auto |
KG_*_INTERVAL (enrich/file_watch/embed/evolution/golden) |
20–3600 | constants unless deployment-varying |
GRAPH_TIMEOUT |
1200000 ms | 20-min RPC timeout — far too long; the root of "hangs look infinite" |
E. Experiment / feature gates¶
KG_GOLDEN_* (10 flags) — collapsed onto AgentConfig ✓. Every KG_GOLDEN_* /
KG_BREADTH_* read was moved off bare os.environ onto typed AgentConfig fields —
opt-in, all off by default, single typed source of truth.
| Flag | Default | What it gates |
|---|---|---|
KG_LOOP |
False |
enable the autonomous golden-loop daemon cycle (intake→acquire→resolve→distil→synthesize) |
KG_LOOP_DISTILL |
False |
distillation step of the golden loop |
KG_LOOP_BREADTH |
False |
breadth ingest step (auto-ingest of the roots below) |
KG_LOOP_STANDARDIZE |
False |
standardization pass of the golden loop |
KG_GOLDEN_AUTO_MERGE |
False |
governed auto-merge of promoted proposals |
KG_GOLDEN_MERGE_THRESHOLD |
None |
score threshold gating auto-merge |
KG_LOOP_INTERVAL |
3600 |
daemon tick interval (s) |
KG_LOOP_TOPICS |
5 |
hot topics processed per cycle |
KG_BREADTH_LIBRARY_ROOTS |
"" |
comma-separated OSS library roots auto-ingested by the breadth step (deployment-specific; empty = no-op) |
KG_BREADTH_REPO_ROOTS |
"" |
comma-separated code-repo roots auto-ingested by the breadth step |
| Family | Count | Notes |
|---|---|---|
KG_EA_WRITEBACK, KG_ENABLE_HARD_NEGATIVE_MINING, KG_BRAIN_ENFORCE, KG_RESEARCH_EXTERNAL, KG_PROCESS_WRITEBACK |
5 | remaining experiment gates — graduate (always-on) or delete |
KG_PROCESS_WRITEBACK — outbound process-intelligence writeback (CONCEPT:KG-2.8,
default off). Opt-in because it performs outbound mutating calls into external
systems (Camunda process-instance variables, ARIS model attributes) — exactly the
"expensive / side-effecting on every run" case the opt-in tier is reserved for. When
set, resolve_process_writeback pushes the KG's per-process intelligence (capability/
code lineage, OWL inferences, operational signals, glossary/data lineage) back onto the
live process world via the graph_analyze action=process_writeback surface. The
connector connection/credentials themselves are not agent-utilities config — they
live in the camunda-mcp / aris-mcp packages' own environment
(CAMUNDA7_*/CAMUNDA8_*, ARIS_*), read by each connector's auth.get_client().
EVOLUTION_WORKTREE_ROOT — evolution→branch bridge (CONCEPT:AHE-3.21), typed on
AgentConfig (evolution_worktree_root). Root directory the LocalBranchPublisher
creates fresh git worktrees under when publishing a promoted proposal as a reviewable
local branch. Empty (default) resolves to data_dir()/evolution_worktrees — publication
never writes into a canonical checkout's working tree.
KG_FAILURE_* — Failure-Driven Evolution (CONCEPT:AHE-3.18), typed on AgentConfig,
opt-in, all off by default. The boolean gates are parsed via to_boolean
("True"/"False", consistent with the fleet's other toggles). See
failure_driven_evolution.md.
| Flag | Default | Notes |
|---|---|---|
KG_FAILURE_EVOLUTION |
False |
enable the daemon failure_ingest tick (pull Langfuse failures → remediation) |
KG_FAILURE_EVOLUTION_INTERVAL |
3600 |
daemon tick interval (s) |
KG_FAILURE_EVOLUTION_WINDOW |
86400 |
telemetry look-back window (s) |
KG_FAILURE_REGRESSION_DATASET |
False |
enable the dataset-based regression path |
KG_DSPY_OPTIMIZATION |
False |
enable the daemon dspy_optimization tick — propose-only DSPy optimization sweep over the self-supervised targets (CONCEPT:AHE-3.46) |
KG_DSPY_OPTIMIZATION_INTERVAL |
3600 |
DSPy optimization sweep interval (s) |
KG_FUSEKI_* — Ontology distribution to Apache Jena Fuseki (CONCEPT:KG-2.52), typed on
AgentConfig, opt-in. The fuseki_publish maintenance tick pushes the bundled ontology
modules (the authoritative TBox) to an optional enterprise Fuseki triplestore for SPARQL
federation. Off by default — Fuseki is optional infrastructure.
| Flag | Default | Notes |
|---|---|---|
KG_FUSEKI_PUBLISH |
False |
enable the daemon fuseki_publish tick |
KG_FUSEKI_ENDPOINT |
None |
Fuseki URL; None defers to the publisher (FUSEKI_ENDPOINT, then localhost) |
KG_FUSEKI_PUBLISH_INTERVAL |
3600 |
daemon tick interval (s) |
KG_WORKFLOW_SHAPE_GATE — execution-time workflow ontology gate (CONCEPT:ORCH-1.42),
typed on AgentConfig, default ON. execute_workflow AND its background twin
dispatch_workflow (REST twin /api/graph/orchestrate/dispatch-workflow) SHACL-validate
the stored WorkflowDefinition (+ steps) against the governance shapes before dispatch and
refuse malformed definitions with a structured violation report; cheap and LLM-free. The
companion permission gate (ontology permissioning ACL on the workflow node) is governed by
the existing KG_BRAIN_ENFORCE flag (OS-5.14 fail-closed semantics), not a new one.
| Flag | Default | Notes |
|---|---|---|
KG_WORKFLOW_SHAPE_GATE |
True |
SHACL-validate stored workflows before execution |
Langfuse (CONCEPT:AHE-3.18 / AHE-3.0) — official SDK variable names only. The host
variable is LANGFUSE_HOST (the non-standard LANGFUSE_BASE_URL fallback was removed —
greenfield). Resolved through AgentConfig.langfuse_host / langfuse_public_key /
langfuse_secret_key.
| Flag | Default | Notes |
|---|---|---|
LANGFUSE_HOST |
https://cloud.langfuse.com |
Langfuse base URL (read + OTEL write paths) |
LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY |
None |
project API keypair |
F. Testing — KEEP¶
| Flag | Default | Notes |
|---|---|---|
AGENT_UTILITIES_TESTING |
False |
test-mode guard (read in ~8 places — centralize) |
AGENT_UTILITIES_GWT_STRICT |
"" |
global-workspace strict test mode |
G. Complete AgentConfig inventory — platform fields beyond the KG/graph flag audit¶
Sections A–F are the original KG_*/GRAPH_* sprawl audit. AgentConfig
(core/config.py, pydantic-settings) additionally carries the platform's general
configuration surface. Totals, extracted programmatically from
AgentConfig.model_fields: 244 fields, 242 distinct environment variables
(SECRETS_VAULT_URL and SECRETS_VAULT_MOUNT each bind two fields — vault_url/
secrets_vault_url and vault_mount/secrets_vault_mount — a legacy duplication
kept for compatibility).
Environment-name resolution: every field declares an explicit alias which IS its
environment variable name (no env_prefix; matching is case-insensitive). Sources in
precedence order: constructor args → environment → .env → nested secrets file
(AGENT_SECRETS_FILE) → Docker secrets (/run/secrets). All fields below are
therefore environment-settable; none are internal-only.
G.1 Model providers & routing¶
| Flag | Default | What it sets |
|---|---|---|
CHAT_MODELS |
[] |
JSON list of chat-model configs (id/provider/base_url/api_key/intelligence_level); drives default_chat_model / lite_chat_model / super_chat_model |
EMBEDDING_MODELS |
[] |
JSON list of embedding-model configs (first entry = default) |
OPENAI_API_KEY / OPENAI_BASE_URL |
None |
OpenAI fallback credentials for ad-hoc model creation |
ANTHROPIC_API_KEY |
None |
Anthropic fallback API key |
GEMINI_API_KEY |
None |
Google Gemini fallback API key |
GROQ_API_KEY |
None |
Groq fallback API key |
MISTRAL_API_KEY |
None |
Mistral fallback API key |
HUGGING_FACE_API_KEY |
None |
Hugging Face fallback API key |
DEEPSEEK_API_KEY / DEEPSEEK_BASE_URL |
None |
DeepSeek fallback credentials |
VLLM_BASE_URL |
None |
Dedicated vLLM inference server base URL |
MODEL_REGISTRY_PATH |
None |
YAML/JSON model-registry file |
MODEL_ROLE_ROUTING |
{} |
Role→{tier,tags} overrides for planner/generator/learner/judge selection (CONCEPT:ORCH-1.27); empty roles use models/model_registry.py defaults |
ROUTING_STRATEGY |
hybrid |
Graph routing strategy |
ROUTING_PERCENTILE |
50.0 |
Routing percentile tunable |
G.2 LLM request defaults¶
| Flag | Default | What it sets |
|---|---|---|
MAX_TOKENS |
16384 |
Default completion token cap |
TEMPERATURE |
0.7 |
Sampling temperature |
TOP_P |
1.0 |
Nucleus sampling |
TIMEOUT |
32400 |
LLM request timeout (s) |
TOOL_TIMEOUT |
32400 |
Tool-call timeout (s) |
PARALLEL_TOOL_CALLS |
true |
Allow parallel tool calls |
SEED |
None |
Deterministic sampling seed |
PRESENCE_PENALTY / FREQUENCY_PENALTY |
0.0 |
Repetition penalties |
LOGIT_BIAS |
None |
Token logit-bias map (JSON) |
STOP_SEQUENCES |
None |
Stop sequences (JSON list) |
EXTRA_HEADERS / EXTRA_BODY |
None |
Extra provider request headers/body (JSON) |
G.3 Agent identity & HTTP server¶
| Flag | Default | What it sets |
|---|---|---|
DEFAULT_AGENT_NAME |
package name | Agent display name |
AGENT_DESCRIPTION |
package description | Agent description |
AGENT_SYSTEM_PROMPT |
None |
System prompt override |
WORKSPACE_PATH |
None |
Workspace root override |
HOST |
0.0.0.0 |
Gateway bind address |
PORT |
9000 |
Gateway port |
DEBUG |
false |
Debug mode |
ENABLE_WEB_UI |
false |
Serve the web UI |
ENABLE_TERMINAL_UI |
false |
Terminal UI mode (disables GATEWAY_WORKERS>1) |
ENABLE_WEB_LOGS |
true |
Web log streaming |
ENABLE_ACP |
false |
Agent Client Protocol adapter |
ACP_PORT |
8001 |
ACP port |
ACP_SESSION_ROOT |
.acp-sessions |
ACP session storage dir |
DEFAULT_TERMINAL_AGENT |
agent-terminal-ui |
Terminal agent binary |
MCP_URL |
None |
Remote MCP server URL the agent attaches to |
MCP_CONFIG |
None |
Path to mcp_config.json |
AGENT_API_KEY |
None |
Static API key for gateway auth |
ENABLE_API_AUTH |
false |
Require the API key |
MAX_UPLOAD_SIZE |
10485760 |
Upload cap (bytes) |
ALLOWED_ORIGINS |
None (= *) |
CORS origins, comma-separated |
ALLOWED_HOSTS |
None |
TrustedHostMiddleware hosts, comma-separated |
G.4 Identity, JWT & delegation¶
AUTH_JWT_JWKS_URI is in section A's orbit (OS-5.14); its companions:
| Flag | Default | What it sets |
|---|---|---|
AUTH_JWT_ISSUER |
None |
Expected JWT issuer claim |
AUTH_JWT_AUDIENCE |
None |
Expected JWT audience claim |
OIDC_CONFIG_URL |
None |
OIDC discovery URL (any compliant IdP) |
OIDC_CLIENT_ID / OIDC_CLIENT_SECRET |
None |
OAuth 2.0 client credentials |
ENABLE_DELEGATION |
false |
RFC 8693 token exchange for downstream APIs (CONCEPT:ECO-4.0) |
AUDIENCE |
None |
Target audience for delegated tokens |
DELEGATED_SCOPES |
api |
Space-separated delegation scopes |
FLEET_EVENTS_TOKEN |
None |
Shared secret for POST /api/fleet/events webhook ingress (X-Fleet-Events-Token header); unset = no token required (CONCEPT:OS-5.15) |
G.5 Secrets backends¶
| Flag | Default | What it sets |
|---|---|---|
SECRETS_BACKEND |
inmemory |
inmemory | sqlite | vault |
SECRETS_SQLITE_PATH |
None |
SQLite secrets DB path |
SECRETS_VAULT_URL |
None |
HashiCorp Vault / OpenBao URL (binds both vault_url and secrets_vault_url) |
SECRETS_VAULT_MOUNT |
secret |
KV v2 mount (binds both vault_mount and secrets_vault_mount) |
VAULT_AUTH_METHOD |
auto |
oidc | approle | token | kubernetes | auto |
VAULT_AUTH_MOUNT |
jwt |
Auth-method mount path |
VAULT_ROLE |
None |
Role for OIDC/JWT or Kubernetes login |
VAULT_PATH_PREFIX |
None |
KV path prefix (e.g. agents/mcp/) |
G.6 Graph service & KG runtime (fields not already in A–F)¶
| Flag | Default | What it sets |
|---|---|---|
GRAPH_PERSISTENCE_TYPE |
file |
L1 persistence mode |
GRAPH_BACKEND_L2 |
None (auto) |
Explicit L2 backend; unset auto-selects (LadybugDB, or PostgreSQL when a DSN is configured) |
GRAPH_COMPUTE_BACKEND |
rust |
Compute tier selection |
GRAPH_SERVICE_SOCKET |
None (XDG runtime dir) |
Engine UDS path; default $XDG_RUNTIME_DIR/epistemic-graph.sock |
GRAPH_SERVICE_TCP_ADDR |
None |
Engine TCP address (e.g. 0.0.0.0:9100); GRAPH_SERVICE_ENDPOINTS overrides both |
GRAPH_SERVICE_CHECKPOINT_SECS |
300 |
Engine auto-checkpoint interval (0 = off) |
GRAPH_SERVICE_PERSIST_ON_SHUTDOWN |
true |
Serialize all graphs on engine shutdown |
GRAPH_DIRECT_EXECUTION |
true |
AG-UI/ACP adapters bypass the LLM tool-call hop and invoke graph execution directly |
GRAPH_ROUTER_TIMEOUT / GRAPH_VERIFIER_TIMEOUT |
300 |
Router/verifier timeouts (s) |
ENABLE_LLM_VALIDATION |
false |
LLM validation pass |
ENABLE_KG_EMBEDDINGS |
true |
KG embedding generation |
KG_EMBEDDING_DIM |
768 |
Must match the embedding model's output dimension; the schema vector column size derives from it |
KG_BACKUPS |
3 |
KG backup retention count |
KG_INGESTION_WORKERS |
None (auto) |
Ingestion worker count override; unset auto-sizes |
KG_ANALYSIS_MAX_DEPTH |
2 |
Max recursion depth for background research daemons |
MAX_RECURSION_DEPTH |
2 |
Graph recursion depth tunable |
KNOWLEDGE_GRAPH_SYNC_BACKGROUND |
true |
Background task workers for the KG pipeline |
ENABLE_SDD_WATCHER |
true |
Plan/task watcher thread in the KG MCP server |
KG_ANOMALY_CONSUMER |
true |
Drain unconsumed PerformanceAnomaly nodes into failure_gap topics; LLM-free, bounded, propose-only (CONCEPT:AHE-3.19) |
SPARQL_ENDPOINTS |
["https://query.wikidata.org/sparql"] |
External SPARQL endpoints to federate (CONCEPT:KG-2.7) |
JENA_FUSEKI_URL |
None |
Local Jena Fuseki URL (distinct from the KG_FUSEKI_* publish tick in section E) |
KAFKA_BOOTSTRAP_SERVERS |
None |
Kafka brokers (task-queue/event transport; one of the three scale knobs in docs/scaling/capacity_model.md) |
KAFKA_TOPIC |
None |
Default Kafka topic for messaging/event ingestion |
NATS_URL |
None |
NATS broker URL |
G.7 Observability exporters¶
The gateway Prometheus flags (GATEWAY_METRICS etc.) are in section A; the metric
series themselves are catalogued in ../reference/metrics.md.
| Flag | Default | What it sets |
|---|---|---|
ENABLE_OTEL |
false |
OpenTelemetry tracing |
OTEL_EXPORTER_OTLP_ENDPOINT |
None |
OTLP collector endpoint |
OTEL_EXPORTER_OTLP_HEADERS |
None |
OTLP headers |
OTEL_EXPORTER_OTLP_PUBLIC_KEY / OTEL_EXPORTER_OTLP_SECRET_KEY |
None |
OTLP keypair (Langfuse-style basic auth) |
OTEL_EXPORTER_OTLP_PROTOCOL |
http/protobuf |
OTLP protocol |
LANGFUSE_DATASET_CAPTURE_THRESHOLD |
0.0 |
Score threshold for dataset capture (AHE-3.18 regression datasets) |
LANGFUSE_LATENCY_BASELINE_SECONDS |
60 |
Latency baseline for anomaly scoring |
LANGFUSE_TOKEN_BASELINE |
20000 |
Token-usage baseline for anomaly scoring |
LANGFUSE_VERIFIER_FALLBACK_LIMIT |
1 |
Verifier fallback attempts |
G.8 A2A (agent-to-agent)¶
| Flag | Default | What it sets |
|---|---|---|
A2A_BROKER |
in-memory |
A2A broker backend |
A2A_BROKER_URL |
None |
Broker URL when not in-memory |
A2A_STORAGE |
in-memory |
A2A storage backend |
A2A_STORAGE_URL |
None |
Storage URL when not in-memory |
A2A_CONFIG |
None |
a2a_config.json path for external agent discovery (CONCEPT:ECO-4.0) |
A2A_REFRESH_INTERVAL |
300 |
Agent-card re-fetch interval (s) |
G.9 Orchestration, scheduler & guardrails¶
| Flag | Default | What it sets |
|---|---|---|
MIN_CONFIDENCE |
0.4 |
Minimum confidence gate |
VALIDATION_MODE |
false |
Validation-only execution mode |
APPROVAL_TIMEOUT |
0 |
Approval wait timeout (s; 0 = no wait) |
COGNITIVE_SCHEDULER_ENABLED |
true |
Priority-aware agent scheduler (CONCEPT:OS-5.2) |
MAX_CONCURRENT_AGENTS |
5 |
Concurrent specialist agents (CONCEPT:OS-5.2) |
AGENT_TOKEN_QUOTA |
100000 |
Per-agent token budget before preemption (CONCEPT:OS-5.2) |
PREEMPTION_THRESHOLD_PCT |
0.85 |
Quota usage triggering preemption warning |
AGENT_POLICIES_PATH |
None |
agent_policies.json for identity-based governance |
PERMISSIONS_SIGNING_KEY |
None (auto) |
HMAC key for agent identity tokens; auto-generated if unset |
SPECIALIST_REGISTRY_PATH |
None |
Local specialist registry dir |
MAX_PARALLEL_AGENTS |
60 |
Global engine-wide execution semaphore (CONCEPT:ORCH-1.8) |
WORKER_POOL_SIZE |
8 |
Workers per node for agent turns / graph mutations; active-concurrency scale knob (CONCEPT:ORCH-1.8) |
PARALLEL_BATCH_SIZE |
25 |
Agents per execution wave |
SYNTHESIS_STRATEGY |
auto |
auto | flat | hierarchical | progressive | rlm (CONCEPT:ORCH-1.26) |
SYNTHESIS_RATIO |
10 |
Outputs per hierarchical synthesis sub-node |
AGENT_EXECUTION_TIMEOUT |
120 |
Per-agent timeout (s) |
CIRCUIT_BREAKER_THRESHOLD |
3 |
Consecutive failures disabling an agent type |
ENABLE_PROGRESSIVE_SYNTHESIS |
true |
Streaming synthesis as agents complete |
HOMEOSTATIC_DOWNGRADE_ENABLED |
true |
Auto model-tier downgrade under budget pressure (CONCEPT:OS-5.2) |
ADVERSARIAL_VERIFICATION |
false |
Adversarial verification pass — opt-in, doubles verification cost (CONCEPT:AHE-3.1) |
MAINTENANCE_TOKEN_BUDGET |
0 (unlimited) |
Token budget for the autonomous maintenance cron |
MAINTENANCE_PRIORITY |
LOW |
Maintenance task priority (LOW/MEDIUM/HIGH) |
WATCHDOG_PATTERNS |
pyproject.toml, mcp_config.json, requirements*.txt |
File patterns for the file-watcher trigger (CONCEPT:OS-5.0) |
TOOL_GUARD_MODE |
strict |
Sensitive-tool guard mode |
SENSITIVE_TOOL_PATTERNS |
67 regexes | Tool-name patterns treated as mutating/sensitive (delete/exec/deploy/...); override only to extend |
G.10 Skills¶
| Flag | Default | What it sets |
|---|---|---|
CUSTOM_SKILLS_DIRECTORY |
None |
Extra skills directory |
SKILL_TYPES |
None |
Skill-type filter (JSON list) |
G.11 Native messaging backends (CONCEPT:ECO-4.0)¶
| Flag | Default | What it sets |
|---|---|---|
MESSAGING_ENABLED_BACKENDS |
[] |
Backend IDs to auto-connect (e.g. ["discord","slack"]) |
MESSAGING_KG_INGEST |
true |
Auto-ingest all inbound/outbound messages into the KG |
MESSAGING_KG_MEMORY_TYPE |
episodic |
KG memory tier for inbound messages (episodic/semantic/procedural) |
MESSAGING_ROUTE_TO_PLANNER |
true |
Route inbound events to the Planner Graph Agent |
MESSAGING_DISCORD_TOKEN |
None |
Discord bot token (also reads DISCORD_BOT_TOKEN) |
MESSAGING_SLACK_TOKEN |
None |
Slack bot token xoxb-... (also reads SLACK_BOT_TOKEN) |
MESSAGING_SLACK_APP_TOKEN |
None |
Slack app-level token xapp-... (Socket Mode) |
MESSAGING_TELEGRAM_TOKEN |
None |
Telegram bot token (also reads TELEGRAM_BOT_TOKEN) |
MESSAGING_WHATSAPP_TOKEN |
None |
WhatsApp API token (also reads WHATSAPP_TOKEN) |
MESSAGING_WHATSAPP_PHONE_NUMBER_ID |
None |
WhatsApp Business phone number ID |
MESSAGING_WHATSAPP_USE_BUSINESS_API |
false |
Official Business API vs neonize bridge |
MESSAGING_TEAMS_APP_ID / MESSAGING_TEAMS_APP_SECRET |
None |
Microsoft Teams Bot Framework credentials |
MESSAGING_GOOGLECHAT_TOKEN |
None |
Google Chat service-account JSON path |
MESSAGING_GOOGLEMEET_TOKEN |
None |
Google Meet service-account JSON path |
MESSAGING_MATTERMOST_TOKEN / MESSAGING_MATTERMOST_URL |
None |
Mattermost token + server URL |
MESSAGING_MATRIX_TOKEN / MESSAGING_MATRIX_HOMESERVER / MESSAGING_MATRIX_USER_ID |
None |
Matrix credentials |
MESSAGING_IRC_SERVER |
None |
IRC server |
MESSAGING_IRC_PORT |
6667 |
IRC port |
MESSAGING_IRC_NICKNAME |
agent_bot |
IRC nickname |
MESSAGING_IRC_CHANNELS |
[] |
IRC channels to auto-join |
MESSAGING_SIGNAL_TOKEN |
None |
Signal phone number (semaphore-bot) |
MESSAGING_LINE_TOKEN |
None |
LINE channel access token |
MESSAGING_TWITCH_TOKEN |
None |
Twitch OAuth token |
MESSAGING_TWITCH_CHANNELS |
[] |
Twitch channels to join |
MESSAGING_SYNOLOGY_WEBHOOK_URL |
None |
Synology Chat webhook URL |
MESSAGING_VOICECALL_APP_ID / MESSAGING_VOICECALL_TOKEN / MESSAGING_VOICECALL_FROM_NUMBER |
None |
Twilio voice/SMS credentials (account SID / auth token / from number) |
MESSAGING_NEXTCLOUD_URL / MESSAGING_NEXTCLOUD_TOKEN / MESSAGING_NEXTCLOUD_APP_ID |
None |
Nextcloud Talk credentials (URL / app token / username) |
H. Former bare KG_*/GRAPH_*/EPISTEMIC_* reads (now folded)¶
Every governed (KG_*/GRAPH_*/EPISTEMIC_*) bare os.environ read has been
folded off os.environ onto a centralized path, per Configuration discipline:
- Deployment-varying / behavioral / test-varied →
config.setting("VAR", default)(live, config.json-driven). Still fully settable — setvarinconfig.jsonorVARin the environment. - Pure load tunables (concurrency/batch) → auto-sized via
compute_ingest_worker_count(). - Single-value cadences/limits/timeouts → named module constants (no knob).
scripts/check_no_env_sprawl.py now ratchets all-prefix bare reads against
scripts/env_flag_baseline.txt (the governed reads are gone from it; what remains is
the non-KG burn-down — AGENT_*, VAULT_*, OTEL_*, connector creds, …). The table
below is the reference for these settings and how each is now resolved (defaults
unchanged); for the setting() rows, the value is config.json-/env-overridable:
| Flag | Default | Read in | What it sets |
|---|---|---|---|
KG_SERVER_HOST / KG_SERVER_PORT |
127.0.0.1 / 8100 |
agent/factory.py, mcp/kg_coordinator.py, backends/contrib/ladybug_backend.py, core/config.py |
KG coordinator server address |
KG_DAEMON_LOG_LEVEL |
INFO |
gateway/daemon.py |
Daemon log level |
GRAPH_ROUTING_STRATEGY |
hybrid |
knowledge_graph/core/engine.py |
Engine-side routing strategy (overlaps ROUTING_STRATEGY on AgentConfig) |
KG_CARD_MODEL |
lite |
core/engine_tasks.py |
lite or heavy model for enrichment cards |
KG_LLM_TIMEOUT / KG_LLM_MAX_RETRIES |
30 / 1 |
enrichment/cards.py |
Enrichment LLM call timeout (s) / retries |
KG_EMBED_BACKFILL_INTERVAL / KG_EMBED_BACKFILL_BUSY_SLEEP |
30 / 1 |
core/engine_tasks.py |
Embedding-backfill idle/busy sleep (s) |
KG_RECONCILE_INTERVAL |
900 |
core/engine_tasks.py |
L1→L2/L3 reconcile tick (s) |
KG_HYGIENE_INTERVAL |
86400 |
core/engine_tasks.py |
Memory decay/dedup tick (s) |
KG_TASK_REAPER_INTERVAL |
120 |
core/engine_tasks.py |
Zombie-task reaper tick (s) |
KG_TASK_ORPHAN_GRACE_SEC |
90 |
core/engine_tasks.py |
Grace before an orphaned task is reclaimed |
KG_TASK_MAX_RUNTIME_SEC |
7200 |
core/engine_tasks.py |
Max task runtime before requeue |
KG_TASK_MAX_REQUEUE |
3 |
core/engine_tasks.py |
Max requeues before a task is failed |
GRAPH_SERVICE_CHECKPOINT_INTERVAL |
60 |
core/graph_compute.py |
Spawned-engine checkpoint interval (distinct from GRAPH_SERVICE_CHECKPOINT_SECS) |
KG_GRAPH_NAME |
__bus__ |
distillation/skill_graph_distiller.py |
Target graph for skill-graph distillation |
KG_INGEST_INFLIGHT |
40 |
ingestion/batch_orchestrator.py |
Max in-flight ingest submissions |
KG_INGEST_PROFILE |
unset (= full) |
pipeline/__init__.py |
Pipeline phase profile (structural | full) — residual read, see the section C correction |
KG_EVAL_CAPTURE |
off | memory/optimization_engine.py |
Capture retrieval evals |
KG_MIN_RELEVANCE_THRESHOLD |
unset (arg/schema-pack) | retrieval/retrieval_quality.py |
Relevance-gate threshold override |
KG_TRUST_HIERARCHY |
built-in defaults | core/company_brain_runtime.py |
JSON trust-hierarchy entries (with KG_BRAIN_ENFORCE) |
GRAPH_SCHEMA_PACK |
unset | models/schema_pack_loader.py |
Schema-pack selection override |
GRAPH_SCHEMA_AUDIT_DIR / GRAPH_SCHEMA_AUDIT_VERBOSE |
unset / off | models/schema_pack_audit.py |
Schema-audit output dir / verbosity |
KG_PROVIDER_ADAPTER_BACKEND |
static |
prompting/provider_adapter.py |
Prompting provider-adapter backend |
Disposition: the address / model-choice / behavioral / profile / schema rows
(KG_SERVER_*, GRAPH_ROUTING_STRATEGY, KG_CARD_MODEL, KG_GRAPH_NAME,
KG_INGEST_*, KG_EVAL_CAPTURE, KG_MIN_RELEVANCE_THRESHOLD, KG_TRUST_HIERARCHY,
GRAPH_SCHEMA_*, KG_PROVIDER_ADAPTER_BACKEND, KG_DAEMON_LOG_LEVEL) are now
config.setting(...) reads (config.json-/env-overridable). The cadence/limit/timeout
rows (KG_*_INTERVAL, KG_TASK_*, KG_LLM_TIMEOUT/_MAX_RETRIES,
KG_EMBED_BACKFILL_*, GRAPH_SERVICE_CHECKPOINT_INTERVAL) are now named module
constants. MCP_CHILD_* flags were already fully typed on AgentConfig with no bare
reads (mcp/child_resilience.py consumes the config object).
The agent toolset gates in tools/tool_registry.py are also bare reads (not
KG-prefixed, so outside the ratchet). The optional-infra toolsets all default
OFF and are opt-in:
| Flag | Default | What it gates |
|---|---|---|
X_TOOLS |
False |
X/Grok social search + post browsing via xAI (needs XAI_API_KEY); production X/Grok deployments must set X_TOOLS=1 explicitly |
MEDIA_TOOLS |
False |
Media generation / transcription services (ECO-4.30/4.31) |
DB_TOOLS |
False |
Native database traversal tools (ECO-4.33) |
(The always-available local toolsets — WORKSPACE_TOOLS, GIT_TOOLS,
A2A_TOOLS, SCHEDULER_TOOLS, BROWSER_TOOLS, DEVELOPER_TOOLS — default
ON in the same registry.)
Coverage statement¶
Verified against agent_utilities/core/config.py on this branch by extracting
AgentConfig.model_fields programmatically: 244 fields / 242 distinct env
variables, every one documented above — sections A–F cover the KG/graph audit
surface, section G the remaining platform fields (no field was deemed
internal-only: every AgentConfig field declares an env alias and is settable
from the environment). Section H additionally documents the user-facing flags
that exist only as baseline-frozen bare os.environ reads. Drift fixed in this
pass: KG_LLM_CONCURRENCY default is 4 (doc previously said 6), and
KG_INGEST_PROFILE retains one phase-selection read despite the section C
removal note.
Known bugs surfaced by this audit¶
KG_EMBED_BACKFILL_BATCHdual default — 256 (engine_tasks.py:1040) vs 512 (:1155). Same flag, two meanings (per-tick budget vs DB fetch batch). Split into two named constants or one config field.- Scattered duplicate reads —
GRAPH_DB_URI(4×),AGENT_UTILITIES_CONFIG_DIR(5×),KG_DAEMON_ROLE/KG_INGEST_PROFILE(2× each). No single source of truth. GRAPH_TIMEOUT=1200000(20 min) makes a non-converging engine call look like an infinite hang for 20 minutes before erroring.
Target end-state¶
~96 flags → roughly the ~27 KEEP deployment items (all typed on AgentConfig),
behavior otherwise auto-detected, and scripts/check_no_env_sprawl.py blocking
regressions. A fresh graph-os-daemon with zero KG_* env vars set should ingest a
full corpus correctly.