Configuration Reference & Flag Audit¶

This is the single, authoritative inventory of every environment variable agent-utilities reads, with a verdict for each: is the flag actually needed, or should the system detect and self-configure instead?

It exists because the codebase had grown to ~96 distinct KG_* / GRAPH_* / EPISTEMIC_* / AGENT_UTILITIES_* flags — over-configuration that is overwhelming to operate and a frequent source of footguns. The rule for adding new flags is in AGENTS.md → Configuration discipline. The CI gate scripts/check_no_env_sprawl.py enforces that flags are declared on AgentConfig (core/config.py), not read with bare os.environ.get() scattered across modules.

How configuration is read (two centralized paths)¶

core/config.py (and core/paths.py) are the only files that touch os.environ. Every other module reads through one of two paths, both driven by config.json — the XDG loader injects ~/.config/agent-utilities/config.json (or AGENT_UTILITIES_CONFIG_DIR) into os.environ before anything reads it, so a JSON key graph_db_uri becomes GRAPH_DB_URI:

Typed AgentConfig field (Field(alias="MY_VAR"), read config.my_var) — for static settings parsed once at import.
config.setting("MY_VAR", default, cast=…) — the sanctioned accessor for reads that must stay live (daemon cadences, test-varied flags, runtime toggles). It reads os.environ at call time with a declared default and type coercion (inferred from the default's type, or pass cast).

Decision: field for static, setting() for dynamic — never a bare os.environ.get / os.getenv / os.environ[...] read in a module.

The fold is complete: ZERO bare env reads remain anywhere in agent_utilities/ (every prefix — KG_*, GRAPH_*, AGENT_*, VAULT_*, OTEL_*, connector creds, …). Deployment-varying/behavioral flags → setting() (or a typed field); pure load/cadence tunables → auto-sized via compute_ingest_worker_count() or named module constants. scripts/check_no_env_sprawl.py covers every prefix and its baseline (scripts/env_flag_baseline.txt) is empty — any new bare read fails CI. setting() itself lives in the dependency-free core/_env.py (re-exported by config) so it stays importable while config is still initializing.

Verdict legend - KEEP — legitimate deployment config (path / DSN / secret / port / socket). Must be read via the central config object, not bare os.environ. - AUTO — should be auto-detected/auto-sized from the runtime; the flag should be removed (or kept only as an override with an auto default). - REMOVE — always-on behavior or stale experiment; delete the flag.

A. Deployment config — KEEP (read via `AgentConfig`)¶

Flag	Default	What it sets
`GRAPH_DB_URI` / `PGGRAPH_DSN`	none	Durable L3 Postgres/pg-age DSN
`GRAPH_BACKEND` / `GRAPH_BACKEND_L1`	`tiered` / `epistemic_graph`	Backend selection
`EPISTEMIC_GRAPH_SOCKET`	`/tmp/epistemic-graph.sock`	Rust engine UDS
`GRAPH_PERSISTENCE_PATH`, `GRAPH_SERVICE_PERSIST_DIR`	data dir	L1 snapshot dir
`GRAPH_DB_HOST/PORT/NAME/USER/PASSWORD/PATH`	—	DSN parts (legacy; prefer `GRAPH_DB_URI`)
`GRAPH_FUSEKI_URL/USER/PASSWORD/DATASET`	—	SPARQL endpoint (optional backend)
`GITLAB_INSTANCES`	none	JSON list of GitLab instances to index/query — the multi-tenant source of truth shared by the KG GitLab indexer and the `gitlab-api` connector registry. Each entry `{"name":<str>,"url":<str>,"token":<str>,"verify_ssl":<bool>}`. Unset → single-host `GITLAB_URL`/`GITLAB_TOKEN` (CONCEPT:KG-2.9g)
`GRAPH_PGGRAPH_SCHEMA`	`public`	Postgres schema
`AGENT_UTILITIES_{CONFIG,DATA,CACHE,LOG,MEMORY,RUNTIME}_DIR`	XDG	Path overrides (resolved in `core/paths.py`)
`AGENT_UTILITIES_TOKEN_SECRET`	—	Run-scoped tool-token secret
`KG_DAEMON_ROLE`	`auto`	host/client/auto election (topology)
`STATE_DB_URI`	none	Externalize ALL durable state (durable-exec checkpoints, sessions/turns/goals, KG task queue) to a shared Postgres; unset keeps the zero-infra per-host SQLite files (CONCEPT:OS-5.16)
`STATE_DB_POOL_SIZE`	`8`	Max connections in the ONE shared state-store psycopg pool (CONCEPT:OS-5.16)
`TASK_QUEUE_BACKEND`	none (auto)	Ingest task queue: `sqlite`\|`postgres`\|`kafka`. Unset = auto (postgres when `STATE_DB_URI` set, else sqlite). Explicit kafka/postgres is FAIL-LOUD at startup (CONCEPT:KG-2.55)
`KG_TASKS_PARTITIONS`	`6`	Partitions ensured on the `kg_tasks` topic at startup (grow-only, never shrinks); bounds kg-ingest consumer-group parallelism (CONCEPT:KG-2.56)
`AGENT_DISPATCH_BACKEND`	`inline`	How agent turns (goal runs / orchestrator jobs) dispatch: `inline` keeps the in-process execution; `queue` publishes a session-keyed envelope onto the `agent_turns` queue (transport follows `TASK_QUEUE_BACKEND`) and returns a job handle for the `agent-dispatch-worker` fleet (CONCEPT:ORCH-1.45)
`AGENT_TURNS_PARTITIONS`	`6`	Partitions ensured on the `agent_turns` topic when Kafka carries dispatched agent turns (grow-only); bounds fleet-wide concurrent-session parallelism (CONCEPT:ORCH-1.45)
`EPISTEMIC_GRAPH_AUTOSTART`	—	Auto-spawn the engine (local `unix://` endpoint only; never remote shards)
`GRAPH_SERVICE_ENDPOINTS`	unset	Engine shard endpoints (comma/JSON list). 2+ entries = tenant-partitioned sharding via HRW over graph names; unset/1 = single-engine zero-infra default (CONCEPT:KG-2.58)
`KG_DEFAULT_GRAPH`	`__bus__`	Default named graph; in sharded mode the ambient ActorContext tenant maps it to `tenant__<t>__<base>` before HRW (CONCEPT:KG-2.58)
`KG_WATCH_DIRS`	unset	Operator document directories the file-watcher auto-ingests recursively, unified with the built-in ScholarX/research download dirs. New files are ingested and modified files re-ingested on the 5s watch tick; unchanged files delta-skip by content hash (CONCEPT:KG-2.8). Value is a JSON array or an `os.pathsep`/comma-separated list of paths (`~` expanded), e.g. `~/Documents`. config.json key: `kg_watch_dirs`. Resolved by `sdd/watcher.py:get_watched_directories()`
`GRAPH_SERVICE_AUTH_SECRET`	auto-generated	Engine HMAC secret; unset → per-install secret persisted at `data_dir()/engine_secret` (0600) (CONCEPT:OS-5.14)
`KG_ENGINE_INSECURE`	`false`	Dev opt-out of engine HMAC auth; sets `EPISTEMIC_GRAPH_ALLOW_INSECURE=1` on spawned engines (CONCEPT:OS-5.14)
`KG_AUTH_REQUIRED`	`false`	Require server-validated JWT identity for KG access — 401 without it; caller `_actor`/`_roles`/`_tenant` kwargs ignored (CONCEPT:OS-5.14)
`KG_AUTH_TOKEN`	—	JWT minting the stdio MCP process identity (validated against `AUTH_JWT_JWKS_URI`) (CONCEPT:OS-5.14)
`KG_ACL_DEFAULT_ALLOW`	`false`	With `KG_BRAIN_ENFORCE` on: allow nodes WITHOUT an ACL (escape hatch from the fail-closed default-deny) (CONCEPT:OS-5.14)
`KG_SERVED_PROFILE`	`true`	Fail-closed served-security profile for network MCP transports: refuse to start without `AUTH_JWT_JWKS_URI`, auto-enable auth + enforcement. `0` = serve open (local dev only) (CONCEPT:OS-5.14)
`KG_ENGINE_POOL_SIZE`	`0`	Max warm per-tenant engine clients per process (elastic LRU pool over KG-2.58 routing); `0` = per-use construction (CONCEPT:KG-2.62)
`KG_ENGINE_POOL_DROP_ON_EVICT`	`false`	On pool eviction also unload the tenant's named graph from the engine to reclaim memory — only safe when L3 mirrors the data (CONCEPT:KG-2.62)
`GATEWAY_METRICS`	`true`	Python-tier Prometheus middleware + `GET /metrics` on the gateway (CONCEPT:OS-5.23)
`GATEWAY_RATE_LIMIT`	`0` (off)	Per-tenant token-bucket rate limit, sustained req/s; buckets are per-process (CONCEPT:OS-5.23)
`GATEWAY_RATE_BURST`	`0` (→ 2× rate)	Token-bucket burst capacity (CONCEPT:OS-5.23)
`GATEWAY_WORKERS`	`1`	Pre-forked gateway worker processes on one shared listen socket; the flock host-lock elects ONE KG host among them (CONCEPT:OS-5.23)
`ENGINE_BREAKER_THRESHOLD`	`5`	Consecutive engine connect/timeout failures before the client circuit opens (0 = off) (CONCEPT:OS-5.23)
`ENGINE_BREAKER_COOLDOWN`	`15`	Seconds an open engine circuit waits before the half-open probe (CONCEPT:OS-5.23)
`MCP_CHILD_MAX_CONCURRENCY`	`8`	Max in-flight tool calls per multiplexer child (0 = unlimited); per-server `max_concurrency` override in `mcp_config.json` (CONCEPT:ECO-4.34)
`MCP_CHILD_QUEUE_TIMEOUT`	`30`	Seconds an excess call queues for a child slot before the typed `MCPChildBusyError`; per-server `queue_timeout` override (CONCEPT:ECO-4.34)
`MCP_CHILD_POOL_SIZE`	`1`	Session-pool size for remote (streamable-http/SSE) children — N round-robin connections for parallel calls; stdio stays single-pipe; per-server `pool_size` override (CONCEPT:ECO-4.34)
`MCP_CHILD_MAX_RESTARTS`	`5`	Auto-restarts a crashed child may consume inside the window before being parked `failed` (0 = no auto-restart); per-server `max_restarts` override (CONCEPT:ECO-4.34)
`MCP_CHILD_RESTART_WINDOW`	`300`	Sliding window (s) for the restart budget; older restarts are forgiven; per-server `restart_window` override (CONCEPT:ECO-4.34)
`MCP_CHILD_BREAKER_THRESHOLD`	`5`	Consecutive transport failures/timeouts before a child's circuit opens (typed `MCPChildCircuitOpenError`, 0 = off); per-server `breaker_threshold` override (CONCEPT:ECO-4.34)
`MCP_CHILD_BREAKER_COOLDOWN`	`15`	Seconds an open child circuit waits before the half-open probe; per-server `breaker_cooldown` override (CONCEPT:ECO-4.34)
`MCP_MULTIPLEXER_MODE`	`eager`	Tool-exposure strategy: `eager` spawns every child and exposes all tools at boot (historical); `dynamic` exposes only the `find_tools`/`load_tools`/`unload_tools`/`multiplexer_status` meta-tools + always-on children, mounting other tools on demand with a `tools/list_changed` notification (CONCEPT:ECO-4.36)
`MCP_DYNAMIC_ALWAYS_ON`	`["graph-os"]`	Child servers mounted at boot in `dynamic` mode (in addition to meta-tools); defaults to the KG server so `find_tools` can rank semantically (CONCEPT:ECO-4.36)
`MCP_DYNAMIC_TOP_K`	`8`	Default number of ranked candidates `find_tools` returns when `top_k` is unspecified (CONCEPT:ECO-4.36)
`ACTION_POLICY_PATH`	shipped default	Operational ActionPolicy YAML; empty → conservative `deploy/action-policy.default.yml` (everything mutating = approval_required). KG `governance_rule` overrides win (CONCEPT:OS-5.24)
`FLEET_RECONCILER`	`false`	Opt-in leader-only desired-state fleet reconciler tick — diff registry vs observed, converge through the ActionPolicy gate + actuator seam (CONCEPT:OS-5.25)
`FLEET_RECONCILER_INTERVAL`	`120`	Seconds between fleet-reconciler ticks (CONCEPT:OS-5.25)
`FLEET_RECONCILER_MAX_ACTIONS`	`5`	Storm guard: max convergence actions per tick, rest deferred (CONCEPT:OS-5.25)
`FLEET_REGISTRY_PATH`	shipped registry	Fleet service registry YAML; empty → `deploy/mcp-fleet.registry.yml` (CONCEPT:OS-5.25)
`FLEET_DESIRED_STATE_PATH`	unset	Optional desired-state override YAML (per-service `replicas`/`desired`/`version`) layered on the registry (CONCEPT:OS-5.25)
`FLEET_ACTUATOR`	`dryrun`	Actuator selection: `dryrun` (records intent, mutates nothing) or `docker` (reference CLI actuator). Portainer/Swarm actuators are deployment-wired via `set_fleet_actuator()` (CONCEPT:OS-5.25)
`DEPLOY_WATCH_WINDOW`	`300`	Health-watch window (s) after a deploy/restart; failure inside the window triggers the policy-gated rollback (CONCEPT:OS-5.27)
`DEPLOY_WATCH_POLL`	`15`	Seconds between health probes inside a deploy watch (CONCEPT:OS-5.27)
`FLEET_AUTOSCALER`	`false`	Opt-in leader-only reactive replica autoscaler tick — load signal → registry-declared min/max bounds → policy-gated `scale_service` + deploy watch (CONCEPT:OS-5.29)
`FLEET_AUTOSCALER_INTERVAL`	`60`	Seconds between autoscaler ticks (CONCEPT:OS-5.29)
`SCALING_PROMETHEUS_URL`	unset	Prometheus base URL for autoscaling signals (instant `/api/v1/query` GETs); unset → zero-infra in-process gauges; injected provider via `set_scaling_signal_provider()` wins (CONCEPT:OS-5.29)

These genuinely vary per host and aren't derivable. Action: ensure each is a typed AgentConfig field; remove duplicate bare reads (GRAPH_DB_URI is read in 4 places, AGENT_UTILITIES_CONFIG_DIR in 5).

B. Daemon on/off toggles, all default ON — REMOVED (Phase 3) ✓¶

Done. The six always-on toggles below were deleted and collapsed behind a single KG_DEV_MODE switch on AgentConfig, read through one engine_tasks._kg_dev_mode() helper that gates the maintenance scheduler + embedding-backfill startup. Production keeps every daemon on; KG_DEV_MODE=1 silences the lot. Removed: KG_EMBED_BACKFILL, KG_ENRICH_DAEMON, KG_FILE_WATCH, KG_HYGIENE_DAEMON, KG_TASK_REAPER_DAEMON, KG_RECONCILE_DURABLE. Also fixed the KG_EMBED_BACKFILL_BATCH dual-default bug → two named constants (_EMBED_BACKFILL_BUDGET=256, _EMBED_BACKFILL_FETCH=512). Sprawl baseline 95 → 88. (The KG_CONCEPT_CODE_LINK toggle lives in another module and is still pending.)

Original inventory (for reference):

Flag	Default	Gated thread/job (`engine_tasks.py`)
`KG_EMBED_BACKFILL`	`1`	vector-embedding backfill drain (L402)
`KG_ENRICH_DAEMON`	`1`	semantic enrichment tick (L618)
`KG_FILE_WATCH`	`1`	SDD/skills/config file-watch (L635)
`KG_HYGIENE_DAEMON`	`1`	memory decay/dedup (L645)
`KG_TASK_REAPER_DAEMON`	`1`	zombie-task recovery (L657)
`KG_RECONCILE_DURABLE`	`1`	L1→L2/L3 autoheal (L609)
`KG_CONCEPT_CODE_LINK`	`1`	concept↔code bridge
`GRAPH_DIRECT_DISPATCH`	`true`	sync dispatch
`KG_RETRIEVAL_QUALITY_GATE`	`true`	relevance filter

Nobody runs these off in production. Action: delete the env gates; if a dev escape hatch is wanted, a single KG_DEV_MODE=1 disables all background daemons.

B.1 Safety overrides — KEEP (typed on `AgentConfig`)¶

Flag	Default	What it gates
`KG_ALLOW_FULL_SCAN`	`false`	Permit an unscoped Cypher query to enumerate the whole graph (ORCH-1.40). Off by default so a buggy unscoped query can never silently full-scan; deliberate opt-in only. Typed `config.kg_allow_full_scan`, read in `backends/epistemic_graph_backend.py`.

C. Ingest-throughput knobs — REMOVED ✓¶

Done. All three deleted: - KG_INGEST_FEATURES / KG_INGEST_PROFILE → per-repo call-graph community detection is now always on. The hang risk that motivated the opt-out is fixed at the source: the engine's community_detection is deterministically bounded (15s wall-clock + iteration cap, epistemic-graph algorithms.rs), and make_community_fn loads its scratch tenant in one batch_update round-trip instead of per-element RPCs. Correction (verified against code): one KG_INGEST_PROFILE read survives — pipeline phase selection in knowledge_graph/pipeline/__init__.py (select_phases; values structural | full, unset = full). It no longer gates community detection. It is tracked in the bare-read baseline (see section H). - KG_BULK_INGEST → the maintenance scheduler auto-detects a bulk ingest from the durable submission-queue depth (_submission_queue.get_queue_size() > _BULK_QUEUE_THRESHOLD) and defers its whole-graph passes per-tick, instead of a manual startup flag.

D. Performance tunables — keep as constants / deployment config (auto-sizing deferred)¶

The one real defect here — the KG_EMBED_BACKFILL_BATCH dual-default — is fixed (Phase 3: two named constants). The rest are left as-is by design: the worker pool already auto-sizes (engine_tasks.py CPU 36% + mem cap), and the remaining batch sizes / intervals have correct universal defaults. Per the Configuration discipline rule, a tunable with a good default should be a constant, not a knob — but mechanically converting every one to a CPU-derived auto-sizer is speculative churn with behaviour-change risk and little payoff, so it is not pursued. GRAPH_TIMEOUT's 20-minute default is noted (it made the old community-detection hang look infinite); now moot since the engine bounds the call itself.

Flag	Default	Notes
`KG_LLM_CONCURRENCY`	4	typed on `AgentConfig` (`kg_llm_concurrency`); max concurrent LLM calls for KG operations — set to match the inference endpoint's parallel capacity
`KG_PARSE_BATCH`	128	constant
`KG_ENRICH_BATCH` / `KG_ENRICH_MAX_BATCHES`	16 / 8	constants
`KG_EMBED_BACKFILL_BATCH`	*256 and* 512 (BUG)**	read twice with different defaults (L1040, L1155) — unify
`KG_BACKGROUND_MAX_CONCURRENT`	2	auto
`GRAPH_POOL_MIN/MAX`	2 / 10	auto from cpu
`KG_CHAT_CONCURRENCY`	8	auto
`KG_*_INTERVAL` (enrich/file_watch/embed/evolution/golden)	20–3600	constants unless deployment-varying
`GRAPH_TIMEOUT`	1200000 ms	20-min RPC timeout — far too long; the root of "hangs look infinite"

E. Experiment / feature gates¶

KG_GOLDEN_* (10 flags) — collapsed onto AgentConfig ✓. Every KG_GOLDEN_* / KG_BREADTH_* read was moved off bare os.environ onto typed AgentConfig fields — opt-in, all off by default, single typed source of truth.

Flag	Default	What it gates
`KG_LOOP`	`False`	enable the autonomous golden-loop daemon cycle (intake→acquire→resolve→distil→synthesize)
`KG_LOOP_DISTILL`	`False`	distillation step of the golden loop
`KG_LOOP_BREADTH`	`False`	breadth ingest step (auto-ingest of the roots below)
`KG_LOOP_STANDARDIZE`	`False`	standardization pass of the golden loop
`KG_GOLDEN_AUTO_MERGE`	`False`	governed auto-merge of promoted proposals
`KG_GOLDEN_MERGE_THRESHOLD`	`None`	score threshold gating auto-merge
`KG_LOOP_INTERVAL`	`3600`	daemon tick interval (s)
`KG_LOOP_TOPICS`	`5`	hot topics processed per cycle
`KG_BREADTH_LIBRARY_ROOTS`	`""`	comma-separated OSS library roots auto-ingested by the breadth step (deployment-specific; empty = no-op)
`KG_BREADTH_REPO_ROOTS`	`""`	comma-separated code-repo roots auto-ingested by the breadth step

Family	Count	Notes
`KG_EA_WRITEBACK`, `KG_ENABLE_HARD_NEGATIVE_MINING`, `KG_BRAIN_ENFORCE`, `KG_RESEARCH_EXTERNAL`, `KG_PROCESS_WRITEBACK`	5	remaining experiment gates — graduate (always-on) or delete

KG_PROCESS_WRITEBACK — outbound process-intelligence writeback (CONCEPT:KG-2.8, default off). Opt-in because it performs outbound mutating calls into external systems (Camunda process-instance variables, ARIS model attributes) — exactly the "expensive / side-effecting on every run" case the opt-in tier is reserved for. When set, resolve_process_writeback pushes the KG's per-process intelligence (capability/ code lineage, OWL inferences, operational signals, glossary/data lineage) back onto the live process world via the graph_analyze action=process_writeback surface. The connector connection/credentials themselves are not agent-utilities config — they live in the camunda-mcp / aris-mcp packages' own environment (CAMUNDA7_*/CAMUNDA8_*, ARIS_*), read by each connector's auth.get_client().

EVOLUTION_WORKTREE_ROOT — evolution→branch bridge (CONCEPT:AHE-3.21), typed on AgentConfig (evolution_worktree_root). Root directory the LocalBranchPublisher creates fresh git worktrees under when publishing a promoted proposal as a reviewable local branch. Empty (default) resolves to data_dir()/evolution_worktrees — publication never writes into a canonical checkout's working tree.

KG_FAILURE_* — Failure-Driven Evolution (CONCEPT:AHE-3.18), typed on AgentConfig, opt-in, all off by default. The boolean gates are parsed via to_boolean ("True"/"False", consistent with the fleet's other toggles). See failure_driven_evolution.md.

Flag	Default	Notes
`KG_FAILURE_EVOLUTION`	`False`	enable the daemon `failure_ingest` tick (pull Langfuse failures → remediation)
`KG_FAILURE_EVOLUTION_INTERVAL`	`3600`	daemon tick interval (s)
`KG_FAILURE_EVOLUTION_WINDOW`	`86400`	telemetry look-back window (s)
`KG_FAILURE_REGRESSION_DATASET`	`False`	enable the dataset-based regression path
`KG_DSPY_OPTIMIZATION`	`False`	enable the daemon `dspy_optimization` tick — propose-only DSPy optimization sweep over the self-supervised targets (CONCEPT:AHE-3.46)
`KG_DSPY_OPTIMIZATION_INTERVAL`	`3600`	DSPy optimization sweep interval (s)

KG_FUSEKI_* — Ontology distribution to Apache Jena Fuseki (CONCEPT:KG-2.52), typed on AgentConfig, opt-in. The fuseki_publish maintenance tick pushes the bundled ontology modules (the authoritative TBox) to an optional enterprise Fuseki triplestore for SPARQL federation. Off by default — Fuseki is optional infrastructure.

Flag	Default	Notes
`KG_FUSEKI_PUBLISH`	`False`	enable the daemon `fuseki_publish` tick
`KG_FUSEKI_ENDPOINT`	`None`	Fuseki URL; `None` defers to the publisher (`FUSEKI_ENDPOINT`, then localhost)
`KG_FUSEKI_PUBLISH_INTERVAL`	`3600`	daemon tick interval (s)

KG_WORKFLOW_SHAPE_GATE — execution-time workflow ontology gate (CONCEPT:ORCH-1.42), typed on AgentConfig, default ON. execute_workflow AND its background twin dispatch_workflow (REST twin /api/graph/orchestrate/dispatch-workflow) SHACL-validate the stored WorkflowDefinition (+ steps) against the governance shapes before dispatch and refuse malformed definitions with a structured violation report; cheap and LLM-free. The companion permission gate (ontology permissioning ACL on the workflow node) is governed by the existing KG_BRAIN_ENFORCE flag (OS-5.14 fail-closed semantics), not a new one.

Flag	Default	Notes
`KG_WORKFLOW_SHAPE_GATE`	`True`	SHACL-validate stored workflows before execution

Langfuse (CONCEPT:AHE-3.18 / AHE-3.0) — official SDK variable names only. The host variable is LANGFUSE_HOST (the non-standard LANGFUSE_BASE_URL fallback was removed — greenfield). Resolved through AgentConfig.langfuse_host / langfuse_public_key / langfuse_secret_key.

Flag	Default	Notes
`LANGFUSE_HOST`	`https://cloud.langfuse.com`	Langfuse base URL (read + OTEL write paths)
`LANGFUSE_PUBLIC_KEY` / `LANGFUSE_SECRET_KEY`	`None`	project API keypair

F. Testing — KEEP¶

Flag	Default	Notes
`AGENT_UTILITIES_TESTING`	`False`	test-mode guard (read in ~8 places — centralize)
`AGENT_UTILITIES_GWT_STRICT`	`""`	global-workspace strict test mode

G. Complete `AgentConfig` inventory — platform fields beyond the KG/graph flag audit¶

Sections A–F are the original KG_*/GRAPH_* sprawl audit. AgentConfig (core/config.py, pydantic-settings) additionally carries the platform's general configuration surface. Totals, extracted programmatically from AgentConfig.model_fields: 244 fields, 242 distinct environment variables (SECRETS_VAULT_URL and SECRETS_VAULT_MOUNT each bind two fields — vault_url/ secrets_vault_url and vault_mount/secrets_vault_mount — a legacy duplication kept for compatibility).

Environment-name resolution: every field declares an explicit alias which IS its environment variable name (no env_prefix; matching is case-insensitive). Sources in precedence order: constructor args → environment → .env → nested secrets file (AGENT_SECRETS_FILE) → Docker secrets (/run/secrets). All fields below are therefore environment-settable; none are internal-only.

G.1 Model providers & routing¶

Flag	Default	What it sets
`CHAT_MODELS`	`[]`	JSON list of chat-model configs (id/provider/base_url/api_key/intelligence_level); drives `default_chat_model` / `lite_chat_model` / `super_chat_model`
`EMBEDDING_MODELS`	`[]`	JSON list of embedding-model configs (first entry = default)
`OPENAI_API_KEY` / `OPENAI_BASE_URL`	`None`	OpenAI fallback credentials for ad-hoc model creation
`ANTHROPIC_API_KEY`	`None`	Anthropic fallback API key
`GEMINI_API_KEY`	`None`	Google Gemini fallback API key
`GROQ_API_KEY`	`None`	Groq fallback API key
`MISTRAL_API_KEY`	`None`	Mistral fallback API key
`HUGGING_FACE_API_KEY`	`None`	Hugging Face fallback API key
`DEEPSEEK_API_KEY` / `DEEPSEEK_BASE_URL`	`None`	DeepSeek fallback credentials
`VLLM_BASE_URL`	`None`	Dedicated vLLM inference server base URL
`MODEL_REGISTRY_PATH`	`None`	YAML/JSON model-registry file
`MODEL_ROLE_ROUTING`	`{}`	Role→{tier,tags} overrides for planner/generator/learner/judge selection (CONCEPT:ORCH-1.27); empty roles use `models/model_registry.py` defaults
`ROUTING_STRATEGY`	`hybrid`	Graph routing strategy
`ROUTING_PERCENTILE`	`50.0`	Routing percentile tunable

G.2 LLM request defaults¶

Flag	Default	What it sets
`MAX_TOKENS`	`16384`	Default completion token cap
`TEMPERATURE`	`0.7`	Sampling temperature
`TOP_P`	`1.0`	Nucleus sampling
`TIMEOUT`	`32400`	LLM request timeout (s)
`TOOL_TIMEOUT`	`32400`	Tool-call timeout (s)
`PARALLEL_TOOL_CALLS`	`true`	Allow parallel tool calls
`SEED`	`None`	Deterministic sampling seed
`PRESENCE_PENALTY` / `FREQUENCY_PENALTY`	`0.0`	Repetition penalties
`LOGIT_BIAS`	`None`	Token logit-bias map (JSON)
`STOP_SEQUENCES`	`None`	Stop sequences (JSON list)
`EXTRA_HEADERS` / `EXTRA_BODY`	`None`	Extra provider request headers/body (JSON)

G.3 Agent identity & HTTP server¶

Flag	Default	What it sets
`DEFAULT_AGENT_NAME`	package name	Agent display name
`AGENT_DESCRIPTION`	package description	Agent description
`AGENT_SYSTEM_PROMPT`	`None`	System prompt override
`WORKSPACE_PATH`	`None`	Workspace root override
`HOST`	`0.0.0.0`	Gateway bind address
`PORT`	`9000`	Gateway port
`DEBUG`	`false`	Debug mode
`ENABLE_WEB_UI`	`false`	Serve the web UI
`ENABLE_TERMINAL_UI`	`false`	Terminal UI mode (disables `GATEWAY_WORKERS>1`)
`ENABLE_WEB_LOGS`	`true`	Web log streaming
`ENABLE_ACP`	`false`	Agent Client Protocol adapter
`ACP_PORT`	`8001`	ACP port
`ACP_SESSION_ROOT`	`.acp-sessions`	ACP session storage dir
`DEFAULT_TERMINAL_AGENT`	`agent-terminal-ui`	Terminal agent binary
`MCP_URL`	`None`	Remote MCP server URL the agent attaches to
`MCP_CONFIG`	`None`	Path to `mcp_config.json`
`AGENT_API_KEY`	`None`	Static API key for gateway auth
`ENABLE_API_AUTH`	`false`	Require the API key
`MAX_UPLOAD_SIZE`	`10485760`	Upload cap (bytes)
`ALLOWED_ORIGINS`	`None` (= `*`)	CORS origins, comma-separated
`ALLOWED_HOSTS`	`None`	TrustedHostMiddleware hosts, comma-separated

G.4 Identity, JWT & delegation¶

AUTH_JWT_JWKS_URI is in section A's orbit (OS-5.14); its companions:

Flag	Default	What it sets
`AUTH_JWT_ISSUER`	`None`	Expected JWT issuer claim
`AUTH_JWT_AUDIENCE`	`None`	Expected JWT audience claim
`OIDC_CONFIG_URL`	`None`	OIDC discovery URL (any compliant IdP)
`OIDC_CLIENT_ID` / `OIDC_CLIENT_SECRET`	`None`	OAuth 2.0 client credentials
`ENABLE_DELEGATION`	`false`	RFC 8693 token exchange for downstream APIs (CONCEPT:ECO-4.0)
`AUDIENCE`	`None`	Target audience for delegated tokens
`DELEGATED_SCOPES`	`api`	Space-separated delegation scopes
`FLEET_EVENTS_TOKEN`	`None`	Shared secret for `POST /api/fleet/events` webhook ingress (`X-Fleet-Events-Token` header); unset = no token required (CONCEPT:OS-5.15)

G.5 Secrets backends¶

Flag	Default	What it sets
`SECRETS_BACKEND`	`inmemory`	`inmemory` \| `sqlite` \| `vault`
`SECRETS_SQLITE_PATH`	`None`	SQLite secrets DB path
`SECRETS_VAULT_URL`	`None`	HashiCorp Vault / OpenBao URL (binds both `vault_url` and `secrets_vault_url`)
`SECRETS_VAULT_MOUNT`	`secret`	KV v2 mount (binds both `vault_mount` and `secrets_vault_mount`)
`VAULT_AUTH_METHOD`	`auto`	`oidc` \| `approle` \| `token` \| `kubernetes` \| `auto`
`VAULT_AUTH_MOUNT`	`jwt`	Auth-method mount path
`VAULT_ROLE`	`None`	Role for OIDC/JWT or Kubernetes login
`VAULT_PATH_PREFIX`	`None`	KV path prefix (e.g. `agents/mcp/`)

G.6 Graph service & KG runtime (fields not already in A–F)¶

Flag	Default	What it sets
`GRAPH_PERSISTENCE_TYPE`	`file`	L1 persistence mode
`GRAPH_BACKEND_L2`	`None` (auto)	Explicit L2 backend; unset auto-selects (LadybugDB, or PostgreSQL when a DSN is configured)
`GRAPH_COMPUTE_BACKEND`	`rust`	Compute tier selection
`GRAPH_SERVICE_SOCKET`	`None` (XDG runtime dir)	Engine UDS path; default `$XDG_RUNTIME_DIR/epistemic-graph.sock`
`GRAPH_SERVICE_TCP_ADDR`	`None`	Engine TCP address (e.g. `0.0.0.0:9100`); `GRAPH_SERVICE_ENDPOINTS` overrides both
`GRAPH_SERVICE_CHECKPOINT_SECS`	`300`	Engine auto-checkpoint interval (0 = off)
`GRAPH_SERVICE_PERSIST_ON_SHUTDOWN`	`true`	Serialize all graphs on engine shutdown
`GRAPH_DIRECT_EXECUTION`	`true`	AG-UI/ACP adapters bypass the LLM tool-call hop and invoke graph execution directly
`GRAPH_ROUTER_TIMEOUT` / `GRAPH_VERIFIER_TIMEOUT`	`300`	Router/verifier timeouts (s)
`ENABLE_LLM_VALIDATION`	`false`	LLM validation pass
`ENABLE_KG_EMBEDDINGS`	`true`	KG embedding generation
`KG_EMBEDDING_DIM`	`768`	Must match the embedding model's output dimension; the schema vector column size derives from it
`KG_BACKUPS`	`3`	KG backup retention count
`KG_INGESTION_WORKERS`	`None` (auto)	Ingestion worker count override; unset auto-sizes
`KG_ANALYSIS_MAX_DEPTH`	`2`	Max recursion depth for background research daemons
`MAX_RECURSION_DEPTH`	`2`	Graph recursion depth tunable
`KNOWLEDGE_GRAPH_SYNC_BACKGROUND`	`true`	Background task workers for the KG pipeline
`ENABLE_SDD_WATCHER`	`true`	Plan/task watcher thread in the KG MCP server
`KG_ANOMALY_CONSUMER`	`true`	Drain unconsumed PerformanceAnomaly nodes into failure_gap topics; LLM-free, bounded, propose-only (CONCEPT:AHE-3.19)
`SPARQL_ENDPOINTS`	`["https://query.wikidata.org/sparql"]`	External SPARQL endpoints to federate (CONCEPT:KG-2.7)
`JENA_FUSEKI_URL`	`None`	Local Jena Fuseki URL (distinct from the `KG_FUSEKI_*` publish tick in section E)
`KAFKA_BOOTSTRAP_SERVERS`	`None`	Kafka brokers (task-queue/event transport; one of the three scale knobs in `docs/scaling/capacity_model.md`)
`KAFKA_TOPIC`	`None`	Default Kafka topic for messaging/event ingestion
`NATS_URL`	`None`	NATS broker URL

G.7 Observability exporters¶

The gateway Prometheus flags (GATEWAY_METRICS etc.) are in section A; the metric series themselves are catalogued in ../reference/metrics.md.

Flag	Default	What it sets
`ENABLE_OTEL`	`false`	OpenTelemetry tracing
`OTEL_EXPORTER_OTLP_ENDPOINT`	`None`	OTLP collector endpoint
`OTEL_EXPORTER_OTLP_HEADERS`	`None`	OTLP headers
`OTEL_EXPORTER_OTLP_PUBLIC_KEY` / `OTEL_EXPORTER_OTLP_SECRET_KEY`	`None`	OTLP keypair (Langfuse-style basic auth)
`OTEL_EXPORTER_OTLP_PROTOCOL`	`http/protobuf`	OTLP protocol
`LANGFUSE_DATASET_CAPTURE_THRESHOLD`	`0.0`	Score threshold for dataset capture (AHE-3.18 regression datasets)
`LANGFUSE_LATENCY_BASELINE_SECONDS`	`60`	Latency baseline for anomaly scoring
`LANGFUSE_TOKEN_BASELINE`	`20000`	Token-usage baseline for anomaly scoring
`LANGFUSE_VERIFIER_FALLBACK_LIMIT`	`1`	Verifier fallback attempts

G.8 A2A (agent-to-agent)¶

Flag	Default	What it sets
`A2A_BROKER`	`in-memory`	A2A broker backend
`A2A_BROKER_URL`	`None`	Broker URL when not in-memory
`A2A_STORAGE`	`in-memory`	A2A storage backend
`A2A_STORAGE_URL`	`None`	Storage URL when not in-memory
`A2A_CONFIG`	`None`	`a2a_config.json` path for external agent discovery (CONCEPT:ECO-4.0)
`A2A_REFRESH_INTERVAL`	`300`	Agent-card re-fetch interval (s)

G.9 Orchestration, scheduler & guardrails¶

Flag	Default	What it sets
`MIN_CONFIDENCE`	`0.4`	Minimum confidence gate
`VALIDATION_MODE`	`false`	Validation-only execution mode
`APPROVAL_TIMEOUT`	`0`	Approval wait timeout (s; 0 = no wait)
`COGNITIVE_SCHEDULER_ENABLED`	`true`	Priority-aware agent scheduler (CONCEPT:OS-5.2)
`MAX_CONCURRENT_AGENTS`	`5`	Concurrent specialist agents (CONCEPT:OS-5.2)
`AGENT_TOKEN_QUOTA`	`100000`	Per-agent token budget before preemption (CONCEPT:OS-5.2)
`PREEMPTION_THRESHOLD_PCT`	`0.85`	Quota usage triggering preemption warning
`AGENT_POLICIES_PATH`	`None`	`agent_policies.json` for identity-based governance
`PERMISSIONS_SIGNING_KEY`	`None` (auto)	HMAC key for agent identity tokens; auto-generated if unset
`SPECIALIST_REGISTRY_PATH`	`None`	Local specialist registry dir
`MAX_PARALLEL_AGENTS`	`60`	Global engine-wide execution semaphore (CONCEPT:ORCH-1.8)
`WORKER_POOL_SIZE`	`8`	Workers per node for agent turns / graph mutations; active-concurrency scale knob (CONCEPT:ORCH-1.8)
`PARALLEL_BATCH_SIZE`	`25`	Agents per execution wave
`SYNTHESIS_STRATEGY`	`auto`	`auto` \| `flat` \| `hierarchical` \| `progressive` \| `rlm` (CONCEPT:ORCH-1.26)
`SYNTHESIS_RATIO`	`10`	Outputs per hierarchical synthesis sub-node
`AGENT_EXECUTION_TIMEOUT`	`120`	Per-agent timeout (s)
`CIRCUIT_BREAKER_THRESHOLD`	`3`	Consecutive failures disabling an agent type
`ENABLE_PROGRESSIVE_SYNTHESIS`	`true`	Streaming synthesis as agents complete
`HOMEOSTATIC_DOWNGRADE_ENABLED`	`true`	Auto model-tier downgrade under budget pressure (CONCEPT:OS-5.2)
`ADVERSARIAL_VERIFICATION`	`false`	Adversarial verification pass — opt-in, doubles verification cost (CONCEPT:AHE-3.1)
`MAINTENANCE_TOKEN_BUDGET`	`0` (unlimited)	Token budget for the autonomous maintenance cron
`MAINTENANCE_PRIORITY`	`LOW`	Maintenance task priority (LOW/MEDIUM/HIGH)
`WATCHDOG_PATTERNS`	`pyproject.toml, mcp_config.json, requirements*.txt`	File patterns for the file-watcher trigger (CONCEPT:OS-5.0)
`TOOL_GUARD_MODE`	`strict`	Sensitive-tool guard mode
`SENSITIVE_TOOL_PATTERNS`	67 regexes	Tool-name patterns treated as mutating/sensitive (delete/exec/deploy/...); override only to extend

G.10 Skills¶

Flag	Default	What it sets
`CUSTOM_SKILLS_DIRECTORY`	`None`	Extra skills directory
`SKILL_TYPES`	`None`	Skill-type filter (JSON list)

G.11 Native messaging backends (CONCEPT:ECO-4.0)¶

Flag	Default	What it sets
`MESSAGING_ENABLED_BACKENDS`	`[]`	Backend IDs to auto-connect (e.g. `["discord","slack"]`)
`MESSAGING_KG_INGEST`	`true`	Auto-ingest all inbound/outbound messages into the KG
`MESSAGING_KG_MEMORY_TYPE`	`episodic`	KG memory tier for inbound messages (`episodic`/`semantic`/`procedural`)
`MESSAGING_ROUTE_TO_PLANNER`	`true`	Route inbound events to the Planner Graph Agent
`MESSAGING_DISCORD_TOKEN`	`None`	Discord bot token (also reads `DISCORD_BOT_TOKEN`)
`MESSAGING_SLACK_TOKEN`	`None`	Slack bot token `xoxb-...` (also reads `SLACK_BOT_TOKEN`)
`MESSAGING_SLACK_APP_TOKEN`	`None`	Slack app-level token `xapp-...` (Socket Mode)
`MESSAGING_TELEGRAM_TOKEN`	`None`	Telegram bot token (also reads `TELEGRAM_BOT_TOKEN`)
`MESSAGING_WHATSAPP_TOKEN`	`None`	WhatsApp API token (also reads `WHATSAPP_TOKEN`)
`MESSAGING_WHATSAPP_PHONE_NUMBER_ID`	`None`	WhatsApp Business phone number ID
`MESSAGING_WHATSAPP_USE_BUSINESS_API`	`false`	Official Business API vs neonize bridge
`MESSAGING_TEAMS_APP_ID` / `MESSAGING_TEAMS_APP_SECRET`	`None`	Microsoft Teams Bot Framework credentials
`MESSAGING_GOOGLECHAT_TOKEN`	`None`	Google Chat service-account JSON path
`MESSAGING_GOOGLEMEET_TOKEN`	`None`	Google Meet service-account JSON path
`MESSAGING_MATTERMOST_TOKEN` / `MESSAGING_MATTERMOST_URL`	`None`	Mattermost token + server URL
`MESSAGING_MATRIX_TOKEN` / `MESSAGING_MATRIX_HOMESERVER` / `MESSAGING_MATRIX_USER_ID`	`None`	Matrix credentials
`MESSAGING_IRC_SERVER`	`None`	IRC server
`MESSAGING_IRC_PORT`	`6667`	IRC port
`MESSAGING_IRC_NICKNAME`	`agent_bot`	IRC nickname
`MESSAGING_IRC_CHANNELS`	`[]`	IRC channels to auto-join
`MESSAGING_SIGNAL_TOKEN`	`None`	Signal phone number (semaphore-bot)
`MESSAGING_LINE_TOKEN`	`None`	LINE channel access token
`MESSAGING_TWITCH_TOKEN`	`None`	Twitch OAuth token
`MESSAGING_TWITCH_CHANNELS`	`[]`	Twitch channels to join
`MESSAGING_SYNOLOGY_WEBHOOK_URL`	`None`	Synology Chat webhook URL
`MESSAGING_VOICECALL_APP_ID` / `MESSAGING_VOICECALL_TOKEN` / `MESSAGING_VOICECALL_FROM_NUMBER`	`None`	Twilio voice/SMS credentials (account SID / auth token / from number)
`MESSAGING_NEXTCLOUD_URL` / `MESSAGING_NEXTCLOUD_TOKEN` / `MESSAGING_NEXTCLOUD_APP_ID`	`None`	Nextcloud Talk credentials (URL / app token / username)

H. Former bare `KG_`/`GRAPH_`/`EPISTEMIC_*` reads (now folded)¶

Every governed (KG_*/GRAPH_*/EPISTEMIC_*) bare os.environ read has been folded off os.environ onto a centralized path, per Configuration discipline:

Deployment-varying / behavioral / test-varied → config.setting("VAR", default) (live, config.json-driven). Still fully settable — set var in config.json or VAR in the environment.
Pure load tunables (concurrency/batch) → auto-sized via compute_ingest_worker_count().
Single-value cadences/limits/timeouts → named module constants (no knob).

scripts/check_no_env_sprawl.py now ratchets all-prefix bare reads against scripts/env_flag_baseline.txt (the governed reads are gone from it; what remains is the non-KG burn-down — AGENT_*, VAULT_*, OTEL_*, connector creds, …). The table below is the reference for these settings and how each is now resolved (defaults unchanged); for the setting() rows, the value is config.json-/env-overridable:

Flag	Default	Read in	What it sets
`KG_SERVER_HOST` / `KG_SERVER_PORT`	`127.0.0.1` / `8100`	`agent/factory.py`, `mcp/kg_coordinator.py`, `backends/contrib/ladybug_backend.py`, `core/config.py`	KG coordinator server address
`KG_DAEMON_LOG_LEVEL`	`INFO`	`gateway/daemon.py`	Daemon log level
`GRAPH_ROUTING_STRATEGY`	`hybrid`	`knowledge_graph/core/engine.py`	Engine-side routing strategy (overlaps `ROUTING_STRATEGY` on `AgentConfig`)
`KG_CARD_MODEL`	`lite`	`core/engine_tasks.py`	`lite` or `heavy` model for enrichment cards
`KG_LLM_TIMEOUT` / `KG_LLM_MAX_RETRIES`	`30` / `1`	`enrichment/cards.py`	Enrichment LLM call timeout (s) / retries
`KG_EMBED_BACKFILL_INTERVAL` / `KG_EMBED_BACKFILL_BUSY_SLEEP`	`30` / `1`	`core/engine_tasks.py`	Embedding-backfill idle/busy sleep (s)
`KG_RECONCILE_INTERVAL`	`900`	`core/engine_tasks.py`	L1→L2/L3 reconcile tick (s)
`KG_HYGIENE_INTERVAL`	`86400`	`core/engine_tasks.py`	Memory decay/dedup tick (s)
`KG_TASK_REAPER_INTERVAL`	`120`	`core/engine_tasks.py`	Zombie-task reaper tick (s)
`KG_TASK_ORPHAN_GRACE_SEC`	`90`	`core/engine_tasks.py`	Grace before an orphaned task is reclaimed
`KG_TASK_MAX_RUNTIME_SEC`	`7200`	`core/engine_tasks.py`	Max task runtime before requeue
`KG_TASK_MAX_REQUEUE`	`3`	`core/engine_tasks.py`	Max requeues before a task is failed
`GRAPH_SERVICE_CHECKPOINT_INTERVAL`	`60`	`core/graph_compute.py`	Spawned-engine checkpoint interval (distinct from `GRAPH_SERVICE_CHECKPOINT_SECS`)
`KG_GRAPH_NAME`	`__bus__`	`distillation/skill_graph_distiller.py`	Target graph for skill-graph distillation
`KG_INGEST_INFLIGHT`	`40`	`ingestion/batch_orchestrator.py`	Max in-flight ingest submissions
`KG_INGEST_PROFILE`	unset (= `full`)	`pipeline/__init__.py`	Pipeline phase profile (`structural` \| `full`) — residual read, see the section C correction
`KG_EVAL_CAPTURE`	off	`memory/optimization_engine.py`	Capture retrieval evals
`KG_MIN_RELEVANCE_THRESHOLD`	unset (arg/schema-pack)	`retrieval/retrieval_quality.py`	Relevance-gate threshold override
`KG_TRUST_HIERARCHY`	built-in defaults	`core/company_brain_runtime.py`	JSON trust-hierarchy entries (with `KG_BRAIN_ENFORCE`)
`GRAPH_SCHEMA_PACK`	unset	`models/schema_pack_loader.py`	Schema-pack selection override
`GRAPH_SCHEMA_AUDIT_DIR` / `GRAPH_SCHEMA_AUDIT_VERBOSE`	unset / off	`models/schema_pack_audit.py`	Schema-audit output dir / verbosity
`KG_PROVIDER_ADAPTER_BACKEND`	`static`	`prompting/provider_adapter.py`	Prompting provider-adapter backend

Disposition: the address / model-choice / behavioral / profile / schema rows (KG_SERVER_*, GRAPH_ROUTING_STRATEGY, KG_CARD_MODEL, KG_GRAPH_NAME, KG_INGEST_*, KG_EVAL_CAPTURE, KG_MIN_RELEVANCE_THRESHOLD, KG_TRUST_HIERARCHY, GRAPH_SCHEMA_*, KG_PROVIDER_ADAPTER_BACKEND, KG_DAEMON_LOG_LEVEL) are now config.setting(...) reads (config.json-/env-overridable). The cadence/limit/timeout rows (KG_*_INTERVAL, KG_TASK_*, KG_LLM_TIMEOUT/_MAX_RETRIES, KG_EMBED_BACKFILL_*, GRAPH_SERVICE_CHECKPOINT_INTERVAL) are now named module constants. MCP_CHILD_* flags were already fully typed on AgentConfig with no bare reads (mcp/child_resilience.py consumes the config object).

The agent toolset gates in tools/tool_registry.py are also bare reads (not KG-prefixed, so outside the ratchet). The optional-infra toolsets all default OFF and are opt-in:

Flag	Default	What it gates
`X_TOOLS`	`False`	X/Grok social search + post browsing via xAI (needs `XAI_API_KEY`); production X/Grok deployments must set `X_TOOLS=1` explicitly
`MEDIA_TOOLS`	`False`	Media generation / transcription services (ECO-4.30/4.31)
`DB_TOOLS`	`False`	Native database traversal tools (ECO-4.33)

(The always-available local toolsets — WORKSPACE_TOOLS, GIT_TOOLS, A2A_TOOLS, SCHEDULER_TOOLS, BROWSER_TOOLS, DEVELOPER_TOOLS — default ON in the same registry.)

Coverage statement¶

Verified against agent_utilities/core/config.py on this branch by extracting AgentConfig.model_fields programmatically: 244 fields / 242 distinct env variables, every one documented above — sections A–F cover the KG/graph audit surface, section G the remaining platform fields (no field was deemed internal-only: every AgentConfig field declares an env alias and is settable from the environment). Section H additionally documents the user-facing flags that exist only as baseline-frozen bare os.environ reads. Drift fixed in this pass: KG_LLM_CONCURRENCY default is 4 (doc previously said 6), and KG_INGEST_PROFILE retains one phase-selection read despite the section C removal note.

Known bugs surfaced by this audit¶

KG_EMBED_BACKFILL_BATCH dual default — 256 (engine_tasks.py:1040) vs 512 (:1155). Same flag, two meanings (per-tick budget vs DB fetch batch). Split into two named constants or one config field.
Scattered duplicate reads — GRAPH_DB_URI (4×), AGENT_UTILITIES_CONFIG_DIR (5×), KG_DAEMON_ROLE/KG_INGEST_PROFILE (2× each). No single source of truth.
GRAPH_TIMEOUT=1200000 (20 min) makes a non-converging engine call look like an infinite hang for 20 minutes before erroring.

Target end-state¶

~96 flags → roughly the ~27 KEEP deployment items (all typed on AgentConfig), behavior otherwise auto-detected, and scripts/check_no_env_sprawl.py blocking regressions. A fresh graph-os-daemon with zero KG_* env vars set should ingest a full corpus correctly.