MCP Multiplexer — Dynamic Tool Gateway¶

The mcp-multiplexer aggregates many child MCP servers behind a single MCP endpoint, and (in dynamic mode) turns that aggregation into a progressive-disclosure gateway: a client sees a handful of meta-tools up front and pulls in exactly the tools it needs at runtime — instead of being flooded with every tool in the fleet.

CONCEPT:ECO-4.0 (aggregation) · CONCEPT:ECO-4.34 (per-child resilience) · CONCEPT:ECO-4.36 (dynamic tool gateway)

Why¶

A real fleet has dozens of MCP servers and hundreds of tools (this homelab: ~52 servers, ~531 tools). Loading them all into one client is a problem:

Context saturation — hundreds of tool schemas crowd the model's context.
Tool-selection degradation — accuracy drops as the tool list grows.
Boot cost & blast radius — eagerly starting every child is slow and means one flaky server can disrupt startup.

The multiplexer solves the first two by keeping the visible tool list tiny and mounting tools on demand; it solves the third with lazy, per-child-hardened spawning and graceful failure isolation.

Architecture¶

flowchart TB
    Client["MCP client<br/>(Claude Code / IDE agent)"]

    subgraph MUX["mcp-multiplexer (FastMCP)"]
        direction TB
        Meta["Meta-tools<br/>find_tools · list_catalog<br/>load_tools · unload_tools · multiplexer_status"]
        Catalog["Catalog + prefix map<br/>(collision-free, algorithmic)"]
        Probe["Self-catalog probe cache<br/>(connect → list_tools → release)"]
        Forwarders["Live forwarding tools<br/>(only what's been loaded)"]
        Runtimes["ChildRuntime pool<br/>(concurrency limit · circuit breaker · auto-restart)"]
    end

    subgraph Children["Child MCP servers"]
        direction TB
        KG["graph-os<br/>(stdio, always-on)"]
        Remote["52× *-mcp<br/>(streamable-http)"]
    end

    Caddy["Caddy reverse proxy<br/>(*.arpa → :8000)"]
    KGEngine[["Knowledge Graph<br/>(semantic ranking)"]]

    Client -->|"tools/list · call"| Meta
    Client -.->|"tools/list_changed"| Forwarders
    Meta --> Catalog
    Meta --> Probe
    Meta -->|"mount on demand"| Runtimes
    Forwarders -->|"forward call"| Runtimes
    Runtimes --> KG
    Runtimes -->|"HTTP"| Caddy --> Remote
    Probe -.->|"rank (optional)"| KGEngine
    KG --- KGEngine

Key idea: the client's tool list is decoupled from the fleet's tool list. load_tools adds forwarding tools at runtime and emits notifications/tools/list_changed, so the client re-fetches and the new tools become callable mid-session — no reconnect.

Modes¶

Selected by MCP_MULTIPLEXER_MODE (default eager).

Mode	Boot	Visible tools	Use when
eager	spawns every child, registers all tools	all (hundreds)	small fleet, or a client with no tool-count pressure
dynamic	spawns only the always-on children; exposes meta-tools	~5 meta-tools + always-on	large fleet; keep the client's context lean

Both modes share the same child lifecycle, prefixing, and enabled/disabled filtering. dynamic simply defers spawning and exposure to runtime.

The discover → load → call flow¶

sequenceDiagram
    participant C as Client (model)
    participant M as Multiplexer
    participant P as Probe cache
    participant Ch as Child server

    C->>M: find_tools("manage github PRs")
    M->>P: probe_catalog() (cached after 1st call)
    P->>Ch: connect → list_tools → release (per server, concurrent)
    P-->>M: real tool metadata, fleet-wide
    M-->>C: ranked [{server, prefixed_name, score, mountable}] + unavailable{}
    C->>M: load_tools(tools=["gh__pulls"])
    M->>Ch: mount_child(github-mcp) — lazy spawn
    M->>M: add forwarding tool gh__pulls
    M--)C: notifications/tools/list_changed
    C->>M: gh__pulls(...)
    M->>Ch: forward call
    Ch-->>C: result

Meta-tools (dynamic mode)¶

Tool	Purpose
`find_tools(query, top_k)`	Semantic search across the whole fleet by intent. Returns ranked prefixed names + an `unavailable` map of unreachable servers. Surfaces only enabled tools.
`list_catalog(server?, include_tools?)`	Flat browse: every server with tool counts, `enabled`/`disabled_tools` split, reachability, mount state. Pass `server` to drill in with descriptions.
`load_tools(tools?, servers?)`	Mount the owning child(ren) lazily and expose the requested tools (or a whole server). Reports `failed` per-server with the reason; sends `tools/list_changed`.
`unload_tools(tools)`	Retract tools to reclaim context; sends `tools/list_changed`.
`multiplexer_status`	Per-child health: state, restart count, concurrency, in-flight/queued, which children are mounted.

find_tools is "find the right tool for X"; list_catalog is "show me everything." Both ride the self-catalog probe (below), so the first call takes a few seconds and subsequent calls are cached.

How it works¶

Self-cataloging (no external dependency)¶

To rank/list tools for servers that aren't mounted, the multiplexer probes each catalog server itself: connect → list_tools() → release, run concurrently (bounded) and cached. This gives real per-server tool metadata without holding connections and without depending on the KG being warm. KG semantic search is layered in as an optional re-rank when available. Unreachable servers record their error (the anyio ExceptionGroup is unwrapped to the real cause, e.g. HTTP 502) instead of failing the whole call.

Collision-free, algorithmic prefixes¶

Tool names are namespaced by a short per-server prefix, derived 100% algorithmically — no lookup table, so any third-party server works:

an explicit prefix on the server's config entry (override), else
auto-derived from the name (strip noise tokens mcp/server/agent/api, camelCase/separator tokenize, initials-acronym for multi-word or short stem for single-word; readable <initials>_<hostid> for multi-instance).

A deterministic, catalog-aware resolver then guarantees uniqueness across the whole fleet (e.g. foo-bar/foo-baz both want fb → one keeps it, the other disambiguates). Examples: github-mcp → gith, container-manager-mcp → cm, graph-os → go, systems-manager-mcp-r510 → sm_r510.

Lazy mounting & per-child resilience¶

mount_child spawns exactly one child on demand (stdio subprocess or streamable-http connection) inside the serving event loop. Each child is wrapped in a ChildRuntime (CONCEPT:ECO-4.34): bounded concurrency + queue, circuit breaker, auto-restart, session pool. A down server fails its mount/probe gracefully and is reported — it never takes down the gateway.

Hybrid transports¶

stdio and streamable-http children coexist in one config. A child is remote when it declares a url (or an http/sse transport); otherwise it's a local subprocess via command. Typical homelab setup: graph-os local stdio (always-on), the rest streamable-http behind Caddy.

Configuration¶

Flags (on `AgentConfig`)¶

Env	Default	Meaning
`MCP_MULTIPLEXER_MODE`	`eager`	`eager` exposes all tools at boot; `dynamic` exposes meta-tools + always-on and mounts on demand.
`MCP_DYNAMIC_ALWAYS_ON`	`["graph-os"]`	Children mounted at boot in dynamic mode (so `find_tools` can rank semantically).
`MCP_DYNAMIC_TOP_K`	`8`	Default candidate count for `find_tools`.
`MCP_CHILD_MAX_CONCURRENCY` / `_QUEUE_TIMEOUT` / `_POOL_SIZE` / `_MAX_RESTARTS` / `_RESTART_WINDOW` / `_BREAKER_THRESHOLD` / `_BREAKER_COOLDOWN`	see configuration.md	Per-child resilience (CONCEPT:ECO-4.34).

`mcp_config.json` (child servers)¶

{
  "mcpServers": {
    // local stdio child (always-on KG)
    "graph-os": {
      "command": "/path/.venv/bin/graph-os",
      "env": { "GRAPH_BACKEND": "tiered" }
    },
    // remote streamable-http child via Caddy
    "github-mcp": {
      "transport": "streamable-http",
      "url": "http://github-mcp.arpa/mcp",
      "timeout": 15,          // fast connect-fail for a down server
      "call_timeout": 300     // generous per-call ceiling
    },
    // per-server tool filtering + explicit prefix override
    "container-manager-mcp": {
      "transport": "streamable-http",
      "url": "http://container-manager-mcp.arpa/mcp",
      "prefix": "cm",                      // optional; else auto-derived
      "disabledTools": ["trace_port_namespace"],
      "enabledTools": ["*"]                // whitelist (fnmatch); optional
    }
  }
}

Per-server keys: command/args/env (stdio) · url/transport/headers (remote) · prefix · enabledTools/disabledTools (fnmatch) · timeout/call_timeout/pool_size/max_concurrency · disabled.

Recipes¶

A. Claude Code (stdio), dynamic mode — recommended for large fleets¶

~/.claude.json (or your client's MCP config):

{
  "mcpServers": {
    "mcp-multiplexer": {
      "command": "/path/.venv/bin/python",
      "args": ["-m", "agent_utilities.mcp.multiplexer",
               "--config", "/path/mcp_config.json"],
      "env": { "MCP_MULTIPLEXER_MODE": "dynamic" }
    }
  }
}

Boot shows ~5 meta-tools + the always-on go__* (graph-os) tools. The model calls find_tools / list_catalog → load_tools to reach the rest. Install: pip install "agent-utilities[mcp]" (provides mcp-multiplexer).

B. Deploy as a streamable-http server¶

mcp-multiplexer --transport streamable-http --host 0.0.0.0 --port 8000 \
  --config /path/mcp_config.json
# MCP_MULTIPLEXER_MODE=dynamic in the environment

Front it with Caddy (reverse_proxy multiplexer:8000) and point remote clients at http://multiplexer.arpa/mcp.

Minimal image. The multiplexer is a thin MCP routing layer — it imports none of agent-utilities' heavy base deps (no KG engine, llama-index, kafka, …) and reaches graph-os over MCP as an optional child, never embedded. So docker/Dockerfile.multiplexer installs the package --no-deps plus only the ~7 light runtime deps (docker/requirements-multiplexer.txt) — a ~370MB image vs multi-GB for agent-utilities[all]. find_tools works on the self-cataloging probe even when graph-os is unavailable; the KG only adds optional semantic re-ranking.

C. Hybrid homelab fleet (the real setup)¶

graph-os local stdio + 52 deployed -mcp servers as streamable-http behind Caddy (http://<name>-mcp.arpa/mcp). Generate the config from the deploy registry + Caddyfile, then validate it (below). Enable dynamic mode so the client only ever sees ~29 tools at boot, expanding on demand.

Validating the config¶

scripts/validate_mcp_config.py reconciles every url entry against the Caddyfile and (with --live) probes each endpoint:

python scripts/validate_mcp_config.py \
  --config mcp_config.json --caddyfile services/caddy/Caddyfile --live
# → "Checked 52 url entries (52 ok, 0 invalid, 0 unreachable). PASS ✓"

It flags url hosts with no Caddy route (typos), Caddy *-mcp.arpa routes with no config entry (coverage gaps), and routed-but-dead backends (a 502). Exits non-zero — wire it into pre-commit/CI.

References¶

Code: agent_utilities/mcp/multiplexer.py (gateway, meta-tools, prefixes, probe), agent_utilities/mcp/child_resilience.py (ChildRuntime).
Config flags: configuration.md.
Consumption transports: ../guides/consumption-models.md.
Validator: scripts/validate_mcp_config.py.