Semantic Compactor & Refactorer (CONCEPT:KG-2.7)¶

Overview¶

The Semantic Compactor resolves the exponential database bloat caused by active agents generating millions of step-by-step reasoning traces.

It periodically executes an offline compilation and refactoring sweep. The compactor aggregates raw execution traces, interaction logs, and tool parameters, replacing thousands of individual AgentProcess nodes with synthesized high-level declarative state triples (e.g., summarizing total execution steps, token usage, and outcomes).

Problem Statement¶

At 1M+ agent scale, each agent execution generates 10–100 AgentProcess trace nodes. After 10K executions: - 1M+ trace nodes accumulate in the graph - Query latency degrades from sub-ms to 100ms+ - Memory consumption grows linearly without bound - Graph traversal algorithms (centrality, PageRank) become O(n²)

The Semantic Compactor reduces the trace node count by 95–99% while preserving aggregate semantics.

Lifecycle¶

1. QUERY: Find AgentProcess nodes for a given agent_id
       ↓
2. THRESHOLD: If count < threshold (default 10), skip
       ↓
3. AGGREGATE: Sum token usage, count state distributions
       ↓
4. CREATE: Merge a SemanticSummary node with aggregated stats
       ↓
5. LINK: Connect agent → SemanticSummary via HAS_COMPACTED_HISTORY
       ↓
6. PRUNE: DETACH DELETE all original AgentProcess nodes
       ↓
7. LOG: Report compaction count and summary node ID

API Surface¶

from agent_utilities.knowledge_graph.memory import SemanticCompactor

compactor = SemanticCompactor(engine=kg_engine)

# Compact traces for a specific agent (threshold: 10 traces)
deleted = compactor.compact_traces("agent:planner", threshold=10)
# Returns: number of trace nodes deleted (e.g., 47)

# Custom threshold for high-frequency agents
deleted = compactor.compact_traces("agent:scraper", threshold=5)

Compaction Thresholds¶

Agent Type	Recommended Threshold	Rationale
Background scrapers	5	High volume, low value per trace
Orchestration routers	10 (default)	Medium volume, moderate diagnostic value
Critical planners	25	Lower volume, high forensic value
Evaluation runners	50	Traces needed for statistical analysis

Graph Schema¶

Before Compaction¶

(Agent {id: "agent:planner"})
  -[:HAS_PROCESS]→ (AgentProcess {id: "proc:1", state: "completed", tokens_used: 1500})
  -[:HAS_PROCESS]→ (AgentProcess {id: "proc:2", state: "completed", tokens_used: 2300})
  -[:HAS_PROCESS]→ (AgentProcess {id: "proc:3", state: "failed", tokens_used: 800})
  ... (47 more)

After Compaction¶

(Agent {id: "agent:planner"})
  -[:HAS_COMPACTED_HISTORY]→ (SemanticSummary {
      id: "summary:agent:planner:50_compacted",
      compacted_count: 50,
      total_tokens_consumed: 85000,
      agent_id: "agent:planner"
  })

OWL Alignment¶

The SemanticSummary node type should be declared in the SDD ontology as:

:SemanticSummary a owl:Class ;
    rdfs:subClassOf :KnowledgeNode ;
    rdfs:comment "Synthesized execution trace summary produced by KG-2.7 compaction" ;
    :hasProperty :compacted_count, :total_tokens_consumed, :agent_id .

:HAS_COMPACTED_HISTORY a owl:ObjectProperty ;
    rdfs:domain :Agent ;
    rdfs:range :SemanticSummary ;
    rdfs:comment "Links an agent to its compacted trace history" .

This ensures SHACL validators can verify that every Agent node with more than threshold historical processes has a corresponding SemanticSummary.

Error Handling¶

Condition	Behavior
No KG engine provided	Returns `0` (no-op)
Query fails	Catches exception, logs error, returns `0`
Partial deletion failure	Logs individual failures, returns count of successful deletes
Agent has no traces	Returns `0` (below threshold)
Backend returns unexpected row format	Handles `dict`, `list/tuple`, and object formats defensively

Integration Points¶

Cognitive Scheduler (OS-5.2): After preempting and completing many processes, schedule periodic compaction
Telemetry Engine (OS-5.6): Compaction events are logged for observability dashboards
Memory Tiers (KG-2.6): Compaction operates on the episodic tier, preserving semantic and procedural memories

Implementation Details¶

Source Code: agent_context.py (SemanticCompactor)
Classes: SemanticCompactor
Tests: test_synergies.py
Pillar: KG
Package Export: agent_utilities.knowledge_graph.memory.SemanticCompactor