Skip to content

Provenance Tracking

The Question: Know what happened, why it mattered, who saw it, which source is trusted, and what action followed.


Why Provenance Is Non-Negotiable

Without provenance, a Company Brain is just a database with extra steps. Provenance answers:

  • Who wrote this? — A human analyst, an AI agent, or a hybrid team?
  • What type of assertion is this? — Raw data, agent inference, or human judgment?
  • Why was it written? — What was the rationale for this mutation?
  • From where was it derived? — What source nodes or systems informed this?
  • How confident is the writer? — Self-assessed reliability score
  • Who has read this? — Read audit trail for compliance and debugging

Write Provenance

Every mutation to the Company Brain generates a ProvenanceRecord:

from agent_utilities.knowledge_graph.core.company_brain import CompanyBrain
from agent_utilities.models.company_brain import ActorType, AssertionType

brain = CompanyBrain(enforce_provenance=True)

# AI agent writes a risk assessment
brain.provenance.record_write(
    node_id="customer:001",
    actor_id="agent:risk-analyzer",
    actor_type=ActorType.AI_AGENT,
    action="update",
    assertion_type=AssertionType.AGENT_INFERENCE,
    confidence=0.87,
    source_system="internal_model",
    derived_from=["transaction:batch-2024-Q4", "customer:001:history"],
    rationale="Updated risk score based on Q4 transaction pattern analysis",
    tenant_id="trading-desk",
)

# Human analyst overrides
brain.provenance.record_write(
    node_id="customer:001",
    actor_id="analyst:jane",
    actor_type=ActorType.HUMAN,
    action="update",
    assertion_type=AssertionType.HUMAN_JUDGMENT,
    confidence=0.95,
    source_system="manual_review",
    derived_from=["customer:001:interview-notes", "regulatory:aml-guidelines"],
    rationale="Customer interview revealed legitimate business justification for unusual patterns",
    tenant_id="compliance",
)

Read Audits

The Company Brain tracks who queried what, closing the "who saw it" gap:

brain.provenance.record_read(
    actor_id="agent:report-generator",
    actor_type=ActorType.AI_AGENT,
    nodes_accessed=["customer:001", "customer:002", "customer:003"],
    query_summary="Generated quarterly risk report for compliance review",
    tenant_id="compliance",
)

This enables: - Compliance auditing — Prove who accessed PII and when - Debugging — Trace how an agent formed a conclusion by seeing what it read - Security — Detect unusual access patterns (an agent reading nodes it normally doesn't)


Trust Hierarchies

Formalize source authority so the conflict resolver knows which sources to trust:

from agent_utilities.models.company_brain import TrustHierarchyEntry

# Define source trust levels
brain.provenance.add_trust_entry(TrustHierarchyEntry(
    source_system="crm",
    data_domain="customer",
    authority_level=0.95,
    rationale="CRM is the system of record for customer master data",
    overrides=["slack", "email"],
))

brain.provenance.add_trust_entry(TrustHierarchyEntry(
    source_system="human_review",
    data_domain="compliance",
    authority_level=0.99,
    rationale="Regulatory decisions require human sign-off per SOX",
    overrides=["crm", "internal_model"],
))

# Query trust level
trust = brain.provenance.get_trust_level("crm", "customer")  # → 0.95

Provenance-Gated Retrieval

Filter query results to only include nodes from trusted sources:

all_nodes = ["customer:001", "customer:002", "customer:003"]

# Only return nodes with high-confidence provenance
trusted = brain.provenance.filter_by_trust(all_nodes, min_trust=0.8)

ProvenanceRecord Fields

Field Type Description
record_id str Unique provenance record identifier
node_id str The node this provenance applies to
actor_id str Who performed the write
actor_type ActorType Human, AI, hybrid team, service, system
action str create, update, delete, merge
assertion_type AssertionType raw_data, agent_inference, human_judgment, synthesized, external_import
confidence float Self-assessed confidence (0.0–1.0)
source_system str System of record (CRM, Slack, git, manual)
derived_from list[str] Node IDs this was derived from
attributed_to str Actor or system this is attributed to
rationale str Free-text explanation
timestamp str ISO timestamp
session_id str Session context for grouping
tenant_id str Tenant scope

PROV-O Alignment

Provenance records map directly to W3C PROV-O concepts:

ProvenanceRecord Field PROV-O Edge
derived_from prov:wasDerivedFrom
attributed_to prov:wasAttributedTo
actor_id + action prov:wasGeneratedBy
timestamp prov:generatedAtTime