Knowledge Graph Ingestion Stability & Locking Architecture¶
This document details the robust locking and process lifecycle architecture implemented to resolve persistent C++ segmentation faults and database connection/lock failures during bulk ingestion of open-source repositories into the agent-utilities Knowledge Graph.
Scope note: The locking/lifecycle hardening below is specific to the LadybugDB backend, which is now an opt-in contrib driver (
backends/contrib/ladybug_backend.py, selected viaGRAPH_BACKEND=ladybug). The default working store is the Rust-native EpistemicGraph (GRAPH_BACKEND=memory/file) and the production durable tier is PostgreSQL + pg-age — neither of which uses the SQLite file-lock mechanics described here. This document therefore applies only when LadybugDB is explicitly enabled.
1. POSIX Advisory Locking & Watcher Synchronization¶
The Locking Hazard (Before)¶
Previously, the backend attempted to actively delete the advisory lock file (*.lock) in LadybugBackend._recover_connection() or upon release. Since filelock uses OS-level advisory locking (flock/fcntl), unlinking the lock file from disk while another process is holding the active lock descriptor breaks mutual exclusion on POSIX systems.
Unlinking the file allows a second process to create a new inode with the exact same name and acquire a lock on the new descriptor, leading to concurrent writes, catalog exceptions, or fatal segmentation faults.
Robust POSIX Advisory Locking (After)¶
To prevent this concurrency hazard, the lock file is never unlinked from disk once created. Mutual exclusion is managed naturally via the filesystem inode's file-lock metadata.
The background synchronization watcher (agent_utilities/sdd/watcher.py) no longer checks for file existence using os.path.exists(lock_path). Instead, it attempts a non-blocking lock acquisition (timeout=0). If the lock is held by active ingestion, it gracefully skips the iteration; if acquired, it releases it immediately and runs the scan safely.
Process Synchronization Flow¶
sequenceDiagram
autonumber
actor CLI as Bulk Ingester
participant lock as POSIX flock (.lock)
participant Watcher as Background Watcher
participant DB as LadybugDB Storage
Note over CLI, DB: Scenario 1: Ingestion Active (Non-Blocking Skip)
CLI->>lock: Acquire flock (Exclusive)
activate lock
Watcher->>lock: Try non-blocking acquisition (timeout=0)
lock-->>Watcher: Lock already held (Timeout Exception)
Watcher->>Watcher: Gracefully skip iteration (No Block)
Note over CLI, DB: Scenario 2: Ingestion Complete (Scan Execution)
CLI->>lock: Release flock (fd closed, file remains)
deactivate lock
Watcher->>lock: Try non-blocking acquisition (timeout=0)
lock-->>Watcher: Lock acquired successfully
Watcher->>lock: Release lock immediately
Watcher->>DB: Scan & update repository metadata
2. Native C++ Handle Destruction & Connection Cleanup¶
The Destruction Sequence Hazard (Before)¶
LadybugDB uses native C++ bindings for high-performance SQLite operations and HNSW vector computations. The C++ ladybug.Connection instances rely on active references to ladybug.Database handles.
If the python interpreter performs garbage collection out-of-order, or if Python attempts to destroy the parent Database handle while the children Connection handles are still active, it results in native C++ null-pointer dereferences or double-free segmentation faults.
Explicit Cleanup & Reference Ordering (After)¶
We enforce a strict connection cleanup sequence in LadybugBackend.close() to ensure child handles are entirely freed and garbage-collected before unreferencing the parent database handles.
graph TD
subgraph "flaw | Out-of-Order Destruction (Segfault)"
A1["Unreference DB & Connections"] --> B1["gc.collect() called randomly"]
B1 --> C1["DB handle destroyed first"]
C1 --> D1["Active Connection handle unreferenced"]
D1 --> E1["CRASH: C++ Segfault / Double Free"]
style E1 fill:#f96,stroke:#333,stroke-width:2px
end
subgraph "correct | Strict Cleanup Sequence (Stable)"
A2["Call close()"] --> B2["1. Close & unreference Connection (self.conn = None)"]
B2 --> C2["2. Force gc.collect() to clear Connection handles from C++ memory"]
C2 --> D2["3. Unreference Database (self.db = None)"]
D2 --> E2["4. Force gc.collect() to clear Database handle from C++ memory"]
E2 --> F2["5. Safe termination without dangling pointers"]
style F2 fill:#6c6,stroke:#333,stroke-width:2px
end
Destruction & Cleanup Sequence Details¶
sequenceDiagram
autonumber
participant App as LadybugBackend
participant Conn as ladybug.Connection (C++)
participant DB as ladybug.Database (C++)
participant GC as Python Garbage Collector
App->>Conn: Close connection explicitly
App->>App: Set self.conn = None
App->>GC: Invoke gc.collect()
Note over Conn, GC: Connection handle fully destroyed in C++ layer
App->>App: Set self.db = None
App->>GC: Invoke gc.collect()
Note over DB, GC: Database handle fully destroyed in C++ layer
3. Nested Metadata Scanning & Domain Routing¶
Bulk ingestion handles 65 open-source repositories categorized by domain slugs. The scanner uses direct domain matching to assign repositories to the correct Concept Schemes and folder hierarchies:
| Domain Category | Category Path Pattern | Target Repositories (Examples) |
|---|---|---|
agent-frameworks |
agent-frameworks/ |
pydantic-ai, crewai, langgraph |
enterprise-ai-infra |
enterprise-ai-infra/ |
caddy, keycloak, twenty, mattermost |
memory-rag-kg |
memory-rag-kg/ |
ladybugdb, chromadb, milvus |
quant-trading |
quant-trading/ |
crypto-trader, qlib, freqtrade |
graph LR
Path["Repository Path Input"] --> Match{"Nested under Domain?"}
Match -->|Yes| ExtractNested["Extract slug, repo, and nested subfolders"]
Match -->|No| DirectRepo["Fallback classification by repository name"]
ExtractNested --> Assign["Assign to Active Concept Scheme & SKOS Concept Hierarchy"]
DirectRepo --> Assign