Graph Database Deployment & Multi-Backend Guide¶

CONCEPT:KG-2.0 — High-Scale Graph Database Backends

This guide provides a comprehensive, production-ready reference for deploying, configuring, and maintaining scale-out persistent graph backends for the agent-utilities Knowledge Graph engine.

The Rust-native EpistemicGraph (GRAPH_BACKEND=memory/file) is the zero-config, zero-dependency embedded default, and PostgreSQL + pgvector (pg-age) is the primary durable/production tier. The demoted contrib backends — LadybugDB, FalkorDB, and Neo4j — remain fully supported as opt-in imports (backends/contrib/) with Cypher CRUD, HNSW vector embeddings, and relationship indexing. Select any of them via create_backend() or the GRAPH_BACKEND environment variable.

📊 Database Comparison Matrix¶

Select the right database backend based on your scale, infrastructure constraints, and architectural patterns:

[!NOTE] The zero-config embedded default is the Rust-native EpistemicGraph (GRAPH_BACKEND=memory/file), not listed below. PostgreSQL + pg-age is the primary durable/production tier. LadybugDB, FalkorDB, and Neo4j are demoted contrib backends (backends/contrib/) — fully supported but opt-in (require their driver packages).

Metric / Feature	🐞 LadybugDB (contrib)	🦅 FalkorDB (contrib)	🟢 Neo4j (contrib)	🐘 PostgreSQL + pg-age
Primary Use Case	Embedded / Local Dev	Low Latency / High Throughput	Enterprise Property Graph	Relational + Graph Unity
ACID Compliance	Single-writer WAL	In-Memory (Optional AOF)	Full ACID (Clustered)	Full ACID (Standard Relational)
Vector Search (HNSW)	✅ Embedded	✅ Redis-native	✅ Neo4j GenAI Index	✅ `pgvector` HNSW
Full-Text BM25 Search	—	—	—	✅ ParadeDB `pg_search`
Scale Limits	Single Node / Local	Sharded Redis Cluster	Multi-region clusters	High-scale Postgres instances
Zero Configuration	✅ Yes	❌ Requires Docker/Server	❌ Requires Docker/Server	❌ Requires Docker/Server
Traversal Engine	Native SQLite Cypher	Redis RESP-native RESP	Native Bolt JVM	pg-age CSR Traversal

🛠️ Docker Deployment and Configuration¶

Below are the complete, production-grade Docker Compose configurations and environment parameters for deploying each database backend.

[!NOTE] For development environments running on the same host, ensure database ports do not conflict. The compose configurations below use standard testing/staging ports, but can be customized as needed.

1. FalkorDB (Low-Latency Graph Workloads)¶

FalkorDB is a low-latency, high-throughput graph database built on Redis, utilizing the fast RESP protocol and sparse matrix multiplication to achieve near-instantaneous Cypher evaluations.

🐳 Docker Compose (`docker/falkordb.compose.yml`)¶

version: "3.8"

services:
  falkordb-db:
    image: falkordb/falkordb:latest
    container_name: agent-falkordb
    ports:
      - "6380:6379"
    volumes:
      - falkordb-data:/data
    restart: unless-stopped

volumes:
  falkordb-data:

⚙️ Configuration Setup¶

To point your agents to FalkorDB, set the following variables:

export GRAPH_BACKEND=falkordb
export GRAPH_DB_HOST=localhost
export GRAPH_DB_PORT=6380
export GRAPH_DB_NAME=agent_graph

🔌 Python Connection Example¶

from agent_utilities.knowledge_graph.backends import create_backend

backend = create_backend(
    backend_type="falkordb",
    host="localhost",
    port=6380,
    db_name="agent_graph"
)
# The backend auto-initializes the schema on connection

2. Neo4j (Enterprise Property Graphs)¶

Neo4j is the industry standard for ACID property graphs, supporting multi-region clustering, fine-grained role permissions, and the APOC (Awesome Procedures on Cypher) plugin suite.

🐳 Docker Compose (`docker/neo4j.compose.yml`)¶

version: "3.8"

services:
  neo4j-db:
    image: neo4j:latest
    container_name: agent-neo4j
    environment:
      - NEO4J_AUTH=neo4j/password
      - NEO4J_PLUGINS=["apoc"]
      - NEO4J_dbms_security_procedures_unrestricted=apoc.*
    ports:
      - "7474:7474"   # HTTP Admin Dashboard
      - "7687:7687"   # Bolt Query Protocol
    volumes:
      - neo4j-data:/data
    restart: unless-stopped

volumes:
  neo4j-data:

⚙️ Configuration Setup¶

Configure your environment to point to Neo4j via the Bolt protocol:

export GRAPH_BACKEND=neo4j
export GRAPH_DB_URI=bolt://localhost:7687
export GRAPH_DB_USER=neo4j
export GRAPH_DB_PASSWORD=password

🔌 Python Connection Example¶

from agent_utilities.knowledge_graph.backends import create_backend

backend = create_backend(
    backend_type="neo4j",
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password"
)

3. PostgreSQL + pg-age (Relational & Graph Unity)¶

The PostgreSQL backend leverages pgvector for HNSW cosine search, ParadeDB (pg_search) for BM25 lexical search, and the pg-age extension for highly optimized CSR (Compressed Sparse Row) graph traversals. The CypherTranspiler translates Cypher queries directly into PostgreSQL SQL parameters.

🐳 Docker Compose (`docker/pg-age.compose.yml`)¶

version: "3.8"

services:
  pg-age-db:
    image: paradedb/paradedb:latest
    container_name: agent-pg-age
    environment:
      POSTGRES_USER: agent
      POSTGRES_PASSWORD: agent
      POSTGRES_DB: agent_kg
    ports:
      - "5433:5432"   # Mapped to 5433 to avoid host conflicts
    volumes:
      - pg-age-data:/var/lib/postgresql
      - ./pg-age-init:/docker-entrypoint-initdb.d
    restart: unless-stopped

volumes:
  pg-age-data:

📜 Initialization Script (`docker/pg-age-init/01-extensions.sql`)¶

Place the following inside the initialization directory to enable the required extensions automatically on container boot:

-- pg-age + pgvector initialization script
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- ParadeDB extensions (included in paradedb image)
CREATE EXTENSION IF NOT EXISTS pg_search;

-- pg-age extension (optional, uncomment if installed)
-- CREATE EXTENSION IF NOT EXISTS pg-age;

⚙️ Configuration Setup¶

Point your agents to the Postgres graph backend:

export GRAPH_BACKEND=postgresql
export GRAPH_DB_URI=postgresql://agent:agent@localhost:5433/agent_kg
export GRAPH_POOL_MIN=2
export GRAPH_POOL_MAX=10
export GRAPH_PGGRAPH_SCHEMA=public

🔌 Python Connection Example¶

from agent_utilities.knowledge_graph.backends import create_backend

backend = create_backend(
    backend_type="postgresql",
    uri="postgresql://agent:agent@localhost:5433/agent_kg",
    db_name="agent_graph"
)

🔄 Verification & Operations¶

Health Checks¶

Before starting the agent workflow or running pipelines, verify that the databases are online and accessible. You can do this programmatically or via CLI.

Dynamic Connectivity Verifier (`verify_connection.py`)¶

import socket
import time

def wait_for_db(host: str, port: int, timeout: int = 30):
    start_time = time.time()
    while True:
        try:
            with socket.create_connection((host, port), timeout=1):
                print(f"✅ Connection successful to {host}:{port}!")
                return True
        except (socket.timeout, ConnectionRefusedError):
            if time.time() - start_time > timeout:
                raise TimeoutError(f"Database at {host}:{port} failed to start in {timeout}s.")
            print(f"Waiting for database at {host}:{port}...")
            time.sleep(1)

# Example: Check pg-age port
wait_for_db("localhost", 5433)

🧪 Sequential Integration Testing Harness¶

To prevent resource starvation or port collisions on localized development environments, the integration test suite runs sequentially. The harness manages starting the container, waiting for it to accept connections, executing deep CRUD/vector operations, stress-testing with the ingestion pipeline, and tearing down the container cleanly.

# Run the sequential multi-backend integration tests
PYTHONPATH=. pytest tests/integration/knowledge_graph/test_multibackend_integration.py -v -s

Flow Diagram of the Testing Lifecycle¶

graph TD
    A[Start integration run] --> B[Initialize Docker Container]
    B --> C[Check Connectivity Socket]
    C -->|Waiting...| C
    C -->|Connected| D[Init DB Backend Schema]
    D --> E["CRUD & Node Writes"]
    E --> F["Vector Search & HNSW Indexing"]
    F --> G[Run Ingestion Pipeline]
    G --> H[Stop and Prune Container]
    H --> I[Execute Next Database Backend]

⚠️ Troubleshooting & Best Practices¶

1. Vector Dimension Mismatch¶

Ensure your embedding chunk sizes match what the database indexes were configured with. The standard configuration uses 768 dimensions (for nomic-embed). - Symptom: ERROR: vector dimension mismatch on Postgres or Vector index dimension size mismatch on Neo4j. - Fix: Verify embedding_models in config.json is set to the same model provider and chunk size dimensions.

2. Neo4j Authentication Failures¶

If you get Neo.ClientError.Security.Unauthorized, double check NEO4J_AUTH in docker-compose matches GRAPH_DB_PASSWORD. - Note: Neo4j requires the password to be at least 8 characters. Do not use extremely simple passwords like 123 as Neo4j will reject them during container boot.

3. Transaction Deadlocks on Scale-Out¶

Under heavy multi-agent concurrency, database locks might trigger transaction retries. - SQLite (LadybugDB): File locking can occur. Use a unique GRAPH_DB_PATH per agent or switch to Neo4j or PostgreSQL which natively support row-level locks and high-concurrency connections. - Postgres Connection Pool: Set GRAPH_POOL_MAX appropriately based on the number of active agents. Each agent should get a dedicated connection from the pool.

Graph Database Deployment & Multi-Backend Guide¶

📊 Database Comparison Matrix¶

🛠️ Docker Deployment and Configuration¶

1. FalkorDB (Low-Latency Graph Workloads)¶

🐳 Docker Compose (docker/falkordb.compose.yml)¶

⚙️ Configuration Setup¶

🔌 Python Connection Example¶

2. Neo4j (Enterprise Property Graphs)¶

🐳 Docker Compose (docker/neo4j.compose.yml)¶

⚙️ Configuration Setup¶

🔌 Python Connection Example¶

3. PostgreSQL + pg-age (Relational & Graph Unity)¶

🐳 Docker Compose (docker/pg-age.compose.yml)¶

📜 Initialization Script (docker/pg-age-init/01-extensions.sql)¶

⚙️ Configuration Setup¶

🔌 Python Connection Example¶

🔄 Verification & Operations¶

Health Checks¶

Dynamic Connectivity Verifier (verify_connection.py)¶

🧪 Sequential Integration Testing Harness¶

Flow Diagram of the Testing Lifecycle¶

⚠️ Troubleshooting & Best Practices¶

1. Vector Dimension Mismatch¶

2. Neo4j Authentication Failures¶

3. Transaction Deadlocks on Scale-Out¶

🐳 Docker Compose (`docker/falkordb.compose.yml`)¶

🐳 Docker Compose (`docker/neo4j.compose.yml`)¶

🐳 Docker Compose (`docker/pg-age.compose.yml`)¶

📜 Initialization Script (`docker/pg-age-init/01-extensions.sql`)¶

Dynamic Connectivity Verifier (`verify_connection.py`)¶