Architecture Overview
This document provides a high-level overview of the components and data flows in DuraGraph.
Components
Section titled “Components”-
API Server (
cmd/server) Exposes REST + SSE endpoints. Implements LangGraph Cloud-compatible API. Handles request validation, rate limiting, and error responses. -
Command Handlers (
internal/application/command) Process write operations following CQRS. Create and modify domain aggregates, emitting domain events. -
Query Handlers (
internal/application/query) Process read operations. Query projections directly for optimized reads. -
Graph Engine (
internal/infrastructure/graph) Executes workflow graphs with support for conditionals, loops, subgraphs, and human-in-the-loop interrupts. -
Event Store (
internal/infrastructure/persistence) PostgreSQL-based event sourcing. Stores all domain events for audit and state reconstruction. Uses optimistic concurrency control with version columns. -
NATS JetStream (
internal/infrastructure/messaging) Reliable event streaming via the outbox pattern. Publishes domain events for real-time updates. -
Projections Read-optimized views of domain state. Updated from events for fast queries.
-
Rate Limiting Middleware (
internal/infrastructure/http/middleware) Token bucket rate limiting per IP or authenticated user. Supports in-memory (SimpleRateLimit), Redis-based, and tiered strategies.
Data Flows
Section titled “Data Flows”- Client calls API (e.g.,
POST /runs). - API validates request, applies rate limiting, and invokes command handler.
- Command Handler creates/modifies aggregate, emitting domain events.
- Repository saves events to event store + outbox in single transaction with optimistic concurrency check.
- Outbox Relay publishes events to NATS JetStream (uses
FOR UPDATE SKIP LOCKEDfor multi-instance safety). - Graph Engine executes workflow nodes (LLM, tools, DSPy modules, conditions).
- Client receives real-time updates via SSE stream.
Horizontal Scaling
Section titled “Horizontal Scaling”DuraGraph supports multi-instance deployment using PostgreSQL primitives only — no external coordination services (etcd, ZooKeeper) required.
Optimistic Concurrency Control
Section titled “Optimistic Concurrency Control”Every run aggregate carries a version column. Updates use optimistic locking:
UPDATE runs SET ..., version = version + 1WHERE id = $1 AND version = $2If another instance modified the run concurrently, the update affects zero rows and the operation is retried or fails with a concurrency conflict error.
Lease Epoch Fencing
Section titled “Lease Epoch Fencing”Worker task assignments carry a lease_epoch that increments on every assignment or retry. This prevents stale workers from processing tasks that have been reassigned — the worker must present the current epoch to complete the task.
Advisory Locks for Singleton Jobs
Section titled “Advisory Locks for Singleton Jobs”Background jobs that must run on exactly one instance (e.g., lease expiry monitor) use PostgreSQL advisory locks:
SELECT pg_try_advisory_lock(42)Only the instance that acquires the lock runs the job. Other instances skip it gracefully.
Skip Locked for Concurrent Access
Section titled “Skip Locked for Concurrent Access”Outbox relay and expired lease scanning use FOR UPDATE SKIP LOCKED to allow multiple instances to process work concurrently without conflicts:
SELECT * FROM outbox WHERE NOT publishedORDER BY created_atFOR UPDATE SKIP LOCKEDLIMIT 100Sequence Diagram
Section titled “Sequence Diagram”%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#fff7ed', 'primaryTextColor': '#9a3412', 'primaryBorderColor': '#f97316', 'lineColor': '#fb923c', 'secondaryColor': '#fef3c7', 'actorBkg': '#fff7ed', 'actorBorder': '#f97316', 'actorTextColor': '#9a3412', 'signalColor': '#fb923c', 'signalTextColor': '#fdba74', 'noteBkgColor': '#fff7ed', 'noteTextColor': '#9a3412', 'noteBorderColor': '#f97316', 'activationBkgColor': '#fed7aa', 'activationBorderColor': '#f97316'}}}%%
sequenceDiagram
participant Client
participant API
participant RateLimit
participant CommandHandler
participant Repository
participant EventStore
participant GraphEngine
participant NATS
Client->>API: POST /runs
API->>RateLimit: Check rate limit
RateLimit-->>API: Allowed
API->>CommandHandler: CreateRun
CommandHandler->>Repository: Save(run)
Repository->>EventStore: Append Events (version check)
Repository->>NATS: Publish via Outbox
API-->>Client: 201 Created
GraphEngine->>EventStore: Execute Graph
GraphEngine-->>NATS: Emit Events
NATS-->>Client: SSE Stream