Profiling
DuraGraph ships a net/http/pprof listener for contributor profiling. It exposes Go’s standard profile surface — heap, goroutines, CPU, allocations, mutex contention, execution traces — on a separate localhost-bound port so it can’t accidentally leak through the main API.
Enabling
Section titled “Enabling”duragraph dev turns the listener on automatically at 127.0.0.1:6060 (see cmd/duragraph/cmd/dev.go’s applyDevEnvDefaults).
duragraph serve leaves it off. Operators opt in with:
DURAGRAPH_PPROF_ADDR=127.0.0.1:6060 duragraph servePublic binds (anything not on 127.0.0.1 / localhost / ::1) are refused by default — pprof endpoints leak heap state and accept long-running profile requests that can DoS the engine. The guard can be bypassed with DURAGRAPH_PPROF_ALLOW_PUBLIC=true if you have a specific reason (e.g. profiling inside a private network), but it should never be set in a production deploy with public reach.
Finding a memory leak
Section titled “Finding a memory leak”# 1. Take a snapshot before the suspect pathcurl -o before.pprof http://localhost:6060/debug/pprof/heap
# 2. Exercise the engine — replay the workflow, send a flurry of runs,# whatever you think is leaking. Wait ~30 seconds after it settles# so GC has a chance to reclaim anything not actually retained.
# 3. Take an after snapshotcurl -o after.pprof http://localhost:6060/debug/pprof/heap
# 4. Diffgo tool pprof -base=before.pprof after.pprofThen in the pprof REPL:
(pprof) top10 # biggest growth between snapshots(pprof) list someFunction # source-level annotation(pprof) web # flamegraph in your browserAnything that grew unboundedly between snapshots is your leak. Typical culprits: maps that never delete entries, goroutines that hold references to large structs, response bodies / connections not closed.
Finding a goroutine leak
Section titled “Finding a goroutine leak”Goroutine leaks are more common than heap leaks in event-driven systems like DuraGraph — every long-running goroutine retains every variable it references, often a transitive heap-full of state. The engine spawns goroutines for:
- HTTP request handlers
- SSE stream subscribers
- The outbox relay
- The cron scheduler
- The watch supervisor (in dev mode)
- The lease monitor + stale-worker cleanup loops
- Each subscribed NATS consumer
Count goroutines over time:
# Baseline countcurl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' | grep -c '^goroutine'
# Exercise the suspect path
# Re-countcurl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' | grep -c '^goroutine'A healthy engine settles at a steady-state goroutine count (~50–200 depending on consumers). If the count climbs without bound, dump the stacks to see what’s stuck:
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' > goroutines.txt# Look for: many goroutines parked at the same line of the same function.# That's your stuck call. Usually a channel send/recv that has no receiver/sender.CPU profiling
Section titled “CPU profiling”# Profile for 30 seconds while the engine is under loadgo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30REPL commands work the same as for heap profiles.
Execution traces (Go runtime trace)
Section titled “Execution traces (Go runtime trace)”For visualising scheduler behaviour, GC pauses, or blocking syscalls:
curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5'go tool trace trace.outOpens an interactive HTML viewer.
Available endpoints
Section titled “Available endpoints”| Endpoint | Purpose |
|---|---|
GET /debug/pprof/ | HTML index of available profiles |
GET /debug/pprof/heap | Current heap snapshot |
GET /debug/pprof/goroutine | Goroutine count + stack dumps (?debug=2 for full stacks) |
GET /debug/pprof/profile?seconds=N | CPU profile (default 30 s) |
GET /debug/pprof/allocs | Cumulative alloc tracking (all allocations since start) |
GET /debug/pprof/mutex | Mutex contention profile |
GET /debug/pprof/block | Goroutine blocking events |
GET /debug/pprof/trace?seconds=N | Runtime execution trace |
GET /debug/pprof/cmdline | The engine’s argv |
GET /debug/pprof/symbol | Symbol resolution helper (used by pprof itself) |
Security recap
Section titled “Security recap”- Never bind to
0.0.0.0or a public interface in production. The default-deny guard catches accidents, but don’t override it without a clear reason. - Anyone with network access to the pprof listener can read your heap (often contains tokens, credentials, intermediate LLM state) and DoS your engine.
- The dev-mode default of
127.0.0.1:6060is safe because loopback isn’t routable.