Skip to content

Profiling

DuraGraph ships a net/http/pprof listener for contributor profiling. It exposes Go’s standard profile surface — heap, goroutines, CPU, allocations, mutex contention, execution traces — on a separate localhost-bound port so it can’t accidentally leak through the main API.

duragraph dev turns the listener on automatically at 127.0.0.1:6060 (see cmd/duragraph/cmd/dev.go’s applyDevEnvDefaults).

duragraph serve leaves it off. Operators opt in with:

Terminal window
DURAGRAPH_PPROF_ADDR=127.0.0.1:6060 duragraph serve

Public binds (anything not on 127.0.0.1 / localhost / ::1) are refused by default — pprof endpoints leak heap state and accept long-running profile requests that can DoS the engine. The guard can be bypassed with DURAGRAPH_PPROF_ALLOW_PUBLIC=true if you have a specific reason (e.g. profiling inside a private network), but it should never be set in a production deploy with public reach.

Terminal window
# 1. Take a snapshot before the suspect path
curl -o before.pprof http://localhost:6060/debug/pprof/heap
# 2. Exercise the engine — replay the workflow, send a flurry of runs,
# whatever you think is leaking. Wait ~30 seconds after it settles
# so GC has a chance to reclaim anything not actually retained.
# 3. Take an after snapshot
curl -o after.pprof http://localhost:6060/debug/pprof/heap
# 4. Diff
go tool pprof -base=before.pprof after.pprof

Then in the pprof REPL:

(pprof) top10 # biggest growth between snapshots
(pprof) list someFunction # source-level annotation
(pprof) web # flamegraph in your browser

Anything that grew unboundedly between snapshots is your leak. Typical culprits: maps that never delete entries, goroutines that hold references to large structs, response bodies / connections not closed.

Goroutine leaks are more common than heap leaks in event-driven systems like DuraGraph — every long-running goroutine retains every variable it references, often a transitive heap-full of state. The engine spawns goroutines for:

  • HTTP request handlers
  • SSE stream subscribers
  • The outbox relay
  • The cron scheduler
  • The watch supervisor (in dev mode)
  • The lease monitor + stale-worker cleanup loops
  • Each subscribed NATS consumer

Count goroutines over time:

Terminal window
# Baseline count
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' | grep -c '^goroutine'
# Exercise the suspect path
# Re-count
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' | grep -c '^goroutine'

A healthy engine settles at a steady-state goroutine count (~50–200 depending on consumers). If the count climbs without bound, dump the stacks to see what’s stuck:

Terminal window
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=2' > goroutines.txt
# Look for: many goroutines parked at the same line of the same function.
# That's your stuck call. Usually a channel send/recv that has no receiver/sender.
Terminal window
# Profile for 30 seconds while the engine is under load
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

REPL commands work the same as for heap profiles.

For visualising scheduler behaviour, GC pauses, or blocking syscalls:

Terminal window
curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5'
go tool trace trace.out

Opens an interactive HTML viewer.

EndpointPurpose
GET /debug/pprof/HTML index of available profiles
GET /debug/pprof/heapCurrent heap snapshot
GET /debug/pprof/goroutineGoroutine count + stack dumps (?debug=2 for full stacks)
GET /debug/pprof/profile?seconds=NCPU profile (default 30 s)
GET /debug/pprof/allocsCumulative alloc tracking (all allocations since start)
GET /debug/pprof/mutexMutex contention profile
GET /debug/pprof/blockGoroutine blocking events
GET /debug/pprof/trace?seconds=NRuntime execution trace
GET /debug/pprof/cmdlineThe engine’s argv
GET /debug/pprof/symbolSymbol resolution helper (used by pprof itself)
  • Never bind to 0.0.0.0 or a public interface in production. The default-deny guard catches accidents, but don’t override it without a clear reason.
  • Anyone with network access to the pprof listener can read your heap (often contains tokens, credentials, intermediate LLM state) and DoS your engine.
  • The dev-mode default of 127.0.0.1:6060 is safe because loopback isn’t routable.