Scaling a webhook delivery worker is straightforward on paper: add more worker processes, poll the queue more aggressively, and watch throughput climb. The problem shows up when you trace what actually happens on the receiving end. Ten workers can simultaneously dequeue ten events for the same destination and fire ten concurrent HTTP requests at an endpoint that was built to handle two. The destination returns 429s or 503s, your retry queue grows, and you've created the very reliability problem your infrastructure was supposed to prevent.

Per-destination concurrency control is the fix. The idea is simple: limit how many in-flight deliveries any single destination can have at one time. The implementation has a few sharp edges.

Why Global Worker Concurrency Isn't Enough

Most delivery systems control concurrency at the worker level — N workers, each processing one event at a time, giving you N total in-flight requests. This works for protecting your own infrastructure, but it tells you nothing about load distribution across destinations.

Consider a queue with 200 pending events: 180 for one high-volume destination and 20 spread across 19 others. With 10 workers and no per-destination limits, all 10 workers race to process the largest backlog and you get 10 concurrent requests hammering that single destination, while the 19 other destinations see no delivery at all.

The resulting problems:

Problem	Cause	Effect
Destination overload	No per-destination cap	429s, 503s, connection refused
Starvation	High-volume destination monopolizes workers	Other destinations miss SLA windows
Out-of-order delivery	Parallel delivery to same destination	Consumer state machine corruption
Retry amplification	Overloaded destination triggers retries	Queue depth grows, not shrinks

Per-destination limits solve all four. The tradeoff is implementation complexity and a small scheduling overhead.

Approach 1: Optimistic Locking with an In-Flight Counter

The most portable approach uses a counter column on the destinations table. Before dispatching an event, your worker atomically claims a concurrency slot. If no slot is available, the worker skips that event and picks the next one.

sql

-- Add concurrency tracking columns to destinations
ALTER TABLE destinations
    ADD COLUMN max_concurrency    INT  NOT NULL DEFAULT 5,
    ADD COLUMN inflight_count     INT  NOT NULL DEFAULT 0;

The dispatch query atomically increments the counter only when it's below the limit:

sql

-- Claim a concurrency slot for destination $1
UPDATE destinations
SET inflight_count = inflight_count + 1
WHERE id = $1
  AND inflight_count < max_concurrency
RETURNING inflight_count;

If the UPDATE returns zero rows, the destination is at capacity — no slot was claimed. Your worker moves to the next event in the queue.

When delivery completes (success, failure, or timeout), the worker releases the slot:

sql

UPDATE destinations
SET inflight_count = GREATEST(0, inflight_count - 1)
WHERE id = $1;

The GREATEST(0, ...) guard prevents the counter from going negative if a worker crashes between claiming and releasing a slot. You'll also want a periodic cleanup job that resets inflight_count to zero for destinations where the count is non-zero but no active delivery jobs exist — to recover from crashes:

sql

-- Run every 60 seconds via your job queue
UPDATE destinations d
SET inflight_count = 0
WHERE inflight_count > 0
  AND NOT EXISTS (
    SELECT 1 FROM delivery_jobs j
    WHERE j.destination_id = d.id
      AND j.status = 'delivering'
  );

Approach 2: Advisory Locks per Destination

If you want strict single-concurrency (at most one in-flight delivery per destination at a time), PostgreSQL advisory locks give you this without any schema changes. Advisory locks are session-scoped, which means they're automatically released if the worker process dies.

func (w *DeliveryWorker) tryAcquireDestinationLock(ctx context.Context, destID uuid.UUID) (bool, error) {
    // pg_try_advisory_lock takes a 64-bit integer key
    // Use the lower 64 bits of the destination UUID
    lockKey := int64(destID[0])<<56 | int64(destID[1])<<48 |
        int64(destID[2])<<40 | int64(destID[3])<<32 |
        int64(destID[4])<<24 | int64(destID[5])<<16 |
        int64(destID[6])<<8 | int64(destID[7])

    var acquired bool
    err := w.db.QueryRowContext(ctx,
        `SELECT pg_try_advisory_lock($1)`, lockKey,
    ).Scan(&acquired)
    return acquired, err
}

func (w *DeliveryWorker) releaseDestinationLock(ctx context.Context, destID uuid.UUID) error {
    lockKey := destLockKey(destID)
    _, err := w.db.ExecContext(ctx, `SELECT pg_advisory_unlock($1)`, lockKey)
    return err
}

Advisory locks are ideal when you need ordering guarantees — delivering events to a destination one at a time ensures they arrive in queue order. The downside is throughput: a destination that can handle 10 concurrent requests gets throttled to 1. Use advisory locks for destinations that need ordered delivery; use the counter approach for destinations where throughput matters more than order.

Integrating Concurrency Control into the Dispatch Loop

Your dispatch loop needs to handle the case where all available events are locked out by concurrency limits. Without this, a worker that finds a full queue for every destination will spin in a tight loop burning CPU.

func (w *DeliveryWorker) runDispatchLoop(ctx context.Context) {
    for {
        dispatched, err := w.dispatchBatch(ctx)
        if err != nil {
            w.logger.Error("dispatch error", "err", err)
            time.Sleep(5 * time.Second)
            continue
        }

        if dispatched == 0 {
            // Either the queue is empty or all destinations are at capacity.
            // Back off to avoid a spin loop.
            select {
            case <-ctx.Done():
                return
            case <-time.After(500 * time.Millisecond):
            }
        }
    }
}

func (w *DeliveryWorker) dispatchBatch(ctx context.Context) (int, error) {
    // Fetch candidates, skipping destinations already at capacity
    events, err := w.store.FetchDispatchable(ctx, w.batchSize)
    if err != nil {
        return 0, err
    }

    dispatched := 0
    for _, event := range events {
        claimed, err := w.claimConcurrencySlot(ctx, event.DestinationID)
        if err != nil {
            return dispatched, err
        }
        if !claimed {
            continue // destination at capacity, skip
        }

        go w.deliver(ctx, event) // release slot in defer inside deliver()
        dispatched++
    }

    return dispatched, nil
}

The key pattern: FetchDispatchable should use FOR UPDATE SKIP LOCKED to avoid workers competing for the same events, and it should order by next_attempt_at so retries fire on schedule even when primary delivery is backlogged.

Setting Concurrency Limits Per Destination

A fixed global limit (e.g., 5 concurrent requests per destination) is a reasonable default but a poor fit for all cases. Some destinations are internal services on the same network with sub-millisecond response times; others are third-party APIs with 30-second timeouts and strict rate limits.

Expose max_concurrency as a configurable field on each destination:

json

POST /v1/destinations
{
  "name": "Order Fulfillment Service",
  "url": "https://fulfillment.internal/webhooks",
  "max_concurrency": 20,
  "timeout_seconds": 5
}

json

POST /v1/destinations
{
  "name": "Slack Notifications",
  "url": "https://hooks.slack.com/services/...",
  "max_concurrency": 1,
  "timeout_seconds": 10
}

The Slack example illustrates an important case: Slack's incoming webhooks have undocumented rate limits that are easy to hit. Setting max_concurrency: 1 combined with exponential backoff on 429s gives you reliable delivery without manual throttling configuration.

A useful heuristic for default values:

Destination Type	Suggested Default	Rationale
Internal service (same VPC)	20–50	Low latency, high capacity
Internal service (cross-region)	5–10	Network latency amplifies queuing
Third-party API with known limits	1–3	Respect provider rate limits
Third-party API (unknown limits)	3	Conservative starting point
Webhook endpoint on shared hosting	1–2	Often single-threaded, easily overloaded

Start conservative and let customers increase limits if they need throughput. It's much easier to raise a limit than to explain why you crashed their endpoint.

Observing Concurrency in Practice

Concurrency limits only help if you can tell when they're the bottleneck. Add two metrics to your delivery worker:

delivery.concurrency_slot_denied (counter) — incremented each time a worker tries to claim a slot and can't. A spike here means destinations are consistently at capacity. Either the destinations are slow (increase timeout investigation) or the limit is too low (raise max_concurrency).

delivery.inflight_per_destination (gauge) — the current inflight_count per destination, sampled every 30 seconds. Destinations that are always at max capacity are your throughput bottleneck.

// Emit after each failed slot claim
w.metrics.Inc("delivery.concurrency_slot_denied",
    "destination_id", event.DestinationID.String(),
    "destination_name", event.DestinationName,
)

// Emit periodically from a background goroutine
func (w *DeliveryWorker) emitConcurrencyMetrics(ctx context.Context) {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()
    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            rows, err := w.db.QueryContext(ctx,
                `SELECT id, name, inflight_count, max_concurrency FROM destinations WHERE inflight_count > 0`,
            )
            if err != nil {
                continue
            }
            for rows.Next() {
                var id, name string
                var inflight, maxC int
                rows.Scan(&id, &name, &inflight, &maxC)
                w.metrics.Set("delivery.inflight_per_destination", float64(inflight),
                    "destination_id", id,
                    "destination_name", name,
                    "max_concurrency", strconv.Itoa(maxC),
                )
            }
            rows.Close()
        }
    }
}

If you're using GetHook as your delivery layer, per-destination concurrency is configurable on each destination and the current inflight count is visible in the delivery metrics panel — no custom instrumentation required.

The Ordering Implication

Per-destination concurrency limits have a side effect worth making explicit: they reduce (but don't eliminate) out-of-order delivery. With max_concurrency: 1, events to a destination are delivered serially — but serial delivery is only ordered if your queue orders them correctly.

Make sure your dispatch query orders by (destination_id, created_at ASC) within the FOR UPDATE SKIP LOCKED window. If two workers both query the queue simultaneously and each grabs a different event for the same destination, the concurrency limit prevents both from running at once — but whichever worker happens to win the slot determines which event fires first.

For strict ordering guarantees, advisory locks are the right tool: they guarantee exclusive access per destination across all workers, regardless of queue ordering.

Concurrency control at the destination level is one of those features that feels like an implementation detail until you skip it. The first time a partner's webhook endpoint goes down under unexpected load from your retry backlog, you'll wish you had per-destination limits in place. The counter-based approach requires one schema migration and roughly 50 lines of Go — the investment is small relative to the failure modes it prevents.

If you want concurrency limits, per-destination configuration, and delivery metrics out of the box, start with GetHook.

Webhook Concurrency Control: Preventing Parallel Delivery to the Same Destination

Why Global Worker Concurrency Isn't Enough

Approach 1: Optimistic Locking with an In-Flight Counter

Approach 2: Advisory Locks per Destination

Integrating Concurrency Control into the Dispatch Loop

Setting Concurrency Limits Per Destination

Observing Concurrency in Practice

The Ordering Implication

Related articles

Webhook Payload Transformation: Normalizing, Enriching, and Redacting Events at the Gateway

Webhook Delivery in Regulated Industries: SOC2, HIPAA, and PCI-DSS

Webhook Consumer Observability: Metrics and Alerts on the Receiving End

Stop losing webhook events.