Back to Blog
webhooksrate limitingdelivery enginereliabilityinfrastructure

Per-Destination Throttling: Respecting Rate Limits on Webhook Delivery

When a destination returns 429, your delivery engine needs to do more than retry — it needs to pause, back off per endpoint, and keep delivering to everyone else. Here's how to build per-destination throttling that doesn't stall your entire worker pool.

D
Dmitri Volkov
Distributed Systems Engineer
April 3, 2026
9 min read

Most webhook delivery engines are built with one failure mode in mind: the destination is down. Retry with exponential backoff, move to dead letter after N attempts — done. But there's a second failure mode that gets less attention and is more insidious: the destination is up, accepting most requests, but rate limiting you.

A 429 response is not a transient network error. It's the destination telling you it has a defined throughput ceiling and you're exceeding it. Treating 429 the same as a 503 — back off and retry — is the right instinct, but if you apply that backoff globally, one throttled destination can delay delivery to every other destination in your queue.

This post explains how to build per-destination throttling that isolates rate-limited destinations, respects Retry-After headers, and keeps the rest of your delivery pipeline moving.


The Problem With a Single Delivery Queue

The typical Postgres-backed delivery queue looks like this:

sql
SELECT id, event_id, destination_id, payload
FROM delivery_jobs
WHERE status = 'queued'
  AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 10;

Workers poll this query, pick up jobs, attempt delivery, and reschedule failures with backoff. This works well when failures are independent: a timeout on destination A doesn't affect delivery to destination B.

But when destination A is rate-limiting you, several things go wrong:

  1. Every attempt to A returns 429, incrementing its retry delay.
  2. The queue fills with A's rescheduled jobs, all with near-term next_attempt_at values.
  3. Workers waste cycles on A's jobs, reducing throughput for B, C, and D.
  4. If A's volume is high enough, it starves the queue entirely.

The fix is to make the throttle state explicit and per-destination, not implicit in the job retry schedule.


Tracking Throttle State Per Destination

Add a destination_throttle table that records when a destination is rate-limited and when it should be retried:

sql
CREATE TABLE destination_throttle (
    destination_id  UUID PRIMARY KEY REFERENCES destinations(id),
    throttled_until TIMESTAMPTZ NOT NULL,
    reason          TEXT,         -- e.g., "429 Too Many Requests"
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX destination_throttle_active
    ON destination_throttle (throttled_until)
    WHERE throttled_until > NOW();

When the delivery worker receives a 429, it writes a row here before rescheduling the job. The throttle window comes from the response's Retry-After header if present, or from a default backoff schedule.

go
func (w *Worker) handleRateLimited(ctx context.Context, job DeliveryJob, resp *http.Response) error {
    retryAfter := parseRetryAfter(resp.Header.Get("Retry-After"))

    if err := w.store.SetThrottle(ctx, job.DestinationID, retryAfter, "429 Too Many Requests"); err != nil {
        return fmt.Errorf("set throttle: %w", err)
    }

    // Reschedule the job to attempt after the throttle window clears
    return w.store.RescheduleJob(ctx, job.ID, retryAfter)
}

func parseRetryAfter(header string) time.Time {
    if header == "" {
        // Default: 60 seconds if no Retry-After provided
        return time.Now().Add(60 * time.Second)
    }

    // Retry-After can be a delay in seconds or an HTTP-date
    if secs, err := strconv.Atoi(header); err == nil {
        return time.Now().Add(time.Duration(secs) * time.Second)
    }

    if t, err := http.ParseTime(header); err == nil {
        return t
    }

    return time.Now().Add(60 * time.Second)
}

Skipping Throttled Destinations in the Queue Query

Now that throttle state is explicit, update the worker query to skip destinations that are currently throttled:

sql
SELECT dj.id, dj.event_id, dj.destination_id, dj.payload
FROM delivery_jobs dj
LEFT JOIN destination_throttle dt ON dt.destination_id = dj.destination_id
WHERE dj.status = 'queued'
  AND dj.next_attempt_at <= NOW()
  AND (dt.destination_id IS NULL OR dt.throttled_until <= NOW())
ORDER BY dj.next_attempt_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 10;

The LEFT JOIN with the IS NULL OR throttled_until <= NOW() condition excludes any destination that has an active throttle entry. Jobs for those destinations sit in the queue untouched — they don't consume worker capacity, and they don't push back on other destinations' delivery timing.

This is the key insight: throttle isolation requires a data model change, not just backoff tuning.


Worker Pool Partitioning

The query above works well at moderate scale. At high volume, a second improvement is partitioning workers by destination so that a heavily throttled destination doesn't hold up workers even momentarily.

The simplest approach is to assign destinations to worker slots based on a hash of the destination ID:

go
const numWorkerSlots = 16

func workerSlot(destinationID uuid.UUID) int {
    h := fnv.New32a()
    h.Write(destinationID[:])
    return int(h.Sum32()) % numWorkerSlots
}

Each worker instance owns a subset of slots and only picks up jobs where destinationID % numWorkerSlots = mySlot. This provides natural isolation: a flood of 429s from destination A doesn't contend with workers handling destination B, because they're different slots.

The tradeoff is that this partitioning is static. If one destination has 10x the volume of all others, its slot becomes the hot path. A dynamic partitioning scheme (consistent hashing with rebalancing) handles this better, but for most teams, static slot assignment with 16–32 slots is sufficient.


Backoff Schedule for Persistent 429s

What happens when a destination is rate-limited for an extended period — say, their plan limits them to 1,000 events per day and your platform sends 10,000? The throttle window should grow with repeated 429 responses, not stay flat.

Consecutive 429sThrottle window
160 seconds (or Retry-After)
25 minutes
315 minutes
41 hour
5+6 hours

Track the consecutive 429 count on the throttle row and use it to compute the window:

go
var throttleBackoff = []time.Duration{
    60 * time.Second,
    5 * time.Minute,
    15 * time.Minute,
    1 * time.Hour,
    6 * time.Hour,
}

func throttleWindow(consecutiveCount int, retryAfterHeader string) time.Duration {
    // Always respect Retry-After if the destination provides it
    if retryAfterHeader != "" {
        if secs, err := strconv.Atoi(retryAfterHeader); err == nil {
            return time.Duration(secs) * time.Second
        }
    }

    idx := consecutiveCount - 1
    if idx >= len(throttleBackoff) {
        idx = len(throttleBackoff) - 1
    }
    if idx < 0 {
        idx = 0
    }
    return throttleBackoff[idx]
}

Reset the counter to zero when delivery succeeds. The goal is to converge to the destination's actual capacity, not to permanently suppress delivery.


Surfacing Throttle State to Your Users

When a destination is actively throttled, your customers should know. Burying this in delivery attempt logs means they'll open a support ticket wondering why events are delayed rather than identifying the root cause themselves.

Expose throttle state in your destination detail endpoint:

json
{
  "id": "dst_abc123",
  "name": "Order Fulfillment Service",
  "url": "https://fulfillment.example.com/webhooks",
  "status": "throttled",
  "throttled_until": "2026-04-03T14:30:00Z",
  "throttle_reason": "429 Too Many Requests",
  "queued_events": 847
}

The queued_events count tells the operator exactly how much of a backlog has accumulated. If that number is growing faster than the throttle window is draining it, they need to either increase their destination's capacity or filter which events they're routing there.

GetHook surfaces throttle state directly in the destination view alongside the delivery attempt timeline, so you can correlate when rate limiting started with the delivery gap your customers are seeing.


Distinguishing 429 from 503

One nuance: a 503 Service Unavailable is also a signal to back off, but its semantics differ from 429. A 503 means the service is temporarily overloaded or restarting. A 429 means you're specifically over the rate limit.

The practical difference is in how you schedule recovery:

StatusInterpretationStrategy
429 + Retry-AfterYou're over the limit; wait exactly this longRespect Retry-After, then resume at normal pace
429 no headerYou're over the limit; server didn't say how longUse backoff schedule, reset on success
503Server temporarily unavailableStandard exponential backoff (same as network errors)
503 + Retry-AfterPlanned maintenance windowRespect Retry-After, treat like a long 503

Don't conflate them in your retry logic. 503s should not increment the throttle counter. 429s should not trigger the circuit breaker that marks a destination as unhealthy. They're different signals that require different responses.


Testing Your Throttle Logic

Throttle handling is easy to skip in testing because it requires a destination that returns controlled 429 responses. A simple test server helps:

go
func rateLimitingTestServer(limit int, window time.Duration) *httptest.Server {
    var mu sync.Mutex
    count := 0
    resetAt := time.Now().Add(window)

    return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        mu.Lock()
        defer mu.Unlock()

        now := time.Now()
        if now.After(resetAt) {
            count = 0
            resetAt = now.Add(window)
        }

        count++
        if count > limit {
            retryAfter := int(time.Until(resetAt).Seconds()) + 1
            w.Header().Set("Retry-After", strconv.Itoa(retryAfter))
            w.WriteHeader(http.StatusTooManyRequests)
            return
        }

        w.WriteHeader(http.StatusOK)
    }))
}

Use this in integration tests to verify that your worker correctly reads Retry-After, writes a throttle entry, skips the throttled destination in subsequent polls, and resumes delivery after the window expires.


Per-destination throttling is one of those features that seems easy (just back off on 429) until you need to explain to a customer why their event was delayed 45 minutes. The difference between naive retry and true per-destination isolation is the difference between one throttled endpoint slowing your entire platform and that endpoint sitting quietly in a holding pattern while everything else delivers normally.

If you want to see how GetHook handles destination throttling alongside delivery observability, get started at gethook.to/setup.

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.