Rate limiting webhook endpoints is a different problem than rate limiting a standard REST API. With a REST API, a rate-limited client gets a 429 and can try again. With webhooks, you don't control when events arrive. A Stripe payment.succeeded event doesn't wait for your rate limit window to reset.

Drop a payment.succeeded event because of rate limiting and that payment may never get processed on your end. The customer charged, the inventory wasn't decremented, the confirmation email wasn't sent.

This post covers rate limiting strategies that protect your infrastructure without dropping events.

Why Webhook Rate Limiting is Hard

The fundamental tension:

›Without rate limiting: a traffic spike or a malicious actor can overwhelm your ingest layer and cause cascading failures
›With naive rate limiting: returning 429 to a webhook sender causes them to either retry (creating more load) or give up (you lose the event)

The solution is to separate acceptance from processing. Rate limit at the processing layer, not the acceptance layer.

                        ┌──────────────┐
                        │  Ingest API  │  ← NO rate limiting here
                        │  (accept all)│    Accept everything, persist durably
                        └──────┬───────┘
                               │
                        ┌──────▼───────┐
                        │  Event Queue │  ← Buffer here
                        │  (durable)   │
                        └──────┬───────┘
                               │
                        ┌──────▼───────┐
                        │   Workers    │  ← Rate limit here
                        │  (process)   │    Control throughput
                        └──────────────┘

Your ingest endpoint accepts everything and persists it immediately. The rate limiting happens in the worker tier — workers process at a controlled rate, regardless of how fast events arrive.

Inbound Rate Limiting: The Right Approach

What to rate limit at the ingest layer

Not event volume — request rate from a specific source token. This prevents:

›Accidentally-looped webhook senders (a service that sends the same event in a tight loop)
›Intentional flooding of a specific source endpoint
›Malformed clients stuck in a retry loop

Rate limit per path_token (source), not per IP or globally.

Token bucket per source

type RateLimiter struct {
    mu      sync.Mutex
    buckets map[string]*TokenBucket
}

type TokenBucket struct {
    tokens    float64
    lastRefill time.Time
    rate      float64 // tokens per second
    capacity  float64
}

func (rl *RateLimiter) Allow(sourceToken string) bool {
    rl.mu.Lock()
    defer rl.mu.Unlock()

    bucket, ok := rl.buckets[sourceToken]
    if !ok {
        bucket = &TokenBucket{
            tokens:    100,
            lastRefill: time.Now(),
            rate:      100, // 100 requests/second per source
            capacity:  1000, // burst up to 1000
        }
        rl.buckets[sourceToken] = bucket
    }

    // Refill tokens based on elapsed time
    now := time.Now()
    elapsed := now.Sub(bucket.lastRefill).Seconds()
    bucket.tokens = min(bucket.capacity, bucket.tokens + elapsed * bucket.rate)
    bucket.lastRefill = now

    if bucket.tokens < 1 {
        return false
    }

    bucket.tokens--
    return true
}

If the token bucket is empty, return 429 Too Many Requests with a Retry-After: 1 header. The webhook sender will back off and retry. Meanwhile, your queue continues to drain at the normal rate.

Rate limit thresholds for ingest

Source type	Recommended rate limit
Standard integration	1,000 req/min
High-volume source (e-commerce, payments)	10,000 req/min
Test/development source	100 req/min
Abuse detection threshold	50,000 req/min → alert + manual review

At 1,000 req/min per source, you can handle real-world burst traffic while still catching runaway loops.

Outbound Rate Limiting: Per-Destination Throttling

When delivering webhooks to customer endpoints, you need to respect the destination's capacity. Some destinations are slow. Some have their own rate limits (429). Some are flaky under load.

The problem with unthrottled outbound delivery

Imagine a customer endpoint can handle 50 webhook deliveries per second. You have a queue of 10,000 backlogged events (after a brief outage). Your workers start delivering at maximum speed — 500/second.

The destination immediately starts returning 429 or 503. Your workers record failures and schedule retries. The retry queue grows. You're now hammering a struggling endpoint with 500 req/sec of retries in addition to the new event stream.

This is the "thundering herd on recovery" problem.

Per-destination rate limiting

Track delivery rate per destination and respect their rate limit signals:

type DeliveryRateLimiter struct {
    limits map[uuid.UUID]*DestinationLimiter
}

type DestinationLimiter struct {
    mu              sync.Mutex
    tokensPerSecond float64
    currentTokens   float64
    lastUpdated     time.Time
    pausedUntil     *time.Time
}

func (l *DestinationLimiter) HandleResponse(resp *http.Response) {
    l.mu.Lock()
    defer l.mu.Unlock()

    if resp.StatusCode == 429 {
        // Respect Retry-After header
        retryAfter := resp.Header.Get("Retry-After")
        if retryAfter != "" {
            seconds, err := strconv.Atoi(retryAfter)
            if err == nil {
                until := time.Now().Add(time.Duration(seconds) * time.Second)
                l.pausedUntil = &until
            }
        } else {
            // Default 60s pause if no Retry-After
            until := time.Now().Add(60 * time.Second)
            l.pausedUntil = &until
        }

        // Also reduce delivery rate
        l.tokensPerSecond = max(1, l.tokensPerSecond * 0.5)
    }

    if resp.StatusCode == 200 {
        // Gradually increase rate back to normal on success
        l.tokensPerSecond = min(50, l.tokensPerSecond * 1.1)
        l.pausedUntil = nil
    }
}

This implements an adaptive rate limiter that backs off when a destination signals overwhelm and gradually recovers when it's healthy again.

Global Rate Limiting for Multi-Tenant Systems

At scale, you're managing delivery across hundreds or thousands of customer accounts simultaneously. A single large account shouldn't monopolize worker capacity.

Fair queuing with priority

Implement per-account delivery limits to ensure fair sharing of delivery worker capacity:

const maxConcurrentDeliveriesPerAccount = 10

func (w *Worker) canDeliverForAccount(accountID uuid.UUID) bool {
    current := w.activeDeliveries.Load(accountID)
    return current < maxConcurrentDeliveriesPerAccount
}

This ensures that an account with 10,000 backlogged events doesn't starve other accounts with fresh events.

Priority queuing

Not all events are equal. A payment.succeeded event should be delivered before a user.profile_updated event from a lower-priority integration.

Add a priority field to your event queue and process higher-priority events first:

sql

SELECT id, account_id, endpoint_id, payload
FROM webhook_events
WHERE status = 'pending'
  AND next_attempt_at <= NOW()
ORDER BY priority DESC, next_attempt_at ASC
LIMIT 50
FOR UPDATE SKIP LOCKED

Let customers configure priority per webhook endpoint or per event type.

Rate Limiting at the Infrastructure Layer

Beyond application-level rate limiting, add network-layer protection:

Nginx / Load Balancer Rate Limiting

nginx

# Limit the ingest endpoint by source token
limit_req_zone $uri zone=ingest:10m rate=1000r/m;

location ~ ^/ingest/(.+)$ {
    limit_req zone=ingest burst=500 nodelay;
    limit_req_status 429;
    proxy_pass http://api_backend;
}

This is a coarse first line of defense. Fine-grained per-source limiting still needs to happen in the application layer.

DDoS Protection

For a public ingest endpoint, consider:

›Cloudflare — Bot Fight Mode blocks volumetric attacks before they reach your origin
›AWS Shield Standard — included with CloudFront, protects against common DDoS patterns
›Rate limiting at the CDN layer — Cloudflare Workers or CloudFront Functions can enforce per-IP limits before requests reach your origin

Monitoring Rate Limit Effectiveness

Track these metrics to know if your rate limiting is correctly calibrated:

Metric	Description	Alert if
`ingest.rate_limited_requests`	% of ingest requests returning 429	> 1% (might be too strict)
`delivery.paused_destinations`	Count of destinations currently paused	Rising trend
`delivery.backlog_age_p95`	Age of oldest undelivered event per account	> 5 minutes
`worker.throughput_per_account`	Delivery rate per account	One account > 10× average

If ingest.rate_limited_requests exceeds 1%, your limits may be too strict — some legitimate integrations are being throttled. If delivery.backlog_age_p95 is rising, workers can't keep up with the ingest rate.

Practical Configuration for GetHook

GetHook's rate limiting is configured per-source:

›Ingest rate limit — configurable per source, default 1,000 req/min
›Delivery rate — per-destination delivery rate, defaults to 50/second
›Circuit breaker — automatically pauses delivery to destinations returning sustained 5xx or 429

For high-volume integrations (e-commerce checkouts, payment processors), increase the per-source ingest limit in source settings. For sensitive endpoints (payment processors, CRMs), reduce the delivery rate to respect their capacity.

The goal is to accept everything, deliver at the pace each destination can handle.

Configure rate limits in GetHook →

Webhook Rate Limiting: Protecting Your Infrastructure Without Losing Events