Rate limiting webhook endpoints is a different problem than rate limiting a standard REST API. With a REST API, a rate-limited client gets a 429 and can try again. With webhooks, you don't control when events arrive. A Stripe payment.succeeded event doesn't wait for your rate limit window to reset.
Drop a payment.succeeded event because of rate limiting and that payment may never get processed on your end. The customer charged, the inventory wasn't decremented, the confirmation email wasn't sent.
This post covers rate limiting strategies that protect your infrastructure without dropping events.
Why Webhook Rate Limiting is Hard
The fundamental tension:
- ›Without rate limiting: a traffic spike or a malicious actor can overwhelm your ingest layer and cause cascading failures
- ›With naive rate limiting: returning
429to a webhook sender causes them to either retry (creating more load) or give up (you lose the event)
The solution is to separate acceptance from processing. Rate limit at the processing layer, not the acceptance layer.
┌──────────────┐
│ Ingest API │ ← NO rate limiting here
│ (accept all)│ Accept everything, persist durably
└──────┬───────┘
│
┌──────▼───────┐
│ Event Queue │ ← Buffer here
│ (durable) │
└──────┬───────┘
│
┌──────▼───────┐
│ Workers │ ← Rate limit here
│ (process) │ Control throughput
└──────────────┘Your ingest endpoint accepts everything and persists it immediately. The rate limiting happens in the worker tier — workers process at a controlled rate, regardless of how fast events arrive.
Inbound Rate Limiting: The Right Approach
What to rate limit at the ingest layer
Not event volume — request rate from a specific source token. This prevents:
- ›Accidentally-looped webhook senders (a service that sends the same event in a tight loop)
- ›Intentional flooding of a specific source endpoint
- ›Malformed clients stuck in a retry loop
Rate limit per path_token (source), not per IP or globally.
Token bucket per source
type RateLimiter struct {
mu sync.Mutex
buckets map[string]*TokenBucket
}
type TokenBucket struct {
tokens float64
lastRefill time.Time
rate float64 // tokens per second
capacity float64
}
func (rl *RateLimiter) Allow(sourceToken string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
bucket, ok := rl.buckets[sourceToken]
if !ok {
bucket = &TokenBucket{
tokens: 100,
lastRefill: time.Now(),
rate: 100, // 100 requests/second per source
capacity: 1000, // burst up to 1000
}
rl.buckets[sourceToken] = bucket
}
// Refill tokens based on elapsed time
now := time.Now()
elapsed := now.Sub(bucket.lastRefill).Seconds()
bucket.tokens = min(bucket.capacity, bucket.tokens + elapsed * bucket.rate)
bucket.lastRefill = now
if bucket.tokens < 1 {
return false
}
bucket.tokens--
return true
}If the token bucket is empty, return 429 Too Many Requests with a Retry-After: 1 header. The webhook sender will back off and retry. Meanwhile, your queue continues to drain at the normal rate.
Rate limit thresholds for ingest
| Source type | Recommended rate limit |
|---|---|
| Standard integration | 1,000 req/min |
| High-volume source (e-commerce, payments) | 10,000 req/min |
| Test/development source | 100 req/min |
| Abuse detection threshold | 50,000 req/min → alert + manual review |
At 1,000 req/min per source, you can handle real-world burst traffic while still catching runaway loops.
Outbound Rate Limiting: Per-Destination Throttling
When delivering webhooks to customer endpoints, you need to respect the destination's capacity. Some destinations are slow. Some have their own rate limits (429). Some are flaky under load.
The problem with unthrottled outbound delivery
Imagine a customer endpoint can handle 50 webhook deliveries per second. You have a queue of 10,000 backlogged events (after a brief outage). Your workers start delivering at maximum speed — 500/second.
The destination immediately starts returning 429 or 503. Your workers record failures and schedule retries. The retry queue grows. You're now hammering a struggling endpoint with 500 req/sec of retries in addition to the new event stream.
This is the "thundering herd on recovery" problem.
Per-destination rate limiting
Track delivery rate per destination and respect their rate limit signals:
type DeliveryRateLimiter struct {
limits map[uuid.UUID]*DestinationLimiter
}
type DestinationLimiter struct {
mu sync.Mutex
tokensPerSecond float64
currentTokens float64
lastUpdated time.Time
pausedUntil *time.Time
}
func (l *DestinationLimiter) HandleResponse(resp *http.Response) {
l.mu.Lock()
defer l.mu.Unlock()
if resp.StatusCode == 429 {
// Respect Retry-After header
retryAfter := resp.Header.Get("Retry-After")
if retryAfter != "" {
seconds, err := strconv.Atoi(retryAfter)
if err == nil {
until := time.Now().Add(time.Duration(seconds) * time.Second)
l.pausedUntil = &until
}
} else {
// Default 60s pause if no Retry-After
until := time.Now().Add(60 * time.Second)
l.pausedUntil = &until
}
// Also reduce delivery rate
l.tokensPerSecond = max(1, l.tokensPerSecond * 0.5)
}
if resp.StatusCode == 200 {
// Gradually increase rate back to normal on success
l.tokensPerSecond = min(50, l.tokensPerSecond * 1.1)
l.pausedUntil = nil
}
}This implements an adaptive rate limiter that backs off when a destination signals overwhelm and gradually recovers when it's healthy again.
Global Rate Limiting for Multi-Tenant Systems
At scale, you're managing delivery across hundreds or thousands of customer accounts simultaneously. A single large account shouldn't monopolize worker capacity.
Fair queuing with priority
Implement per-account delivery limits to ensure fair sharing of delivery worker capacity:
const maxConcurrentDeliveriesPerAccount = 10
func (w *Worker) canDeliverForAccount(accountID uuid.UUID) bool {
current := w.activeDeliveries.Load(accountID)
return current < maxConcurrentDeliveriesPerAccount
}This ensures that an account with 10,000 backlogged events doesn't starve other accounts with fresh events.
Priority queuing
Not all events are equal. A payment.succeeded event should be delivered before a user.profile_updated event from a lower-priority integration.
Add a priority field to your event queue and process higher-priority events first:
SELECT id, account_id, endpoint_id, payload
FROM webhook_events
WHERE status = 'pending'
AND next_attempt_at <= NOW()
ORDER BY priority DESC, next_attempt_at ASC
LIMIT 50
FOR UPDATE SKIP LOCKEDLet customers configure priority per webhook endpoint or per event type.
Rate Limiting at the Infrastructure Layer
Beyond application-level rate limiting, add network-layer protection:
Nginx / Load Balancer Rate Limiting
# Limit the ingest endpoint by source token
limit_req_zone $uri zone=ingest:10m rate=1000r/m;
location ~ ^/ingest/(.+)$ {
limit_req zone=ingest burst=500 nodelay;
limit_req_status 429;
proxy_pass http://api_backend;
}This is a coarse first line of defense. Fine-grained per-source limiting still needs to happen in the application layer.
DDoS Protection
For a public ingest endpoint, consider:
- ›Cloudflare — Bot Fight Mode blocks volumetric attacks before they reach your origin
- ›AWS Shield Standard — included with CloudFront, protects against common DDoS patterns
- ›Rate limiting at the CDN layer — Cloudflare Workers or CloudFront Functions can enforce per-IP limits before requests reach your origin
Monitoring Rate Limit Effectiveness
Track these metrics to know if your rate limiting is correctly calibrated:
| Metric | Description | Alert if |
|---|---|---|
ingest.rate_limited_requests | % of ingest requests returning 429 | > 1% (might be too strict) |
delivery.paused_destinations | Count of destinations currently paused | Rising trend |
delivery.backlog_age_p95 | Age of oldest undelivered event per account | > 5 minutes |
worker.throughput_per_account | Delivery rate per account | One account > 10× average |
If ingest.rate_limited_requests exceeds 1%, your limits may be too strict — some legitimate integrations are being throttled. If delivery.backlog_age_p95 is rising, workers can't keep up with the ingest rate.
Practical Configuration for GetHook
GetHook's rate limiting is configured per-source:
- ›Ingest rate limit — configurable per source, default 1,000 req/min
- ›Delivery rate — per-destination delivery rate, defaults to 50/second
- ›Circuit breaker — automatically pauses delivery to destinations returning sustained
5xxor429
For high-volume integrations (e-commerce checkouts, payment processors), increase the per-source ingest limit in source settings. For sensitive endpoints (payment processors, CRMs), reduce the delivery rate to respect their capacity.
The goal is to accept everything, deliver at the pace each destination can handle.