Black Friday starts at midnight. Your e-commerce platform goes from 10 orders per minute to 2,000. Stripe begins firing payment_intent.succeeded, charge.captured, and order.updated events at a rate your webhook endpoint has never seen. By 12:03 AM, your ingest service is throwing 503s and Stripe has started marking your endpoint as unreliable.
This is a preventable failure. The pattern isn't specific to Black Friday — any bursty third-party provider (Twilio for SMS delivery receipts, GitHub for CI builds, Shopify for order events during a flash sale) can spike your inbound webhook volume by 100x in seconds. The difference between teams that handle it gracefully and teams that don't comes down to a few architectural decisions made well before the spike arrives.
Why Webhooks Are Harder to Burst-Handle Than API Calls
With an outbound API call, you control the request rate. With inbound webhooks, the provider controls it. Most providers have their own internal queue and will fire events as fast as they can — which may be much faster than your endpoint can process them.
The compounding problem: most webhook providers retry on failure. If your endpoint responds with 500 or 503, the provider queues the event for retry. This means a traffic spike that overwhelms your endpoint doesn't just cause immediate failures — it creates a delayed second wave of retries that arrives after the spike, just as you're recovering.
The failure modes stack up:
| Problem | Immediate effect | Delayed effect |
|---|---|---|
| Endpoint returns 503 | Events queued for retry | Retry wave 5–30 min later |
| Processing too slow | Queue depth grows | Events expire before delivery |
| Database write bottleneck | Ingest latency spikes | Provider marks endpoint unhealthy |
| Worker can't keep up | Delivery backlog grows | Customer-visible delays |
Solving any one of these in isolation isn't enough. You need to decouple each layer.
The Core Principle: Accept Fast, Process Slow
The most important architectural decision is separating ingest from processing. Your webhook endpoint should do exactly two things:
- ›Validate the incoming request (signature, payload size)
- ›Persist the raw event to a durable store
Nothing else. No database lookups, no business logic, no calling downstream services. The goal is to return a 200 OK in under 50ms for every request, regardless of load.
func (h *IngestHandler) Handle(w http.ResponseWriter, r *http.Request) {
// 1. Validate signature — fast, CPU-only operation
body, err := io.ReadAll(io.LimitReader(r.Body, maxPayloadBytes))
if err != nil || !h.verifySignature(r, body) {
http.Error(w, "invalid signature", http.StatusUnauthorized)
return
}
// 2. Persist raw event — the only I/O operation
eventID, err := h.store.Enqueue(r.Context(), body, sourceID)
if err != nil {
http.Error(w, "storage error", http.StatusInternalServerError)
return
}
// 3. Return 200 immediately — processing happens asynchronously
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(map[string]string{"id": eventID})
}Processing — decoding, routing, calling downstream services, updating application state — happens in a separate worker process that pulls from the queue at a controlled rate. The ingest endpoint and the processing workers are independently scalable.
Sizing Your Ingest Tier for Burst Traffic
The ingest tier needs to be sized for peak concurrency, not average throughput. The key question is: how many concurrent HTTP requests can your ingest handler sustain while keeping P99 latency under 200ms?
For a Postgres-backed queue (which GetHook uses), the bottleneck is typically write throughput to the events table. Benchmark this before you need it:
# Benchmark ingest write throughput using wrk
wrk -t 8 -c 200 -d 30s \
-s post_event.lua \
https://your-ingest-host/ingest/src_abc123A well-tuned single Postgres instance can sustain 5,000–10,000 INSERT operations per second for simple event rows. That's enough for most burst scenarios. If you need more, consider:
- ›Connection pooling via PgBouncer — reduces per-connection overhead significantly under concurrent load
- ›Bulk inserts — batch multiple events in a single
INSERT ... VALUES (...)statement when processing from a buffer - ›Partitioned tables — partition the events table by date so writes land on the current partition, reducing index contention
Horizontal ingest scaling
Because the ingest endpoint is stateless (it just writes to Postgres), you can run multiple instances behind a load balancer and scale horizontally. The only shared state is the database.
Provider → Load Balancer → [ingest-1, ingest-2, ingest-3, ...] → PostgresAdd instances until your write throughput ceiling is the bottleneck. At that point, move to sharded writes or a message queue in front of Postgres.
Controlling Processing Rate With a Worker Pool
The delivery worker is where back-pressure matters. You don't want to process events as fast as possible — you want to process them at a rate that your downstream services can absorb.
A Postgres job queue with FOR UPDATE SKIP LOCKED gives you natural concurrency control: the number of concurrent delivery workers determines your processing rate.
-- Workers compete for the next batch of events
SELECT id, payload, destination_id
FROM events
WHERE status = 'queued'
AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED;With 5 workers each polling for 10 events, you're processing up to 50 events per poll cycle. Increase worker count to scale up, decrease to throttle.
For burst handling specifically, consider a dynamic worker pool that scales the number of workers based on queue depth:
| Queue depth | Worker count |
|---|---|
| 0–100 events | 2 workers |
| 100–1,000 events | 5 workers |
| 1,000–10,000 events | 20 workers |
| > 10,000 events | 50 workers (max) |
The max cap is important. Scaling workers indefinitely to drain a burst queue will hammer downstream services with more traffic than they can handle — which converts your burst problem into a downstream outage.
Per-Destination Rate Limiting
Not all destinations are equal. During a burst, you may be delivering to 50 different customer endpoints. Some are robust, some are flimsy. Hammering all of them at maximum throughput will cause failures in the flimsy ones, which triggers retries, which makes the backlog worse.
Implement per-destination rate limiting with a token bucket:
type DestinationLimiter struct {
mu sync.Mutex
buckets map[string]*rate.Limiter
rps float64 // requests per second per destination
}
func (l *DestinationLimiter) Allow(destinationID string) bool {
l.mu.Lock()
limiter, ok := l.buckets[destinationID]
if !ok {
limiter = rate.NewLimiter(rate.Limit(l.rps), int(l.rps*2))
l.buckets[destinationID] = limiter
}
l.mu.Unlock()
return limiter.Allow()
}A reasonable default is 10 requests/second per destination, with the ability to configure higher limits for destinations that have demonstrated capacity. When a destination starts returning 429 (Too Many Requests), respect the Retry-After header and back off that specific destination without pausing delivery to others.
Handling Provider-Specific Retry Behavior
Different providers have different retry policies, and understanding them changes how you should handle failures:
| Provider | Retry window | Retry count | Retry on |
|---|---|---|---|
| Stripe | 72 hours | Up to ~87 attempts | 4xx (except 400), 5xx, timeout |
| GitHub | 3 days | Not published | Non-200 responses |
| Shopify | 48 hours | Up to 19 attempts | 4xx (except 410), 5xx |
| Twilio | 4 hours | Up to 3 attempts | 4xx (except 400/401), 5xx |
| SendGrid | 72 hours | Variable | 4xx, 5xx |
The critical insight here: if your endpoint returns 5xx during a burst, Stripe will retry for up to 72 hours. That's your burst becoming a multi-day tail. A 200 OK that you process asynchronously is always better than a 503 that triggers weeks of retries.
Return 200 OK the moment you've durably written the event. If your processing later fails, that's your internal retry problem — not the provider's.
Testing Burst Readiness Before It Matters
Run a load test against your ingest endpoint at 10x expected peak before every major traffic event:
# Generate a burst of 5,000 concurrent webhook events
# post_stripe_event.lua sends a signed Stripe-format payload
wrk -t 16 -c 500 -d 60s \
--timeout 5s \
-s post_stripe_event.lua \
https://staging.yoursaas.com/ingest/src_abc123
# Measure:
# - Requests/sec sustained
# - P99 latency (should be < 200ms)
# - Error rate (should be 0%)
# - Queue depth after burst ends
# - Time to drain queue back to 0Track queue drain time specifically. If it takes 30 minutes to drain a 60-second burst, you have a worker capacity problem that will be visible to customers as delivery delays.
GetHook's delivery pipeline is designed around the accept-fast-process-slow pattern — ingest endpoints that return 200 OK in under 50ms and a worker pool that can be scaled independently based on queue depth. During burst conditions, the queue acts as a buffer so no events are dropped and delivery continues at a controlled rate.
A Checklist for Burst Readiness
Before your next high-traffic event:
- › Ingest endpoint does no processing — writes to queue and returns 200
- › Ingest tier is horizontally scalable (stateless, behind a load balancer)
- › Worker pool has a tested maximum concurrency cap
- › Per-destination rate limiting is in place
- › Queue depth alert is configured (fire at > 5,000 events)
- › Load test run at 10x expected peak within the last 30 days
- › Provider retry policies documented — know your recovery window
- › Runbook exists for "queue not draining" scenario
The teams that handle Black Friday well aren't the ones with the most capacity — they're the ones who decoupled their ingest from their processing and never let the two bottlenecks interfere with each other.
If you want webhook infrastructure that handles bursts without custom operations work, start with GetHook →