Black Friday starts at midnight. Your e-commerce platform goes from 10 orders per minute to 2,000. Stripe begins firing payment_intent.succeeded, charge.captured, and order.updated events at a rate your webhook endpoint has never seen. By 12:03 AM, your ingest service is throwing 503s and Stripe has started marking your endpoint as unreliable.

This is a preventable failure. The pattern isn't specific to Black Friday — any bursty third-party provider (Twilio for SMS delivery receipts, GitHub for CI builds, Shopify for order events during a flash sale) can spike your inbound webhook volume by 100x in seconds. The difference between teams that handle it gracefully and teams that don't comes down to a few architectural decisions made well before the spike arrives.

Why Webhooks Are Harder to Burst-Handle Than API Calls

With an outbound API call, you control the request rate. With inbound webhooks, the provider controls it. Most providers have their own internal queue and will fire events as fast as they can — which may be much faster than your endpoint can process them.

The compounding problem: most webhook providers retry on failure. If your endpoint responds with 500 or 503, the provider queues the event for retry. This means a traffic spike that overwhelms your endpoint doesn't just cause immediate failures — it creates a delayed second wave of retries that arrives after the spike, just as you're recovering.

The failure modes stack up:

Problem	Immediate effect	Delayed effect
Endpoint returns 503	Events queued for retry	Retry wave 5–30 min later
Processing too slow	Queue depth grows	Events expire before delivery
Database write bottleneck	Ingest latency spikes	Provider marks endpoint unhealthy
Worker can't keep up	Delivery backlog grows	Customer-visible delays

Solving any one of these in isolation isn't enough. You need to decouple each layer.

The Core Principle: Accept Fast, Process Slow

The most important architectural decision is separating ingest from processing. Your webhook endpoint should do exactly two things:

›Validate the incoming request (signature, payload size)
›Persist the raw event to a durable store

Nothing else. No database lookups, no business logic, no calling downstream services. The goal is to return a 200 OK in under 50ms for every request, regardless of load.

func (h *IngestHandler) Handle(w http.ResponseWriter, r *http.Request) {
    // 1. Validate signature — fast, CPU-only operation
    body, err := io.ReadAll(io.LimitReader(r.Body, maxPayloadBytes))
    if err != nil || !h.verifySignature(r, body) {
        http.Error(w, "invalid signature", http.StatusUnauthorized)
        return
    }

    // 2. Persist raw event — the only I/O operation
    eventID, err := h.store.Enqueue(r.Context(), body, sourceID)
    if err != nil {
        http.Error(w, "storage error", http.StatusInternalServerError)
        return
    }

    // 3. Return 200 immediately — processing happens asynchronously
    w.WriteHeader(http.StatusOK)
    _ = json.NewEncoder(w).Encode(map[string]string{"id": eventID})
}

Processing — decoding, routing, calling downstream services, updating application state — happens in a separate worker process that pulls from the queue at a controlled rate. The ingest endpoint and the processing workers are independently scalable.

Sizing Your Ingest Tier for Burst Traffic

The ingest tier needs to be sized for peak concurrency, not average throughput. The key question is: how many concurrent HTTP requests can your ingest handler sustain while keeping P99 latency under 200ms?

For a Postgres-backed queue (which GetHook uses), the bottleneck is typically write throughput to the events table. Benchmark this before you need it:

bash

# Benchmark ingest write throughput using wrk
wrk -t 8 -c 200 -d 30s \
  -s post_event.lua \
  https://your-ingest-host/ingest/src_abc123

A well-tuned single Postgres instance can sustain 5,000–10,000 INSERT operations per second for simple event rows. That's enough for most burst scenarios. If you need more, consider:

›Connection pooling via PgBouncer — reduces per-connection overhead significantly under concurrent load
›Bulk inserts — batch multiple events in a single INSERT ... VALUES (...) statement when processing from a buffer
›Partitioned tables — partition the events table by date so writes land on the current partition, reducing index contention

Horizontal ingest scaling

Because the ingest endpoint is stateless (it just writes to Postgres), you can run multiple instances behind a load balancer and scale horizontally. The only shared state is the database.

Provider → Load Balancer → [ingest-1, ingest-2, ingest-3, ...] → Postgres

Add instances until your write throughput ceiling is the bottleneck. At that point, move to sharded writes or a message queue in front of Postgres.

Controlling Processing Rate With a Worker Pool

The delivery worker is where back-pressure matters. You don't want to process events as fast as possible — you want to process them at a rate that your downstream services can absorb.

A Postgres job queue with FOR UPDATE SKIP LOCKED gives you natural concurrency control: the number of concurrent delivery workers determines your processing rate.

sql

-- Workers compete for the next batch of events
SELECT id, payload, destination_id
FROM events
WHERE status = 'queued'
  AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED;

With 5 workers each polling for 10 events, you're processing up to 50 events per poll cycle. Increase worker count to scale up, decrease to throttle.

For burst handling specifically, consider a dynamic worker pool that scales the number of workers based on queue depth:

Queue depth	Worker count
0–100 events	2 workers
100–1,000 events	5 workers
1,000–10,000 events	20 workers
> 10,000 events	50 workers (max)

The max cap is important. Scaling workers indefinitely to drain a burst queue will hammer downstream services with more traffic than they can handle — which converts your burst problem into a downstream outage.

Per-Destination Rate Limiting

Not all destinations are equal. During a burst, you may be delivering to 50 different customer endpoints. Some are robust, some are flimsy. Hammering all of them at maximum throughput will cause failures in the flimsy ones, which triggers retries, which makes the backlog worse.

Implement per-destination rate limiting with a token bucket:

type DestinationLimiter struct {
    mu       sync.Mutex
    buckets  map[string]*rate.Limiter
    rps      float64 // requests per second per destination
}

func (l *DestinationLimiter) Allow(destinationID string) bool {
    l.mu.Lock()
    limiter, ok := l.buckets[destinationID]
    if !ok {
        limiter = rate.NewLimiter(rate.Limit(l.rps), int(l.rps*2))
        l.buckets[destinationID] = limiter
    }
    l.mu.Unlock()
    return limiter.Allow()
}

A reasonable default is 10 requests/second per destination, with the ability to configure higher limits for destinations that have demonstrated capacity. When a destination starts returning 429 (Too Many Requests), respect the Retry-After header and back off that specific destination without pausing delivery to others.

Handling Provider-Specific Retry Behavior

Different providers have different retry policies, and understanding them changes how you should handle failures:

Provider	Retry window	Retry count	Retry on
Stripe	72 hours	Up to ~87 attempts	4xx (except 400), 5xx, timeout
GitHub	3 days	Not published	Non-200 responses
Shopify	48 hours	Up to 19 attempts	4xx (except 410), 5xx
Twilio	4 hours	Up to 3 attempts	4xx (except 400/401), 5xx
SendGrid	72 hours	Variable	4xx, 5xx

The critical insight here: if your endpoint returns 5xx during a burst, Stripe will retry for up to 72 hours. That's your burst becoming a multi-day tail. A 200 OK that you process asynchronously is always better than a 503 that triggers weeks of retries.

Return 200 OK the moment you've durably written the event. If your processing later fails, that's your internal retry problem — not the provider's.

Testing Burst Readiness Before It Matters

Run a load test against your ingest endpoint at 10x expected peak before every major traffic event:

bash

# Generate a burst of 5,000 concurrent webhook events
# post_stripe_event.lua sends a signed Stripe-format payload
wrk -t 16 -c 500 -d 60s \
  --timeout 5s \
  -s post_stripe_event.lua \
  https://staging.yoursaas.com/ingest/src_abc123

# Measure:
# - Requests/sec sustained
# - P99 latency (should be < 200ms)
# - Error rate (should be 0%)
# - Queue depth after burst ends
# - Time to drain queue back to 0

Track queue drain time specifically. If it takes 30 minutes to drain a 60-second burst, you have a worker capacity problem that will be visible to customers as delivery delays.

GetHook's delivery pipeline is designed around the accept-fast-process-slow pattern — ingest endpoints that return 200 OK in under 50ms and a worker pool that can be scaled independently based on queue depth. During burst conditions, the queue acts as a buffer so no events are dropped and delivery continues at a controlled rate.

A Checklist for Burst Readiness

Before your next high-traffic event:

› Ingest endpoint does no processing — writes to queue and returns 200
› Ingest tier is horizontally scalable (stateless, behind a load balancer)
› Worker pool has a tested maximum concurrency cap
› Per-destination rate limiting is in place
› Queue depth alert is configured (fire at > 5,000 events)
› Load test run at 10x expected peak within the last 30 days
› Provider retry policies documented — know your recovery window
› Runbook exists for "queue not draining" scenario

The teams that handle Black Friday well aren't the ones with the most capacity — they're the ones who decoupled their ingest from their processing and never let the two bottlenecks interfere with each other.

If you want webhook infrastructure that handles bursts without custom operations work, start with GetHook →

Handling Webhook Bursts From Third-Party Providers

Why Webhooks Are Harder to Burst-Handle Than API Calls

The Core Principle: Accept Fast, Process Slow

Sizing Your Ingest Tier for Burst Traffic

Horizontal ingest scaling

Controlling Processing Rate With a Worker Pool

Per-Destination Rate Limiting

Handling Provider-Specific Retry Behavior

Testing Burst Readiness Before It Matters

A Checklist for Burst Readiness

Related articles

Webhook Consumer Observability: Metrics and Alerts on the Receiving End

Synthetic End-to-End Testing for Webhook Delivery Pipelines

Change Data Capture: Triggering Webhooks Directly from Your Database

Stop losing webhook events.