The first time most teams learn their webhook pipeline has a throughput ceiling is when a provider sends a burst during a sale, a migration, or a major incident — and events start backing up. By then, you're already in triage mode.
Load testing a webhook pipeline is not the same as load testing a REST API. The ingest layer is only one piece. The queue depth, worker concurrency, per-destination retry isolation, and database I/O under sustained load all behave differently than they do at normal throughput. This post walks through how to test each layer systematically, what metrics to watch, and how to interpret the results before they become incident postmortems.
What You're Actually Testing
A webhook pipeline has at least three distinct layers, each with its own failure modes:
| Layer | What can fail | Leading indicator |
|---|---|---|
| Ingest | Request queue saturation, body parsing overhead, signature verification CPU | p99 response time rising, 503s |
| Job queue | Table bloat, lock contention, worker starvation | queued event count growing, next_attempt_at lag |
| Delivery worker | Per-destination concurrency limits, network timeouts, connection pool exhaustion | Attempt rate falling below enqueue rate |
A load test that only hammers the ingest endpoint and declares success because it returned 200s misses the queue and worker layers entirely. The ingest endpoint can look healthy right up until the queue is 200,000 events deep and workers are 4 hours behind.
Test all three layers.
Setting Up a Realistic Load Profile
Before you generate load, build a realistic event profile. Most webhook pipelines have non-uniform traffic: a baseline of steady events, occasional spikes from external providers, and periodic replay bursts from customers who missed events.
A useful model for test scenarios:
Scenario A: Sustained baseline
Rate: 200 events/second for 15 minutes
Goal: Verify steady-state delivery lag stays under 5 seconds
Scenario B: Provider burst
Rate: ramp from 200 to 2000 events/second over 30 seconds,
sustain for 5 minutes, ramp back down
Goal: Queue drains within 10 minutes of burst ending
Scenario C: Replay flood
Rate: 5000 events enqueued immediately (batch insert)
Goal: Delivery workers process all events within 30 minutes;
no events drop to dead-letterThese three scenarios test different parts of your pipeline. Scenario A tests steady-state behavior. Scenario B tests your queue's ability to absorb and drain a spike. Scenario C tests bulk-replay throughput without assuming a steady ingest stream.
Generating Load Against the Ingest Layer
k6 is the right tool for this. It runs in Go, handles high concurrency cleanly, and lets you model ramp profiles in code.
// k6 load test for webhook ingest
import http from 'k6/http';
import { check, sleep } from 'k6';
import { randomBytes } from 'k6/crypto';
export const options = {
scenarios: {
sustained_baseline: {
executor: 'constant-arrival-rate',
rate: 200,
timeUnit: '1s',
duration: '15m',
preAllocatedVUs: 50,
maxVUs: 200,
},
},
thresholds: {
http_req_duration: ['p99<200'], // 99th percentile under 200ms
http_req_failed: ['rate<0.001'], // error rate under 0.1%
},
};
const INGEST_URL = __ENV.INGEST_URL || 'http://localhost:8080/ingest/src_test_token';
export default function () {
const payload = JSON.stringify({
id: `evt_${Math.random().toString(36).slice(2)}`,
type: 'order.created',
created: Math.floor(Date.now() / 1000),
data: { order_id: 12345, amount: 9900, currency: 'usd' },
});
const res = http.post(INGEST_URL, payload, {
headers: { 'Content-Type': 'application/json' },
});
check(res, {
'status is 202': (r) => r.status === 202,
});
}Key detail: use constant-arrival-rate executor, not constant-vus. With constant-vus, if your endpoint slows down, VUs back off and effective RPS drops — you stop testing your target rate right when things get interesting. constant-arrival-rate maintains the request rate regardless of latency, which is what a real provider sending webhooks does.
Monitoring the Queue During Load
While k6 generates ingest traffic, you need a parallel view of queue depth. If you're using a Postgres-backed job queue (like GetHook does), this is a query you should have running in a separate terminal or piped into your observability stack throughout the test:
-- Poll this every 5 seconds during load test
SELECT
status,
COUNT(*) AS count,
MIN(created_at) AS oldest_event,
EXTRACT(EPOCH FROM (now() - MIN(created_at))) AS oldest_lag_seconds
FROM events
WHERE created_at > now() - INTERVAL '1 hour'
GROUP BY status
ORDER BY status;What you want to see: queued count grows during the burst, then drains. What you don't want to see: queued count grows monotonically without draining, or retry_scheduled count climbing while delivered count stalls.
Add a second query to track worker throughput:
-- Delivery attempt rate over rolling 1-minute windows
SELECT
date_trunc('minute', created_at) AS window,
COUNT(*) AS attempts,
COUNT(*) FILTER (WHERE outcome = 'success') AS successes,
COUNT(*) FILTER (WHERE outcome != 'success') AS failures
FROM delivery_attempts
WHERE created_at > now() - INTERVAL '30 minutes'
GROUP BY 1
ORDER BY 1;If attempt rate (row count per minute) falls significantly below ingest rate, workers are the bottleneck.
Identifying the Real Bottleneck
Most pipelines hit one of three bottlenecks first, and the symptoms are distinct:
CPU-bound ingest: p99 latency climbs, CPU on ingest nodes approaches 100%, but queue depth stays modest. Fix: horizontal scale on ingest, or optimize signature verification (cache secrets, avoid double-parsing the body).
Queue lock contention: Worker throughput is lower than expected, and pg_stat_activity shows queries waiting on row locks. This happens when your poll query doesn't use FOR UPDATE SKIP LOCKED, or when the job table lacks the right indexes. Fix: add the SKIP LOCKED clause and index on (status, scheduled_at).
Worker connection pool exhaustion: Workers are spawning goroutines but delivery latency is high. Check pg_stat_activity — if you see many connections in idle in transaction state, your connection pool is undersized for your worker concurrency. Fix: tune max_open_conns and max_idle_conns on the pool, or reduce worker concurrency to match available connections.
A quick diagnostic query for connection usage during load:
SELECT
state,
COUNT(*) AS connections,
MAX(EXTRACT(EPOCH FROM (now() - state_change))) AS longest_seconds
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
ORDER BY connections DESC;More than ~20% of your max_connections sitting in idle in transaction during load is a signal to tune.
Testing the Delivery Layer Directly
The ingest load test exercises the ingest path and queue. To test delivery throughput in isolation — without waiting for ingest to populate the queue — batch-insert synthetic events directly into your events table:
-- Insert 10,000 synthetic queued events pointing to a test destination
INSERT INTO events (
id, account_id, source_id, direction, status,
event_type, payload, next_attempt_at, created_at
)
SELECT
gen_random_uuid(),
'your-account-id',
'your-source-id',
'inbound',
'queued',
'order.created',
'{"id": "test", "type": "order.created"}'::jsonb,
now(),
now() - (random() * INTERVAL '10 minutes')
FROM generate_series(1, 10000);This lets you measure delivery worker throughput in events/second without having to run the ingest layer at scale. Point the destination to a simple HTTP sink that returns 200 immediately (an http.HandleFunc that reads the body and responds in ~1ms), and measure how long the worker takes to drain the queue.
A good baseline target for a single worker node: drain 10,000 events in under 10 minutes (~17 events/second). If you're below that, you're either hitting destination I/O limits or worker concurrency is too conservative.
Setting Pass/Fail Thresholds
Load tests without defined pass/fail criteria are expensive noise. Define thresholds before you run:
| Metric | Acceptable | Degraded | Failing |
|---|---|---|---|
| Ingest p99 latency | < 150 ms | 150–500 ms | > 500 ms |
| Ingest error rate | < 0.1% | 0.1–1% | > 1% |
| Queue drain time after burst | < 10 min | 10–30 min | > 30 min |
| Dead-letter rate | 0% | < 0.1% | > 0.1% |
| Worker throughput (events/sec) | > 20 | 10–20 | < 10 |
These thresholds should match your SLA commitments. If you've told customers "events deliver within 60 seconds under normal conditions," then a 30-minute drain time after a burst violates that SLA even if nothing technically errored.
Running Load Tests in CI
One-off load tests are useful. Automated load tests in CI catch regressions before they reach production. The practical approach: run a lightweight version (30 seconds at 200 RPS, not 15 minutes) as a step in your staging deployment pipeline.
# .github/workflows/load-test.yml (excerpt)
- name: Run ingest load test
env:
INGEST_URL: ${{ secrets.STAGING_INGEST_URL }}
run: |
k6 run \
--env INGEST_URL=$INGEST_URL \
--out json=load-results.json \
scripts/k6/ingest-baseline.js
- name: Assert thresholds
run: |
# k6 exits non-zero if thresholds are breached
# CI fails the step automatically
echo "Load test passed"A 30-second baseline at 200 RPS won't catch every bottleneck, but it will catch a slow database migration that added unindexed columns to the job queue, a code change that moved to per-request secret fetching, or a deploy that accidentally dropped the connection pool size. These are the common regressions — catching them in staging is far cheaper than in production.
What GetHook Exposes for Observability
If you're using GetHook as your delivery layer, the delivery attempt log and event status timeline give you the same visibility during load tests that you'd otherwise build yourself. Watching the events table drain via the dashboard during a burst test is a fast sanity check — if the queued count stops decreasing, something is wrong in the worker layer before you've even written a query.
The /healthz and /readyz endpoints are also worth polling from your load testing script. /readyz returning non-200 during a burst is an early signal of database connection exhaustion, not just a signal that the health endpoint itself is slow.
Load testing a webhook pipeline requires thinking end-to-end: ingest throughput, queue drain rate, and worker delivery speed are three separate numbers, and your pipeline is only as strong as the slowest one. Build the test suite once, define your thresholds, and run it on every major deploy. The burst that finds your bottleneck first should be yours.
If you'd rather start with a delivery layer that's already been stress-tested and instrumented, set up GetHook in under 10 minutes →