The first time most teams learn their webhook pipeline has a throughput ceiling is when a provider sends a burst during a sale, a migration, or a major incident — and events start backing up. By then, you're already in triage mode.

Load testing a webhook pipeline is not the same as load testing a REST API. The ingest layer is only one piece. The queue depth, worker concurrency, per-destination retry isolation, and database I/O under sustained load all behave differently than they do at normal throughput. This post walks through how to test each layer systematically, what metrics to watch, and how to interpret the results before they become incident postmortems.

What You're Actually Testing

A webhook pipeline has at least three distinct layers, each with its own failure modes:

Layer	What can fail	Leading indicator
Ingest	Request queue saturation, body parsing overhead, signature verification CPU	p99 response time rising, 503s
Job queue	Table bloat, lock contention, worker starvation	`queued` event count growing, next_attempt_at lag
Delivery worker	Per-destination concurrency limits, network timeouts, connection pool exhaustion	Attempt rate falling below enqueue rate

A load test that only hammers the ingest endpoint and declares success because it returned 200s misses the queue and worker layers entirely. The ingest endpoint can look healthy right up until the queue is 200,000 events deep and workers are 4 hours behind.

Test all three layers.

Setting Up a Realistic Load Profile

Before you generate load, build a realistic event profile. Most webhook pipelines have non-uniform traffic: a baseline of steady events, occasional spikes from external providers, and periodic replay bursts from customers who missed events.

A useful model for test scenarios:

Scenario A: Sustained baseline
  Rate: 200 events/second for 15 minutes
  Goal: Verify steady-state delivery lag stays under 5 seconds

Scenario B: Provider burst
  Rate: ramp from 200 to 2000 events/second over 30 seconds,
        sustain for 5 minutes, ramp back down
  Goal: Queue drains within 10 minutes of burst ending

Scenario C: Replay flood
  Rate: 5000 events enqueued immediately (batch insert)
  Goal: Delivery workers process all events within 30 minutes;
        no events drop to dead-letter

These three scenarios test different parts of your pipeline. Scenario A tests steady-state behavior. Scenario B tests your queue's ability to absorb and drain a spike. Scenario C tests bulk-replay throughput without assuming a steady ingest stream.

Generating Load Against the Ingest Layer

k6 is the right tool for this. It runs in Go, handles high concurrency cleanly, and lets you model ramp profiles in code.

javascript

// k6 load test for webhook ingest
import http from 'k6/http';
import { check, sleep } from 'k6';
import { randomBytes } from 'k6/crypto';

export const options = {
  scenarios: {
    sustained_baseline: {
      executor: 'constant-arrival-rate',
      rate: 200,
      timeUnit: '1s',
      duration: '15m',
      preAllocatedVUs: 50,
      maxVUs: 200,
    },
  },
  thresholds: {
    http_req_duration: ['p99<200'],   // 99th percentile under 200ms
    http_req_failed: ['rate<0.001'],  // error rate under 0.1%
  },
};

const INGEST_URL = __ENV.INGEST_URL || 'http://localhost:8080/ingest/src_test_token';

export default function () {
  const payload = JSON.stringify({
    id: `evt_${Math.random().toString(36).slice(2)}`,
    type: 'order.created',
    created: Math.floor(Date.now() / 1000),
    data: { order_id: 12345, amount: 9900, currency: 'usd' },
  });

  const res = http.post(INGEST_URL, payload, {
    headers: { 'Content-Type': 'application/json' },
  });

  check(res, {
    'status is 202': (r) => r.status === 202,
  });
}

Key detail: use constant-arrival-rate executor, not constant-vus. With constant-vus, if your endpoint slows down, VUs back off and effective RPS drops — you stop testing your target rate right when things get interesting. constant-arrival-rate maintains the request rate regardless of latency, which is what a real provider sending webhooks does.

Monitoring the Queue During Load

While k6 generates ingest traffic, you need a parallel view of queue depth. If you're using a Postgres-backed job queue (like GetHook does), this is a query you should have running in a separate terminal or piped into your observability stack throughout the test:

sql

-- Poll this every 5 seconds during load test
SELECT
    status,
    COUNT(*) AS count,
    MIN(created_at) AS oldest_event,
    EXTRACT(EPOCH FROM (now() - MIN(created_at))) AS oldest_lag_seconds
FROM events
WHERE created_at > now() - INTERVAL '1 hour'
GROUP BY status
ORDER BY status;

What you want to see: queued count grows during the burst, then drains. What you don't want to see: queued count grows monotonically without draining, or retry_scheduled count climbing while delivered count stalls.

Add a second query to track worker throughput:

sql

-- Delivery attempt rate over rolling 1-minute windows
SELECT
    date_trunc('minute', created_at) AS window,
    COUNT(*) AS attempts,
    COUNT(*) FILTER (WHERE outcome = 'success') AS successes,
    COUNT(*) FILTER (WHERE outcome != 'success') AS failures
FROM delivery_attempts
WHERE created_at > now() - INTERVAL '30 minutes'
GROUP BY 1
ORDER BY 1;

If attempt rate (row count per minute) falls significantly below ingest rate, workers are the bottleneck.

Identifying the Real Bottleneck

Most pipelines hit one of three bottlenecks first, and the symptoms are distinct:

CPU-bound ingest: p99 latency climbs, CPU on ingest nodes approaches 100%, but queue depth stays modest. Fix: horizontal scale on ingest, or optimize signature verification (cache secrets, avoid double-parsing the body).

Queue lock contention: Worker throughput is lower than expected, and pg_stat_activity shows queries waiting on row locks. This happens when your poll query doesn't use FOR UPDATE SKIP LOCKED, or when the job table lacks the right indexes. Fix: add the SKIP LOCKED clause and index on (status, scheduled_at).

Worker connection pool exhaustion: Workers are spawning goroutines but delivery latency is high. Check pg_stat_activity — if you see many connections in idle in transaction state, your connection pool is undersized for your worker concurrency. Fix: tune max_open_conns and max_idle_conns on the pool, or reduce worker concurrency to match available connections.

A quick diagnostic query for connection usage during load:

sql

SELECT
    state,
    COUNT(*) AS connections,
    MAX(EXTRACT(EPOCH FROM (now() - state_change))) AS longest_seconds
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
ORDER BY connections DESC;

More than ~20% of your max_connections sitting in idle in transaction during load is a signal to tune.

Testing the Delivery Layer Directly

The ingest load test exercises the ingest path and queue. To test delivery throughput in isolation — without waiting for ingest to populate the queue — batch-insert synthetic events directly into your events table:

sql

-- Insert 10,000 synthetic queued events pointing to a test destination
INSERT INTO events (
    id, account_id, source_id, direction, status,
    event_type, payload, next_attempt_at, created_at
)
SELECT
    gen_random_uuid(),
    'your-account-id',
    'your-source-id',
    'inbound',
    'queued',
    'order.created',
    '{"id": "test", "type": "order.created"}'::jsonb,
    now(),
    now() - (random() * INTERVAL '10 minutes')
FROM generate_series(1, 10000);

This lets you measure delivery worker throughput in events/second without having to run the ingest layer at scale. Point the destination to a simple HTTP sink that returns 200 immediately (an http.HandleFunc that reads the body and responds in ~1ms), and measure how long the worker takes to drain the queue.

A good baseline target for a single worker node: drain 10,000 events in under 10 minutes (~17 events/second). If you're below that, you're either hitting destination I/O limits or worker concurrency is too conservative.

Setting Pass/Fail Thresholds

Load tests without defined pass/fail criteria are expensive noise. Define thresholds before you run:

Metric	Acceptable	Degraded	Failing
Ingest p99 latency	< 150 ms	150–500 ms	> 500 ms
Ingest error rate	< 0.1%	0.1–1%	> 1%
Queue drain time after burst	< 10 min	10–30 min	> 30 min
Dead-letter rate	0%	< 0.1%	> 0.1%
Worker throughput (events/sec)	> 20	10–20	< 10

These thresholds should match your SLA commitments. If you've told customers "events deliver within 60 seconds under normal conditions," then a 30-minute drain time after a burst violates that SLA even if nothing technically errored.

Running Load Tests in CI

One-off load tests are useful. Automated load tests in CI catch regressions before they reach production. The practical approach: run a lightweight version (30 seconds at 200 RPS, not 15 minutes) as a step in your staging deployment pipeline.

yaml

# .github/workflows/load-test.yml (excerpt)
- name: Run ingest load test
  env:
    INGEST_URL: ${{ secrets.STAGING_INGEST_URL }}
  run: |
    k6 run \
      --env INGEST_URL=$INGEST_URL \
      --out json=load-results.json \
      scripts/k6/ingest-baseline.js

- name: Assert thresholds
  run: |
    # k6 exits non-zero if thresholds are breached
    # CI fails the step automatically
    echo "Load test passed"

A 30-second baseline at 200 RPS won't catch every bottleneck, but it will catch a slow database migration that added unindexed columns to the job queue, a code change that moved to per-request secret fetching, or a deploy that accidentally dropped the connection pool size. These are the common regressions — catching them in staging is far cheaper than in production.

What GetHook Exposes for Observability

If you're using GetHook as your delivery layer, the delivery attempt log and event status timeline give you the same visibility during load tests that you'd otherwise build yourself. Watching the events table drain via the dashboard during a burst test is a fast sanity check — if the queued count stops decreasing, something is wrong in the worker layer before you've even written a query.

The /healthz and /readyz endpoints are also worth polling from your load testing script. /readyz returning non-200 during a burst is an early signal of database connection exhaustion, not just a signal that the health endpoint itself is slow.

Load testing a webhook pipeline requires thinking end-to-end: ingest throughput, queue drain rate, and worker delivery speed are three separate numbers, and your pipeline is only as strong as the slowest one. Build the test suite once, define your thresholds, and run it on every major deploy. The burst that finds your bottleneck first should be yours.

If you'd rather start with a delivery layer that's already been stress-tested and instrumented, set up GetHook in under 10 minutes →

Load Testing Your Webhook Pipeline Before Production Does It for You

What You're Actually Testing

Setting Up a Realistic Load Profile

Generating Load Against the Ingest Layer

Monitoring the Queue During Load

Identifying the Real Bottleneck

Testing the Delivery Layer Directly

Setting Pass/Fail Thresholds

Running Load Tests in CI

What GetHook Exposes for Observability

Related articles

Webhook Consumer Observability: Metrics and Alerts on the Receiving End

Designing a Great Webhook SDK: Verification, Typing, and Developer Ergonomics

Synthetic End-to-End Testing for Webhook Delivery Pipelines

Stop losing webhook events.