Back to Blog
reliabilitytestingobservabilitywebhooksinfrastructure

Synthetic End-to-End Testing for Webhook Delivery Pipelines

Unit tests and health checks don't tell you whether an event ingested right now will reach its destination in the next 30 seconds. Synthetic canary events do — and they catch the failures your other monitors miss.

A
Aleksa Vukovic
Developer Relations
April 25, 2026
10 min read

Your webhook delivery pipeline is a chain of at least four moving parts: the ingest endpoint, the persistence layer, the delivery queue, and the worker process that makes the outbound HTTP call. Each link can fail independently. A deployment can break the worker without touching the ingest endpoint. A database migration can silently stall the queue without returning errors. The ingest endpoint can accept events and return 200 OK while the downstream worker is wedged.

Standard monitoring — uptime checks on your API, process health on your workers, database connection pool metrics — tells you whether each component is alive. It does not tell you whether an event traveling through all of them right now will actually arrive at its destination. That gap is where synthetic end-to-end testing lives.

Synthetic testing means deliberately sending known test events through your real production pipeline on a schedule and asserting that they arrive at a controlled destination within an expected time window. It catches delivery regressions the moment they happen rather than the moment a customer files a ticket.


What Synthetic Testing Catches That Health Checks Miss

Before writing any code, it's worth being specific about the failure modes that synthetic testing surfaces and health checks do not.

FailureStandard Health CheckSynthetic E2E Test
Worker process is running but not polling the queueNot caught — process is "up"Caught — event never delivered
Queue backlog spike — events enqueued but not dispatchedNot caught unless queue depth alerting is configuredCaught — delivery latency exceeds threshold
Route misconfiguration — event type not matching any destinationNot caughtCaught — canary event goes undelivered
HMAC signing failure — worker signs with stale keyNot caught — delivery returns 2xx if destination ignores signaturesCaught if canary destination verifies the signature
Network connectivity between worker and destination brokenNot caught by ingest or queue healthCaught — delivery fails with network error
Database index bloat slowing queue polling to a crawlNot caught at P50 — only visible at P99Caught — latency threshold exceeded even at modest load

The worker-not-polling failure is particularly insidious. A worker process that crashed and restarted with a config error might be running but consuming no events. Your process health check sees a running process. Your API health check sees a healthy ingest endpoint. Nothing alerts. Your customers start noticing that their webhooks stopped arriving.


The Architecture of a Canary Pipeline

You need three components to run synthetic e2e tests:

  1. A canary sender — a scheduled job that injects a known test event into your pipeline
  2. A canary receiver — an HTTP endpoint you control that accepts and records delivery
  3. An assertion service — something that checks whether the sent event was received within the latency SLA

The canary sender and assertion service can be the same job, with a delay between send and check. The receiver is a separate endpoint — it can be as simple as a serverless function that writes to a table.

Canary Sender ──► POST /ingest/{token} ──► Queue ──► Worker ──► Canary Receiver
                                                                      │
                                                              records received_at
                                                                      │
Assertion Job ◄─────────────────── checks: (sent_at + SLA) > received_at? ──┘

Step 1: Build the Canary Receiver

The receiver is a purpose-built HTTP endpoint that accepts webhook deliveries and records them. Keep it simple — it should do almost nothing that could fail independently.

go
package main

import (
    "database/sql"
    "encoding/json"
    "log"
    "net/http"
    "time"

    _ "github.com/lib/pq"
)

type CanaryRecord struct {
    EventID     string    `json:"id"`
    ReceivedAt  time.Time `json:"received_at"`
    Source      string    `json:"source"`
}

func canaryHandler(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        var payload struct {
            ID   string `json:"id"`
            Type string `json:"type"`
        }

        if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
            http.Error(w, "bad request", http.StatusBadRequest)
            return
        }

        _, err := db.ExecContext(r.Context(), `
            INSERT INTO canary_receipts (event_id, received_at, source)
            VALUES ($1, NOW(), $2)
            ON CONFLICT (event_id) DO NOTHING
        `, payload.ID, r.Header.Get("X-Canary-Source"))

        if err != nil {
            log.Printf("canary insert failed: %v", err)
            // Still return 200 — don't cause the delivery worker to retry
        }

        w.WriteHeader(http.StatusOK)
    }
}

The ON CONFLICT DO NOTHING on event_id handles the case where your delivery layer retries a canary event — you only want to record the first receipt. The handler always returns 200 OK even on DB failure to avoid triggering delivery retries that would pollute your latency measurements.


Step 2: The Canary Schema

sql
CREATE TABLE canary_events (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id     TEXT NOT NULL UNIQUE,   -- the ID you inject into the pipeline
    pipeline     TEXT NOT NULL,          -- "primary", "high-priority", etc.
    sent_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    sla_seconds  INT NOT NULL DEFAULT 30,
    received_at  TIMESTAMPTZ,            -- filled by receiver
    latency_ms   INT GENERATED ALWAYS AS (
        EXTRACT(MILLISECONDS FROM (received_at - sent_at))::INT
    ) STORED
);

CREATE TABLE canary_receipts (
    event_id    TEXT PRIMARY KEY,
    received_at TIMESTAMPTZ NOT NULL,
    source      TEXT
);

CREATE INDEX canary_events_unresolved ON canary_events (sent_at)
    WHERE received_at IS NULL;

The latency_ms generated column gives you delivery latency as a first-class metric without a JOIN — useful for dashboards and alerting queries.


Step 3: The Sender Job

The sender fires on a cron schedule — every 60 seconds is a reasonable cadence for a production pipeline. It creates a uniquely identifiable event and ingests it through the real ingest endpoint.

bash
#!/usr/bin/env bash
# canary-send.sh — run every 60 seconds

set -euo pipefail

PIPELINE="${PIPELINE:-primary}"
SLA_SECONDS="${SLA_SECONDS:-30}"
INGEST_URL="${INGEST_URL}"      # e.g. https://ingest.gethook.to/ingest/src_abc123
EVENT_ID="canary_$(date +%s)_$(openssl rand -hex 4)"

# Record the send in the canary table
psql "$CANARY_DB_URL" -c "
  INSERT INTO canary_events (event_id, pipeline, sla_seconds)
  VALUES ('$EVENT_ID', '$PIPELINE', $SLA_SECONDS)
"

# Fire the event into the real ingest pipeline
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
  -X POST "$INGEST_URL" \
  -H "Content-Type: application/json" \
  -H "X-Canary-Source: $PIPELINE" \
  -d "{
    \"id\": \"$EVENT_ID\",
    \"type\": \"canary.ping\",
    \"sent_at\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",
    \"pipeline\": \"$PIPELINE\"
  }")

if [ "$HTTP_STATUS" != "200" ]; then
  echo "ALERT: canary ingest returned $HTTP_STATUS" >&2
  exit 1
fi

echo "canary event $EVENT_ID sent, HTTP $HTTP_STATUS"

The X-Canary-Source header identifies this as a canary delivery in the receiver logs, so you can distinguish canary traffic from real customer events in your observability stack.


Step 4: The Assertion Job

The assertion job runs slightly after the SLA window. For a 30-second SLA, run it every 60 seconds with a 45-second lookback. It finds canary events that were sent more than sla_seconds ago but have not been received.

sql
-- Canary events past their SLA with no receipt
SELECT
    ce.event_id,
    ce.pipeline,
    ce.sent_at,
    ce.sla_seconds,
    EXTRACT(EPOCH FROM (NOW() - ce.sent_at))::INT AS age_seconds,
    cr.received_at
FROM canary_events ce
LEFT JOIN canary_receipts cr ON cr.event_id = ce.event_id
WHERE
    ce.sent_at > NOW() - INTERVAL '10 minutes'   -- don't look too far back
    AND ce.sent_at < NOW() - (ce.sla_seconds || ' seconds')::INTERVAL
    AND cr.received_at IS NULL;

If this query returns any rows, you have a delivery regression. Fire an alert: PagerDuty, Slack, or wherever your on-call team watches.

Also update resolved canary records with receipt data for latency tracking:

sql
UPDATE canary_events ce
SET received_at = cr.received_at
FROM canary_receipts cr
WHERE cr.event_id = ce.event_id
  AND ce.received_at IS NULL;

What to Measure

Once canary events are flowing, you have a continuous latency signal from ingest to delivery. Track:

MetricAlert Threshold
canary_sla_miss_rate (missed / sent per 5 min)> 0% over 3 consecutive windows
canary_latency_p50> 2× baseline
canary_latency_p99> SLA threshold
canary_ingest_failure_rate> 0 (ingest returned non-200)
canary_age_max (oldest unresolved event)> 2× SLA

The canary_age_max metric is particularly useful for catching slow degradations — a worker that's polling at half speed shows up as gradual latency creep before it becomes a full SLA miss.


Isolating Canary Traffic in Production

Canary events should not pollute your customers' event feeds or dashboards. A few approaches:

Use a dedicated canary source. Create a source specifically for canary testing with a route to your canary receiver. Canary events never touch customer destinations. This is the cleanest isolation.

Filter in your event listing queries. Add a type != 'canary.ping' filter to any customer-facing event list. Canary events are still stored and visible in your internal ops tooling, but hidden from customer dashboards.

Tag canary events explicitly. Use a metadata field or custom header to mark events as synthetic. Some teams add a "canary": true field to the payload. This lets you filter in logs, metrics, and traces without needing a dedicated source.

GetHook's event filtering lets you route by event_type_pattern, which means a pattern like canary.* can send all canary events exclusively to a dedicated destination without any overlap with customer routes.


Multi-Region and Multi-Pipeline Canaries

If you run delivery workers across multiple regions or priority queues (e.g., a high-priority queue for paid accounts, a standard queue for free accounts), run a separate canary per path.

CanaryPipelineSLA
canary-primary-usStandard queue, US worker60s
canary-primary-euStandard queue, EU worker60s
canary-priority-usHigh-priority queue, US worker15s

Each canary has its own source, its own route to a regional receiver, and its own SLA threshold. A regression in the EU pipeline shows up immediately without obscuring the US pipeline's health signal — and vice versa.


Avoiding Alert Fatigue

Synthetic testing produces false positives when your test infrastructure itself fails. The most common culprits:

  • The canary sender script fails because psql isn't available in the container
  • The canary receiver is deployed to a non-production environment and the URL isn't updated
  • The assertion job runs before the SLA window has fully elapsed for the first batch of events after a restart

Guard against these by: tracking sender failures as a separate metric (not as a delivery SLA miss), running the assertion job with a generous buffer after the SLA (sla_seconds + 15), and alerting on "canary sender hasn't fired in 3 minutes" as a separate signal from "canary event missed SLA."

The goal is a signal with high specificity: when the alert fires, you have high confidence the delivery pipeline is broken, not just that the monitoring script is misconfigured.


Synthetic end-to-end testing is the difference between finding out your delivery pipeline is broken from a customer and finding out from your own alerting — before the customer notices. The implementation is straightforward; the operational discipline of keeping the canary tests healthy and the SLA thresholds calibrated is what makes it stick.

If you're building on GetHook and want to add canary delivery to your monitoring setup, start here to configure a dedicated source and route for your test events.

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.