Your webhook delivery pipeline is a chain of at least four moving parts: the ingest endpoint, the persistence layer, the delivery queue, and the worker process that makes the outbound HTTP call. Each link can fail independently. A deployment can break the worker without touching the ingest endpoint. A database migration can silently stall the queue without returning errors. The ingest endpoint can accept events and return 200 OK while the downstream worker is wedged.

Standard monitoring — uptime checks on your API, process health on your workers, database connection pool metrics — tells you whether each component is alive. It does not tell you whether an event traveling through all of them right now will actually arrive at its destination. That gap is where synthetic end-to-end testing lives.

Synthetic testing means deliberately sending known test events through your real production pipeline on a schedule and asserting that they arrive at a controlled destination within an expected time window. It catches delivery regressions the moment they happen rather than the moment a customer files a ticket.

What Synthetic Testing Catches That Health Checks Miss

Before writing any code, it's worth being specific about the failure modes that synthetic testing surfaces and health checks do not.

Failure	Standard Health Check	Synthetic E2E Test
Worker process is running but not polling the queue	Not caught — process is "up"	Caught — event never delivered
Queue backlog spike — events enqueued but not dispatched	Not caught unless queue depth alerting is configured	Caught — delivery latency exceeds threshold
Route misconfiguration — event type not matching any destination	Not caught	Caught — canary event goes undelivered
HMAC signing failure — worker signs with stale key	Not caught — delivery returns 2xx if destination ignores signatures	Caught if canary destination verifies the signature
Network connectivity between worker and destination broken	Not caught by ingest or queue health	Caught — delivery fails with network error
Database index bloat slowing queue polling to a crawl	Not caught at P50 — only visible at P99	Caught — latency threshold exceeded even at modest load

The worker-not-polling failure is particularly insidious. A worker process that crashed and restarted with a config error might be running but consuming no events. Your process health check sees a running process. Your API health check sees a healthy ingest endpoint. Nothing alerts. Your customers start noticing that their webhooks stopped arriving.

The Architecture of a Canary Pipeline

You need three components to run synthetic e2e tests:

›A canary sender — a scheduled job that injects a known test event into your pipeline
›A canary receiver — an HTTP endpoint you control that accepts and records delivery
›An assertion service — something that checks whether the sent event was received within the latency SLA

The canary sender and assertion service can be the same job, with a delay between send and check. The receiver is a separate endpoint — it can be as simple as a serverless function that writes to a table.

Canary Sender ──► POST /ingest/{token} ──► Queue ──► Worker ──► Canary Receiver
                                                                      │
                                                              records received_at
                                                                      │
Assertion Job ◄─────────────────── checks: (sent_at + SLA) > received_at? ──┘

Step 1: Build the Canary Receiver

The receiver is a purpose-built HTTP endpoint that accepts webhook deliveries and records them. Keep it simple — it should do almost nothing that could fail independently.

package main

import (
    "database/sql"
    "encoding/json"
    "log"
    "net/http"
    "time"

    _ "github.com/lib/pq"
)

type CanaryRecord struct {
    EventID     string    `json:"id"`
    ReceivedAt  time.Time `json:"received_at"`
    Source      string    `json:"source"`
}

func canaryHandler(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        var payload struct {
            ID   string `json:"id"`
            Type string `json:"type"`
        }

        if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
            http.Error(w, "bad request", http.StatusBadRequest)
            return
        }

        _, err := db.ExecContext(r.Context(), `
            INSERT INTO canary_receipts (event_id, received_at, source)
            VALUES ($1, NOW(), $2)
            ON CONFLICT (event_id) DO NOTHING
        `, payload.ID, r.Header.Get("X-Canary-Source"))

        if err != nil {
            log.Printf("canary insert failed: %v", err)
            // Still return 200 — don't cause the delivery worker to retry
        }

        w.WriteHeader(http.StatusOK)
    }
}

The ON CONFLICT DO NOTHING on event_id handles the case where your delivery layer retries a canary event — you only want to record the first receipt. The handler always returns 200 OK even on DB failure to avoid triggering delivery retries that would pollute your latency measurements.

Step 2: The Canary Schema

sql

CREATE TABLE canary_events (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id     TEXT NOT NULL UNIQUE,   -- the ID you inject into the pipeline
    pipeline     TEXT NOT NULL,          -- "primary", "high-priority", etc.
    sent_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    sla_seconds  INT NOT NULL DEFAULT 30,
    received_at  TIMESTAMPTZ,            -- filled by receiver
    latency_ms   INT GENERATED ALWAYS AS (
        EXTRACT(MILLISECONDS FROM (received_at - sent_at))::INT
    ) STORED
);

CREATE TABLE canary_receipts (
    event_id    TEXT PRIMARY KEY,
    received_at TIMESTAMPTZ NOT NULL,
    source      TEXT
);

CREATE INDEX canary_events_unresolved ON canary_events (sent_at)
    WHERE received_at IS NULL;

The latency_ms generated column gives you delivery latency as a first-class metric without a JOIN — useful for dashboards and alerting queries.

Step 3: The Sender Job

The sender fires on a cron schedule — every 60 seconds is a reasonable cadence for a production pipeline. It creates a uniquely identifiable event and ingests it through the real ingest endpoint.

bash

#!/usr/bin/env bash
# canary-send.sh — run every 60 seconds

set -euo pipefail

PIPELINE="${PIPELINE:-primary}"
SLA_SECONDS="${SLA_SECONDS:-30}"
INGEST_URL="${INGEST_URL}"      # e.g. https://ingest.gethook.to/ingest/src_abc123
EVENT_ID="canary_$(date +%s)_$(openssl rand -hex 4)"

# Record the send in the canary table
psql "$CANARY_DB_URL" -c "
  INSERT INTO canary_events (event_id, pipeline, sla_seconds)
  VALUES ('$EVENT_ID', '$PIPELINE', $SLA_SECONDS)
"

# Fire the event into the real ingest pipeline
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
  -X POST "$INGEST_URL" \
  -H "Content-Type: application/json" \
  -H "X-Canary-Source: $PIPELINE" \
  -d "{
    \"id\": \"$EVENT_ID\",
    \"type\": \"canary.ping\",
    \"sent_at\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",
    \"pipeline\": \"$PIPELINE\"
  }")

if [ "$HTTP_STATUS" != "200" ]; then
  echo "ALERT: canary ingest returned $HTTP_STATUS" >&2
  exit 1
fi

echo "canary event $EVENT_ID sent, HTTP $HTTP_STATUS"

The X-Canary-Source header identifies this as a canary delivery in the receiver logs, so you can distinguish canary traffic from real customer events in your observability stack.

Step 4: The Assertion Job

The assertion job runs slightly after the SLA window. For a 30-second SLA, run it every 60 seconds with a 45-second lookback. It finds canary events that were sent more than sla_seconds ago but have not been received.

sql

-- Canary events past their SLA with no receipt
SELECT
    ce.event_id,
    ce.pipeline,
    ce.sent_at,
    ce.sla_seconds,
    EXTRACT(EPOCH FROM (NOW() - ce.sent_at))::INT AS age_seconds,
    cr.received_at
FROM canary_events ce
LEFT JOIN canary_receipts cr ON cr.event_id = ce.event_id
WHERE
    ce.sent_at > NOW() - INTERVAL '10 minutes'   -- don't look too far back
    AND ce.sent_at < NOW() - (ce.sla_seconds || ' seconds')::INTERVAL
    AND cr.received_at IS NULL;

If this query returns any rows, you have a delivery regression. Fire an alert: PagerDuty, Slack, or wherever your on-call team watches.

Also update resolved canary records with receipt data for latency tracking:

sql

UPDATE canary_events ce
SET received_at = cr.received_at
FROM canary_receipts cr
WHERE cr.event_id = ce.event_id
  AND ce.received_at IS NULL;

What to Measure

Once canary events are flowing, you have a continuous latency signal from ingest to delivery. Track:

Metric	Alert Threshold
`canary_sla_miss_rate` (missed / sent per 5 min)	> 0% over 3 consecutive windows
`canary_latency_p50`	> 2× baseline
`canary_latency_p99`	> SLA threshold
`canary_ingest_failure_rate`	> 0 (ingest returned non-200)
`canary_age_max` (oldest unresolved event)	> 2× SLA

The canary_age_max metric is particularly useful for catching slow degradations — a worker that's polling at half speed shows up as gradual latency creep before it becomes a full SLA miss.

Isolating Canary Traffic in Production

Canary events should not pollute your customers' event feeds or dashboards. A few approaches:

Use a dedicated canary source. Create a source specifically for canary testing with a route to your canary receiver. Canary events never touch customer destinations. This is the cleanest isolation.

Filter in your event listing queries. Add a type != 'canary.ping' filter to any customer-facing event list. Canary events are still stored and visible in your internal ops tooling, but hidden from customer dashboards.

Tag canary events explicitly. Use a metadata field or custom header to mark events as synthetic. Some teams add a "canary": true field to the payload. This lets you filter in logs, metrics, and traces without needing a dedicated source.

GetHook's event filtering lets you route by event_type_pattern, which means a pattern like canary.* can send all canary events exclusively to a dedicated destination without any overlap with customer routes.

Multi-Region and Multi-Pipeline Canaries

If you run delivery workers across multiple regions or priority queues (e.g., a high-priority queue for paid accounts, a standard queue for free accounts), run a separate canary per path.

Canary	Pipeline	SLA
`canary-primary-us`	Standard queue, US worker	60s
`canary-primary-eu`	Standard queue, EU worker	60s
`canary-priority-us`	High-priority queue, US worker	15s

Each canary has its own source, its own route to a regional receiver, and its own SLA threshold. A regression in the EU pipeline shows up immediately without obscuring the US pipeline's health signal — and vice versa.

Avoiding Alert Fatigue

Synthetic testing produces false positives when your test infrastructure itself fails. The most common culprits:

›The canary sender script fails because psql isn't available in the container
›The canary receiver is deployed to a non-production environment and the URL isn't updated
›The assertion job runs before the SLA window has fully elapsed for the first batch of events after a restart

Guard against these by: tracking sender failures as a separate metric (not as a delivery SLA miss), running the assertion job with a generous buffer after the SLA (sla_seconds + 15), and alerting on "canary sender hasn't fired in 3 minutes" as a separate signal from "canary event missed SLA."

The goal is a signal with high specificity: when the alert fires, you have high confidence the delivery pipeline is broken, not just that the monitoring script is misconfigured.

Synthetic end-to-end testing is the difference between finding out your delivery pipeline is broken from a customer and finding out from your own alerting — before the customer notices. The implementation is straightforward; the operational discipline of keeping the canary tests healthy and the SLA thresholds calibrated is what makes it stick.

If you're building on GetHook and want to add canary delivery to your monitoring setup, start here to configure a dedicated source and route for your test events.

Synthetic End-to-End Testing for Webhook Delivery Pipelines

What Synthetic Testing Catches That Health Checks Miss

The Architecture of a Canary Pipeline

Step 1: Build the Canary Receiver

Step 2: The Canary Schema

Step 3: The Sender Job

Step 4: The Assertion Job

What to Measure

Isolating Canary Traffic in Production

Multi-Region and Multi-Pipeline Canaries

Avoiding Alert Fatigue

Related articles

Webhook Payload Transformation: Normalizing, Enriching, and Redacting Events at the Gateway

Webhook Delivery in Regulated Industries: SOC2, HIPAA, and PCI-DSS

Webhook Consumer Observability: Metrics and Alerts on the Receiving End

Stop losing webhook events.