When you register a brand-new webhook endpoint, you face a problem that rarely gets discussed: what happens to the events that fired before your endpoint existed? For many use cases, the answer is "nothing" — you only care about future events. But for use cases where events represent state transitions — order status changes, subscription lifecycle events, payment reconciliation — missing the historical window means starting with an incomplete picture.

This is the webhook cold start problem. It has four sub-problems that you need to solve in sequence:

›Backfill: How do you catch up on events that fired before your endpoint was registered?
›Ordering: During backfill, how do you handle events arriving out of order?
›Deduplication: During the cutover from backfill to live stream, how do you avoid processing the same event twice?
›Validation: How do you know when your endpoint has reached steady state?

This post works through each of these. The solutions apply whether you're building your own webhook platform or integrating with a third-party provider.

Why Cold Starts Are Easy to Get Wrong

The instinct when onboarding a new consumer is to start it, register the webhook, and trust that everything from that point forward will arrive. For stateless events — "a user clicked something" — this is fine. For stateful events — "an order moved to status X" — you now have a consumer that knows about state transitions from timestamp T onwards, but has no view of the state that existed before T.

The concrete failure mode: your new order management system registers a webhook for order.status_changed. It starts receiving events. Thirty minutes later, you discover that 12 orders were in a payment_failed state before your endpoint came online. Your system has no record of them. When the payment provider sends order.status_changed for one of those orders moving to payment_retry, your consumer has no context for the transition and either drops the event or creates corrupt state.

Backfilling is not optional for stateful consumers. It's part of the bootstrap.

Phase 1: Backfill via Event Replay

Most production webhook platforms expose an event history API — a paginated feed of past events that you can query. Before you start processing live events, you need to replay history up to the moment your subscription became active.

The pattern is a cursor-based replay loop:

func backfill(client *WebhookClient, subscriptionStart time.Time) error {
    cursor := ""
    for {
        resp, err := client.ListEvents(ListEventsParams{
            Before:  subscriptionStart,
            After:   subscriptionStart.Add(-72 * time.Hour), // 3-day window
            Cursor:  cursor,
            Limit:   100,
        })
        if err != nil {
            return fmt.Errorf("backfill fetch: %w", err)
        }

        for _, event := range resp.Events {
            if err := processEvent(event); err != nil {
                // Log but don't abort — partial backfill is better than none
                log.Printf("backfill event %s failed: %v", event.ID, err)
            }
        }

        if !resp.HasMore {
            break
        }
        cursor = resp.NextCursor
    }
    return nil
}

Key decisions in this loop:

How far back to go. This depends on your domain. If you're reconciling payment state, you might need 7 days of history. If you're syncing inventory counts, 24 hours is probably enough. Pick a window that covers your longest meaningful state transition lifecycle.

What to do with failures. A single event processing failure during backfill should not abort the entire backfill. Log the failure, continue, and handle it manually later. A failed backfill that stops at 40% completion is worse than a completed backfill with 3 skipped events.

Ordering during backfill. Most event history APIs return events newest-first. If your state transitions are order-sensitive, you need to reverse the page order — collect all pages into a buffer, then process from oldest to newest. Alternatively, use an API parameter like sort=asc if the provider supports it.

Phase 2: The Cutover Window

The hardest part of bootstrapping is the gap between "backfill started" and "live stream processing started." During this window, new events are arriving on the live subscription, but you're still processing backfill history. You need to buffer the live stream without processing it yet.

Here's the sequence:

T=0   Create subscription. Start buffering incoming events.
T=1   Begin backfill from (T=0 - window) to T=0.
T=N   Backfill completes.
T=N+1 Drain the buffer. Begin processing live stream.

The buffer is the key mechanism. Your webhook endpoint should accept incoming events during the backfill, write them to a holding table, and return 200 immediately — without processing them. Once backfill is complete, you drain the buffer in order.

sql

CREATE TABLE webhook_buffer (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id      TEXT NOT NULL UNIQUE,  -- provider's event ID for deduplication
    event_type    TEXT NOT NULL,
    payload       JSONB NOT NULL,
    received_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    processed_at  TIMESTAMPTZ
);

CREATE INDEX webhook_buffer_unprocessed ON webhook_buffer (received_at)
    WHERE processed_at IS NULL;

The event_id column with a UNIQUE constraint is your deduplication key. If a backfill event and a live buffered event have the same ID, the insert will fail — and you can safely ignore the duplicate.

Phase 3: Deduplication at the Seam

The overlap between the tail end of backfill and the start of the live buffer is where duplicates appear. An event that fired at T=0-2s might show up in both your backfill replay and your live subscription. You need to process it exactly once.

The cleanest approach is an idempotency table — a record of every event ID you have already processed:

sql

CREATE TABLE processed_events (
    event_id    TEXT PRIMARY KEY,
    processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Before processing any event — whether from backfill or live stream — check and record atomically:

func processOnce(db *sql.DB, eventID string, fn func() error) error {
    tx, err := db.Begin()
    if err != nil {
        return err
    }
    defer tx.Rollback()

    // Try to claim this event ID
    _, err = tx.Exec(
        `INSERT INTO processed_events (event_id) VALUES ($1) ON CONFLICT DO NOTHING`,
        eventID,
    )
    if err != nil {
        return fmt.Errorf("claim event: %w", err)
    }

    // Check if we actually claimed it (affected rows = 0 means already processed)
    var count int
    tx.QueryRow(`SELECT COUNT(*) FROM processed_events WHERE event_id = $1`, eventID).Scan(&count)
    // A simpler approach: query rows affected from the INSERT
    // Using ON CONFLICT DO NOTHING, if no row was inserted, skip processing
    if err := fn(); err != nil {
        return err // tx.Rollback() fires via defer
    }

    return tx.Commit()
}

In practice, use INSERT ... ON CONFLICT DO NOTHING and check the rows affected. If zero rows were inserted, the event was already processed — skip it. The transaction boundary ensures that even if two workers race on the same event ID, only one processes it.

Ordering Guarantees During Drain

When you drain the buffer, process events in the order they were received (received_at ASC), not in the order they arrived from the provider. For most event types this is the same thing. For providers that send retries out of chronological order, this distinction matters.

Event source	Recommended ordering key
Backfill replay	`event.created_at ASC` (provider timestamp)
Live buffer drain	`webhook_buffer.received_at ASC`
Steady-state live	Process as received, rely on idempotency

If your domain requires strict causal ordering (event B must be processed after event A for the same entity), group events by entity ID during drain and process each entity's events sequentially. You can still process different entities concurrently.

Phase 4: Validating Steady State

How do you know the bootstrap is complete and your consumer is healthy? Define success criteria before you start, not after.

A useful checklist:

Check	How to verify
Backfill completed without fatal errors	Log a completion record with error count
Buffer fully drained	`SELECT COUNT(*) FROM webhook_buffer WHERE processed_at IS NULL` = 0
No processing gaps	Compare event count from provider API vs. your `processed_events` table
Live stream latency normal	P99 time from event creation to processing < SLA threshold
No unusual error rate	Error rate on live events matches baseline from similar consumers

The event count comparison is worth elaborating on. Most providers expose an event count endpoint or let you count events from the history API. After draining your buffer, query both:

bash

# Events the provider says fired in your bootstrap window
PROVIDER_COUNT=$(curl -s "https://api.provider.com/events?after=T0&before=T1&count=true" \
  -H "Authorization: Bearer $API_KEY" | jq '.total_count')

# Events you recorded as processed in the same window
YOUR_COUNT=$(psql -tAc "
  SELECT COUNT(*) FROM processed_events
  WHERE processed_at BETWEEN '$T0' AND '$T1'
")

echo "Provider: $PROVIDER_COUNT | Ours: $YOUR_COUNT | Gap: $((PROVIDER_COUNT - YOUR_COUNT))"

A nonzero gap is not necessarily a problem — providers sometimes include internal or system events in their count that are not delivered to subscribers. But a gap larger than a few percent warrants investigation before you declare the consumer healthy.

When the Provider Doesn't Support Replay

Not every webhook provider has a replay API. Stripe does. GitHub does. Many SaaS platforms don't. If you're integrating with a provider that only delivers events going forward, your bootstrap strategy changes:

›Seed from the REST API. Before registering the webhook, call the provider's REST API to fetch current state for all relevant entities. Write that state to your database. This is your backfill substitute.
›Register the webhook subscription after the REST fetch completes. Any events fired during the REST fetch will appear in your live stream. Your idempotency layer handles events that duplicate state already fetched.
›Reconcile periodically. For the first 24–48 hours after bootstrap, run a scheduled job that calls the REST API and compares state against what your webhook stream has delivered. Differences are events you missed during the bootstrap window.

This is more work than replay-based bootstrap, but it's the practical answer for providers that don't expose event history.

GetHook's replay API lets you re-deliver any event by ID or replay all events for a source within a time window — which makes bootstrapping new destinations against existing sources straightforward without building your own replay infrastructure.

Checklist for a Safe Cold Start

Step	Action
1	Create the subscription and start buffering. Record `subscription_start` timestamp.
2	Backfill from provider event history up to `subscription_start`.
3	Process backfill events oldest-first. Record each event ID in `processed_events`.
4	Drain the live buffer in `received_at ASC` order. Skip duplicates via `ON CONFLICT DO NOTHING`.
5	Switch to live stream processing.
6	Validate: compare event counts, check buffer is empty, verify error rate.
7	Tear down the buffer after a 24-hour monitoring window.

The cold start problem is one of those distributed systems details that seems minor until it causes an incident. Getting it right at the start saves you from a messy post-hoc reconciliation job — and gives you a reliable, auditable record of exactly which events your consumer has processed from day one.

Ready to build webhook consumers with replay, deduplication, and full event history built in? Start with GetHook — or explore the event replay docs to see how replay-based bootstrapping works against your existing sources.

The Webhook Cold Start Problem: Safely Bootstrapping a New Consumer Endpoint

Why Cold Starts Are Easy to Get Wrong

Phase 1: Backfill via Event Replay

Phase 2: The Cutover Window

Phase 3: Deduplication at the Seam

Ordering Guarantees During Drain

Phase 4: Validating Steady State

When the Provider Doesn't Support Replay

Checklist for a Safe Cold Start

Related articles

Webhook Consumer Observability: Metrics and Alerts on the Receiving End

Designing a Great Webhook SDK: Verification, Typing, and Developer Ergonomics

Synthetic End-to-End Testing for Webhook Delivery Pipelines

Stop losing webhook events.