Back to Blog
fanoutarchitecturereliabilitydistributed systems

Webhook Fanout: Routing One Event to 50 Destinations Reliably

Fanout sounds simple — receive one event, deliver to many destinations. In practice it surfaces ordering problems, partial failure scenarios, and queue design challenges that aren't obvious until you're debugging at 2am.

D
Dmitri Volkov
Distributed Systems Engineer
March 25, 2026
9 min read

When your webhook infrastructure needs to route a single inbound event to multiple destinations, you enter fanout territory. The canonical use case: a payment provider sends payment.succeeded to your ingest endpoint, and your system must deliver it to your order service, your analytics pipeline, your fraud detection system, your accounting integration, and every customer who has subscribed to payment events through your platform.

Fanout at 2 destinations is trivial. At 50 it requires deliberate design. This post covers the failure modes, the architectural trade-offs, and the implementation patterns that keep fanout reliable under load.


Why Fanout Is Not Just a Loop

The naive implementation looks like this:

go
destinations := getDestinationsForEvent(event)
for _, dest := range destinations {
    deliver(event, dest)
}

This has several problems:

Sequential delivery means destination 50 waits for destinations 1–49 to complete. If each delivery takes 200ms, you're looking at 10 seconds of sequential work for a 50-destination fanout. This blocks your worker and inflates end-to-end latency.

One failure blocks the rest. If destination 23 is down, you have to decide: continue to destinations 24–50 and mark 23 as failed, or abort and retry everything? Neither is clean without explicit state tracking per destination.

No independent retry. A synchronous loop can't retry destination 23 independently of the others. If you retry the loop, destinations 1–22 and 24–50 receive duplicate deliveries.

The fix is to decompose fanout into one delivery job per destination, created atomically when the event arrives.


The Right Data Model

Fanout reliability starts with the right schema. You need to track delivery state independently for each destination:

sql
-- One row per (event, destination) pair
CREATE TABLE delivery_attempts (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id      UUID NOT NULL REFERENCES events(id),
    destination_id UUID NOT NULL REFERENCES destinations(id),
    attempt_number INT NOT NULL DEFAULT 1,
    status        TEXT NOT NULL DEFAULT 'queued',
    -- queued | delivering | delivered | failed | dead_letter
    http_status   INT,
    outcome       TEXT,
    -- success | timeout | network_error | http_4xx | http_5xx
    scheduled_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    attempted_at  TIMESTAMPTZ,
    UNIQUE (event_id, destination_id, attempt_number)
);

CREATE INDEX ON delivery_attempts (status, scheduled_at)
    WHERE status IN ('queued', 'failed');

The UNIQUE constraint on (event_id, destination_id, attempt_number) prevents duplicate delivery records from being inserted, even under concurrent worker processes.

The partial index on (status, scheduled_at) is what the worker uses to claim work — only rows in queued or failed status appear in the index.


Atomically Expanding Events into Fanout Jobs

When an inbound event arrives, you need to atomically:

  1. Persist the event
  2. Resolve all matching destinations
  3. Insert one delivery job per destination

This must happen in a single transaction. If you persist the event but crash before inserting delivery jobs, those destinations never receive anything.

go
func (s *EventStore) IngestAndFanout(ctx context.Context, event *Event) error {
    tx, err := s.db.BeginTx(ctx, nil)
    if err != nil {
        return err
    }
    defer tx.Rollback()

    // 1. Insert the event
    if err := insertEvent(ctx, tx, event); err != nil {
        return err
    }

    // 2. Resolve matching routes and destinations
    destinations, err := resolveDestinations(ctx, tx, event)
    if err != nil {
        return err
    }

    if len(destinations) == 0 {
        return tx.Commit() // No fanout needed
    }

    // 3. Insert one delivery job per destination
    for _, dest := range destinations {
        if err := insertDeliveryJob(ctx, tx, event.ID, dest.ID); err != nil {
            return err
        }
    }

    return tx.Commit()
}

If this transaction commits, every destination has a delivery job. If it fails, nothing is persisted — the provider's retry will re-send the event and you'll try again.


Worker Design for Parallel Fanout Delivery

With delivery jobs in Postgres, workers claim and execute them in parallel using FOR UPDATE SKIP LOCKED:

sql
UPDATE delivery_attempts
SET status = 'delivering', attempted_at = now()
WHERE id IN (
    SELECT id FROM delivery_attempts
    WHERE status = 'queued'
      AND scheduled_at <= now()
    ORDER BY scheduled_at
    LIMIT 10
    FOR UPDATE SKIP LOCKED
)
RETURNING *;

FOR UPDATE SKIP LOCKED lets multiple workers run simultaneously without contention — each worker grabs a batch of jobs that no other worker is currently processing. This is the key to parallel fanout without an external queue system.

For a 50-destination event with 10 workers each claiming 10 jobs, all 50 deliveries complete in roughly one round — limited by the slowest destination's response time, not by sequential chaining.


Handling Partial Fanout Failures

With independent delivery jobs, partial failure is the default case — some destinations succeed, others fail, others time out. This is actually the correct behavior. What matters is how you handle each case.

Destination OutcomeAction
2xx responseMark delivered, done
4xx response (except 429)Mark dead_letter, do not retry (client error — retrying won't fix it)
429 Too Many RequestsRetry with backoff, respecting Retry-After header if present
5xx responseRetry with exponential backoff
Connection timeoutRetry — destination may be temporarily unreachable
Network errorRetry — transient infrastructure issue
TLS handshake failureRetry once; if persistent, dead-letter with clear error

The critical rule for 4xx: do not retry unconditional 4xx errors. A 404 Not Found means the endpoint no longer exists. Retrying it 5 times is noise. A 401 Unauthorized means the signing secret doesn't match — the customer needs to fix their configuration. Retrying doesn't help and inflates your retry queue.


Fanout at Scale: When 50 Destinations Becomes 5,000

Everything described above works well up to a few hundred destinations per event. At larger fanout counts (multi-tenant platforms where a single event triggers deliveries to thousands of customer endpoints), you need to think about write amplification.

Inserting 5,000 delivery job rows per event, at 100 events/second, is 500,000 row inserts per second. That's a write load most Postgres instances handle without issue — but it's worth measuring before assuming it's fine.

The practical limits to watch:

ScaleConcern
1–100 destinationsNo special consideration needed
100–1,000 destinationsBatch inserts (INSERT with multi-row VALUES) to reduce round trips
1,000–10,000 destinationsConsider a fanout expansion service that runs asynchronously
10,000+ destinationsPartition delivery jobs by account or region; purpose-built fanout queue

For most SaaS webhook platforms, the 100–1,000 destination range is the realistic ceiling. Batch insert your delivery jobs in chunks of 100 to keep transaction size manageable.

go
// Batch insert delivery jobs in chunks
const chunkSize = 100
for i := 0; i < len(destinations); i += chunkSize {
    end := i + chunkSize
    if end > len(destinations) {
        end = len(destinations)
    }
    chunk := destinations[i:end]
    if err := insertDeliveryJobsBatch(ctx, tx, event.ID, chunk); err != nil {
        return err
    }
}

Ordering Guarantees (or Lack Thereof)

Fanout inherently breaks strict ordering. If you have destinations A, B, and C for a given source, and event E1 is followed by event E2:

  • E1 may be delivered to A before E2
  • E2 may be delivered to B before E1 (if E1 is retrying)
  • E1 and E2 may arrive at C simultaneously if two workers claim them at the same time

If ordering matters for a specific destination, you need per-destination sequencing — a mechanism that ensures no E2 delivery is attempted for a destination until E1 is confirmed delivered.

This is expensive: it serializes delivery for that destination, eliminating the parallelism that makes fanout fast. Use it only where ordering is a hard requirement (e.g., financial ledger updates where event order directly affects state).

The practical approach: design your destination handlers to be idempotent and order-tolerant. Include sequence numbers in your event payload so consumers can detect and handle out-of-order delivery themselves.


Observing Fanout Health

Standard delivery metrics don't tell the full story for fanout. You need per-event fanout visibility:

  • Fanout completion rate: what percentage of events have all destinations delivered (not just one)?
  • Fanout partial failure rate: events where at least one destination is in dead-letter
  • Per-destination success rate: which specific destinations are consistently failing?
  • Fanout lag: time between event receipt and the last destination receiving delivery

A useful query for identifying events with incomplete fanout:

sql
SELECT
    e.id AS event_id,
    e.created_at,
    COUNT(*) FILTER (WHERE da.status = 'delivered') AS delivered_count,
    COUNT(*) FILTER (WHERE da.status = 'dead_letter') AS dead_letter_count,
    COUNT(*) FILTER (WHERE da.status IN ('queued', 'delivering')) AS pending_count,
    COUNT(*) AS total_destinations
FROM events e
JOIN delivery_attempts da ON da.event_id = e.id
WHERE e.created_at > now() - interval '1 hour'
GROUP BY e.id, e.created_at
HAVING COUNT(*) FILTER (WHERE da.status IN ('queued', 'delivering', 'dead_letter')) > 0
ORDER BY e.created_at DESC;

This surfaces events that haven't fully fanned out within the last hour — the starting point for any fanout incident investigation.


How GetHook Handles Fanout

GetHook's route model is designed for fanout from the start. A single source can have multiple routes, each mapping to a different destination with its own retry policy, signing secret, and timeout configuration. When an event arrives, GetHook expands it into per-destination delivery jobs atomically, processes them in parallel with independent retry state, and surfaces per-destination outcomes in the event timeline.

For platforms building customer-facing webhook infrastructure with many subscribers per event, the delivery isolation per destination means one slow or failed customer endpoint never delays delivery to the others.

If you're building fanout into your platform, start with GetHook's route configuration to avoid rebuilding the delivery infrastructure from scratch.


Summary

Reliable webhook fanout requires:

  1. Decompose fanout into one delivery job per destination at ingest time, atomically
  2. Use FOR UPDATE SKIP LOCKED to let workers process jobs in parallel without contention
  3. Track delivery state independently per destination — partial failure is normal and manageable
  4. Don't retry unconditional 4xx errors; they indicate a configuration problem, not a transient failure
  5. For high-destination-count fanout, batch insert delivery jobs to manage write amplification
  6. Add per-event fanout visibility metrics — aggregate success rate hides partial failures

The patterns here scale from 2 destinations to thousands without architectural changes. The hardest part isn't the mechanics — it's recognizing early that a loop won't get you there.

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.