Most teams treat webhooks and analytics as separate concerns: webhooks handle operational events, analytics runs on scheduled ETL jobs that query the database. But if your product emits webhooks for state changes — order.completed, subscription.cancelled, payment.refunded — you already have a real-time event stream. The question is whether you're capturing it.
This post covers how to wire your inbound webhook stream into an analytics pipeline: what to buffer, how to handle schema evolution, how to deal with late-arriving and duplicate events, and where pipelines typically break under load.
Why Webhooks Instead of ETL
Traditional ETL queries the operational database every 15–60 minutes, extracts changed rows, and loads them into the warehouse. It works but has structural problems:
| Approach | Latency | Complexity | Completeness |
|---|---|---|---|
| Scheduled ETL | 5–60 min lag | High (change tracking) | Misses deletes, soft state |
| CDC (change data capture) | Near real-time | Very high (replication slot mgmt) | Complete, but fragile |
| Webhooks | Seconds | Low (HTTP receiver) | Event-level, not row-level |
| Polling REST API | Variable | Medium | Limited by pagination rate |
Webhooks give you event-level semantics — each payload represents a discrete business fact — at low operational overhead. The trade-off is handling the unreliability of HTTP delivery: retries, duplicates, out-of-order arrival, and gaps when your receiver is down.
The Architecture
A minimal webhook-to-analytics pipeline has four components:
[Provider] → [Receiver / Buffer] → [Transformer] → [Warehouse]Receiver: an HTTP endpoint that validates the signature, returns 200, and writes the raw payload to a buffer. The buffer is the critical reliability layer — it decouples fast ingest from slow warehouse writes.
Buffer: a durable queue. Kafka, SQS, and a Postgres table all work. For most pipelines under 50k events/day, a Postgres-backed buffer is operationally simpler than Kafka.
Transformer: a consumer that reads from the buffer, normalizes the schema, and produces rows in the format your warehouse expects.
Warehouse: ClickHouse, BigQuery, Redshift, or DuckDB depending on your scale and query patterns.
The receiver must do exactly two things: verify the signature and enqueue the raw payload. Any parsing or enrichment in the receiver path creates latency that causes the provider to time out and retry.
Writing the Receiver
Here is a minimal receiver in Go that validates a Stripe-compatible HMAC signature and enqueues to a Postgres buffer:
func (h *WebhookHandler) Ingest(w http.ResponseWriter, r *http.Request) {
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20)) // 1 MB limit
if err != nil {
http.Error(w, "read error", http.StatusBadRequest)
return
}
sig := r.Header.Get("Webhook-Signature")
if !verifyHMAC(body, sig, h.secret) {
http.Error(w, "invalid signature", http.StatusUnauthorized)
return
}
// Enqueue raw payload — no parsing here
_, err = h.db.ExecContext(r.Context(), `
INSERT INTO webhook_buffer (id, source, raw_payload, received_at)
VALUES (gen_random_uuid(), $1, $2, NOW())
`, r.Header.Get("X-Source-ID"), body)
if err != nil {
// Return 500 so the provider retries
http.Error(w, "enqueue error", http.StatusInternalServerError)
return
}
w.WriteHeader(http.StatusOK)
}The io.LimitReader guards against oversized payloads. Returning 500 on an enqueue failure is correct: it tells the provider to retry, because you haven't durably recorded the event yet. Never return 200 and silently discard.
The Buffer Table
CREATE TABLE webhook_buffer (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source TEXT NOT NULL,
raw_payload JSONB NOT NULL,
received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
event_id TEXT GENERATED ALWAYS AS (raw_payload->>'id') STORED,
event_type TEXT GENERATED ALWAYS AS (raw_payload->>'type') STORED,
processed_at TIMESTAMPTZ,
error TEXT
);
CREATE UNIQUE INDEX webhook_buffer_event_id ON webhook_buffer (source, event_id);
CREATE INDEX webhook_buffer_unprocessed ON webhook_buffer (received_at)
WHERE processed_at IS NULL;The UNIQUE index on (source, event_id) handles duplicate delivery automatically — a second insert for the same event ID fails with a conflict, which you ignore. The generated columns on event_id and event_type let the transformer filter events by type without parsing JSONB in the query.
Surviving Schema Evolution
The biggest operational risk in a webhook analytics pipeline is schema drift. Providers update payload structure without notice. A field gets renamed; a type changes from string to integer.
The pattern that survives this: schema-on-read for raw storage, schema-on-write for the warehouse layer.
Store every raw payload verbatim in a JSONB column. Never parse into typed columns at ingest time. The transformer applies the current schema expectations. When the provider changes their payload, you update the transformer and reprocess historical events from the raw buffer with the new logic.
Handle field renames in the transformer with pointer types and a resolver:
type OrderCompletedData struct {
AmountCents *int64 `json:"amount_cents"`
Amount *int64 `json:"amount"` // legacy field name
Currency string `json:"currency"`
CustomerID string `json:"customer_id"`
}
func (d *OrderCompletedData) ResolvedAmountCents() int64 {
if d.AmountCents != nil {
return *d.AmountCents
}
if d.Amount != nil {
return *d.Amount
}
return 0
}The resolver reads whichever field is present. When you're confident the legacy field is gone from all historical events, remove the fallback.
Late Arrivals and Out-of-Order Events
Webhook delivery is not ordered. Due to retries, order.refunded may arrive before order.completed for the same order. For most analytics use cases — counting events, computing aggregates — this is acceptable. But for metrics like time-to-completion or conversion rates, ordering matters.
| Strategy | Latency | Ordering Accuracy | Complexity |
|---|---|---|---|
| Process immediately, use received_at | Seconds | Low | Minimal |
| Process immediately, use event_created_at | Seconds | Medium | Low |
| Delay transformer 5–10 min, use event_created_at | 5–10 min | High | Low |
| Streaming with watermarks (Flink, Spark) | Seconds | Very high | Very high |
Store event_created_at from the payload envelope alongside received_at from the buffer. For most pipelines, delaying the transformer by 5–10 minutes and querying against event_created_at hits the right accuracy/latency trade-off. Events delivered within 10 minutes of creation — the vast majority — are grouped into the correct event-time window.
Deduplication Before the Warehouse
Providers retry on delivery failure. If your receiver is briefly unavailable, you may process the same event twice. The deduplication layer belongs in the transformer, not the warehouse — warehouses like BigQuery and Redshift make deduplication expensive because they don't support efficient upserts at scale.
Maintain a processed-events table:
CREATE TABLE analytics_processed_events (
source TEXT NOT NULL,
event_id TEXT NOT NULL,
loaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (source, event_id)
);In the transformer: read a batch from the buffer, filter out IDs already in analytics_processed_events, transform the remainder, record the processed IDs, write to the warehouse, then mark buffer rows done. Record the event ID before writing to the warehouse — a duplicate warehouse row is easier to remove in SQL than a missing row is to recover.
Backfilling Historical Events
When you first deploy this pipeline you need to backfill history. If your webhook events flow through GetHook, the replay API lets you re-deliver any event by ID or replay all events for a source within a time window — no manual export needed. The receiver handles replayed events identically to live ones; deduplication catches any overlap.
If the provider doesn't offer replay, call their REST API to export historical records and load them as synthetic events with the same envelope format. Consistent envelope structure is what makes the transformer reusable across backfill and live delivery.
What to Monitor
Four signals tell you whether the pipeline is healthy:
Receiver error rate: percentage of deliveries returning non-2xx. Above 0.1% over a 5-minute window is worth investigating — usually a slow database or a signature verification mismatch.
Buffer depth: count of unprocessed rows in webhook_buffer. In steady state this should be near zero. A growing buffer means the transformer is falling behind.
Transformer lag: difference between NOW() and the received_at of the oldest unprocessed event. Your analytics freshness SLA defines the acceptable upper bound.
Warehouse write error rate: percentage of transformer batches that fail to load. These stay in the buffer and will be retried, but repeated failures signal a schema mismatch or connectivity issue.
Alert on sustained buffer depth and on transformer lag exceeding your SLA. Let error rates drive investigation, not pages — the lagging indicators are what actually breaks your analytics queries.
A webhook analytics pipeline built on these patterns gives you event-level granularity, near-real-time freshness, and a reprocessable raw store that survives schema changes and provider migrations. The operational cost is low: a buffer table, a transformer process, and a deduplication index.
If you want reliable, durable event delivery with replay built in, start with GetHook.