Polling works until it doesn't. You start with a cron job that calls /api/orders?since=... every minute, it's fine for a while, and then three things happen: your provider starts rate-limiting you, your data freshness requirements tighten, and your server bill grows because you're making 1,440 API calls a day to retrieve an average of 12 new events.
The fix is obvious — switch to webhooks. The provider pushes events to you the moment they happen. No polling, no wasted requests, sub-second freshness.
The migration, though, is where teams get tripped up. You can't just "turn off the poll and turn on the webhook." There's a transition window where events can fall through the gap, a new set of reliability concerns to handle on your side, and a cutover sequence that has to be executed carefully. This guide walks through it end to end.
Why Polling Breaks Down
Before committing to the migration, it's worth being precise about the failure modes. Polling fails in three specific ways:
| Failure mode | Symptom | Root cause |
|---|---|---|
| Rate limiting | 429 Too Many Requests from provider | Too many calls per minute/hour |
| Event gaps | Events missed between poll cycles | Events created and resolved within one poll interval |
| Thundering herd | Spike in load after downtime | Catching up on missed polls simultaneously |
| Stale data | Users see outdated state | Long poll intervals (5m, 15m) to reduce API load |
The event gap problem is the most insidious. If you poll every 60 seconds and a payment is created and marked as failed within that window, your poll might never see the intermediate payment.created state — only the payment.failed. Or depending on your query, it might miss the event entirely if your timestamp filter is off by milliseconds.
Webhooks eliminate all four. Events are pushed immediately. No polling budget, no gaps, no staleness.
Step 1: Map What You're Polling
Before writing any code, produce a complete inventory of what your polling job does. This is the most commonly skipped step, and it causes incomplete webhook configurations later.
For each polling loop, document:
- ›Which endpoint is being polled
- ›Which fields from the response are being used
- ›What action is taken on each record (write to DB, trigger workflow, send email)
- ›How deduplication is handled — do you check if you've already processed a record?
- ›What the polling interval is and what freshness SLA that represents
Example inventory table for a payment integration:
| Poll target | Interval | Action | Dedup mechanism |
|---|---|---|---|
GET /payments?status=pending | 60s | Charge card, update order status | Check payment_id in processed_payments table |
GET /refunds?created_after=... | 5m | Issue credit to customer balance | Check refund_id in ledger_entries |
GET /disputes?status=open | 15m | Alert support team via Slack | Check dispute_id in dispute_alerts |
Every row in this table maps to one or more webhook event types you'll need to subscribe to.
Step 2: Identify the Equivalent Webhook Events
Most providers document their webhook event catalog alongside their REST API. Map each poll target to its webhook equivalent:
| Current poll | Equivalent webhook event(s) |
|---|---|
GET /payments?status=pending | payment.created, payment.updated |
GET /refunds?created_after=... | refund.created |
GET /disputes?status=open | dispute.created, dispute.updated |
Watch for mismatches. A webhook may fire for states you don't care about (e.g., payment.updated fires for every status change, not just the ones your poll was filtering for). You'll need to add filtering logic on your side that your polling query previously handled implicitly with a WHERE clause.
Also check whether the webhook payload contains everything your handler needs. Some providers send "thin" events — just { "type": "payment.updated", "id": "pay_123" } — requiring you to make a follow-up API call to fetch the full record. That's a fetch-on-webhook pattern, and it's worth knowing upfront because it changes your handler design.
Step 3: Build the Webhook Handler (Before Cutting Over)
The critical rule: build and validate your webhook handler while the polling job is still running. Don't cut over until you've confirmed the handler works in production.
A production-ready webhook handler needs four things:
1. Signature verification
Validate the HMAC signature on every request before doing anything else. This prevents spoofed events.
func verifySignature(body []byte, sigHeader, secret string) error {
// Stripe-style: "t=<unix>,v1=<hex>"
parts := strings.SplitN(sigHeader, ",", 2)
if len(parts) != 2 {
return errors.New("malformed signature header")
}
timestamp := strings.TrimPrefix(parts[0], "t=")
signature := strings.TrimPrefix(parts[1], "v1=")
mac := hmac.New(sha256.New, []byte(secret))
mac.Write([]byte(timestamp + "." + string(body)))
expected := hex.EncodeToString(mac.Sum(nil))
if !hmac.Equal([]byte(signature), []byte(expected)) {
return errors.New("signature mismatch")
}
// Reject events older than 5 minutes (replay attack prevention)
ts, err := strconv.ParseInt(timestamp, 10, 64)
if err != nil || time.Now().Unix()-ts > 300 {
return errors.New("event timestamp out of tolerance")
}
return nil
}2. Idempotent processing
Your polling job probably had implicit idempotency — you checked whether you'd seen a record before acting. Make that explicit in your webhook handler:
func handlePaymentUpdated(ctx context.Context, db *sql.DB, event PaymentEvent) error {
// Idempotency check: have we already processed this event?
var exists bool
err := db.QueryRowContext(ctx,
`SELECT EXISTS(SELECT 1 FROM processed_events WHERE event_id = $1)`,
event.ID,
).Scan(&exists)
if err != nil {
return err
}
if exists {
return nil // already processed, return 200 to ack
}
// Process the event
if err := applyPaymentUpdate(ctx, db, event); err != nil {
return err
}
// Mark as processed
_, err = db.ExecContext(ctx,
`INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())`,
event.ID,
)
return err
}3. Fast acknowledgement
Return 200 OK within 5 seconds (most providers have this timeout). Do the heavy lifting asynchronously:
func webhookHandler(w http.ResponseWriter, r *http.Request) {
body, _ := io.ReadAll(r.Body)
if err := verifySignature(body, r.Header.Get("X-Webhook-Signature"), secret); err != nil {
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}
// Enqueue for async processing, ack immediately
if err := queue.Enqueue(body); err != nil {
http.Error(w, "internal error", http.StatusInternalServerError)
return
}
w.WriteHeader(http.StatusOK)
}4. Structured logging
Log every event with enough context to reconstruct the sequence later:
{
"event": "webhook.received",
"event_type": "payment.updated",
"event_id": "evt_1NxPQ2LkdIw...",
"source": "stripe",
"payload_bytes": 1243,
"signature_valid": true,
"timestamp": "2026-03-23T10:14:22.003Z"
}Step 4: Run Both Systems in Parallel
This is the overlap period — the most important phase of the migration.
Register your webhook endpoint with the provider. Leave your polling job running. For the next 48–72 hours, both systems are active. Your webhook handler should process events but write a flag indicating the source:
ALTER TABLE processed_events ADD COLUMN source TEXT DEFAULT 'poll';
-- 'poll' or 'webhook'Run queries to compare coverage:
-- Events processed by webhook but not poll (webhook is ahead)
SELECT event_id FROM processed_events WHERE source = 'webhook'
EXCEPT
SELECT event_id FROM processed_events WHERE source = 'poll';
-- Events processed by poll but not webhook (webhook missed something)
SELECT event_id FROM processed_events WHERE source = 'poll'
AND created_at > '<webhook_registration_time>'
EXCEPT
SELECT event_id FROM processed_events WHERE source = 'webhook';The second query is the important one. Any event that the poll caught but the webhook missed is a gap you need to investigate before cutting over.
Common causes of gaps during parallel mode:
- ›The webhook event type doesn't cover all the states your poll was filtering for
- ›The webhook subscription was created after some events had already fired
- ›Events are being delivered to a different environment (staging vs. production)
Resolve gaps before proceeding. Don't rush this step.
Step 5: Handle the Backfill Window
When you register a webhook, the provider typically only sends events going forward. It does not replay historical events. This means any events that occurred between your last successful poll and your webhook registration time will never arrive via webhook.
Explicitly backfill that window:
- ›Note the timestamp of your last successful poll (
T_last_poll) - ›Note the timestamp of your webhook registration (
T_webhook_start) - ›Run a one-time script that polls the API for events between
T_last_pollandT_webhook_startand processes them through your handler
# Example: fetch all events in the backfill window
curl "https://api.provider.com/events?created_after=T_last_poll&created_before=T_webhook_start" \
-H "Authorization: Bearer $API_KEY" \
| jq '.data[]' \
| xargs -I{} ./process-historical-event '{}'Mark these events with source = 'backfill' so you can distinguish them in audits.
Step 6: Cut Over
Once parallel mode has run cleanly for 48–72 hours with no gaps, you're ready to cut over.
The sequence:
- ›Disable the polling job — comment out the cron entry or set a feature flag
- ›Keep the polling code deployed for one more release cycle (easy rollback)
- ›Monitor your webhook handler error rate and processing volume for 24 hours
- ›Set alerts on webhook delivery failures via your gateway (GetHook, or your provider's dashboard)
- ›After 7 days of clean operation, delete the polling code
Do not delete the polling code at cutover. You want rollback to take seconds, not a redeploy.
Step 7: Manage Reliability Post-Cutover
Webhooks shift reliability responsibility to your side. The provider will retry if you return a non-2xx response, but you need to handle:
| Concern | Polling equivalent | Webhook equivalent |
|---|---|---|
| Provider outage | Poll fails; retry next cycle | No events received; gaps when provider recovers |
| Your outage | Poll resumes from last timestamp | Events retried by provider; check retry window |
| Event ordering | Query ordered by timestamp | Not guaranteed; use event timestamps, not arrival order |
| Duplicate events | Idempotency check on ID | Same — idempotency check is still required |
For provider outages, the most important mitigation is knowing your provider's event retention policy. Stripe, for example, retries webhook delivery for 72 hours. If your outage exceeds that window, you'll need to backfill from the API — the same pattern you used during the migration.
GetHook helps on both ends: it absorbs inbound webhooks from providers and retries delivery to your services independently of the provider's retry logic, giving you a wider recovery window and full event history for backfill queries.
Checklist Before Cutover
Use this before disabling any polling job:
- › Webhook handler deployed to production with signature verification
- › Idempotency check in place for every event type
- › Events queued asynchronously — handler returns 200 in under 2 seconds
- › Parallel mode ran for at least 48 hours with no unexplained gaps
- › Backfill window between last poll and webhook registration was processed
- › Alerts configured on webhook delivery failure rate
- › Rollback plan documented (re-enable polling cron, timeline for rollback decision)
The migration from polling to webhooks is one of the higher-leverage infrastructure improvements you can make. Fewer wasted API calls, better data freshness, and a more honest model of how event-driven systems should work. Done carefully, the cutover is low-risk. Done hastily, you drop events in production.
Take the parallel period seriously, validate your idempotency logic, and the rest follows.