A payment authorization expires in 7 minutes. An OTP is valid for 60 seconds. A price alert fires because a stock crossed a threshold that it reversed 30 seconds later.
Your webhook delivery infrastructure doesn't know any of this. It retries all three events on the same exponential backoff schedule — 30s, 2m, 10m, 1h — until they're either delivered or exhausted. When the OTP event finally reaches your consumer 3 minutes after the code expired, the delivery system marks it delivered. The event isn't delivered. It's delivered late, which is a different outcome, and in some cases a worse one than not delivering it at all.
This post covers how to model event TTLs, enforce them in the delivery layer, and configure retry policies that match the actual semantics of each event type.
The Two Event Categories
The core issue is that most webhook systems treat every event as equally durable. They're not.
| Category | Example event types | Late delivery consequence |
|---|---|---|
| Durable | user.created, subscription.updated, invoice.paid | Delayed but harmless — state is still valid |
| Time-sensitive | otp.requested, payment.auth.captured, price.alert.triggered, inventory.depleted | Incorrect state — the relevant window has closed |
A user.created event delivered 20 minutes late is just a delay. Your consumer creates the user 20 minutes late. That's a problem, but not a correctness problem.
A payment.auth.captured event delivered 10 minutes late is different. The authorization window has closed. Any action your consumer takes — fulfilling an order, reserving inventory, sending a confirmation — is based on stale state. The authorization may have already expired and been refunded. Acting on it causes incorrect behavior, not just a delay.
The retry logic that helps durable events actively harms time-sensitive ones. A well-intentioned 5-attempt backoff schedule delivers an event that should have been discarded.
Modeling TTLs on the Event Envelope
The fix starts with making TTL semantics explicit in the event itself. Two approaches work in practice.
Absolute expiry timestamp — the event carries an expires_at field:
{
"id": "evt_01HX9P3KQY",
"type": "otp.requested",
"created_at": "2026-04-28T14:00:00Z",
"expires_at": "2026-04-28T14:01:00Z",
"data": {
"user_id": "usr_abc123",
"otp_code": "847291",
"channel": "sms"
}
}This is the most precise approach. Each event carries its own expiry, which can vary by context — a payment authorization over SEPA Direct Debit might expire in 24 hours, while one over a card swipe might expire in 7 minutes. The delivery layer checks expires_at before each attempt and skips delivery if the window has passed.
Per-event-type TTL policy — rather than per-event, the TTL is a configuration property of the event type that your delivery infrastructure applies uniformly:
{
"event_type": "otp.requested",
"delivery_ttl_seconds": 60,
"max_attempts": 2
}This is operationally simpler. You configure the rule once, and it applies to every event of that type. It works well when all events of a given type share the same time sensitivity — which is common for OTP and alerting events.
If your upstream provider includes expires_at in the payload (Stripe does this for payment intents), honor it. Otherwise, configure per-type TTLs based on your understanding of the business semantics.
Enforcing TTLs in the Delivery Worker
TTL metadata is useless without enforcement. Your delivery worker needs to check TTL state before attempting delivery:
func (w *Worker) shouldDeliver(event Event) (bool, string) {
// Check absolute expiry field if present
if event.ExpiresAt != nil && time.Now().After(*event.ExpiresAt) {
return false, "event expired before delivery"
}
// Check per-event-type TTL policy
if policy, ok := w.typePolicies[event.Type]; ok && policy.DeliveryTTLSeconds > 0 {
ttl := time.Duration(policy.DeliveryTTLSeconds) * time.Second
if time.Since(event.CreatedAt) > ttl {
return false, fmt.Sprintf("event exceeded type TTL of %v", ttl)
}
}
return true, ""
}
func (w *Worker) processJob(ctx context.Context, job DeliveryJob) error {
event, err := w.events.Get(ctx, job.EventID)
if err != nil {
return err
}
if ok, reason := w.shouldDeliver(event); !ok {
return w.events.MarkExpired(ctx, event.ID, reason)
}
return w.deliver(ctx, job, event)
}The MarkExpired transition sets status = 'expired' with a reason string. This is a distinct terminal state from dead_letter. A dead-lettered event exhausted delivery attempts. An expired event was intentionally skipped. The distinction matters for debugging and for operator dashboards — you want to know whether events aren't reaching consumers because delivery failed or because they were discarded as stale.
Retry Policy Interaction
The standard retry schedule — 30s → 2m → 10m → 1h — is designed for durable events where eventual delivery is the goal. For time-sensitive events, it's the wrong shape.
An event with a 60-second TTL gets two realistic attempts on this schedule: the initial delivery at T+0 and one retry at T+30. The second retry at T+90 is already expired before it fires. Your retry policy is effectively truncated by the TTL, whether you've designed it that way or not.
Better to be explicit:
| Event TTL | Recommended retry strategy |
|---|---|
| < 2 minutes | 1–2 attempts, 15s interval, no further retries |
| 2–15 minutes | 2–3 attempts, 30s–60s interval |
| > 30 minutes | Standard exponential backoff up to TTL boundary |
| No TTL | Standard exponential backoff with full jitter |
The principle: maximize delivery probability within the valid window, then stop. For short-TTL events, that means aggressive early retries and hard termination, not long backoff schedules that extend well past expiry.
func retryPolicyForEvent(event Event, typePolicies map[string]EventTypePolicy) RetryPolicy {
// If event has an absolute expiry, tune retry policy to the remaining window
if event.ExpiresAt != nil {
remaining := time.Until(*event.ExpiresAt)
switch {
case remaining < 2*time.Minute:
return RetryPolicy{MaxAttempts: 2, BaseDelay: 15 * time.Second, Jitter: false}
case remaining < 15*time.Minute:
return RetryPolicy{MaxAttempts: 3, BaseDelay: 30 * time.Second, Jitter: true}
}
}
// Fall back to per-type configuration or the global default
if p, ok := typePolicies[event.Type]; ok {
return p.RetryPolicy
}
return defaultRetryPolicy // 5 attempts, exponential backoff, full jitter
}Consumer-Side Validation
TTL enforcement at the delivery layer stops most stale deliveries. But there's a gap: an event delivered just before expiry can sit in your consumer's internal processing queue past the TTL. Your consumer should validate whether the event is still actionable before taking irreversible action.
func (h *OTPHandler) Handle(event WebhookEvent) error {
var payload struct {
UserID string `json:"user_id"`
OTPCode string `json:"otp_code"`
Channel string `json:"channel"`
}
if err := json.Unmarshal(event.Data, &payload); err != nil {
return err
}
// Validate the OTP is still valid in our own system
if !h.otpStore.IsValid(payload.UserID, payload.OTPCode) {
log.Printf("otp.requested delivered but OTP already expired: event_id=%s", event.ID)
// Return nil (HTTP 200) — this is not a delivery failure
return nil
}
return h.notifier.Send(payload.Channel, payload.UserID, payload.OTPCode)
}The important detail: return 200 OK when you've decided not to act on a stale event. A 4xx would trigger retries of an event you already know is expired. A 200 tells the delivery layer that the event was received and handled — your consumer made the call about what to do with it. The delivery system's job is delivery, not business logic.
Document TTL Semantics for Your Consumers
If you're building a webhook-producing platform, your TTL semantics belong in your event catalog alongside payload schemas. Consumers who don't know an event is time-sensitive will treat it as durable and build handlers that act on stale state.
| Event type | TTL | Consumer guidance |
|---|---|---|
otp.requested | 60 seconds | Validate OTP validity in your own store before delivering |
payment.auth.captured | 7 minutes | Check authorization status via API before fulfilling |
price.alert.triggered | 5 minutes | Re-fetch current price before surfacing to end user |
inventory.depleted | 2 minutes | Check current stock level before triggering resupply |
user.created | None | Idempotent; safe to process at any point |
invoice.paid | None | Idempotent; safe to process at any point |
This table should live in your webhook documentation, not just in a JIRA comment. Consumers who don't know price.alert.triggered is time-sensitive will write handlers that fire correctly in development (where there's no delivery delay) and incorrectly in production (where there is).
GetHook lets you configure per-event-type TTL policies and retry strategies per source, and surfaces expired events separately in the delivery log so you can audit how many time-sensitive events are being discarded versus delivered successfully.
Late delivery of time-sensitive events is a predictable failure mode of any webhook system that treats all events identically. Adding expires_at to your event envelope, enforcing TTLs before each delivery attempt, and tuning retry policies to match event semantics is a day's worth of work. The alternative is debugging payment double-charges, OTPs that arrive after the session expired, and price alerts that send customers on fruitless chases — all of which look like application bugs until you trace them to the delivery layer.
Configure event TTLs and per-event-type retry policies on GetHook →