Most engineering teams treat webhook infrastructure as an undifferentiated line item — compute, database, and egress costs lumped into "platform overhead." That works until you need to answer one of these questions:
- ›What does it actually cost to deliver a webhook to one of your customers?
- ›Which customer segments generate the most delivery retries, and what is that costing you?
- ›If you add ten new enterprise customers with aggressive webhook usage, what incremental infrastructure spend should you plan for?
Without cost attribution at the event level, you're flying blind on webhook unit economics. This post walks through how to model the cost per delivered event, what drives that cost up, and the levers you have to pull it back down.
Why Webhook Costs Are Non-Linear
A naive cost model says: more events, proportionally more cost. Reality is more complicated, because webhook infrastructure has several non-linear cost drivers.
Retry amplification. A failed delivery that retries five times generates five times the delivery cost of a succeeded-on-first-attempt delivery. If your p50 delivery is one attempt but your p99 is four attempts, your average cost per event may be double what a simple throughput calculation suggests. A destination with a consistently unhealthy endpoint can generate 80% of your retry costs while accounting for 5% of your events.
Dead letter accumulation. Events that exhaust their retry policy end up in dead letter queues. If you store those events indefinitely, storage costs grow monotonically even when delivery volume is flat. Many teams never purge DLQ entries after a customer-side issue is resolved.
Egress amplification. Each retry is a network egress call. At cloud pricing — typically $0.08–$0.12 per GB for inter-region or internet egress — a 50 KB payload retried five times across 1,000 events costs $0.02–$0.03 in egress alone. Small individually, but at scale these numbers compound.
Fan-out multipliers. One inbound event routed to ten destinations generates ten delivery attempts, ten sets of retry state, and ten audit log entries. Cost-per-event attribution must account for route multiplicity, not just raw event count.
Building a Cost Model
Start with these four components. Measure each separately before combining them.
| Component | What drives it | How to measure |
|---|---|---|
| Compute (ingest) | Events received per second | CPU core-hours ÷ ingest throughput |
| Compute (delivery worker) | Delivery attempts per second (not events) | CPU core-hours ÷ attempt throughput |
| Database (storage) | Events stored × retention period | Storage GB × hourly rate |
| Database (queue ops) | FOR UPDATE SKIP LOCKED poll frequency | Query count × avg query duration |
| Network egress | Delivery attempt payload size × attempt count | GB egressed × egress rate |
| DLQ storage | Failed events × retention period | Storage GB × hourly rate |
The key distinction: cost the delivery worker on attempts, not events. An event with three delivery attempts to two destinations generates six attempts total. If your worker costs $0.002 per attempt-second of compute, an event that generates six attempts costs $0.012 in compute before you factor in storage or egress.
Here is a simplified SQL query you can run against your events and delivery_attempts tables to get attempt-to-event ratios per customer:
SELECT
e.account_id,
COUNT(DISTINCT e.id) AS total_events,
COUNT(da.id) AS total_attempts,
ROUND(COUNT(da.id)::numeric / NULLIF(COUNT(DISTINCT e.id), 0), 2)
AS attempts_per_event,
COUNT(da.id) FILTER (WHERE da.outcome = 'success') AS successful_attempts,
COUNT(da.id) FILTER (WHERE da.outcome IN ('timeout', 'network_error', 'http_5xx'))
AS failed_attempts
FROM events e
LEFT JOIN delivery_attempts da ON da.event_id = e.id
WHERE e.created_at >= NOW() - INTERVAL '30 days'
GROUP BY e.account_id
ORDER BY total_attempts DESC;Run this monthly. Accounts with attempts_per_event above 2.0 are your cost outliers — either their destinations are unhealthy, their event volume is unusually bursty, or your retry policy is misconfigured for their usage pattern.
The Retry Tax
Retry behavior is the largest variable cost in webhook delivery. Your retry schedule determines how many attempts a failing event accumulates before reaching dead letter status.
A typical exponential backoff schedule — 0s, 30s, 2m, 10m, 1h — means an event that never delivers generates five attempts spread over roughly 72 minutes. For a destination that is down for six hours, every event ingested during that window exhausts the full retry schedule. If you receive 10,000 events during a six-hour outage and your cost per attempt is $0.001, that's $50 in retry cost from one destination outage — before accounting for storage and egress.
Three levers reduce retry tax:
Circuit breaking. Stop retrying to a destination that has failed consistently for the past N attempts or the past M minutes. A destination in an open-circuit state accumulates zero retry cost. The tradeoff is that events may reach DLQ faster than they would with continued retrying — acceptable if you offer event replay.
Per-destination retry limits. Rather than a global retry policy, let customers configure retry aggressiveness. A customer who processes high-volume, low-value events (analytics pings) may prefer fast DLQ to prolonged retry expense. A customer processing payment confirmations wants aggressive retry.
Delivery attempt caps per time window. Instead of allowing five retries regardless of destination state, cap the total attempts a single destination can generate in a rolling hour. This prevents one unhealthy destination from consuming disproportionate worker capacity.
Egress: The Cost That Sneaks Up on You
Egress billing varies significantly by provider and architecture:
| Traffic path | Typical cost |
|---|---|
| Same region, same VPC | Free or near-free |
| Same region, different VPC | $0.01–$0.02/GB |
| Cross-region (same cloud) | $0.02–$0.08/GB |
| Internet egress | $0.08–$0.12/GB |
| Internet egress (committed use) | $0.04–$0.06/GB |
If your webhook gateway delivers to customer endpoints on the public internet, every delivery attempt carries egress cost. For a 10 KB payload with a 3× retry multiplier, the egress per delivered event is roughly 30 KB. At $0.09/GB that's $0.0000027 — negligible per event but $2.70 per million events. At 100 million events per month, egress alone is $270 — enough to be worth optimizing.
The most effective egress reduction is payload compression. Most webhook payloads are JSON, which compresses well. Adding Content-Encoding: gzip to outbound delivery attempts reduces typical payload size by 60–80%, cutting egress costs proportionally. Verify that the destination can handle compressed bodies before enabling — most modern frameworks do, but some legacy systems do not.
Storage Cost Attribution
Webhook event storage costs are driven by three variables: event count, payload size, and retention period.
The retention period is the lever most teams underutilize. If you store all events indefinitely, storage costs grow forever. Define a retention policy and enforce it:
-- Purge delivered events older than 90 days
DELETE FROM events
WHERE status = 'delivered'
AND created_at < NOW() - INTERVAL '90 days';
-- Purge dead-letter events older than 180 days
DELETE FROM events
WHERE status = 'dead_letter'
AND created_at < NOW() - INTERVAL '180 days';
-- Purge delivery attempts for purged events (cascade may handle this)
DELETE FROM delivery_attempts
WHERE event_id NOT IN (SELECT id FROM events);Run these as scheduled jobs, not bulk deletes. Deleting millions of rows in a single transaction locks the table. Delete in batches of 1,000–10,000 rows with a short sleep between batches to avoid I/O saturation.
DLQ events deserve separate retention logic. A dead-letter event has value for debugging — it represents a delivery failure that the customer may want to investigate and replay. Purging them too aggressively destroys that value. Purging them too conservatively inflates storage costs. A 180-day DLQ retention with optional customer-triggered purge is a reasonable default.
Surfacing Cost Attribution to Customers
If you operate a multi-tenant webhook platform — whether as a SaaS product or internal platform team serving multiple engineering teams — cost attribution enables better conversations.
Instead of absorbing all webhook infrastructure costs as platform overhead, surface per-account usage in terms that correlate to actual cost drivers:
- ›Total events received (30 days)
- ›Total delivery attempts (30 days)
- ›Delivery success rate
- ›Average attempts per event
- ›Total payload bytes delivered
You do not need to expose raw dollar figures. Surfacing these metrics lets customers understand the delivery health of their own destinations — and creates natural incentives to fix unhealthy endpoints that are generating retry cost for everyone.
GetHook's events dashboard surfaces delivery attempt counts alongside success rates, so customers can see at a glance whether a destination's retry count is unusually high. That visibility alone typically prompts customers to investigate and fix unhealthy endpoints before they generate significant retry accumulation.
A Practical Cost Per Event Calculation
Putting it together: here is a simplified model for a mid-scale deployment.
Assumptions:
- ›5 million events per month
- ›1.4 average attempts per event = 7 million total attempts
- ›Average payload: 8 KB
- ›Compute: 2 vCPUs at $0.048/vCPU-hour = $69/month
- ›Database: 100 GB at $0.115/GB-month = $11.50/month
- ›Egress: 7M × 8 KB = 56 GB × $0.09 = $5.04/month
- ›Storage: 5M events × 2 KB metadata = 10 GB × $0.115 = $1.15/month
Total: ~$86.69/month for 5 million events = $0.0000174 per event
The 1.4× retry multiplier is the swing factor. If your retry multiplier climbs to 2.0 (from destination instability), attempts double to 10 million, compute rises to ~$98/month, egress to ~$7.20/month, and your cost-per-event increases by roughly 40%.
Keeping retry multipliers low — through circuit breaking, destination health monitoring, and customer-side endpoint reliability — is where the leverage is.
Webhook infrastructure is predictable to cost if you measure the right things. The teams that get surprised by infrastructure bills are the ones treating event count as their only metric. Track attempts-per-event, watch egress, enforce retention policies, and circuit-break unhealthy destinations. Those four habits keep webhook unit economics stable as you scale.
If you want event-level delivery telemetry and per-destination attempt counts without building the instrumentation yourself, start with GetHook — the data you need to run this model is available out of the box.