Most webhook delivery systems treat all events equally. An event lands in a queue, a worker picks it up, delivery is attempted. This works fine when your event volume is modest and all events have similar urgency. It breaks down the moment you're mixing time-critical events — payment failures, security alerts, SLA breach notifications — with high-volume background events like analytics summaries, audit log exports, or nightly digest payloads.
When both types share the same delivery queue, a burst of 10,000 analytics events can delay a payment failure alert by minutes. That's the wrong trade-off.
This post covers how to model webhook event priorities, implement priority-aware queuing on top of Postgres, and avoid the common pitfalls that make priority queues fragile in practice.
The Problem with a Single Queue
A simple Postgres-backed webhook queue looks something like this:
SELECT id, destination_id, payload, next_attempt_at
FROM webhook_events
WHERE status = 'queued'
AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED;This is a reasonable starting point. FOR UPDATE SKIP LOCKED gives you safe concurrent workers without Redis. Ordering by next_attempt_at ensures scheduled retries fire on time.
The problem: with this query, a wave of low-priority events all scheduled with next_attempt_at = NOW() will block high-priority events that arrive seconds later. The queue has no concept of urgency — only arrival order and scheduled time.
Modeling Priority
The simplest model is a small integer priority field: 1 (critical), 2 (high), 3 (normal), 4 (bulk). Resist the urge to use more levels than you actually need. If you cannot clearly articulate which events belong in which tier, you do not need that tier.
A practical mapping for a B2B SaaS product:
| Priority | Level | Example event types |
|---|---|---|
| 1 | Critical | payment.failed, fraud.detected, subscription.cancelled, incident.triggered |
| 2 | High | payment.succeeded, user.signup, order.placed |
| 3 | Normal | record.updated, sync.completed, comment.added |
| 4 | Bulk | export.ready, digest.daily, report.generated |
Add the priority column to your events table:
ALTER TABLE webhook_events
ADD COLUMN priority SMALLINT NOT NULL DEFAULT 3
CHECK (priority BETWEEN 1 AND 4);
CREATE INDEX idx_webhook_events_priority_queue
ON webhook_events (priority ASC, next_attempt_at ASC)
WHERE status = 'queued';The compound index on (priority, next_attempt_at) is key. It ensures the query planner can efficiently find the highest-priority ready events without a full table scan.
Priority-Aware Queue Query
With the index in place, update the worker's fetch query:
SELECT id, destination_id, payload, priority, next_attempt_at
FROM webhook_events
WHERE status = 'queued'
AND next_attempt_at <= NOW()
ORDER BY priority ASC, next_attempt_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED;Primary sort by priority ASC means a priority-1 event always beats a priority-4 event. Secondary sort by next_attempt_at ASC ensures fairness within the same priority tier — events at the same priority level are delivered in order.
This single query change is often sufficient for moderate-volume systems. At 1,000–5,000 events per second, Postgres handles this comfortably without dedicated per-priority queues.
Dedicated Workers per Priority Band
At higher throughput, a single shared worker pool with a priority-ordered query can still starve low-priority events during sustained high-priority bursts. An alternative: run separate worker pools per priority band.
type WorkerPool struct {
priority int
batchSize int
workers int
pollInterval time.Duration
}
func startWorkerPools(db *sql.DB, forwarder *Forwarder) {
pools := []WorkerPool{
{priority: 1, batchSize: 25, workers: 4, pollInterval: 100 * time.Millisecond},
{priority: 2, batchSize: 20, workers: 4, pollInterval: 250 * time.Millisecond},
{priority: 3, batchSize: 15, workers: 8, pollInterval: 500 * time.Millisecond},
{priority: 4, batchSize: 10, workers: 2, pollInterval: 2 * time.Second},
}
for _, pool := range pools {
for i := 0; i < pool.workers; i++ {
go runWorker(db, forwarder, pool)
}
}
}
func runWorker(db *sql.DB, forwarder *Forwarder, pool WorkerPool) {
for {
events, err := fetchByPriority(db, pool.priority, pool.batchSize)
if err != nil || len(events) == 0 {
time.Sleep(pool.pollInterval)
continue
}
for _, event := range events {
forwarder.Deliver(event)
}
}
}The fetch query for a dedicated pool targets only one priority level:
SELECT id, destination_id, payload, next_attempt_at
FROM webhook_events
WHERE status = 'queued'
AND priority = $1
AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT $2
FOR UPDATE SKIP LOCKED;Dedicated pools give you independent scaling. If bulk exports routinely spike, add workers to the priority-4 pool without touching the critical-alert pool. You can also tune poll intervals independently — critical events poll every 100 ms for low latency; bulk events poll every 2 seconds to reduce database pressure.
Setting Priority at Ingestion Time
Priority assignment should happen at ingest, not at the worker. Waiting until delivery time means your queue has accumulated unordered events, and every fetch must sort across all priorities.
There are two common patterns for assigning priority:
1. Route-based priority: Define priority at the route level. All events flowing through a given source-to-destination route inherit a configured priority.
{
"route_id": "rte_abc123",
"source_id": "src_payments",
"destination_id": "dst_pagerduty",
"event_type_pattern": "payment.failed",
"priority": 1
}2. Payload-based priority: Inspect the event payload at ingest and derive priority from a field value. This is more flexible but requires a small classification step in your ingest handler.
func classifyPriority(eventType string) int {
switch {
case strings.HasPrefix(eventType, "payment.failed"),
strings.HasPrefix(eventType, "fraud."),
strings.HasPrefix(eventType, "incident."):
return 1
case strings.HasPrefix(eventType, "payment."),
strings.HasPrefix(eventType, "subscription."):
return 2
case strings.HasSuffix(eventType, ".daily"),
strings.HasSuffix(eventType, ".export"),
strings.HasSuffix(eventType, ".digest"):
return 4
default:
return 3
}
}Route-based priority is simpler to operate — you configure it once and it's visible in your dashboard. Payload-based classification is better when the same route carries both urgent and non-urgent events and you cannot split them at the source.
Avoiding Priority Starvation
Pure priority ordering risks starvation: if priority-1 events arrive at a steady rate, priority-4 events never get delivered. Two mitigations:
Age-based promotion. If a lower-priority event has been waiting longer than a threshold, promote it:
UPDATE webhook_events
SET priority = priority - 1
WHERE status = 'queued'
AND priority > 1
AND created_at < NOW() - INTERVAL '30 minutes';Run this as a periodic maintenance query (every few minutes). An event waiting 30 minutes without delivery gets one priority tier bump. After 60 minutes, another. This guarantees eventual delivery for every event while keeping urgent events fast.
Dedicated bulk worker. Keep at least one or two workers exclusively for bulk (priority-4) events. Even during a priority-1 burst, those workers continue draining the bulk queue at a floor rate.
Observability
Priority queues create new failure modes: silent priority-4 starvation is easy to miss. Add queue depth metrics broken down by priority:
SELECT
priority,
status,
COUNT(*) AS count,
MIN(created_at) AS oldest_event
FROM webhook_events
WHERE status IN ('queued', 'retry_scheduled')
GROUP BY priority, status
ORDER BY priority, status;Alert on:
- ›Priority-1 queue depth > 10 (immediate page)
- ›Priority-4 oldest event age > 2 hours (warning — starvation risk)
- ›Priority-3 queue depth > 5,000 (investigate throughput)
In GetHook's delivery metrics, each event carries its priority through the full delivery lifecycle — from ingest to delivery attempt — so you can trace latency percentiles broken down by priority tier in your observability dashboard.
Retry Priority Inheritance
When a delivery attempt fails and the event is rescheduled, it should keep its original priority — not reset to the default. This sounds obvious, but it is easy to break in practice. A common mistake is using an ORM or insert helper that sets priority = DEFAULT on the retry-scheduled update.
Make the retry update explicit:
UPDATE webhook_events
SET
status = 'retry_scheduled',
next_attempt_at = NOW() + $1,
attempts_count = attempts_count + 1
-- priority is NOT changed
WHERE id = $2;A failed payment alert on its fourth retry attempt is still a critical event. Resetting it to normal priority on retry would defeat the entire system.
When You Don't Need Priority Queues
Not every webhook system needs this. If your event volume is under a few hundred per second and your event types have roughly similar urgency, a single ordered queue with FOR UPDATE SKIP LOCKED is simpler and easier to operate. Add priority queuing when you have at least two of:
- ›A meaningful mix of time-critical and background event types
- ›Volume bursts on low-priority events (exports, digests, batch syncs)
- ›Customer SLAs tied to specific event types (e.g., "payment failure notifications within 30 seconds")
The compound index, the priority column, and the age-based promotion query add maybe two hours of implementation work. The return — guaranteed low latency for your most critical event types, regardless of bulk event volume — is worth it at any meaningful production scale.
If you want to see priority-aware delivery in action without building the queue yourself, GetHook supports per-route priority configuration out of the box. Configure a route, set its priority tier, and your critical events get their own fast lane.