If you're building a SaaS product, at some point your customers will ask: "Do you have webhooks?" They want to be notified when something happens in your system — a payment completes, a user is created, a document is processed.
Building the consumer side of webhooks (receiving and processing inbound events) is relatively straightforward. Building the provider side — being the service that sends reliable webhooks to potentially hundreds of customer endpoints — is an entirely different engineering challenge.
This post covers the architecture for building outbound webhook infrastructure that's reliable enough for production use.
What You're Actually Building
When your customers ask for webhooks, they're asking for:
- ›Event subscriptions — the ability to say "notify me when
payment.succeededhappens" - ›Reliable delivery — guarantee the event reaches their endpoint at least once
- ›Retry logic — automatically retry on failures
- ›Event history — view past events and redeliver them
- ›Security — signed payloads they can verify came from you
- ›Developer experience — documentation, testing tools, logs
The engineering work behind this list is substantial. Teams consistently underestimate it. Before you build, read our cost analysis post.
The Data Model
The core entities for an outbound webhook system:
-- Customer-configured webhook endpoints
CREATE TABLE webhook_endpoints (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID NOT NULL REFERENCES accounts(id),
url TEXT NOT NULL,
description TEXT,
signing_secret TEXT NOT NULL, -- Encrypted AES-256-GCM
status TEXT NOT NULL DEFAULT 'active', -- active, disabled
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Which event types each endpoint subscribes to
CREATE TABLE webhook_subscriptions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
endpoint_id UUID NOT NULL REFERENCES webhook_endpoints(id),
event_type TEXT NOT NULL, -- e.g. 'payment.succeeded' or '*'
UNIQUE(endpoint_id, event_type)
);
-- The events waiting to be (or already) delivered
CREATE TABLE webhook_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
next_attempt_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Delivery attempt log
CREATE TABLE webhook_deliveries (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_id UUID NOT NULL REFERENCES webhook_events(id),
endpoint_id UUID NOT NULL REFERENCES webhook_endpoints(id),
attempt_number INT NOT NULL,
http_status INT,
outcome TEXT NOT NULL, -- success, timeout, http_4xx, http_5xx
duration_ms INT,
attempted_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Publishing an Event
When something happens in your system that should trigger webhooks, call a PublishEvent function:
func (s *WebhookService) PublishEvent(
ctx context.Context,
accountID uuid.UUID,
eventType string,
payload interface{},
) error {
// Serialize payload
data, err := json.Marshal(map[string]interface{}{
"id": "evt_" + newID(),
"type": eventType,
"api_version": "2026-01",
"created_at": time.Now().UTC().Format(time.RFC3339),
"data": map[string]interface{}{
"object": payload,
},
})
if err != nil {
return err
}
// Find subscribed endpoints for this account + event type
endpoints, err := s.findSubscribedEndpoints(ctx, accountID, eventType)
if err != nil {
return err
}
// Create a delivery job per endpoint
for _, endpoint := range endpoints {
_, err = s.db.ExecContext(ctx, `
INSERT INTO webhook_events
(account_id, event_type, payload, endpoint_id, status, next_attempt_at)
VALUES ($1, $2, $3, $4, 'pending', NOW())
`, accountID, eventType, data, endpoint.ID)
if err != nil {
return err
}
}
return nil
}This is the "fan-out on publish" pattern — one event creates one delivery job per subscribed endpoint. Each endpoint has independent retry state.
The Delivery Worker
The delivery worker polls for pending events and delivers them:
func (w *Worker) Run(ctx context.Context) {
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
w.processBatch(ctx)
}
}
}
func (w *Worker) processBatch(ctx context.Context) {
// Claim a batch using FOR UPDATE SKIP LOCKED
rows, err := w.db.QueryContext(ctx, `
SELECT id, endpoint_id, payload
FROM webhook_events
WHERE status = 'pending'
AND next_attempt_at <= NOW()
ORDER BY next_attempt_at ASC
LIMIT 50
FOR UPDATE SKIP LOCKED
`)
if err != nil {
return
}
defer rows.Close()
for rows.Next() {
var event PendingEvent
rows.Scan(&event.ID, &event.EndpointID, &event.Payload)
w.deliver(ctx, event)
}
}FOR UPDATE SKIP LOCKED is the key — multiple worker instances can run concurrently without claiming the same event. This is your horizontal scaling primitive.
HMAC Signing on Delivery
Every outbound webhook must be signed. Generate the signature immediately before delivery:
func (w *Worker) signPayload(payload []byte, secret string, timestamp int64) string {
signedContent := fmt.Sprintf("%d.%s", timestamp, string(payload))
mac := hmac.New(sha256.New, []byte(secret))
mac.Write([]byte(signedContent))
sig := hex.EncodeToString(mac.Sum(nil))
return fmt.Sprintf("t=%d,v1=%s", timestamp, sig)
}
func (w *Worker) deliver(ctx context.Context, event PendingEvent) {
endpoint, _ := w.getEndpoint(ctx, event.EndpointID)
timestamp := time.Now().Unix()
signature := w.signPayload(event.Payload, endpoint.SigningSecret, timestamp)
req, _ := http.NewRequestWithContext(ctx, "POST", endpoint.URL,
bytes.NewReader(event.Payload))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Webhook-Signature", signature)
req.Header.Set("X-Webhook-Timestamp", strconv.FormatInt(timestamp, 10))
req.Header.Set("X-Webhook-Id", event.ID.String())
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Do(req)
w.recordAttempt(ctx, event, resp, err)
}Retry Policy Design
Your retry policy directly impacts how customers experience outages. The standard approach:
| Attempt | Delay | Cumulative Time |
|---|---|---|
| 1 | Immediate | 0s |
| 2 | 30s | 30s |
| 3 | 5 minutes | 5.5m |
| 4 | 30 minutes | 35.5m |
| 5 | 2 hours | 2h 35m |
| DLQ | — | — |
var retryDelays = []time.Duration{
0,
30 * time.Second,
5 * time.Minute,
30 * time.Minute,
2 * time.Hour,
}
func nextAttemptAt(attemptNumber int) *time.Time {
if attemptNumber >= len(retryDelays) {
return nil // Move to DLQ
}
t := time.Now().Add(retryDelays[attemptNumber])
return &t
}Permanent vs. transient failures:
5xx responses and timeouts → retry (transient)
4xx responses → don't retry (permanent — endpoint is rejecting your events)
Exception: 429 Too Many Requests → retry after Retry-After header delay
Customer-Facing Dashboard Features
The delivery infrastructure is the backend. Customers also need a dashboard to:
View their endpoints:
- ›Endpoint URL, status (active/disabled/failing)
- ›Health indicator — success rate over last 24h
- ›Subscribed event types
View event history:
- ›List of recent events with status (delivered, pending, failed)
- ›Searchable by event type, date range, status
- ›Per-event detail: all delivery attempts, response codes, response bodies
Replay events:
- ›Redeliver any historical event to its original endpoint
- ›Useful after deploying a bug fix — replay the events your bad code mishandled
Test webhooks:
- ›Send a synthetic test event to verify the endpoint is configured correctly
- ›Shows exactly what payload the customer should expect
Manage signing secrets:
- ›View current secret (masked), rotate secret, download public key if using asymmetric signing
Event Type Design
Name your event types carefully — they're forever. Use the resource.action convention:
payment.succeeded
payment.failed
payment.refunded
subscription.created
subscription.updated
subscription.cancelled
user.created
user.deleted
invoice.created
invoice.paid
invoice.overdueAvoid:
- ›
paymentSucceeded(camelCase — inconsistent with most platforms) - ›
payment_succeeded(underscores — harder to scan) - ›
new_payment(verb-noun order — ambiguous) - ›
payment(no action — what happened?)
Support wildcard subscriptions (*) for customers who want all events, and category wildcards (payment.*) for customers who want all payment events.
Rate Limiting Outbound Delivery
If a customer endpoint suddenly starts responding slowly (rate limiting, degraded performance), you risk overwhelming it with retries.
Implement a circuit breaker per endpoint:
type EndpointCircuitBreaker struct {
ConsecutiveFailures int
LastFailureAt time.Time
State string // closed, open, half-open
}
// If 5 consecutive failures in 5 minutes, open the circuit
// Try one request every 60s (half-open)
// If it succeeds, close the circuitAn open circuit pauses delivery to that endpoint, preventing retry storms during outages.
Using GetHook for Outbound Webhooks
Building outbound webhook infrastructure from scratch is substantial — the worker, retry logic, signing, dashboard, circuit breakers, and delivery logs together represent 4–6 weeks of focused engineering work.
GetHook provides this infrastructure via the outbound events API. Your application calls POST /v1/outbound-events to publish an event; GetHook handles delivery, retry, signing, and the delivery dashboard for your customers.
curl -X POST https://api.gethook.to/v1/outbound-events \
-H "Authorization: Bearer hk_..." \
-H "Content-Type: application/json" \
-d '{
"event_type": "payment.succeeded",
"destination_id": "dst_customer_endpoint_id",
"payload": {
"payment_id": "pay_123",
"amount": 4999,
"currency": "usd"
}
}'Your application code focuses on publishing events. GetHook handles the rest.