Writing a webhook consumer looks deceptively simple: expose an HTTP endpoint, parse the JSON, do something with it. The naive version works fine in development and starts causing problems in production — usually right after a high-volume event lands during a traffic spike, or when a provider retries an event your handler already processed, or when your database is slow and the provider's 30-second timeout fires.
This post walks through the patterns that separate a webhook handler that works from one that's production-ready: signature verification you can actually trust, async processing that never blocks the HTTP response, idempotency that survives retries, and structured logging that makes debugging tractable.
The examples are Go using only the standard library. None of this requires a framework.
Start with Signature Verification
Every inbound webhook handler must verify the request signature before processing the payload. Signature verification is not optional polish — it's the mechanism that prevents an attacker from POSTing fabricated events to your endpoint.
Most providers use HMAC-SHA256 with a Stripe-compatible format: a header containing a Unix timestamp and a hex-encoded signature, like t=1714561200,v1=abc123.... The timestamp is included to prevent replay attacks.
package webhook
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
"errors"
"fmt"
"io"
"net/http"
"strconv"
"strings"
"time"
)
const (
signatureHeader = "X-Webhook-Signature"
maxBodyBytes = 1 << 20 // 1 MB
replayWindowSecs = 300 // reject timestamps older than 5 minutes
)
var (
ErrMissingSignature = errors.New("missing signature header")
ErrInvalidSignature = errors.New("invalid signature")
ErrTimestampTooOld = errors.New("timestamp outside replay window")
)
// VerifySignature reads the body, verifies the HMAC, and returns the raw body
// bytes so callers don't need to re-read the (already consumed) request body.
func VerifySignature(r *http.Request, secret string) ([]byte, error) {
sigHeader := r.Header.Get(signatureHeader)
if sigHeader == "" {
return nil, ErrMissingSignature
}
var ts, v1 string
for _, part := range strings.Split(sigHeader, ",") {
if strings.HasPrefix(part, "t=") {
ts = strings.TrimPrefix(part, "t=")
}
if strings.HasPrefix(part, "v1=") {
v1 = strings.TrimPrefix(part, "v1=")
}
}
if ts == "" || v1 == "" {
return nil, ErrInvalidSignature
}
unix, err := strconv.ParseInt(ts, 10, 64)
if err != nil {
return nil, ErrInvalidSignature
}
// Reject events outside the replay window.
age := time.Now().Unix() - unix
if age < 0 || age > replayWindowSecs {
return nil, ErrTimestampTooOld
}
body, err := io.ReadAll(io.LimitReader(r.Body, maxBodyBytes))
if err != nil {
return nil, fmt.Errorf("reading body: %w", err)
}
// Reconstruct the signed payload: "<timestamp>.<body>"
signed := ts + "." + string(body)
mac := hmac.New(sha256.New, []byte(secret))
mac.Write([]byte(signed))
expected := hex.EncodeToString(mac.Sum(nil))
// Use hmac.Equal to avoid timing attacks.
expectedBytes, _ := hex.DecodeString(expected)
gotBytes, err := hex.DecodeString(v1)
if err != nil || !hmac.Equal(expectedBytes, gotBytes) {
return nil, ErrInvalidSignature
}
return body, nil
}Three details that matter here:
- ›
io.LimitReadercaps the body at 1 MB. Without this, a malicious sender can POST a 500 MB body and exhaust your process's memory. - ›
hmac.Equaldoes a constant-time comparison. A naiveexpected == gotstring comparison is vulnerable to timing attacks. - ›The replay window check rejects events with timestamps older than 5 minutes. This prevents an attacker who captured a valid request from replaying it later.
Never Process Synchronously
The most common mistake in webhook consumer design is doing real work inside the HTTP handler. Real work means: database writes, calls to downstream services, email sending, inventory updates. Any of these can be slow or fail.
When your handler does real work synchronously and takes more than the provider's timeout (commonly 10–30 seconds), the provider receives no response, marks the delivery as failed, and retries. Now your handler processes the same event again — and again. You've manufactured a retry loop not because of a transient failure, but because your handler was too slow.
The correct pattern: acknowledge immediately, process asynchronously.
type OrderHandler struct {
queue chan<- orderEvent
secret string
log *slog.Logger
}
type orderEvent struct {
EventID string `json:"event_id"`
Type string `json:"type"`
Payload json.RawMessage `json:"payload"`
}
func (h *OrderHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
body, err := VerifySignature(r, h.secret)
if err != nil {
h.log.Warn("signature verification failed",
slog.String("error", err.Error()),
slog.String("remote_addr", r.RemoteAddr),
)
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}
var evt orderEvent
if err := json.Unmarshal(body, &evt); err != nil {
h.log.Error("malformed payload", slog.String("error", err.Error()))
http.Error(w, "bad request", http.StatusBadRequest)
return
}
// Non-blocking send: if the queue is full, return 503 so the provider retries.
select {
case h.queue <- evt:
h.log.Info("event enqueued",
slog.String("event_id", evt.EventID),
slog.String("type", evt.Type),
)
w.WriteHeader(http.StatusOK)
default:
h.log.Warn("queue full, returning 503", slog.String("event_id", evt.EventID))
http.Error(w, "service unavailable", http.StatusServiceUnavailable)
}
}The handler's job is exactly: verify, parse, enqueue, respond. The select with a default branch is intentional — if the in-memory queue is at capacity, returning a 503 is better than blocking the HTTP goroutine (which would exhaust your server's goroutine pool under sustained load). The provider will retry the event after backoff.
For production, replace the chan orderEvent with a durable queue backed by Postgres or a message broker. In-memory channels don't survive restarts. Any event in the channel when your process dies is lost.
Idempotency Is Not Optional
Providers retry on any non-2xx response — including timeouts, network errors, and 5xx from your handler. Your consumer will receive the same event more than once. Design for it from the start.
The standard pattern is to track processed event IDs in a database table and skip duplicates:
// idempotency_keys table:
// CREATE TABLE processed_events (
// event_id TEXT PRIMARY KEY,
// processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
// );
func processEvent(ctx context.Context, db *sql.DB, evt orderEvent) error {
// Attempt to claim the event ID. If the INSERT fails with a unique
// constraint violation, this event was already processed — skip it.
_, err := db.ExecContext(ctx,
`INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING`,
evt.EventID,
)
if err != nil {
return fmt.Errorf("claiming event: %w", err)
}
// Check rows affected to distinguish "inserted" from "already existed".
// With ON CONFLICT DO NOTHING, zero rows affected = duplicate.
// Use a transaction if the downstream operation must be atomic with the claim.
return handleOrderEvent(ctx, db, evt)
}The ON CONFLICT DO NOTHING approach works for simple cases. For operations that must be atomic — claim the event AND update the order record — wrap both in a transaction:
func processEventTx(ctx context.Context, db *sql.DB, evt orderEvent) error {
tx, err := db.BeginTx(ctx, nil)
if err != nil {
return err
}
defer tx.Rollback()
var exists bool
err = tx.QueryRowContext(ctx,
`INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO UPDATE SET event_id = EXCLUDED.event_id
RETURNING (xmax = 0) AS inserted`,
evt.EventID,
).Scan(&exists)
if err != nil {
return fmt.Errorf("idempotency check: %w", err)
}
if !exists {
// Already processed; commit the no-op and return.
return tx.Commit()
}
if err := updateOrderInTx(ctx, tx, evt); err != nil {
return err
}
return tx.Commit()
}The xmax = 0 trick returns true when the row was freshly inserted (not conflicted), letting you distinguish new events from duplicates in a single round trip.
Structured Logging for Debuggability
When a webhook fails to process correctly — wrong payload shape, a downstream service error, a duplicate you didn't expect — you need to be able to reconstruct what happened from logs. Structured logging with consistent fields makes this tractable.
Every log line from your webhook handler should include:
| Field | Why |
|---|---|
event_id | Correlate across multiple log lines for the same event |
event_type | Filter by what happened (order.created vs. order.cancelled) |
source | Which provider or source sent this event |
attempt_number | Distinguish first deliveries from retries |
latency_ms | How long processing took; surface slow handlers before they timeout |
outcome | success, duplicate, processing_error, invalid_signature |
Using Go's slog package (available since 1.21):
func (w *worker) process(ctx context.Context, evt orderEvent) {
start := time.Now()
logger := w.log.With(
slog.String("event_id", evt.EventID),
slog.String("event_type", evt.Type),
)
err := processEventTx(ctx, w.db, evt)
latency := time.Since(start).Milliseconds()
if err != nil {
logger.Error("processing failed",
slog.String("outcome", "processing_error"),
slog.Int64("latency_ms", latency),
slog.String("error", err.Error()),
)
return
}
logger.Info("event processed",
slog.String("outcome", "success"),
slog.Int64("latency_ms", latency),
)
}With this structure, finding all processing errors for a specific event type over the last hour is a single log query — no grep-and-parse required.
Graceful Shutdown
When your process receives a SIGTERM (during a deploy or a scale-down), any events currently in your in-memory queue need to be flushed before the process exits. Without graceful shutdown, those events are dropped and the provider will eventually retry them — but with a gap in your processing timeline.
func main() {
queue := make(chan orderEvent, 1000)
handler := &OrderHandler{queue: queue, secret: os.Getenv("WEBHOOK_SECRET")}
srv := &http.Server{
Addr: ":8080",
Handler: handler,
}
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
defer stop()
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
runWorker(ctx, queue, db)
}()
go func() {
if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatal(err)
}
}()
<-ctx.Done()
// Stop accepting new requests.
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
srv.Shutdown(shutdownCtx)
// Close the queue channel and wait for the worker to drain it.
close(queue)
wg.Wait()
}The worker's runWorker loop should range over the channel — when the channel is closed, the range exits after processing remaining items, giving you a clean drain.
Error Response Strategy
What status code to return from your webhook handler matters more than most engineers realize, because it directly controls provider retry behavior:
| Your response | Provider behavior |
|---|---|
2xx | Delivery considered successful; no retry |
4xx (except 429) | Delivery considered permanently failed; no retry (most providers) |
429 | Retry after backoff; respect Retry-After if present |
5xx | Retry with backoff |
| Timeout (no response) | Retry with backoff |
Return 400 for payloads that are structurally invalid — wrong schema, missing required fields. Retrying a malformed event won't fix it. Return 500 for transient failures — database connection errors, downstream service unavailable. These are worth retrying. Return 200 as soon as you've enqueued the event (not after processing), so the provider doesn't timeout waiting for your processing to complete.
The one subtle case: if your idempotency check detects a duplicate, return 200, not 409. From the provider's perspective, the event was delivered successfully on the first attempt. A 4xx on a duplicate often confuses retry logic and can trigger alerts on the provider side.
Putting It Together
A production-grade webhook consumer in Go has five components working together:
- ›Signature verification — before any parsing or processing, reject unauthenticated requests
- ›Body limit — cap at 1 MB (or your provider's documented max) to prevent memory exhaustion
- ›Async processing — acknowledge immediately, process out-of-band
- ›Idempotency — track processed event IDs; skip duplicates without erroring
- ›Graceful shutdown — drain the in-memory queue before the process exits
None of these are difficult to implement individually. The challenge is that they interact: async processing requires durable queuing to be reliable; idempotency requires a persistent store; graceful shutdown requires coordinating the HTTP server and the worker goroutine. Getting all five right together, on the first implementation, is the part that takes experience.
If you're building the sending side — exposing webhooks to your customers — GetHook handles delivery retries, dead-letter queuing, signing, and replay for you. The consumer patterns above apply to any webhook endpoint your team writes, regardless of which gateway sends the events.