When a customer reports "our webhook didn't fire," your log output determines whether you can answer that in two minutes or two hours. Most teams treat webhook pipeline logging as an afterthought — a log.Printf("delivering event %s", id) scattered through the worker code. That gets you nowhere fast.
Structured logging means every log line is a machine-readable JSON object with a consistent schema. It means every log entry for event evt_01JQMR9KXV4B2P7TNWY63ZHFD shares a event_id field you can filter on. It means your on-call engineer can write a single query and reconstruct exactly what happened to a specific event — without guessing at log formats or grepping through multi-gigabyte files.
This post covers what fields belong in each layer of your webhook pipeline, how to structure them in Go, and what queries to build on top.
The Five Layers You Need to Log
A webhook delivery pipeline has distinct phases, and each phase can fail independently. Log at every boundary:
| Layer | What to log | Key fields |
|---|---|---|
| Ingest | Event received from provider or sender | event_id, source_id, event_type, payload_bytes, ingest_latency_ms |
| Validation | Signature check pass/fail | event_id, source_id, algorithm, result, failure_reason |
| Queue | Event enqueued for delivery | event_id, destination_id, attempt_number, scheduled_at |
| Delivery | HTTP request to destination | event_id, destination_id, attempt_number, http_status, duration_ms, outcome |
| Retry | Next attempt scheduled | event_id, destination_id, attempt_number, next_attempt_at, backoff_seconds |
Missing any of these means you'll have a gap in your trace when something goes wrong. Ingest and delivery are the two most critical; don't ship without both.
Your Base Log Structure
Start with a base event type that every log entry embeds. In Go, this looks like:
package logging
import (
"context"
"encoding/json"
"log/slog"
"os"
"time"
)
type WebhookLogEntry struct {
Timestamp time.Time `json:"ts"`
Level string `json:"level"`
Layer string `json:"layer"` // ingest|validate|queue|deliver|retry
EventID string `json:"event_id,omitempty"`
SourceID string `json:"source_id,omitempty"`
DestID string `json:"destination_id,omitempty"`
AccountID string `json:"account_id,omitempty"`
EventType string `json:"event_type,omitempty"`
Attempt int `json:"attempt_number,omitempty"`
Outcome string `json:"outcome,omitempty"` // success|timeout|network_error|http_4xx|http_5xx
HTTPStatus int `json:"http_status,omitempty"`
DurationMS int64 `json:"duration_ms,omitempty"`
Msg string `json:"msg"`
Error string `json:"error,omitempty"`
}
var logger = slog.New(slog.NewJSONHandler(os.Stdout, nil))
func Emit(ctx context.Context, entry WebhookLogEntry) {
entry.Timestamp = time.Now().UTC()
b, _ := json.Marshal(entry)
logger.InfoContext(ctx, string(b))
}Every field has a name you can filter on. event_id appears in logs from ingest all the way through final delivery — it's the thread you pull to reconstruct a complete delivery trace.
The layer field is underappreciated. When you're diagnosing a failure, knowing which layer emitted the log entry tells you immediately whether the problem is upstream of the queue, inside the delivery worker, or in the retry scheduler. Don't make your engineers parse message strings to figure that out.
Logging the Ingest Layer
When an event arrives at your ingest endpoint, emit a log entry before any processing:
func (h *IngestHandler) Handle(w http.ResponseWriter, r *http.Request) {
start := time.Now()
body, err := io.ReadAll(io.LimitReader(r.Body, maxBodyBytes))
if err != nil {
logging.Emit(r.Context(), logging.WebhookLogEntry{
Layer: "ingest",
Level: "error",
Msg: "failed to read request body",
Error: err.Error(),
})
http.Error(w, "bad request", http.StatusBadRequest)
return
}
eventID := generateEventID()
logging.Emit(r.Context(), logging.WebhookLogEntry{
Layer: "ingest",
Level: "info",
EventID: eventID,
SourceID: sourceID,
AccountID: accountID,
EventType: r.Header.Get("X-Event-Type"),
Msg: "event received",
DurationMS: time.Since(start).Milliseconds(),
})
// ... persist and enqueue
}Log event_id at the moment you assign it — this is the key you'll use to track everything downstream. If you log it only after persistence and the persistence fails, you lose the entry.
One field worth adding that many teams miss: payload_bytes. Knowing the size of the incoming payload is useful for capacity planning and for diagnosing issues where oversized payloads cause downstream failures. Don't log the payload body itself — that's a security and compliance problem — but the byte count is safe and useful.
Logging Signature Validation
Signature verification failures are common and important to distinguish from delivery failures. Log them separately with enough detail to tell a spoofed request from a misconfigured HMAC secret:
type ValidationResult struct {
EventID string
SourceID string
Algorithm string // "hmac-sha256"
Result string // "pass" | "fail"
FailureReason string // "signature_mismatch" | "timestamp_expired" | "missing_header"
}
func logValidation(ctx context.Context, v ValidationResult) {
level := "info"
if v.Result == "fail" {
level = "warn"
}
logging.Emit(ctx, logging.WebhookLogEntry{
Layer: "validate",
Level: level,
EventID: v.EventID,
SourceID: v.SourceID,
Outcome: v.Result,
Msg: "signature validation " + v.Result,
Error: v.FailureReason,
})
}The distinction between signature_mismatch and timestamp_expired is operationally significant. A signature_mismatch from a new integration usually means the customer is computing the HMAC with the wrong secret. A timestamp_expired means their server clock is drifted, or they're replaying an old request. Different failures, different conversations with the customer.
Logging the Delivery Worker
The delivery layer is where most failures surface. You want a log entry for every HTTP attempt — not just failures:
func (w *Worker) deliver(ctx context.Context, event Event, dest Destination) DeliveryResult {
start := time.Now()
resp, err := w.httpClient.Post(dest.URL, event.Payload)
durationMS := time.Since(start).Milliseconds()
outcome, httpStatus := classifyOutcome(resp, err)
logging.Emit(ctx, logging.WebhookLogEntry{
Layer: "deliver",
Level: outcomeLogLevel(outcome),
EventID: event.ID,
DestID: dest.ID,
AccountID: event.AccountID,
Attempt: event.AttemptsCount + 1,
Outcome: outcome,
HTTPStatus: httpStatus,
DurationMS: durationMS,
Msg: "delivery attempt",
})
return DeliveryResult{Outcome: outcome, DurationMS: durationMS}
}
func outcomeLogLevel(outcome string) string {
switch outcome {
case "success":
return "info"
case "http_4xx":
return "warn"
case "http_5xx", "timeout", "network_error":
return "error"
default:
return "info"
}
}Log level should reflect the outcome. A success is info. An http_5xx is error. This matters because your log aggregation system — Loki, Datadog, CloudWatch Logs Insights — typically lets you filter by level. An on-call engineer can immediately find all delivery errors without constructing complex queries.
The attempt_number field lets you distinguish "this event failed on first attempt and succeeded on retry" from "this event failed all five attempts and went to dead letter." Both are failures at the attempt level, but they're very different situations at the event level.
Logging Retry Scheduling
When an event is scheduled for retry, log the backoff decision:
func logRetryScheduled(ctx context.Context, event Event, nextAttemptAt time.Time) {
backoffSeconds := int64(time.Until(nextAttemptAt).Seconds())
logging.Emit(ctx, logging.WebhookLogEntry{
Layer: "retry",
Level: "info",
EventID: event.ID,
DestID: event.DestinationID,
AccountID: event.AccountID,
Attempt: event.AttemptsCount,
Msg: "retry scheduled",
DurationMS: backoffSeconds * 1000, // reuse field for backoff
})
}When an event reaches max attempts and moves to dead letter, emit a separate log entry at error level with outcome: dead_letter. This is the signal your alerting should fire on — not individual delivery failures, which are expected.
Queries You Should Be Able to Run
Once your logs are in an aggregation system, these queries should work out of the box:
-- Full delivery trace for a single event
SELECT ts, layer, outcome, http_status, duration_ms, attempt_number, error
FROM webhook_logs
WHERE event_id = 'evt_01JQMR9KXV4B2P7TNWY63ZHFD'
ORDER BY ts ASC;
-- All delivery failures for a destination in the last hour
SELECT ts, event_id, attempt_number, http_status, outcome
FROM webhook_logs
WHERE layer = 'deliver'
AND destination_id = 'dst_abc123'
AND outcome != 'success'
AND ts > now() - INTERVAL '1 hour'
ORDER BY ts DESC;
-- Dead letter rate by account over the last 24 hours
SELECT account_id, COUNT(*) AS dead_letter_count
FROM webhook_logs
WHERE outcome = 'dead_letter'
AND ts > now() - INTERVAL '24 hours'
GROUP BY account_id
ORDER BY dead_letter_count DESC;These queries are simple because the log schema is consistent. The layer, outcome, destination_id, and account_id fields are at known paths in every log entry — no string parsing, no regex, no guessing.
GetHook exposes this delivery history through the events API and dashboard, so you can trace any event from ingest to delivery without writing these queries yourself. But if you're building your own pipeline, structuring your logs this way gives you the same capability in your own log stack.
What Not to Log
Structured logging is not "log everything." A few rules:
- ›Never log raw payloads. Webhook payloads frequently contain PII, payment data, or API keys. Log byte counts, not content.
- ›Never log signing secrets or HMAC keys. Even in validation failure messages. Log the failure reason, not the expected vs. actual signature values.
- ›Avoid logging at debug level in production by default. Set debug logs to require explicit opt-in (env var or per-account flag). Debug logs at volume will saturate your log aggregator and cost money.
- ›Don't log ephemeral connection metadata (remote IP, TLS version) unless you're actively debugging a security incident. It adds noise and has retention compliance implications.
The goal is a log entry that answers "what happened to this event?" — not a complete audit trail of every internal state transition.
Tying It Together with a Trace ID
If you want to go beyond per-event tracing to cross-service correlation, add a trace_id to your base log structure. Generate it at ingest and propagate it through the queue via the event row. When the worker picks it up and logs a delivery attempt, the same trace_id appears.
// At ingest
traceID := uuid.New().String()
event := Event{
ID: eventID,
TraceID: traceID,
// ...
}
// In the worker
logging.Emit(ctx, logging.WebhookLogEntry{
// ...
EventID: event.ID,
// Add trace_id to WebhookLogEntry
})Once you have trace_id flowing through your logs, you can join them with application traces in OpenTelemetry if and when you add that layer. Structured logs and distributed traces are complementary — structured logs answer "what happened," traces answer "how long did each part take and where did latency come from."
Good webhook delivery logs are not a nice-to-have. They're the difference between a 5-minute incident response and a 3-hour war room. The schema above takes an afternoon to implement and pays back that time in full the first time a customer asks "why didn't my webhook fire?"
If you want a webhook gateway where delivery tracing is already built in — with per-event attempt history, outcome classification, and full payload metadata — start with GetHook.