Back to Blog
webhooksobservabilitylogginginfrastructuredebugging

Structured Logging for Your Webhook Delivery Pipeline

Unstructured logs are useless during a 2 AM incident. Here's how to emit actionable structured logs from every layer of your webhook pipeline — ingest, queue, delivery, and retry — so you can answer 'what happened to event X?' in under a minute.

A
Aleksa Vukovic
Developer Relations
April 16, 2026
9 min read

When a customer reports "our webhook didn't fire," your log output determines whether you can answer that in two minutes or two hours. Most teams treat webhook pipeline logging as an afterthought — a log.Printf("delivering event %s", id) scattered through the worker code. That gets you nowhere fast.

Structured logging means every log line is a machine-readable JSON object with a consistent schema. It means every log entry for event evt_01JQMR9KXV4B2P7TNWY63ZHFD shares a event_id field you can filter on. It means your on-call engineer can write a single query and reconstruct exactly what happened to a specific event — without guessing at log formats or grepping through multi-gigabyte files.

This post covers what fields belong in each layer of your webhook pipeline, how to structure them in Go, and what queries to build on top.


The Five Layers You Need to Log

A webhook delivery pipeline has distinct phases, and each phase can fail independently. Log at every boundary:

LayerWhat to logKey fields
IngestEvent received from provider or senderevent_id, source_id, event_type, payload_bytes, ingest_latency_ms
ValidationSignature check pass/failevent_id, source_id, algorithm, result, failure_reason
QueueEvent enqueued for deliveryevent_id, destination_id, attempt_number, scheduled_at
DeliveryHTTP request to destinationevent_id, destination_id, attempt_number, http_status, duration_ms, outcome
RetryNext attempt scheduledevent_id, destination_id, attempt_number, next_attempt_at, backoff_seconds

Missing any of these means you'll have a gap in your trace when something goes wrong. Ingest and delivery are the two most critical; don't ship without both.


Your Base Log Structure

Start with a base event type that every log entry embeds. In Go, this looks like:

go
package logging

import (
    "context"
    "encoding/json"
    "log/slog"
    "os"
    "time"
)

type WebhookLogEntry struct {
    Timestamp   time.Time `json:"ts"`
    Level       string    `json:"level"`
    Layer       string    `json:"layer"`  // ingest|validate|queue|deliver|retry
    EventID     string    `json:"event_id,omitempty"`
    SourceID    string    `json:"source_id,omitempty"`
    DestID      string    `json:"destination_id,omitempty"`
    AccountID   string    `json:"account_id,omitempty"`
    EventType   string    `json:"event_type,omitempty"`
    Attempt     int       `json:"attempt_number,omitempty"`
    Outcome     string    `json:"outcome,omitempty"`   // success|timeout|network_error|http_4xx|http_5xx
    HTTPStatus  int       `json:"http_status,omitempty"`
    DurationMS  int64     `json:"duration_ms,omitempty"`
    Msg         string    `json:"msg"`
    Error       string    `json:"error,omitempty"`
}

var logger = slog.New(slog.NewJSONHandler(os.Stdout, nil))

func Emit(ctx context.Context, entry WebhookLogEntry) {
    entry.Timestamp = time.Now().UTC()
    b, _ := json.Marshal(entry)
    logger.InfoContext(ctx, string(b))
}

Every field has a name you can filter on. event_id appears in logs from ingest all the way through final delivery — it's the thread you pull to reconstruct a complete delivery trace.

The layer field is underappreciated. When you're diagnosing a failure, knowing which layer emitted the log entry tells you immediately whether the problem is upstream of the queue, inside the delivery worker, or in the retry scheduler. Don't make your engineers parse message strings to figure that out.


Logging the Ingest Layer

When an event arrives at your ingest endpoint, emit a log entry before any processing:

go
func (h *IngestHandler) Handle(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    body, err := io.ReadAll(io.LimitReader(r.Body, maxBodyBytes))
    if err != nil {
        logging.Emit(r.Context(), logging.WebhookLogEntry{
            Layer: "ingest",
            Level: "error",
            Msg:   "failed to read request body",
            Error: err.Error(),
        })
        http.Error(w, "bad request", http.StatusBadRequest)
        return
    }

    eventID := generateEventID()

    logging.Emit(r.Context(), logging.WebhookLogEntry{
        Layer:       "ingest",
        Level:       "info",
        EventID:     eventID,
        SourceID:    sourceID,
        AccountID:   accountID,
        EventType:   r.Header.Get("X-Event-Type"),
        Msg:         "event received",
        DurationMS:  time.Since(start).Milliseconds(),
    })

    // ... persist and enqueue
}

Log event_id at the moment you assign it — this is the key you'll use to track everything downstream. If you log it only after persistence and the persistence fails, you lose the entry.

One field worth adding that many teams miss: payload_bytes. Knowing the size of the incoming payload is useful for capacity planning and for diagnosing issues where oversized payloads cause downstream failures. Don't log the payload body itself — that's a security and compliance problem — but the byte count is safe and useful.


Logging Signature Validation

Signature verification failures are common and important to distinguish from delivery failures. Log them separately with enough detail to tell a spoofed request from a misconfigured HMAC secret:

go
type ValidationResult struct {
    EventID       string
    SourceID      string
    Algorithm     string // "hmac-sha256"
    Result        string // "pass" | "fail"
    FailureReason string // "signature_mismatch" | "timestamp_expired" | "missing_header"
}

func logValidation(ctx context.Context, v ValidationResult) {
    level := "info"
    if v.Result == "fail" {
        level = "warn"
    }

    logging.Emit(ctx, logging.WebhookLogEntry{
        Layer:    "validate",
        Level:    level,
        EventID:  v.EventID,
        SourceID: v.SourceID,
        Outcome:  v.Result,
        Msg:      "signature validation " + v.Result,
        Error:    v.FailureReason,
    })
}

The distinction between signature_mismatch and timestamp_expired is operationally significant. A signature_mismatch from a new integration usually means the customer is computing the HMAC with the wrong secret. A timestamp_expired means their server clock is drifted, or they're replaying an old request. Different failures, different conversations with the customer.


Logging the Delivery Worker

The delivery layer is where most failures surface. You want a log entry for every HTTP attempt — not just failures:

go
func (w *Worker) deliver(ctx context.Context, event Event, dest Destination) DeliveryResult {
    start := time.Now()
    resp, err := w.httpClient.Post(dest.URL, event.Payload)
    durationMS := time.Since(start).Milliseconds()

    outcome, httpStatus := classifyOutcome(resp, err)

    logging.Emit(ctx, logging.WebhookLogEntry{
        Layer:      "deliver",
        Level:      outcomeLogLevel(outcome),
        EventID:    event.ID,
        DestID:     dest.ID,
        AccountID:  event.AccountID,
        Attempt:    event.AttemptsCount + 1,
        Outcome:    outcome,
        HTTPStatus: httpStatus,
        DurationMS: durationMS,
        Msg:        "delivery attempt",
    })

    return DeliveryResult{Outcome: outcome, DurationMS: durationMS}
}

func outcomeLogLevel(outcome string) string {
    switch outcome {
    case "success":
        return "info"
    case "http_4xx":
        return "warn"
    case "http_5xx", "timeout", "network_error":
        return "error"
    default:
        return "info"
    }
}

Log level should reflect the outcome. A success is info. An http_5xx is error. This matters because your log aggregation system — Loki, Datadog, CloudWatch Logs Insights — typically lets you filter by level. An on-call engineer can immediately find all delivery errors without constructing complex queries.

The attempt_number field lets you distinguish "this event failed on first attempt and succeeded on retry" from "this event failed all five attempts and went to dead letter." Both are failures at the attempt level, but they're very different situations at the event level.


Logging Retry Scheduling

When an event is scheduled for retry, log the backoff decision:

go
func logRetryScheduled(ctx context.Context, event Event, nextAttemptAt time.Time) {
    backoffSeconds := int64(time.Until(nextAttemptAt).Seconds())

    logging.Emit(ctx, logging.WebhookLogEntry{
        Layer:      "retry",
        Level:      "info",
        EventID:    event.ID,
        DestID:     event.DestinationID,
        AccountID:  event.AccountID,
        Attempt:    event.AttemptsCount,
        Msg:        "retry scheduled",
        DurationMS: backoffSeconds * 1000, // reuse field for backoff
    })
}

When an event reaches max attempts and moves to dead letter, emit a separate log entry at error level with outcome: dead_letter. This is the signal your alerting should fire on — not individual delivery failures, which are expected.


Queries You Should Be Able to Run

Once your logs are in an aggregation system, these queries should work out of the box:

sql
-- Full delivery trace for a single event
SELECT ts, layer, outcome, http_status, duration_ms, attempt_number, error
FROM webhook_logs
WHERE event_id = 'evt_01JQMR9KXV4B2P7TNWY63ZHFD'
ORDER BY ts ASC;

-- All delivery failures for a destination in the last hour
SELECT ts, event_id, attempt_number, http_status, outcome
FROM webhook_logs
WHERE layer = 'deliver'
  AND destination_id = 'dst_abc123'
  AND outcome != 'success'
  AND ts > now() - INTERVAL '1 hour'
ORDER BY ts DESC;

-- Dead letter rate by account over the last 24 hours
SELECT account_id, COUNT(*) AS dead_letter_count
FROM webhook_logs
WHERE outcome = 'dead_letter'
  AND ts > now() - INTERVAL '24 hours'
GROUP BY account_id
ORDER BY dead_letter_count DESC;

These queries are simple because the log schema is consistent. The layer, outcome, destination_id, and account_id fields are at known paths in every log entry — no string parsing, no regex, no guessing.

GetHook exposes this delivery history through the events API and dashboard, so you can trace any event from ingest to delivery without writing these queries yourself. But if you're building your own pipeline, structuring your logs this way gives you the same capability in your own log stack.


What Not to Log

Structured logging is not "log everything." A few rules:

  • Never log raw payloads. Webhook payloads frequently contain PII, payment data, or API keys. Log byte counts, not content.
  • Never log signing secrets or HMAC keys. Even in validation failure messages. Log the failure reason, not the expected vs. actual signature values.
  • Avoid logging at debug level in production by default. Set debug logs to require explicit opt-in (env var or per-account flag). Debug logs at volume will saturate your log aggregator and cost money.
  • Don't log ephemeral connection metadata (remote IP, TLS version) unless you're actively debugging a security incident. It adds noise and has retention compliance implications.

The goal is a log entry that answers "what happened to this event?" — not a complete audit trail of every internal state transition.


Tying It Together with a Trace ID

If you want to go beyond per-event tracing to cross-service correlation, add a trace_id to your base log structure. Generate it at ingest and propagate it through the queue via the event row. When the worker picks it up and logs a delivery attempt, the same trace_id appears.

go
// At ingest
traceID := uuid.New().String()
event := Event{
    ID:      eventID,
    TraceID: traceID,
    // ...
}

// In the worker
logging.Emit(ctx, logging.WebhookLogEntry{
    // ...
    EventID: event.ID,
    // Add trace_id to WebhookLogEntry
})

Once you have trace_id flowing through your logs, you can join them with application traces in OpenTelemetry if and when you add that layer. Structured logs and distributed traces are complementary — structured logs answer "what happened," traces answer "how long did each part take and where did latency come from."


Good webhook delivery logs are not a nice-to-have. They're the difference between a 5-minute incident response and a 3-hour war room. The schema above takes an afternoon to implement and pays back that time in full the first time a customer asks "why didn't my webhook fire?"

If you want a webhook gateway where delivery tracing is already built in — with per-event attempt history, outcome classification, and full payload metadata — start with GetHook.

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.