Back to Blog
kubernetesinfrastructuredeploymentreliabilitywebhooks

Deploying Webhook Consumers in Kubernetes: Ingress, Probes, and Zero-Downtime Rollouts

Kubernetes introduces subtle failure modes for webhook consumers that don't exist in traditional deployments. Here's how to configure ingress, health probes, and rolling updates so your endpoint never drops an event.

L
Lena Hartmann
Infrastructure Engineer
April 28, 2026
9 min read

Webhook consumers look like ordinary HTTP services, but they behave differently under load and during deployments. A standard API endpoint can safely return HTTP 503 when it isn't ready — the client retries. A webhook sender, depending on its configuration, may treat that 503 as a failed delivery, start an exponential backoff window, or — in the worst case — silently drop the event after exhausting retries.

Running a webhook consumer in Kubernetes without tuning it for these semantics will cause dropped events during rollouts. This post covers the configuration that matters: ingress setup, liveness vs. readiness probes, graceful shutdown, and rolling update strategy.


Expose Your Endpoint Correctly

There's nothing exotic about a webhook ingress — it's a standard HTTP route. Two details matter that are easy to miss:

1. Preserve the Host header

Many providers include the destination hostname in their HMAC signature computation. If your ingress rewrites or strips the Host header, signature verification will fail for every request. With NGINX ingress, pass the original host explicitly:

yaml
nginx.ingress.kubernetes.io/configuration-snippet: |
  proxy_set_header Host $http_host;

2. Raise body size and read timeout limits

Batch webhooks — aggregated events from providers like Shopify or Stripe — can be several megabytes. Default NGINX limits (1m body size, 60s read timeout) will silently reject oversized payloads with 413 Request Entity Too Large. Set these on the webhook ingress specifically so you aren't relaxing limits globally:

yaml
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "16m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "90"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "90"

Health Probes: The Most Misunderstood Part

Most teams copy a liveness probe from a tutorial and move on. For webhook consumers, the liveness/readiness distinction is load-bearing.

Liveness probe — answers: "is this process alive and worth keeping?" When liveness fails, Kubernetes kills and restarts the pod. A flapping liveness probe causes cascading restarts that drop in-flight events.

Readiness probe — answers: "should I route traffic to this pod right now?" When readiness fails, the pod is removed from the Service's endpoint list without being killed. Traffic stops arriving; the pod waits for its dependency to recover.

For webhook consumers, readiness is almost always the right primitive. Your consumer is "ready" when it can accept an event and durably enqueue it. If your database or internal queue is unhealthy, return 503 on /readyz — the sender retries against a healthy pod.

go
// /healthz — is the process alive?
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    _, _ = w.Write([]byte("ok"))
})

// /readyz — can we durably accept events right now?
mux.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()
    if err := db.PingContext(ctx); err != nil {
        http.Error(w, "db unavailable", http.StatusServiceUnavailable)
        return
    }
    w.WriteHeader(http.StatusOK)
    _, _ = w.Write([]byte("ready"))
})

In your pod spec:

yaml
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2

Keep failureThreshold low on readiness — you want traffic to divert quickly when your connection pool is exhausted, not after 30 seconds of 503s exhausting the sender's retry budget.


Graceful Shutdown

When Kubernetes terminates a pod, it sends SIGTERM and then waits terminationGracePeriodSeconds before issuing SIGKILL. The default grace period is 30 seconds, which is plenty of time to drain in-flight requests — but only if your application actually responds to SIGTERM.

Go's net/http server does not stop accepting connections on SIGTERM unless you call Shutdown explicitly. During a rolling update, pods that haven't caught the signal continue receiving new connections right up until they're killed, resulting in connection reset errors mid-request.

go
func main() {
    srv := &http.Server{
        Addr:         ":8080",
        Handler:      buildRouter(),
        ReadTimeout:  30 * time.Second,
        WriteTimeout: 60 * time.Second,
    }

    go func() {
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("server error: %v", err)
        }
    }()

    stop := make(chan os.Signal, 1)
    signal.Notify(stop, syscall.SIGTERM, syscall.SIGINT)
    <-stop

    log.Println("shutting down — draining in-flight requests")
    ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()

    if err := srv.Shutdown(ctx); err != nil {
        log.Printf("shutdown error: %v", err)
    }
    log.Println("shutdown complete")
}

Set terminationGracePeriodSeconds a few seconds longer than your application's shutdown timeout to avoid races:

yaml
spec:
  terminationGracePeriodSeconds: 30  # app drains in 25s; K8s allows 30s

Rolling Update Strategy

The default RollingUpdate strategy is the right choice, but maxUnavailable: 25% is often too aggressive for webhook consumers. With a 4-replica deployment, that means a single rollout can drop you to 3 pods — but if those 3 pods also have in-flight migrations or cold-start latency, you've reduced capacity further right when retrying senders are increasing load.

Use maxUnavailable: 0 combined with maxSurge: 1. Kubernetes will spin up a new pod before removing an old one, so you never dip below your target replica count:

yaml
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

The trade-off is brief overcapacity (5 pods during the rollout) and slightly slower deployments. For webhook consumers, that's the correct trade-off.


Autoscaling Webhook Workloads

Webhook traffic is bursty. CPU utilization is a lagging indicator for I/O-bound consumers — by the time CPU rises, you're already dropping requests. Scale on metrics that reflect actual queue pressure:

MetricTypical ThresholdNotes
HTTP requests per second200–400 req/s per podBest general-purpose signal
p95 request latency> 400msEarly signal of consumer backpressure
Pod CPU utilization70%Useful only if processing is CPU-bound
Queue depth (custom metric)> 500 unprocessedBest signal if you use an internal queue

With the NGINX ingress controller, you can expose RPS via Prometheus and feed it into an HPA using the external metrics API. For most teams, starting with CPU-based HPA and refining to RPS after a few weeks of production data is the pragmatic approach — better than tuning without real traffic patterns.


A Minimal Production Deployment

Putting it all together:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webhook-consumer
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: webhook-consumer
  template:
    metadata:
      labels:
        app: webhook-consumer
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: consumer
          image: your-registry/webhook-consumer:latest
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 2
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

One caveat on CPU limits: if your consumer is CPU-throttled during a burst, request latency rises and your readiness probe starts failing — removing healthy pods from the load balancer at the worst possible time. Monitor p95 latency against CPU throttle metrics (container_cpu_cfs_throttled_seconds_total) in your first few weeks to validate your limit settings.


Common Mistakes and Fixes

MistakeWhat BreaksFix
No readiness probeTraffic sent to pods before DB connection is establishedAdd /readyz with a dependency ping
Liveness probe too aggressiveCascading restarts under high loadRaise failureThreshold, lower periodSeconds
Missing SIGTERM handlerIn-flight requests killed mid-writeCall http.Server.Shutdown on signal
maxUnavailable: 25% on small fleetsDrops to 1 pod during rollouts, triggering sender retriesSet maxUnavailable: 0, maxSurge: 1
Default ingress body size limitLarge batch payloads rejected with 413Set proxy-body-size on webhook routes
CPU-only HPAScaling lags behind burst trafficAdd RPS or latency metric to HPA

If you're routing inbound events through GetHook, delivery retries use exponential backoff with jitter — so a brief 503 during your rolling update doesn't burn through your retry budget before a healthy pod comes online. Connect your Kubernetes consumer to GetHook →

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.