Webhook consumers look like ordinary HTTP services, but they behave differently under load and during deployments. A standard API endpoint can safely return HTTP 503 when it isn't ready — the client retries. A webhook sender, depending on its configuration, may treat that 503 as a failed delivery, start an exponential backoff window, or — in the worst case — silently drop the event after exhausting retries.
Running a webhook consumer in Kubernetes without tuning it for these semantics will cause dropped events during rollouts. This post covers the configuration that matters: ingress setup, liveness vs. readiness probes, graceful shutdown, and rolling update strategy.
Expose Your Endpoint Correctly
There's nothing exotic about a webhook ingress — it's a standard HTTP route. Two details matter that are easy to miss:
1. Preserve the Host header
Many providers include the destination hostname in their HMAC signature computation. If your ingress rewrites or strips the Host header, signature verification will fail for every request. With NGINX ingress, pass the original host explicitly:
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_set_header Host $http_host;2. Raise body size and read timeout limits
Batch webhooks — aggregated events from providers like Shopify or Stripe — can be several megabytes. Default NGINX limits (1m body size, 60s read timeout) will silently reject oversized payloads with 413 Request Entity Too Large. Set these on the webhook ingress specifically so you aren't relaxing limits globally:
metadata:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "16m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "90"
nginx.ingress.kubernetes.io/proxy-send-timeout: "90"Health Probes: The Most Misunderstood Part
Most teams copy a liveness probe from a tutorial and move on. For webhook consumers, the liveness/readiness distinction is load-bearing.
Liveness probe — answers: "is this process alive and worth keeping?" When liveness fails, Kubernetes kills and restarts the pod. A flapping liveness probe causes cascading restarts that drop in-flight events.
Readiness probe — answers: "should I route traffic to this pod right now?" When readiness fails, the pod is removed from the Service's endpoint list without being killed. Traffic stops arriving; the pod waits for its dependency to recover.
For webhook consumers, readiness is almost always the right primitive. Your consumer is "ready" when it can accept an event and durably enqueue it. If your database or internal queue is unhealthy, return 503 on /readyz — the sender retries against a healthy pod.
// /healthz — is the process alive?
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("ok"))
})
// /readyz — can we durably accept events right now?
mux.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil {
http.Error(w, "db unavailable", http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("ready"))
})In your pod spec:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2Keep failureThreshold low on readiness — you want traffic to divert quickly when your connection pool is exhausted, not after 30 seconds of 503s exhausting the sender's retry budget.
Graceful Shutdown
When Kubernetes terminates a pod, it sends SIGTERM and then waits terminationGracePeriodSeconds before issuing SIGKILL. The default grace period is 30 seconds, which is plenty of time to drain in-flight requests — but only if your application actually responds to SIGTERM.
Go's net/http server does not stop accepting connections on SIGTERM unless you call Shutdown explicitly. During a rolling update, pods that haven't caught the signal continue receiving new connections right up until they're killed, resulting in connection reset errors mid-request.
func main() {
srv := &http.Server{
Addr: ":8080",
Handler: buildRouter(),
ReadTimeout: 30 * time.Second,
WriteTimeout: 60 * time.Second,
}
go func() {
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("server error: %v", err)
}
}()
stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGTERM, syscall.SIGINT)
<-stop
log.Println("shutting down — draining in-flight requests")
ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("shutdown error: %v", err)
}
log.Println("shutdown complete")
}Set terminationGracePeriodSeconds a few seconds longer than your application's shutdown timeout to avoid races:
spec:
terminationGracePeriodSeconds: 30 # app drains in 25s; K8s allows 30sRolling Update Strategy
The default RollingUpdate strategy is the right choice, but maxUnavailable: 25% is often too aggressive for webhook consumers. With a 4-replica deployment, that means a single rollout can drop you to 3 pods — but if those 3 pods also have in-flight migrations or cold-start latency, you've reduced capacity further right when retrying senders are increasing load.
Use maxUnavailable: 0 combined with maxSurge: 1. Kubernetes will spin up a new pod before removing an old one, so you never dip below your target replica count:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0The trade-off is brief overcapacity (5 pods during the rollout) and slightly slower deployments. For webhook consumers, that's the correct trade-off.
Autoscaling Webhook Workloads
Webhook traffic is bursty. CPU utilization is a lagging indicator for I/O-bound consumers — by the time CPU rises, you're already dropping requests. Scale on metrics that reflect actual queue pressure:
| Metric | Typical Threshold | Notes |
|---|---|---|
| HTTP requests per second | 200–400 req/s per pod | Best general-purpose signal |
| p95 request latency | > 400ms | Early signal of consumer backpressure |
| Pod CPU utilization | 70% | Useful only if processing is CPU-bound |
| Queue depth (custom metric) | > 500 unprocessed | Best signal if you use an internal queue |
With the NGINX ingress controller, you can expose RPS via Prometheus and feed it into an HPA using the external metrics API. For most teams, starting with CPU-based HPA and refining to RPS after a few weeks of production data is the pragmatic approach — better than tuning without real traffic patterns.
A Minimal Production Deployment
Putting it all together:
apiVersion: apps/v1
kind: Deployment
metadata:
name: webhook-consumer
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: webhook-consumer
template:
metadata:
labels:
app: webhook-consumer
spec:
terminationGracePeriodSeconds: 30
containers:
- name: consumer
image: your-registry/webhook-consumer:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"One caveat on CPU limits: if your consumer is CPU-throttled during a burst, request latency rises and your readiness probe starts failing — removing healthy pods from the load balancer at the worst possible time. Monitor p95 latency against CPU throttle metrics (container_cpu_cfs_throttled_seconds_total) in your first few weeks to validate your limit settings.
Common Mistakes and Fixes
| Mistake | What Breaks | Fix |
|---|---|---|
| No readiness probe | Traffic sent to pods before DB connection is established | Add /readyz with a dependency ping |
| Liveness probe too aggressive | Cascading restarts under high load | Raise failureThreshold, lower periodSeconds |
Missing SIGTERM handler | In-flight requests killed mid-write | Call http.Server.Shutdown on signal |
maxUnavailable: 25% on small fleets | Drops to 1 pod during rollouts, triggering sender retries | Set maxUnavailable: 0, maxSurge: 1 |
| Default ingress body size limit | Large batch payloads rejected with 413 | Set proxy-body-size on webhook routes |
| CPU-only HPA | Scaling lags behind burst traffic | Add RPS or latency metric to HPA |
If you're routing inbound events through GetHook, delivery retries use exponential backoff with jitter — so a brief 503 during your rolling update doesn't burn through your retry budget before a healthy pod comes online. Connect your Kubernetes consumer to GetHook →