Back to Blog
webhookssecurityperformancehmacscalability

Webhook Signature Verification Performance: Avoiding the Crypto Bottleneck at Scale

HMAC-SHA256 verification is cheap per request but expensive in aggregate. Here's how high-throughput systems keep verification off the critical path without weakening security.

F
Finn Eriksson
Payments Engineer
April 5, 2026
9 min read

Signature verification is one of those things that feels free until it isn't. A single HMAC-SHA256 computation takes microseconds. At 50 events per second it's invisible. At 50,000 events per second — during a payment processor surge, a flash sale, or a mass notification broadcast — it's enough CPU to saturate a core and add measurable latency to every ingest request.

This post covers the mechanics of where HMAC verification overhead actually comes from, the implementation mistakes that make it worse, and the architectural patterns that keep verification fast at high throughput without cutting corners on security.


Why HMAC Verification Is Not Free

HMAC-SHA256 is a fast primitive. On modern hardware, a single core can compute roughly 500,000–700,000 HMAC-SHA256 operations per second on small payloads (under 1 KB). For larger payloads, throughput drops proportionally — the hash function processes data in 64-byte blocks, so a 64 KB payload takes approximately 64× longer than a 1 KB one.

In a typical webhook ingest path, you're not doing just one HMAC operation per request. You're:

  1. Parsing the Webhook-Signature header to extract the timestamp and one or more signature values
  2. Recomputing t=<timestamp>.<raw body> into a signed message string
  3. Computing HMAC-SHA256 of that message with each active secret
  4. Comparing the result against each signature in the header using a constant-time comparison

For a destination with a single active secret, this is one hash and one comparison. For a destination mid-rotation with two active secrets (the overlap pattern), it's two hashes and up to four comparisons. The constant-time comparison itself — while essential for security — is slower than a naive == comparison because it never short-circuits.

The payload read is often the hidden cost. If you read the raw body into memory to verify the signature and then parse it again for processing, you're traversing potentially large payloads twice. At scale, this doubles your memory allocation and GC pressure on top of the CPU cost.


Mistakes That Make It Slower

Several common implementation patterns compound the baseline overhead:

MistakeImpact
Reading body with io.ReadAll, then re-parsingDouble memory allocation; GC pressure at high RPS
Re-allocating the signed message string per requestUnnecessary heap allocation on every ingest
Querying the database for the signing secret per requestNetwork round-trip on the critical ingest path
Using strings.Compare or == for signature comparisonVulnerable to timing attacks; also forces constant-recompile of compiler optimizations
Verifying signature after JSON parsingWastes parse work on unauthenticated requests

The ordering mistake is worth calling out specifically. Some implementations parse the JSON body first (to extract event metadata for routing) and then verify the signature. This means an attacker who sends malformed or malicious payloads forces your JSON parser to do work before the request is even authenticated. Always verify the signature first, with the raw body bytes, before you parse anything.


Reading the Body Once

The canonical Go pattern for single-pass body handling:

go
func (h *IngestHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Enforce a size limit before reading anything
    r.Body = http.MaxBytesReader(w, r.Body, 2<<20) // 2 MB

    // Read once into a reusable buffer
    body, err := io.ReadAll(r.Body)
    if err != nil {
        httpx.BadRequest(w, "failed to read request body")
        return
    }

    // Verify before any other processing
    sigHeader := r.Header.Get("Webhook-Signature")
    if err := h.verifier.Verify(body, sigHeader); err != nil {
        httpx.Unauthorized(w, "invalid signature")
        return
    }

    // Now parse — only for authenticated requests
    var event EventPayload
    if err := json.Unmarshal(body, &event); err != nil {
        httpx.BadRequest(w, "invalid JSON payload")
        return
    }

    // ... enqueue, respond
}

The body slice is passed to both Verify and json.Unmarshal — no second read from the network, no second allocation. If you're using a sync.Pool for the read buffer, you can avoid the heap allocation entirely on the common path.


Caching Signing Secrets In-Process

The most impactful optimization is almost always secret caching. If you query the database for the destination's signing secret on every ingest request, you're paying a database round-trip (typically 1–5 ms) on every event. At 1,000 RPS, that's 1,000 concurrent database queries just for secret lookups.

The solution is an in-process cache with a short TTL:

go
type SecretCache struct {
    mu      sync.RWMutex
    entries map[string]cachedSecret
}

type cachedSecret struct {
    secrets   [][]byte // decoded, ready-to-use
    fetchedAt time.Time
}

const secretCacheTTL = 5 * time.Minute

func (c *SecretCache) Get(ctx context.Context, sourceToken string, store SecretStore) ([][]byte, error) {
    c.mu.RLock()
    entry, ok := c.entries[sourceToken]
    c.mu.RUnlock()

    if ok && time.Since(entry.fetchedAt) < secretCacheTTL {
        return entry.secrets, nil
    }

    // Cache miss or stale — fetch from store
    secrets, err := store.GetSecretsForSource(ctx, sourceToken)
    if err != nil {
        return nil, err
    }

    c.mu.Lock()
    c.entries[sourceToken] = cachedSecret{secrets: secrets, fetchedAt: time.Now()}
    c.mu.Unlock()

    return secrets, nil
}

A 5-minute TTL means a secret rotation propagates to all ingest nodes within 5 minutes — short enough for operational safety, long enough that cache pressure is negligible even at thousands of unique source tokens.

Two operational notes:

  1. When you rotate a signing secret, the overlap window (during which both old and new secrets are accepted) must be longer than the cache TTL. If your TTL is 5 minutes and your overlap window is 1 minute, nodes with a stale cache will reject events signed with the new secret before they learn about it.
  2. If a source is deleted or disabled, you need a way to invalidate its cache entry immediately. A simple approach: on cache miss, always re-fetch; on explicit source deletion, delete the cache entry via a background goroutine or pub/sub notification.

Constant-Time Comparison Is Non-Negotiable

Timing attacks against HMAC verification are real. If your comparison short-circuits on the first mismatched byte, an attacker can measure response latency to reconstruct the expected signature one byte at a time. At scale, with many requests, the statistical signal is clear enough to be exploitable.

Go's standard library provides the right tool:

go
import "crypto/hmac"

// Do NOT use this:
if computedSig == providedSig { ... }

// Use this instead:
if hmac.Equal([]byte(computedSig), []byte(providedSig)) { ... }

hmac.Equal always processes both slices to completion, regardless of where they differ. The comparison time is proportional to the length of the slices, not the position of the first mismatch.

One subtlety: hmac.Equal operates on raw byte slices, but signatures in headers are usually hex-encoded strings. You have two options: compare hex strings with subtle.ConstantTimeCompare, or decode both to bytes first and compare with hmac.Equal. The latter is preferable because hmac.Equal is specifically designed for this use case and is harder to misuse.

go
import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
)

func verifyHMAC(payload []byte, timestamp int64, providedHex string, secret []byte) bool {
    message := fmt.Sprintf("%d.%s", timestamp, payload)
    mac := hmac.New(sha256.New, secret)
    mac.Write([]byte(message))
    expected := mac.Sum(nil)

    provided, err := hex.DecodeString(providedHex)
    if err != nil {
        return false
    }

    return hmac.Equal(expected, provided)
}

Parallelizing Verification for Multi-Secret Rotation

During a secret rotation, you have two active secrets and potentially two signatures in the header. Verifying both in sequence is fine for most workloads. At extreme throughput, you can verify them in parallel:

go
func verifyAny(payload []byte, timestamp int64, sigs []string, secrets [][]byte) bool {
    type result struct{ ok bool }
    results := make(chan result, len(secrets))

    for _, secret := range secrets {
        s := secret // capture
        go func() {
            ok := false
            for _, sig := range sigs {
                if verifyHMAC(payload, timestamp, sig, s) {
                    ok = true
                    break
                }
            }
            results <- result{ok}
        }()
    }

    for range secrets {
        if r := <-results; r.ok {
            return true
        }
    }
    return false
}

In practice, this is only worth the goroutine overhead when you have more than two active secrets simultaneously (unusual outside of a migration window) or when payloads are large enough that each HMAC takes multiple milliseconds. For typical webhook payloads (under 64 KB) with two active secrets, sequential verification is fast enough.


Profiling Your Ingest Path

Before optimizing, measure. The Go profiler makes this straightforward in production:

go
import _ "net/http/pprof"

// In your health/debug server (not your public ingest server):
go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

Then sample CPU during a load test:

bash
# Run a 30-second CPU profile while load testing
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

In the flame graph, look for time spent in:

  • crypto/sha256 — the hash computation itself
  • io.ReadAll — body reads
  • encoding/json.Unmarshal — JSON parsing (should happen after verification)
  • database/sql — any secret lookups hitting the database

If crypto/sha256 is your top frame, you're either processing very large payloads or have a pathological number of active secrets per source. If database/sql is high, add the secret cache described above. If encoding/json appears before crypto/sha256 in the call stack, fix your verification ordering.


Putting It Together: A Verification Budget

A useful mental model for production capacity planning:

ComponentTypical latency contributionOptimization lever
Body read (1 KB payload)~10 µssync.Pool for read buffer
Secret cache lookup (hit)~1 µsKeep TTL 1–5 min; warm on startup
Secret cache lookup (miss)1–5 msPre-warm; keep miss rate < 1%
HMAC-SHA256 (1 KB payload)~3 µsNo optimization needed at this size
HMAC-SHA256 (64 KB payload)~200 µsConsider payload size limits
Constant-time comparison~1 µsAlready optimal; do not skip
JSON parsing (1 KB payload)~20 µsParse after verification only

For a 1 KB payload with a warm cache, total verification overhead is under 35 µs. At 10,000 ingest RPS on a single goroutine pool, that's 350 ms of CPU time per second — one core handles it comfortably. For 64 KB payloads, the budget is tighter: 2 ms per verification, meaning a single core saturates at around 500 RPS. Know your payload size distribution.

GetHook enforces a 2 MB ingest body limit by default, which bounds the worst-case verification time while accommodating any realistic webhook payload. Sources with consistently large payloads (above 100 KB) should be reviewed — most legitimate webhook events are far smaller.


Signature verification overhead is manageable at almost any scale if you avoid the common mistakes: reading the body twice, hitting the database per request, verifying after parsing, and using non-constant-time comparisons. With a warm secret cache and correct implementation, verification adds under 50 µs to your ingest path — cheap enough that you should never be tempted to skip it.

If you want an ingest layer with these optimizations already built in — including secret caching, constant-time HMAC verification, and configurable payload limits — start with GetHook and focus on your application logic instead.

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.