Signature verification is one of those things that feels free until it isn't. A single HMAC-SHA256 computation takes microseconds. At 50 events per second it's invisible. At 50,000 events per second — during a payment processor surge, a flash sale, or a mass notification broadcast — it's enough CPU to saturate a core and add measurable latency to every ingest request.
This post covers the mechanics of where HMAC verification overhead actually comes from, the implementation mistakes that make it worse, and the architectural patterns that keep verification fast at high throughput without cutting corners on security.
Why HMAC Verification Is Not Free
HMAC-SHA256 is a fast primitive. On modern hardware, a single core can compute roughly 500,000–700,000 HMAC-SHA256 operations per second on small payloads (under 1 KB). For larger payloads, throughput drops proportionally — the hash function processes data in 64-byte blocks, so a 64 KB payload takes approximately 64× longer than a 1 KB one.
In a typical webhook ingest path, you're not doing just one HMAC operation per request. You're:
- ›Parsing the
Webhook-Signatureheader to extract the timestamp and one or more signature values - ›Recomputing
t=<timestamp>.<raw body>into a signed message string - ›Computing HMAC-SHA256 of that message with each active secret
- ›Comparing the result against each signature in the header using a constant-time comparison
For a destination with a single active secret, this is one hash and one comparison. For a destination mid-rotation with two active secrets (the overlap pattern), it's two hashes and up to four comparisons. The constant-time comparison itself — while essential for security — is slower than a naive == comparison because it never short-circuits.
The payload read is often the hidden cost. If you read the raw body into memory to verify the signature and then parse it again for processing, you're traversing potentially large payloads twice. At scale, this doubles your memory allocation and GC pressure on top of the CPU cost.
Mistakes That Make It Slower
Several common implementation patterns compound the baseline overhead:
| Mistake | Impact |
|---|---|
Reading body with io.ReadAll, then re-parsing | Double memory allocation; GC pressure at high RPS |
| Re-allocating the signed message string per request | Unnecessary heap allocation on every ingest |
| Querying the database for the signing secret per request | Network round-trip on the critical ingest path |
Using strings.Compare or == for signature comparison | Vulnerable to timing attacks; also forces constant-recompile of compiler optimizations |
| Verifying signature after JSON parsing | Wastes parse work on unauthenticated requests |
The ordering mistake is worth calling out specifically. Some implementations parse the JSON body first (to extract event metadata for routing) and then verify the signature. This means an attacker who sends malformed or malicious payloads forces your JSON parser to do work before the request is even authenticated. Always verify the signature first, with the raw body bytes, before you parse anything.
Reading the Body Once
The canonical Go pattern for single-pass body handling:
func (h *IngestHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// Enforce a size limit before reading anything
r.Body = http.MaxBytesReader(w, r.Body, 2<<20) // 2 MB
// Read once into a reusable buffer
body, err := io.ReadAll(r.Body)
if err != nil {
httpx.BadRequest(w, "failed to read request body")
return
}
// Verify before any other processing
sigHeader := r.Header.Get("Webhook-Signature")
if err := h.verifier.Verify(body, sigHeader); err != nil {
httpx.Unauthorized(w, "invalid signature")
return
}
// Now parse — only for authenticated requests
var event EventPayload
if err := json.Unmarshal(body, &event); err != nil {
httpx.BadRequest(w, "invalid JSON payload")
return
}
// ... enqueue, respond
}The body slice is passed to both Verify and json.Unmarshal — no second read from the network, no second allocation. If you're using a sync.Pool for the read buffer, you can avoid the heap allocation entirely on the common path.
Caching Signing Secrets In-Process
The most impactful optimization is almost always secret caching. If you query the database for the destination's signing secret on every ingest request, you're paying a database round-trip (typically 1–5 ms) on every event. At 1,000 RPS, that's 1,000 concurrent database queries just for secret lookups.
The solution is an in-process cache with a short TTL:
type SecretCache struct {
mu sync.RWMutex
entries map[string]cachedSecret
}
type cachedSecret struct {
secrets [][]byte // decoded, ready-to-use
fetchedAt time.Time
}
const secretCacheTTL = 5 * time.Minute
func (c *SecretCache) Get(ctx context.Context, sourceToken string, store SecretStore) ([][]byte, error) {
c.mu.RLock()
entry, ok := c.entries[sourceToken]
c.mu.RUnlock()
if ok && time.Since(entry.fetchedAt) < secretCacheTTL {
return entry.secrets, nil
}
// Cache miss or stale — fetch from store
secrets, err := store.GetSecretsForSource(ctx, sourceToken)
if err != nil {
return nil, err
}
c.mu.Lock()
c.entries[sourceToken] = cachedSecret{secrets: secrets, fetchedAt: time.Now()}
c.mu.Unlock()
return secrets, nil
}A 5-minute TTL means a secret rotation propagates to all ingest nodes within 5 minutes — short enough for operational safety, long enough that cache pressure is negligible even at thousands of unique source tokens.
Two operational notes:
- ›When you rotate a signing secret, the overlap window (during which both old and new secrets are accepted) must be longer than the cache TTL. If your TTL is 5 minutes and your overlap window is 1 minute, nodes with a stale cache will reject events signed with the new secret before they learn about it.
- ›If a source is deleted or disabled, you need a way to invalidate its cache entry immediately. A simple approach: on cache miss, always re-fetch; on explicit source deletion, delete the cache entry via a background goroutine or pub/sub notification.
Constant-Time Comparison Is Non-Negotiable
Timing attacks against HMAC verification are real. If your comparison short-circuits on the first mismatched byte, an attacker can measure response latency to reconstruct the expected signature one byte at a time. At scale, with many requests, the statistical signal is clear enough to be exploitable.
Go's standard library provides the right tool:
import "crypto/hmac"
// Do NOT use this:
if computedSig == providedSig { ... }
// Use this instead:
if hmac.Equal([]byte(computedSig), []byte(providedSig)) { ... }hmac.Equal always processes both slices to completion, regardless of where they differ. The comparison time is proportional to the length of the slices, not the position of the first mismatch.
One subtlety: hmac.Equal operates on raw byte slices, but signatures in headers are usually hex-encoded strings. You have two options: compare hex strings with subtle.ConstantTimeCompare, or decode both to bytes first and compare with hmac.Equal. The latter is preferable because hmac.Equal is specifically designed for this use case and is harder to misuse.
import (
"crypto/hmac"
"crypto/sha256"
"encoding/hex"
)
func verifyHMAC(payload []byte, timestamp int64, providedHex string, secret []byte) bool {
message := fmt.Sprintf("%d.%s", timestamp, payload)
mac := hmac.New(sha256.New, secret)
mac.Write([]byte(message))
expected := mac.Sum(nil)
provided, err := hex.DecodeString(providedHex)
if err != nil {
return false
}
return hmac.Equal(expected, provided)
}Parallelizing Verification for Multi-Secret Rotation
During a secret rotation, you have two active secrets and potentially two signatures in the header. Verifying both in sequence is fine for most workloads. At extreme throughput, you can verify them in parallel:
func verifyAny(payload []byte, timestamp int64, sigs []string, secrets [][]byte) bool {
type result struct{ ok bool }
results := make(chan result, len(secrets))
for _, secret := range secrets {
s := secret // capture
go func() {
ok := false
for _, sig := range sigs {
if verifyHMAC(payload, timestamp, sig, s) {
ok = true
break
}
}
results <- result{ok}
}()
}
for range secrets {
if r := <-results; r.ok {
return true
}
}
return false
}In practice, this is only worth the goroutine overhead when you have more than two active secrets simultaneously (unusual outside of a migration window) or when payloads are large enough that each HMAC takes multiple milliseconds. For typical webhook payloads (under 64 KB) with two active secrets, sequential verification is fast enough.
Profiling Your Ingest Path
Before optimizing, measure. The Go profiler makes this straightforward in production:
import _ "net/http/pprof"
// In your health/debug server (not your public ingest server):
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()Then sample CPU during a load test:
# Run a 30-second CPU profile while load testing
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30In the flame graph, look for time spent in:
- ›
crypto/sha256— the hash computation itself - ›
io.ReadAll— body reads - ›
encoding/json.Unmarshal— JSON parsing (should happen after verification) - ›
database/sql— any secret lookups hitting the database
If crypto/sha256 is your top frame, you're either processing very large payloads or have a pathological number of active secrets per source. If database/sql is high, add the secret cache described above. If encoding/json appears before crypto/sha256 in the call stack, fix your verification ordering.
Putting It Together: A Verification Budget
A useful mental model for production capacity planning:
| Component | Typical latency contribution | Optimization lever |
|---|---|---|
| Body read (1 KB payload) | ~10 µs | sync.Pool for read buffer |
| Secret cache lookup (hit) | ~1 µs | Keep TTL 1–5 min; warm on startup |
| Secret cache lookup (miss) | 1–5 ms | Pre-warm; keep miss rate < 1% |
| HMAC-SHA256 (1 KB payload) | ~3 µs | No optimization needed at this size |
| HMAC-SHA256 (64 KB payload) | ~200 µs | Consider payload size limits |
| Constant-time comparison | ~1 µs | Already optimal; do not skip |
| JSON parsing (1 KB payload) | ~20 µs | Parse after verification only |
For a 1 KB payload with a warm cache, total verification overhead is under 35 µs. At 10,000 ingest RPS on a single goroutine pool, that's 350 ms of CPU time per second — one core handles it comfortably. For 64 KB payloads, the budget is tighter: 2 ms per verification, meaning a single core saturates at around 500 RPS. Know your payload size distribution.
GetHook enforces a 2 MB ingest body limit by default, which bounds the worst-case verification time while accommodating any realistic webhook payload. Sources with consistently large payloads (above 100 KB) should be reviewed — most legitimate webhook events are far smaller.
Signature verification overhead is manageable at almost any scale if you avoid the common mistakes: reading the body twice, hitting the database per request, verifying after parsing, and using non-constant-time comparisons. With a warm secret cache and correct implementation, verification adds under 50 µs to your ingest path — cheap enough that you should never be tempted to skip it.
If you want an ingest layer with these optimizations already built in — including secret caching, constant-time HMAC verification, and configurable payload limits — start with GetHook and focus on your application logic instead.