Webhook infrastructure exists at an uncomfortable intersection with data privacy law. Your delivery system needs to store event payloads to support retry, replay, and debugging. GDPR requires that you not retain personal data beyond its stated purpose — and that you can delete it on request, within 30 days.
Most teams treat this as a problem for later. It rarely stays that way. A single data subject access request (DSAR) or right-to-erasure request for a customer's data will expose every place you stored their personal information — including your webhook event log.
This post covers how to design webhook storage with GDPR compliance from the start, rather than retrofitting it after an audit.
Why Webhook Payloads Are a GDPR Concern
Webhooks carry whatever data the upstream provider chose to include. That often means:
- ›Stripe payment events contain billing name, email, address, last 4 of card
- ›Shopify order events contain full customer name, shipping address, email, phone
- ›Auth0/Okta user events contain email, IP address, user agent
- ›Twilio SMS events contain phone numbers and sometimes message content
- ›HubSpot CRM events contain contact names, emails, companies, deal amounts
If your webhook delivery system stores raw payloads — and nearly all of them do, for retry and replay purposes — you are storing personal data. Under GDPR Articles 5 and 17, that data must have a documented retention period, a lawful basis for processing, and a deletion mechanism.
The good news: GDPR doesn't prohibit storing this data. It requires that you do so deliberately, with controls in place.
The Data Minimization Decision
Before thinking about retention, ask: do you need to store the full payload at all?
For many webhook use cases, you process the payload and store derived state — not the raw event. If a Stripe payment.succeeded event updates a row in your orders table, you may not need the raw payload after delivery succeeds.
| Payload storage approach | Pros | Cons |
|---|---|---|
| Full raw payload stored indefinitely | Maximum debuggability; replay possible at any point | Personal data retained beyond its purpose; largest GDPR surface |
| Full payload with time-bounded retention | Replay window limited; cleaner compliance posture | Requires automated deletion; metadata-only after expiry |
| Payload hash only (no body) | Near-zero personal data exposure | No replay, no content debugging |
| Payload with PII fields stripped | Replay possible with redacted content | Requires field-level parsing per provider; complex to maintain |
| No payload storage (metadata only) | Minimal GDPR exposure | Debugging requires provider re-sends; no replay |
For production webhook infrastructure, time-bounded full payload retention is the most practical trade-off. Store the full payload for a window that covers your legitimate operational need (typically 7–30 days), then delete or truncate to metadata.
Defining Your Retention Windows
GDPR doesn't specify retention periods — it requires that you define them based on purpose. The purpose for webhook payload storage is typically:
- ›
Retry: If delivery fails, you need the payload to attempt redelivery. Your retry window defines the minimum. For a 5-attempt exponential backoff strategy (0s → 30s → 2m → 10m → 1h), the maximum retry window is ~73 minutes. Anything beyond that is beyond the retry purpose.
- ›
Debugging: Developers investigating delivery failures need to inspect payloads. 7 days covers the vast majority of debugging scenarios. 30 days covers edge cases like incidents discovered late.
- ›
Replay: Event replay is a separate operational concern from retry. If your product offers replay as a feature, document it explicitly as a lawful basis, define its window, and surface it to customers so they can factor it into their own GDPR compliance posture.
A reasonable default policy:
| Purpose | Retention period | Action on expiry |
|---|---|---|
| Active delivery (retry window) | 72 hours | No action needed; retry state resolves |
| Delivery debugging | 30 days | Delete payload body; retain metadata |
| Event replay | 90 days (configurable) | Delete payload body; retain metadata |
| Audit trail (delivery attempts) | 1 year | Retain metadata without payload body |
The metadata you retain after deleting the payload body — event ID, timestamp, source, destination, outcome, HTTP status — is sufficient for audit and monitoring purposes and contains no personal data.
Schema Design for Retention-Ready Storage
The key is separating the payload from the metadata at the schema level:
-- Metadata: retained long-term for audit
CREATE TABLE events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID NOT NULL,
source_id UUID,
event_type TEXT NOT NULL,
direction TEXT NOT NULL, -- 'inbound' | 'outbound'
status TEXT NOT NULL,
received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
payload_id UUID, -- nullable; NULL after deletion
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Payload: deleted on schedule
CREATE TABLE event_payloads (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_id UUID NOT NULL REFERENCES events(id),
headers JSONB,
body BYTEA NOT NULL, -- encrypted at rest
body_size INT NOT NULL,
content_type TEXT,
expires_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX event_payloads_expires_at ON event_payloads (expires_at)
WHERE expires_at IS NOT NULL;The expires_at column drives automated deletion. Your retention job is a simple query:
-- Run periodically (every hour is fine)
DELETE FROM event_payloads
WHERE expires_at < NOW()
RETURNING id, event_id;After deletion, the events row remains — giving you a complete audit trail — but payload_id is nulled out and no personal data survives.
Handling Right-to-Erasure Requests
Under GDPR Article 17, data subjects can request deletion of their personal data. For webhook infrastructure, this means identifying and deleting every payload that contains data belonging to that individual.
This is harder than it sounds. A Stripe customer ID might appear in dozens of webhook payloads across multiple event types. You can't query event_payloads for a specific email address without decrypting and parsing every row — which is expensive and breaks encryption at rest.
The practical approaches:
1. Maintain a PII index at ingest time. When an event arrives, extract known PII identifiers (customer ID, email, user ID) from the payload and store them in a separate index table:
CREATE TABLE event_pii_index (
event_id UUID NOT NULL REFERENCES events(id),
pii_type TEXT NOT NULL, -- 'customer_id', 'email', 'user_id'
pii_value TEXT NOT NULL, -- hashed, not plaintext
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (event_id, pii_type, pii_value)
);
CREATE INDEX event_pii_index_lookup ON event_pii_index (pii_type, pii_value);Store a hash of the PII value (SHA-256 is sufficient for lookup purposes), not the plaintext. When an erasure request arrives, hash the identifier and look it up:
SELECT event_id FROM event_pii_index
WHERE pii_type = 'email'
AND pii_value = encode(sha256('user@example.com'), 'hex');Delete the matched payloads immediately — ahead of the scheduled retention window.
2. Rely on retention windows. If your retention window is 30 days, any erasure request for data older than 30 days is automatically satisfied. For requests within the window, you still need to delete proactively — but the blast radius is bounded.
3. Payload-level encryption with per-subject keys. Encrypt each payload with a key derived from the data subject's identifier. To satisfy an erasure request, delete the key. The payload becomes unrecoverable without needing to delete it. This is elegant but adds complexity to your key management infrastructure and isn't supported by most standard database encryption setups.
For most teams, approach 1 (PII index with hashed identifiers) combined with short retention windows is the practical choice.
What the Audit Trail Must Contain
Even after payload deletion, you need an audit trail for your own operational and compliance purposes. The minimum viable audit record per event:
| Field | Retained? | Reason |
|---|---|---|
| Event ID | Yes | Cross-reference with provider logs |
| Account ID | Yes | Tenant attribution |
| Source and event type | Yes | Operational monitoring |
| Received timestamp | Yes | Timeline reconstruction |
| Delivery status | Yes | SLA reporting |
| HTTP response status per attempt | Yes | Debugging delivery failures |
| Destination ID (not URL) | Yes | Audit without storing endpoint URLs |
| Payload body | No (after expiry) | Personal data; deleted per retention policy |
| Raw headers | No (after expiry) | May contain auth tokens or personal data |
| Destination URL | Consider | URLs can contain personal data (e.g., /users/email@domain.com) |
Destination URLs deserve special attention. If your customers configure webhook endpoints with PII embedded in the path or query string (it happens), storing those URLs creates the same retention obligation as storing payload bodies. Store the destination ID instead, and resolve the URL at delivery time from a separately managed configuration.
Logging and Observability Hygiene
Delivery logs are another vector for personal data retention that teams often overlook. A log line like:
INFO delivering event evt_01HX... to https://api.acme.com/webhooks body={"email":"user@example.com","order_id":"ord_123"}stores personal data in your log aggregation system, which likely has its own retention policy and GDPR posture. Structured logging that captures outcomes without payload content avoids this:
slog.Info("delivery attempt",
"event_id", attempt.EventID,
"destination_id", attempt.DestinationID,
"attempt_number", attempt.Number,
"outcome", attempt.Outcome,
"http_status", attempt.HTTPStatus,
"duration_ms", attempt.DurationMs,
// Do NOT log: body, headers, destination URL if it contains PII
)The delivery system knows everything it needs to debug — event ID, destination ID, outcome, latency — without touching the payload. Payload inspection goes through your dedicated event storage, which has the retention controls you've already built.
Communicating Retention Policy to Your Customers
If you operate a webhook infrastructure platform (i.e., your customers send webhooks to their own customers), your retention policy affects their GDPR compliance too. Your customers are data controllers; you are a data processor under Article 28.
The minimum your data processing agreement (DPA) should specify:
- ›Maximum retention period for event payloads
- ›How customers can trigger early deletion (API endpoint or dashboard)
- ›What metadata is retained after payload deletion and for how long
- ›Subprocessors who receive payload data (cloud provider, log aggregation)
- ›Breach notification timeline
If you're building on GetHook, the platform's retention and deletion APIs give you the building blocks to expose these controls to your own customers — so their erasure requests can be satisfied programmatically rather than through manual support tickets.
Treating GDPR compliance as an afterthought creates expensive retrofit projects and audit risk. The patterns here — separated payload storage, short retention windows, a PII index, and payload-free audit logs — add minimal engineering overhead when designed in from the start.
If you want webhook infrastructure with built-in retention controls, start with GetHook →