Back to Blog
GDPRcompliancedata retentionsecuritywebhooks

GDPR and Webhook Data Retention: What to Log, What to Delete

Webhook payloads routinely contain personal data — email addresses, order details, user IDs. Under GDPR, storing that data indefinitely is a liability. Here's a practical framework for what to log, how long to keep it, and how to delete it without breaking your replay guarantees.

N
Nadia Kowalski
Security Engineer
March 26, 2026
10 min read

Webhook infrastructure exists at an uncomfortable intersection with data privacy law. Your delivery system needs to store event payloads to support retry, replay, and debugging. GDPR requires that you not retain personal data beyond its stated purpose — and that you can delete it on request, within 30 days.

Most teams treat this as a problem for later. It rarely stays that way. A single data subject access request (DSAR) or right-to-erasure request for a customer's data will expose every place you stored their personal information — including your webhook event log.

This post covers how to design webhook storage with GDPR compliance from the start, rather than retrofitting it after an audit.


Why Webhook Payloads Are a GDPR Concern

Webhooks carry whatever data the upstream provider chose to include. That often means:

  • Stripe payment events contain billing name, email, address, last 4 of card
  • Shopify order events contain full customer name, shipping address, email, phone
  • Auth0/Okta user events contain email, IP address, user agent
  • Twilio SMS events contain phone numbers and sometimes message content
  • HubSpot CRM events contain contact names, emails, companies, deal amounts

If your webhook delivery system stores raw payloads — and nearly all of them do, for retry and replay purposes — you are storing personal data. Under GDPR Articles 5 and 17, that data must have a documented retention period, a lawful basis for processing, and a deletion mechanism.

The good news: GDPR doesn't prohibit storing this data. It requires that you do so deliberately, with controls in place.


The Data Minimization Decision

Before thinking about retention, ask: do you need to store the full payload at all?

For many webhook use cases, you process the payload and store derived state — not the raw event. If a Stripe payment.succeeded event updates a row in your orders table, you may not need the raw payload after delivery succeeds.

Payload storage approachProsCons
Full raw payload stored indefinitelyMaximum debuggability; replay possible at any pointPersonal data retained beyond its purpose; largest GDPR surface
Full payload with time-bounded retentionReplay window limited; cleaner compliance postureRequires automated deletion; metadata-only after expiry
Payload hash only (no body)Near-zero personal data exposureNo replay, no content debugging
Payload with PII fields strippedReplay possible with redacted contentRequires field-level parsing per provider; complex to maintain
No payload storage (metadata only)Minimal GDPR exposureDebugging requires provider re-sends; no replay

For production webhook infrastructure, time-bounded full payload retention is the most practical trade-off. Store the full payload for a window that covers your legitimate operational need (typically 7–30 days), then delete or truncate to metadata.


Defining Your Retention Windows

GDPR doesn't specify retention periods — it requires that you define them based on purpose. The purpose for webhook payload storage is typically:

  1. Retry: If delivery fails, you need the payload to attempt redelivery. Your retry window defines the minimum. For a 5-attempt exponential backoff strategy (0s → 30s → 2m → 10m → 1h), the maximum retry window is ~73 minutes. Anything beyond that is beyond the retry purpose.

  2. Debugging: Developers investigating delivery failures need to inspect payloads. 7 days covers the vast majority of debugging scenarios. 30 days covers edge cases like incidents discovered late.

  3. Replay: Event replay is a separate operational concern from retry. If your product offers replay as a feature, document it explicitly as a lawful basis, define its window, and surface it to customers so they can factor it into their own GDPR compliance posture.

A reasonable default policy:

PurposeRetention periodAction on expiry
Active delivery (retry window)72 hoursNo action needed; retry state resolves
Delivery debugging30 daysDelete payload body; retain metadata
Event replay90 days (configurable)Delete payload body; retain metadata
Audit trail (delivery attempts)1 yearRetain metadata without payload body

The metadata you retain after deleting the payload body — event ID, timestamp, source, destination, outcome, HTTP status — is sufficient for audit and monitoring purposes and contains no personal data.


Schema Design for Retention-Ready Storage

The key is separating the payload from the metadata at the schema level:

sql
-- Metadata: retained long-term for audit
CREATE TABLE events (
    id            UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id    UUID        NOT NULL,
    source_id     UUID,
    event_type    TEXT        NOT NULL,
    direction     TEXT        NOT NULL,      -- 'inbound' | 'outbound'
    status        TEXT        NOT NULL,
    received_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    payload_id    UUID,                      -- nullable; NULL after deletion
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Payload: deleted on schedule
CREATE TABLE event_payloads (
    id            UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    event_id      UUID        NOT NULL REFERENCES events(id),
    headers       JSONB,
    body          BYTEA       NOT NULL,      -- encrypted at rest
    body_size     INT         NOT NULL,
    content_type  TEXT,
    expires_at    TIMESTAMPTZ NOT NULL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX event_payloads_expires_at ON event_payloads (expires_at)
    WHERE expires_at IS NOT NULL;

The expires_at column drives automated deletion. Your retention job is a simple query:

sql
-- Run periodically (every hour is fine)
DELETE FROM event_payloads
WHERE expires_at < NOW()
RETURNING id, event_id;

After deletion, the events row remains — giving you a complete audit trail — but payload_id is nulled out and no personal data survives.


Handling Right-to-Erasure Requests

Under GDPR Article 17, data subjects can request deletion of their personal data. For webhook infrastructure, this means identifying and deleting every payload that contains data belonging to that individual.

This is harder than it sounds. A Stripe customer ID might appear in dozens of webhook payloads across multiple event types. You can't query event_payloads for a specific email address without decrypting and parsing every row — which is expensive and breaks encryption at rest.

The practical approaches:

1. Maintain a PII index at ingest time. When an event arrives, extract known PII identifiers (customer ID, email, user ID) from the payload and store them in a separate index table:

sql
CREATE TABLE event_pii_index (
    event_id    UUID    NOT NULL REFERENCES events(id),
    pii_type    TEXT    NOT NULL,  -- 'customer_id', 'email', 'user_id'
    pii_value   TEXT    NOT NULL,  -- hashed, not plaintext
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (event_id, pii_type, pii_value)
);

CREATE INDEX event_pii_index_lookup ON event_pii_index (pii_type, pii_value);

Store a hash of the PII value (SHA-256 is sufficient for lookup purposes), not the plaintext. When an erasure request arrives, hash the identifier and look it up:

sql
SELECT event_id FROM event_pii_index
WHERE pii_type = 'email'
  AND pii_value = encode(sha256('user@example.com'), 'hex');

Delete the matched payloads immediately — ahead of the scheduled retention window.

2. Rely on retention windows. If your retention window is 30 days, any erasure request for data older than 30 days is automatically satisfied. For requests within the window, you still need to delete proactively — but the blast radius is bounded.

3. Payload-level encryption with per-subject keys. Encrypt each payload with a key derived from the data subject's identifier. To satisfy an erasure request, delete the key. The payload becomes unrecoverable without needing to delete it. This is elegant but adds complexity to your key management infrastructure and isn't supported by most standard database encryption setups.

For most teams, approach 1 (PII index with hashed identifiers) combined with short retention windows is the practical choice.


What the Audit Trail Must Contain

Even after payload deletion, you need an audit trail for your own operational and compliance purposes. The minimum viable audit record per event:

FieldRetained?Reason
Event IDYesCross-reference with provider logs
Account IDYesTenant attribution
Source and event typeYesOperational monitoring
Received timestampYesTimeline reconstruction
Delivery statusYesSLA reporting
HTTP response status per attemptYesDebugging delivery failures
Destination ID (not URL)YesAudit without storing endpoint URLs
Payload bodyNo (after expiry)Personal data; deleted per retention policy
Raw headersNo (after expiry)May contain auth tokens or personal data
Destination URLConsiderURLs can contain personal data (e.g., /users/email@domain.com)

Destination URLs deserve special attention. If your customers configure webhook endpoints with PII embedded in the path or query string (it happens), storing those URLs creates the same retention obligation as storing payload bodies. Store the destination ID instead, and resolve the URL at delivery time from a separately managed configuration.


Logging and Observability Hygiene

Delivery logs are another vector for personal data retention that teams often overlook. A log line like:

INFO  delivering event evt_01HX... to https://api.acme.com/webhooks body={"email":"user@example.com","order_id":"ord_123"}

stores personal data in your log aggregation system, which likely has its own retention policy and GDPR posture. Structured logging that captures outcomes without payload content avoids this:

go
slog.Info("delivery attempt",
    "event_id", attempt.EventID,
    "destination_id", attempt.DestinationID,
    "attempt_number", attempt.Number,
    "outcome", attempt.Outcome,
    "http_status", attempt.HTTPStatus,
    "duration_ms", attempt.DurationMs,
    // Do NOT log: body, headers, destination URL if it contains PII
)

The delivery system knows everything it needs to debug — event ID, destination ID, outcome, latency — without touching the payload. Payload inspection goes through your dedicated event storage, which has the retention controls you've already built.


Communicating Retention Policy to Your Customers

If you operate a webhook infrastructure platform (i.e., your customers send webhooks to their own customers), your retention policy affects their GDPR compliance too. Your customers are data controllers; you are a data processor under Article 28.

The minimum your data processing agreement (DPA) should specify:

  • Maximum retention period for event payloads
  • How customers can trigger early deletion (API endpoint or dashboard)
  • What metadata is retained after payload deletion and for how long
  • Subprocessors who receive payload data (cloud provider, log aggregation)
  • Breach notification timeline

If you're building on GetHook, the platform's retention and deletion APIs give you the building blocks to expose these controls to your own customers — so their erasure requests can be satisfied programmatically rather than through manual support tickets.


Treating GDPR compliance as an afterthought creates expensive retrofit projects and audit risk. The patterns here — separated payload storage, short retention windows, a PII index, and payload-free audit logs — add minimal engineering overhead when designed in from the start.

If you want webhook infrastructure with built-in retention controls, start with GetHook →

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.