Back to Blog
architecturemicroserviceskafkaevent-drivenwebhooks

Webhooks for Internal Microservice Communication: When to Use Them vs. Kafka and NATS

Kafka and NATS get all the attention for internal event-driven systems, but HTTP webhooks are often the right choice — and teams dismiss them too quickly. Here's a clear-eyed comparison of the trade-offs.

C
Camille Beaumont
Backend Architect
March 27, 2026
10 min read

The moment a team decides to decouple two internal services with events, someone opens a Kafka ticket. It's the default — even when the system has three services, handles five thousand events per day, and has a six-person engineering team.

Kafka is the right answer for some problems. It is a poor fit for many others. HTTP webhooks, usually dismissed as "just for external integrations," are often a better internal communication primitive than teams give them credit for. NATS lands somewhere in between.

This post is a direct comparison: where each option excels, where it hurts you, and how to make the call without defaulting to whatever the largest tech company blog post recommended.


What We're Actually Comparing

"Webhooks vs. Kafka" conflates two different dimensions:

  1. Transport: HTTP vs. TCP with a custom protocol
  2. Broker model: Push vs. pull; durable vs. transient; log-based vs. queue-based

Kafka is a durable, ordered, pull-based, log-structured broker. NATS is a lightweight, push-based messaging system that offers both transient (core NATS) and durable (JetStream) modes. HTTP webhooks are synchronous push calls over standard HTTP — typically made reliable by a delivery layer that handles retries.

None of these are inherently superior. They optimize for different things.


The Case for Kafka

Kafka dominates in environments that need:

Ordered replay. Kafka's log structure means every event is addressable by offset. A new consumer can replay from offset 0 and get every event, in order, that ever happened on a topic. This is invaluable for event sourcing, audit logs, and building derived data stores.

Fan-out to many consumers. A single Kafka topic can have 50 independent consumer groups, each reading at their own pace. Adding a new consumer doesn't affect existing ones.

Very high throughput. Kafka handles millions of events per second per partition. If you're processing high-frequency time-series data, telemetry, or financial ticks, HTTP isn't going to cut it at that volume.

Long retention. Kafka can retain events for days, weeks, or indefinitely (with tiered storage). Your consumers can fall behind by hours and catch up at their own pace.

The cost: operational complexity. Kafka requires ZooKeeper or KRaft, careful partition sizing, consumer group management, schema registry if you care about schema evolution, and monitoring of consumer lag. A team that doesn't have Kafka expertise will spend significant time operating it before they get value from it.


The Case for NATS

NATS fills the gap between raw HTTP and Kafka:

Low latency. Core NATS delivers messages in microseconds. There's no durable log overhead on the hot path.

Simple deployment. A NATS server is a single binary with sensible defaults. JetStream (NATS's durable stream) adds persistence without the operational weight of Kafka.

Push model with back-pressure. NATS JetStream can push messages to consumers and slow down delivery if the consumer signals back-pressure — unlike Kafka's pull model, which requires consumers to continuously poll.

Good for request-reply patterns. Core NATS has first-class support for synchronous request-reply over pub-sub, which maps well to RPC-style internal service calls.

The cost: less mature ecosystem compared to Kafka, and JetStream's durability guarantees are weaker than Kafka's in failure scenarios. At very high scale (millions of events/sec), Kafka's architecture wins.


The Case for HTTP Webhooks

HTTP webhooks for internal communication get dismissed because they're associated with external integrations. That's a category error. The mechanism is just HTTP — the same protocol your services already use for REST APIs.

Here's where webhooks outperform both Kafka and NATS for internal use:

Zero new infrastructure. Every service already speaks HTTP. Adding a webhook call between two services requires no new broker, no new dependency, no new operational surface. The only addition is a retry/delivery layer — which can be as simple as a Postgres-backed queue.

Explicit delivery contracts. With Kafka, a producer has no idea whether any consumer processed its event — it just appended to a log. With webhooks, the recipient's HTTP response code is the acknowledgement. A 200 means it worked. A 500 means try again. This explicitness simplifies debugging: you can see exactly which deliveries succeeded and which failed.

Per-destination retry control. You can give service A a 5-second timeout and 3 retries, while service B gets a 30-second timeout and 10 retries. Kafka doesn't have a native concept of per-consumer retry policies — that logic lives in the consumer application code.

Easy observability without special tooling. HTTP logs are structured, ubiquitous, and understood. Every APM tool, every log aggregator, every trace collector already speaks HTTP. Kafka consumer lag requires specialized monitoring (Prometheus JMX exporter, Burrow, etc.) and expertise to interpret correctly.

Replay is explicit. Dead-letter events are first-class objects you can inspect, replay on demand, or route elsewhere. With Kafka, "replay" means resetting a consumer group offset — which affects all instances of that consumer simultaneously, with no per-event granularity.

The cost: throughput ceiling. HTTP adds per-request overhead: TLS handshake (mitigated by connection reuse), request/response framing, and a round-trip latency floor that's typically 1–50ms per delivery. For tens of thousands of events per second between internal services, this becomes a bottleneck. For thousands per second or below, it's irrelevant.


Decision Framework

Use this table as a starting point, not a hard rule:

CriteriaHTTP WebhooksNATS JetStreamKafka
Events per second< 10K< 500KMillions
New infra budgetNoneLowHigh
Consumer count1–201–100+Unlimited
Replay granularityPer-eventPer-sequencePer-offset (batch)
Retry controlPer-destinationPer-consumerIn consumer code
Delivery visibilityHTTP status codesAck/nackConsumer lag
Schema evolutionIn payloadIn payloadSchema registry
Team expertise neededHTTP (universal)NATS docsKafka ecosystem
Ordering guaranteePer-source FIFOPer-subjectPer-partition

A useful shortcut: if your team can't name five Kafka operational tasks they're prepared to own, they're probably not ready for Kafka. Start with webhooks or NATS, and migrate when the pain of scale is real, not hypothetical.


A Concrete Internal Webhook Setup

Here's what an internal webhook setup looks like in practice. Service A emits events by POSTing to a delivery layer; service B registers as a destination.

go
// Service A: publish an internal event
func (s *OrderService) placeOrder(ctx context.Context, order Order) error {
    if err := s.db.InsertOrder(ctx, order); err != nil {
        return err
    }

    payload, _ := json.Marshal(map[string]any{
        "event":    "order.placed",
        "order_id": order.ID,
        "amount":   order.Amount,
        "currency": order.Currency,
    })

    // POST to the internal event gateway, which handles retries
    req, _ := http.NewRequestWithContext(ctx, "POST",
        "https://events.internal/ingest/"+s.sourceToken,
        bytes.NewReader(payload),
    )
    req.Header.Set("Content-Type", "application/json")
    resp, err := s.httpClient.Do(req)
    if err != nil || resp.StatusCode >= 300 {
        return fmt.Errorf("event publish failed: %w", err)
    }
    return nil
}

Service B registers its endpoint as a destination. The delivery layer handles retries with exponential backoff, records every attempt, and provides a replay API. Service B's handler just needs to process the event and return 200:

go
// Service B: receive the internal event
func (h *InventoryHandler) HandleOrderPlaced(w http.ResponseWriter, r *http.Request) {
    var event struct {
        OrderID  string `json:"order_id"`
        Amount   int    `json:"amount"`
    }
    if err := json.NewDecoder(r.Body).Decode(&event); err != nil {
        http.Error(w, "bad payload", http.StatusBadRequest)
        return
    }

    if err := h.reserveInventory(r.Context(), event.OrderID); err != nil {
        // Return 500 — the delivery layer will retry
        http.Error(w, "reservation failed", http.StatusInternalServerError)
        return
    }

    w.WriteHeader(http.StatusOK)
}

The delivery layer is the key component. Without it, you're just making raw HTTP calls with no retry logic — which is worse than Kafka in every reliability dimension. With a proper delivery layer (Postgres-backed queue, exponential backoff, dead-letter support), you get durability comparable to NATS JetStream without the infrastructure overhead.

GetHook provides this delivery layer and supports internal service-to-service delivery patterns — not just external customer-facing webhooks. You get per-destination retry policies, a full delivery attempt history, and one-click replay, without running a broker.


When to Migrate Away from Webhooks

Start with webhooks. Migrate to Kafka (or NATS) when you hit one of these concrete thresholds:

  • Throughput: you need to process more than 10,000 events/second per consumer pair, and your HTTP delivery layer is becoming the bottleneck
  • Consumer fan-out: you have more than 20–30 consumers subscribing to the same event type, and managing destination registration is becoming operationally painful
  • Strict ordering: you need global ordering guarantees across events from multiple producers, not just per-source FIFO
  • Retention at scale: you're storing billions of events and need log compaction or tiered storage to keep costs manageable

Until you hit one of these, you're paying Kafka's operational tax for a problem you don't have.


The default in software engineering is often to use the biggest tool available. Kafka is impressive. It's also a serious operational commitment. HTTP webhooks between internal services — backed by a reliable delivery layer — solve 80% of event-driven communication needs with 20% of the operational complexity.

Evaluate your actual event volume, your team's operational bandwidth, and your fan-out requirements before you commit. The right answer for most teams at most stages is simpler than they think.

If you want to see how a delivery layer handles internal webhook routing in practice, start with GetHook →

Stop losing webhook events.

GetHook gives you reliable delivery, automatic retry, and full observability — in minutes.