Delivery Semantics

5 min read

Reading Progress0%
Streaming Systems Index
Tier 1 -- Foundations
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design
Streaming Systems Index
Tier 1 -- Foundations
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design

Delivery Semantics

1. What Is It?

Delivery semantics describe how many times a message produced into a streaming system can be observed by a : , , or effectively-once (often called ). The choice is forced by the combination of acknowledgments, durability, /commit strategy, and any external side effects the consumer performs.

The problem they solve: distributed systems can fail at any step ( crash before ack, broker crash before , consumer crash before commit). Each guarantee defines a different way of trading off duplicates, drops, and complexity. Without an explicit choice, you default to whatever the broker happens to give you and you'll be surprised — usually by duplicates in production after the first failover.

QUICK CHECK

A backend service consumes messages from a Kafka topic and writes order records to a database. After a deployment, the team notices that some orders are being inserted twice during broker failovers, even though the business logic is correct. Which root cause best explains this behavior?

Choose one answer

2. How It Works

The three guarantees correspond to where the system handles failure on the -to- and -to- hops:

GuaranteeProducer ackConsumer commitFailure result
At-most-onceFire-and-forgetCommit before processingDrops on crash, never duplicates
At-least-onceWait for ack + retry on failureCommit after processingDuplicates on crash, never drops
Effectively-onceIdempotent / transactional producer + transactional or idempotent consumerCoordinated with outputNeither drops nor duplicates observable in the output

mechanics:

  • acks=0 on the produce hop (no broker ack).
  • acks=1 → durability only to the leader; data lost if leader dies before .
  • acks=all + enable.idempotence=true → no duplicates from retries, no loss on single broker failure. This is the floor for on the producer side.
  • acks=all + idempotence + transactions (transactional.id) + -to-Kafka using read_committed → the building blocks for end-to-end within Kafka.

Consumer mechanics:

  • Auto-commit before processing → .
  • Commit after processing → (duplicates possible if crash between processing and commit).
  • Transactional commit alongside the output write → effectively-once.

Concrete example. Consumer reads payments, calls Stripe to refund, then commits :

  • Crash after refund, before commit. On restart, the message is redelivered; Stripe is called again — duplicate refund. To avoid this, either (a) use an idempotency key Stripe deduplicates by, or (b) move the commit into the same transaction as the write — which only works for Kafka-to-Kafka sinks.
QUICK CHECK

A payment service consumes messages from a Kafka topic, calls an external refund API, and then commits the Kafka offset. The service crashes after the refund API call succeeds but before the offset is committed. What happens when the service restarts, and what delivery semantic does this behavior reflect?

Choose one answer

3. What Mid-Senior SWEs Actually Need to Know

  • Default behavior is . If you do nothing special, you will see duplicates on restarts and rebalances. Design every with that assumption.
  • "" is end-to-end and contextual. EOS ( semantics) only covers Kafka-to-Kafka (or Kafka → state → Kafka). The moment your consumer calls an external system (HTTP, DB, third-party API), you are responsible for idempotency on that side.
  • Idempotency in the sink is usually cheaper than transactions. A unique key (event ID, deterministic primary key, upsert) on the sink turns into effectively-once at the system boundary.
  • idempotence (enable.idempotence=true) is essentially free. It eliminates duplicates from retries within the same session. Default true in modern Kafka. Always on.
  • Transactions cost throughput. Transactional producers add ~10–30% overhead and a few ms of latency from the commit protocol. Use them where exactly-once matters; skip them otherwise.
  • The order of commit and side-effect determines the semantic. This is the one rule to internalize: commit-before-process = ; process-before-commit = at-least-once; commit-with-process = effectively-once (and requires a transactional or idempotent sink).
  • Common misunderstanding: Turning on Kafka EOS does not make your consumer → REST API call idempotent. EOS is a Kafka-side guarantee; the REST endpoint still sees duplicates unless it dedupes.
QUICK CHECK

Your Kafka consumer reads a payment event, calls an external payment processor via HTTP, and then commits its offset. The payment processor's API is not idempotent. During a consumer rebalance, the offset hasn't been committed yet, so the event is reprocessed and the HTTP call is made a second time. Which delivery semantic does this architecture exhibit, and what is the most practical fix?

Choose one answer

4. Tradeoffs & Decisions

If you need...Pick...Choose differently when...
Lowest latency, can tolerate some loss (metrics, logs)At-most-onceEach event matters
Pragmatic default for most pipelinesAt-least-once + idempotent sinkEither drops or duplicates are catastrophic and unrecoverable
Kafka-to-Kafka pipeline where duplicates are unacceptableKafka EOS (transactional)Pipeline crosses to an external system; handle idempotency there
Side effects on external systems (payments, emails)At-least-once + dedup key in the external systemThe external system has no idempotency mechanism — then add one

Key tradeoff: complexity / throughput vs duplicate-tolerance. is operationally simple and fast; you push duplicate handling into the sink, where you usually have natural keys. Transactional is heavier and only worth it inside -native pipelines (e.g. , ) where no external system needs deduping.

Secondary tradeoff: scope creep. People say "" when they mean "no duplicates in the user-visible output." Almost always, the right design is with idempotent writes, not Kafka EOS everywhere.

5. Interview & System Design Cheat Sheet

  • The three modes — , , effectively-once — are determined by the order of side effect and commit, not by a single setting.
  • 's default is ; idempotence + acks=all should always be on; that's the floor.
  • "" in means Kafka EOS — only across Kafka-to-Kafka boundaries. Anything else requires idempotency at the external boundary.
  • The dominant production pattern is: at-least-once delivery + idempotent sink writes. It's simpler, faster, and crosses external boundaries cleanly.
  • Late and out-of-order events are a separate problem from delivery semantics — don't conflate "may arrive twice" with "may arrive after the window closed."

Common follow-ups:

  • "How does Kafka EOS work, at a high level?" atomically writes data + offsets to Kafka in one transaction; consumers in read_committed mode see only committed messages. Coordinator manages two-phase commit.
  • "How would you design idempotent processing for a payment refund?" — Pass an idempotency key (e.g. event ID) to the payment provider; provider deduplicates on its side. Locally, use UPSERT on a refunds(idempotency_key UNIQUE) table.
  • "What goes wrong if I set acks=0 to speed up the ?" — On any failure or network glitch you'll silently lose messages. Acceptable for fire-and-forget telemetry, never for business events.

If asked to design X, anchor on this: Default to at-least-once + idempotent sink. Reach for Kafka EOS only when the pipeline is Kafka-native end-to-end and the sink can't be made idempotent another way. Be explicit about which boundary your guarantee applies to — that's the senior-vs-junior distinction.

QUICK CHECK

Your team is building a payment refund pipeline that reads events from Kafka and calls an external payment provider's API. A senior engineer argues against using Kafka's exactly-once semantics (EOS) and instead recommends at-least-once delivery combined with idempotent sink writes. What is the primary reason this approach is preferred here?

Choose one answer