Delivery Semantics
5 min read
Streaming Systems Index
Tier 1 -- Foundations
Event-Driven Foundations
Kafka Mental Model
Stream Processing Landscape
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design
Streaming Systems Index
Tier 1 -- Foundations
Event-Driven Foundations
Kafka Mental Model
Stream Processing Landscape
Tier 2 -- Core Concepts
Tier 3 -- Production & System Design
Delivery Semantics
1. What Is It?
Delivery semantics describe how many times a message produced into a streaming system can be observed by a : , , or effectively-once (often called ). The choice is forced by the combination of acknowledgments, durability, /commit strategy, and any external side effects the consumer performs.
The problem they solve: distributed systems can fail at any step ( crash before ack, broker crash before , consumer crash before commit). Each guarantee defines a different way of trading off duplicates, drops, and complexity. Without an explicit choice, you default to whatever the broker happens to give you and you'll be surprised — usually by duplicates in production after the first failover.
A backend service consumes messages from a Kafka topic and writes order records to a database. After a deployment, the team notices that some orders are being inserted twice during broker failovers, even though the business logic is correct. Which root cause best explains this behavior?
2. How It Works
The three guarantees correspond to where the system handles failure on the -to- and -to- hops:
| Guarantee | Producer ack | Consumer commit | Failure result |
|---|---|---|---|
| At-most-once | Fire-and-forget | Commit before processing | Drops on crash, never duplicates |
| At-least-once | Wait for ack + retry on failure | Commit after processing | Duplicates on crash, never drops |
| Effectively-once | Idempotent / transactional producer + transactional or idempotent consumer | Coordinated with output | Neither drops nor duplicates observable in the output |
mechanics:
acks=0→ on the produce hop (no broker ack).acks=1→ durability only to the leader; data lost if leader dies before .acks=all+enable.idempotence=true→ no duplicates from retries, no loss on single broker failure. This is the floor for on the producer side.acks=all+ idempotence + transactions (transactional.id) + -to-Kafka usingread_committed→ the building blocks for end-to-end within Kafka.
Consumer mechanics:
- Auto-commit before processing → .
- Commit after processing → (duplicates possible if crash between processing and commit).
- Transactional commit alongside the output write → effectively-once.
Concrete example. Consumer reads payments, calls Stripe to refund, then commits :
- Crash after refund, before commit. On restart, the message is redelivered; Stripe is called again — duplicate refund. To avoid this, either (a) use an idempotency key Stripe deduplicates by, or (b) move the commit into the same transaction as the write — which only works for Kafka-to-Kafka sinks.
A payment service consumes messages from a Kafka topic, calls an external refund API, and then commits the Kafka offset. The service crashes after the refund API call succeeds but before the offset is committed. What happens when the service restarts, and what delivery semantic does this behavior reflect?
3. What Mid-Senior SWEs Actually Need to Know
- Default behavior is . If you do nothing special, you will see duplicates on restarts and rebalances. Design every with that assumption.
- "" is end-to-end and contextual. EOS ( semantics) only covers Kafka-to-Kafka (or Kafka → state → Kafka). The moment your consumer calls an external system (HTTP, DB, third-party API), you are responsible for idempotency on that side.
- Idempotency in the sink is usually cheaper than transactions. A unique key (event ID, deterministic primary key, upsert) on the sink turns into effectively-once at the system boundary.
- idempotence (
enable.idempotence=true) is essentially free. It eliminates duplicates from retries within the same session. Defaulttruein modern Kafka. Always on. - Transactions cost throughput. Transactional producers add ~10–30% overhead and a few ms of latency from the commit protocol. Use them where exactly-once matters; skip them otherwise.
- The order of commit and side-effect determines the semantic. This is the one rule to internalize: commit-before-process = ; process-before-commit = at-least-once; commit-with-process = effectively-once (and requires a transactional or idempotent sink).
- Common misunderstanding: Turning on Kafka EOS does not make your
consumer → REST API callidempotent. EOS is a Kafka-side guarantee; the REST endpoint still sees duplicates unless it dedupes.
Your Kafka consumer reads a payment event, calls an external payment processor via HTTP, and then commits its offset. The payment processor's API is not idempotent. During a consumer rebalance, the offset hasn't been committed yet, so the event is reprocessed and the HTTP call is made a second time. Which delivery semantic does this architecture exhibit, and what is the most practical fix?
4. Tradeoffs & Decisions
| If you need... | Pick... | Choose differently when... |
|---|---|---|
| Lowest latency, can tolerate some loss (metrics, logs) | At-most-once | Each event matters |
| Pragmatic default for most pipelines | At-least-once + idempotent sink | Either drops or duplicates are catastrophic and unrecoverable |
| Kafka-to-Kafka pipeline where duplicates are unacceptable | Kafka EOS (transactional) | Pipeline crosses to an external system; handle idempotency there |
| Side effects on external systems (payments, emails) | At-least-once + dedup key in the external system | The external system has no idempotency mechanism — then add one |
Key tradeoff: complexity / throughput vs duplicate-tolerance. is operationally simple and fast; you push duplicate handling into the sink, where you usually have natural keys. Transactional is heavier and only worth it inside -native pipelines (e.g. , → ) where no external system needs deduping.
Secondary tradeoff: scope creep. People say "" when they mean "no duplicates in the user-visible output." Almost always, the right design is with idempotent writes, not Kafka EOS everywhere.
5. Interview & System Design Cheat Sheet
- The three modes — , , effectively-once — are determined by the order of side effect and commit, not by a single setting.
- 's default is ; idempotence +
acks=allshould always be on; that's the floor. - "" in means Kafka EOS — only across Kafka-to-Kafka boundaries. Anything else requires idempotency at the external boundary.
- The dominant production pattern is: at-least-once delivery + idempotent sink writes. It's simpler, faster, and crosses external boundaries cleanly.
- Late and out-of-order events are a separate problem from delivery semantics — don't conflate "may arrive twice" with "may arrive after the window closed."
Common follow-ups:
- "How does Kafka EOS work, at a high level?" — atomically writes data + offsets to Kafka in one transaction; consumers in
read_committedmode see only committed messages. Coordinator manages two-phase commit. - "How would you design idempotent processing for a payment refund?" — Pass an idempotency key (e.g. event ID) to the payment provider; provider deduplicates on its side. Locally, use UPSERT on a
refunds(idempotency_key UNIQUE)table. - "What goes wrong if I set
acks=0to speed up the ?" — On any failure or network glitch you'll silently lose messages. Acceptable for fire-and-forget telemetry, never for business events.
If asked to design X, anchor on this: Default to at-least-once + idempotent sink. Reach for Kafka EOS only when the pipeline is Kafka-native end-to-end and the sink can't be made idempotent another way. Be explicit about which boundary your guarantee applies to — that's the senior-vs-junior distinction.
Your team is building a payment refund pipeline that reads events from Kafka and calls an external payment provider's API. A senior engineer argues against using Kafka's exactly-once semantics (EOS) and instead recommends at-least-once delivery combined with idempotent sink writes. What is the primary reason this approach is preferred here?
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.