Offsets & Commits
7 min read
Streaming Systems Index
Streaming Systems Index
Offsets & Commits
1. What Is It?
An is the monotonically increasing integer position of a record within a . Every record written to a is stamped with the next ; consumers track their progress through a partition by remembering "the next offset I haven't processed yet." A commit is the act of persisting that position back to (specifically, to the internal __consumer_offsets , keyed by group.id + + partition) so that on restart or rebalance, processing resumes from there instead of from the beginning or end.
The problem it solves: consumers crash, get rebalanced, get redeployed. Without durable progress tracking, every restart would either re-read everything (duplicates) or skip to the latest (data loss). Offsets are the shared, server-side ledger that makes "where am I in this stream" survivable across failures. Crucially, the offset is not part of the record — it's metadata Kafka assigns. And commits are decoupled from reads — you can read 1,000 records and commit none, some, or all of them, which is the lever that controls delivery semantics.
A consumer service reads 500 records from a Kafka partition in a single poll, processes each record by writing to a database, but crashes halfway through before committing any offsets. When the service restarts and rejoins the consumer group, what will happen?
2. How It Works
- The sends a record; the assigns the next for that (offsets are -local — partition 0 and partition 1 each have their own 0, 1, 2, ...).
- A in a group polls; the returns batches of records starting from the 's current fetch position.
- The consumer processes records. At some point — depending on commit strategy — the consumer calls
commitSync()orcommitAsync()(or auto-commit fires on a timer). - The committed is written to
__consumer_offsetsas(group.id, , partition) → offset. This is the offset will hand back to a future poll if the consumer restarts or another member takes over the partition during a rebalance. - The committed offset is the offset of the next record to read, not the last record processed. So if you've processed offsets 0–9, you commit
10.
Concrete example. A payments with 3 partitions, a ledger-writer. After an hour:
- partition 0: high 1,000,000 — committed offset 999,800 — lag = 200
- partition 1: high 1,000,000 — committed offset 1,000,000 — lag = 0
- partition 2: high watermark 1,000,000 — committed offset 999,950 — lag = 50
If the consumer crashes now and restarts, it resumes from 999,800 on partition 0, 1,000,000 on partition 1, 999,950 on partition 2. The records between committed offset and high watermark on partitions 0 and 2 will be re-fetched — and if they were already processed but not committed, you get duplicates. That's the cost of committing after processing.
A Kafka consumer in group order-processor has successfully processed records at offsets 50 through 59 on a partition, then crashes before committing. What offset should the consumer commit to allow a clean resume from the correct position, and what is the risk if it had already processed those records before crashing without committing?
3. What Mid-Senior SWEs Actually Need to Know
- The commit order vs processing order determines delivery semantics. Commit before processing → (you can lose the record if processing fails after the commit). Commit after processing → (you can re-process if the dies between processing and commit). True across systems requires either transactions on -side output or idempotent downstream writes — commits alone don't give it to you.
- Auto-commit (
enable.auto.commit=true) commits the fetched , not the processed . It fires on a timer (default 5s), committing whatever was last polled. If your polled 1,000 records, processed 200, then crashed — auto-commit may have already committed past record 1,000. For anything beyond toy consumers, setenable.auto.commit=falseand commit explicitly after processing. commitSyncvscommitAsync: sync blocks until the commit is acknowledged (safer, slower); async fires-and-forgets with a callback (faster, but late failures can be lost). Common pattern:commitAsyncin the steady-state loop,commitSyncon shutdown / before rebalance.auto.offset.resetonly applies when there's no committed offset for the group. Values are typicallyearliest(start from beginning),latest(start from end, ignoring backlog), ornone(throw). Once even one commit has happened, this config is ignored for that .- Lag = high − committed offset. This is the operational metric for consumer health. Rising lag means you're falling behind. Use
-consumer-groups.sh --describe --group Xor JMXrecords-lag-max. - Offsets are per . "Resume from offset 100" is meaningless without specifying which partition. Re-keying a or repartitioning invalidates all offsets — the new partitions have their own offset space.
- Resetting offsets is an operational tool, not a runtime API.
kafka-consumer-groups.sh --reset-offsets --to-earliest|--to-datetime|--to-offsetonly works when the group has no active members. Stop consumers, reset, restart — or your reset is a no-op. - Common misunderstanding: "the offset is unique." It's unique within a partition, not across the . Two records in different partitions can both be offset 1,000.
- Common misunderstanding: "I should store offsets myself for safety." Storing offsets in your own DB (alongside the side-effect of processing) is the correct pattern for into that DB — write the row + the offset atomically, ignore Kafka's
__consumer_offsets. But for most workloads, server-side commits are fine and far simpler.
// Idiomatic at-least-once consumer loop KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(List.of("payments")); try { while (running) { ConsumerRecords<String, String> batch = consumer.poll(Duration.ofMillis(500)); for (ConsumerRecord<String, String> record : batch) { processRecord(record); // side effects happen here } consumer.commitSync(); // commit only after successful processing } } finally { consumer.commitSync(); // final commit before close consumer.close(); }
4. Tradeoffs & Decisions
| If you need... | Choose... | Tradeoff |
|---|---|---|
| At-least-once delivery (the sane default) | enable.auto.commit=false; commit after processing | Duplicates on crash — downstream must be idempotent |
| At-most-once | Commit before processing (rare) | Data loss on crash — only ok for metrics-class data |
| Exactly-once into a database | Store offset + side effect in the same DB transaction | Couples your offset storage to that DB; can't use Kafka tooling for lag |
| Exactly-once into another Kafka topic | Producer transactions + isolation.level=read_committed on downstream | Higher latency, transactional broker overhead |
| Lowest commit overhead | commitAsync in steady state, commitSync at shutdown / on rebalance callback | Async errors are non-blocking — must handle in callback |
| Replay a topic from scratch | New group.id with auto.offset.reset=earliest, or stop the group + kafka-consumer-groups --reset-offsets | Reprocessing cost; idempotency required |
| Skip historical backlog on first deploy | auto.offset.reset=latest on a fresh group | You lose anything written before the consumer started |
Key tradeoff: commit frequency vs throughput. Committing every record is correct but adds a round-trip per record. Committing every poll batch (the standard pattern) amortizes the cost and only re-processes one batch on crash. Committing every N seconds is what auto-commit does — usually wrong because it commits records you may not have processed yet.
Secondary tradeoff: server-side commits vs application-managed offsets. Server-side (__consumer_offsets) is simple, gives you --groups tooling, and is enough for . Application-managed ( stored next to the side effect in the same transaction) is the only path to true into a non- system, but you give up Kafka's lag-monitoring story and have to roll your own.
A payment processing service consumes Kafka events and writes each payment record to a PostgreSQL database. The team wants true exactly-once semantics so that a consumer crash never causes a payment to be recorded twice or lost. Which approach achieves this?
5. Interview & System Design Cheat Sheet
- An is a per- position; a commit is persisted progress for a
(group, , )triple stored in__consumer_offsets. - The committed is the next offset to read — process 0–9, commit 10.
- Commit-after-process = ; commit-before-process = ; commits alone never give across systems — you need transactions or idempotency.
auto.offset.resetonly applies when no commit exists — it does not "reset" anything once a group has progress.- Lag = high − committed offset is the canonical health metric.
Common follow-ups:
- "How do you guarantee no duplicates downstream?" — Either transactions (if writing back to ) or write the side effect + offset together in the downstream system's transaction (if writing to a DB). Server-side commits alone cannot.
- "What happens on rebalance during processing?" — The partition you're processing may get reassigned. Standard pattern: register
ConsumerRebalanceListener.onPartitionsRevoked, callcommitSyncthere so the next owner picks up exactly where you stopped. - "Our lag is rising on one partition only — why?" — Almost always a hot key (one partition is getting disproportionate traffic because
key % partitionCountis skewed) or one instance is slow / GCing / I/O-blocked on that partition's processing.
If asked to design X, anchor on this: Decide who owns the offset — Kafka's __consumer_offsets is the default; only move to application-managed offsets when you specifically need transactional into one downstream system. Then choose commit strategy: explicit, after processing, batched per poll — that single decision determines your delivery semantics.
A Kafka consumer group processes messages from a topic partition and commits offsets to __consumer_offsets after each batch is fully processed. Despite this setup, you notice occasional duplicate records in your downstream PostgreSQL database after consumer crashes. What is the most accurate explanation for why duplicates occur, and what would eliminate them?
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.