Offsets & Commits

7 min read

Reading Progress0%
Streaming Systems Index
Streaming Systems Index

Offsets & Commits

1. What Is It?

An is the monotonically increasing integer position of a record within a . Every record written to a is stamped with the next ; consumers track their progress through a partition by remembering "the next offset I haven't processed yet." A commit is the act of persisting that position back to (specifically, to the internal __consumer_offsets , keyed by group.id + + partition) so that on restart or rebalance, processing resumes from there instead of from the beginning or end.

The problem it solves: consumers crash, get rebalanced, get redeployed. Without durable progress tracking, every restart would either re-read everything (duplicates) or skip to the latest (data loss). Offsets are the shared, server-side ledger that makes "where am I in this stream" survivable across failures. Crucially, the offset is not part of the record — it's metadata Kafka assigns. And commits are decoupled from reads — you can read 1,000 records and commit none, some, or all of them, which is the lever that controls delivery semantics.

QUICK CHECK

A consumer service reads 500 records from a Kafka partition in a single poll, processes each record by writing to a database, but crashes halfway through before committing any offsets. When the service restarts and rejoins the consumer group, what will happen?

Choose one answer

2. How It Works

  1. The sends a record; the assigns the next for that (offsets are -local — partition 0 and partition 1 each have their own 0, 1, 2, ...).
  2. A in a group polls; the returns batches of records starting from the 's current fetch position.
  3. The consumer processes records. At some point — depending on commit strategy — the consumer calls commitSync() or commitAsync() (or auto-commit fires on a timer).
  4. The committed is written to __consumer_offsets as (group.id, , partition) → offset. This is the offset will hand back to a future poll if the consumer restarts or another member takes over the partition during a rebalance.
  5. The committed offset is the offset of the next record to read, not the last record processed. So if you've processed offsets 0–9, you commit 10.

Concrete example. A payments with 3 partitions, a ledger-writer. After an hour:

  • partition 0: high 1,000,000 — committed offset 999,800 — lag = 200
  • partition 1: high 1,000,000 — committed offset 1,000,000 — lag = 0
  • partition 2: high watermark 1,000,000 — committed offset 999,950 — lag = 50

If the consumer crashes now and restarts, it resumes from 999,800 on partition 0, 1,000,000 on partition 1, 999,950 on partition 2. The records between committed offset and high watermark on partitions 0 and 2 will be re-fetched — and if they were already processed but not committed, you get duplicates. That's the cost of committing after processing.

QUICK CHECK

A Kafka consumer in group order-processor has successfully processed records at offsets 50 through 59 on a partition, then crashes before committing. What offset should the consumer commit to allow a clean resume from the correct position, and what is the risk if it had already processed those records before crashing without committing?

Choose one answer

3. What Mid-Senior SWEs Actually Need to Know

  • The commit order vs processing order determines delivery semantics. Commit before processing → (you can lose the record if processing fails after the commit). Commit after processing → (you can re-process if the dies between processing and commit). True across systems requires either transactions on -side output or idempotent downstream writes — commits alone don't give it to you.
  • Auto-commit (enable.auto.commit=true) commits the fetched , not the processed . It fires on a timer (default 5s), committing whatever was last polled. If your polled 1,000 records, processed 200, then crashed — auto-commit may have already committed past record 1,000. For anything beyond toy consumers, set enable.auto.commit=false and commit explicitly after processing.
  • commitSync vs commitAsync: sync blocks until the commit is acknowledged (safer, slower); async fires-and-forgets with a callback (faster, but late failures can be lost). Common pattern: commitAsync in the steady-state loop, commitSync on shutdown / before rebalance.
  • auto.offset.reset only applies when there's no committed offset for the group. Values are typically earliest (start from beginning), latest (start from end, ignoring backlog), or none (throw). Once even one commit has happened, this config is ignored for that .
  • Lag = high − committed offset. This is the operational metric for consumer health. Rising lag means you're falling behind. Use -consumer-groups.sh --describe --group X or JMX records-lag-max.
  • Offsets are per . "Resume from offset 100" is meaningless without specifying which partition. Re-keying a or repartitioning invalidates all offsets — the new partitions have their own offset space.
  • Resetting offsets is an operational tool, not a runtime API. kafka-consumer-groups.sh --reset-offsets --to-earliest|--to-datetime|--to-offset only works when the group has no active members. Stop consumers, reset, restart — or your reset is a no-op.
  • Common misunderstanding: "the offset is unique." It's unique within a partition, not across the . Two records in different partitions can both be offset 1,000.
  • Common misunderstanding: "I should store offsets myself for safety." Storing offsets in your own DB (alongside the side-effect of processing) is the correct pattern for into that DB — write the row + the offset atomically, ignore Kafka's __consumer_offsets. But for most workloads, server-side commits are fine and far simpler.
// Idiomatic at-least-once consumer loop
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(List.of("payments"));

try {
    while (running) {
        ConsumerRecords<String, String> batch = consumer.poll(Duration.ofMillis(500));
        for (ConsumerRecord<String, String> record : batch) {
            processRecord(record);  // side effects happen here
        }
        consumer.commitSync();      // commit only after successful processing
    }
} finally {
    consumer.commitSync();          // final commit before close
    consumer.close();
}

4. Tradeoffs & Decisions

If you need...Choose...Tradeoff
At-least-once delivery (the sane default)enable.auto.commit=false; commit after processingDuplicates on crash — downstream must be idempotent
At-most-onceCommit before processing (rare)Data loss on crash — only ok for metrics-class data
Exactly-once into a databaseStore offset + side effect in the same DB transactionCouples your offset storage to that DB; can't use Kafka tooling for lag
Exactly-once into another Kafka topicProducer transactions + isolation.level=read_committed on downstreamHigher latency, transactional broker overhead
Lowest commit overheadcommitAsync in steady state, commitSync at shutdown / on rebalance callbackAsync errors are non-blocking — must handle in callback
Replay a topic from scratchNew group.id with auto.offset.reset=earliest, or stop the group + kafka-consumer-groups --reset-offsetsReprocessing cost; idempotency required
Skip historical backlog on first deployauto.offset.reset=latest on a fresh groupYou lose anything written before the consumer started

Key tradeoff: commit frequency vs throughput. Committing every record is correct but adds a round-trip per record. Committing every poll batch (the standard pattern) amortizes the cost and only re-processes one batch on crash. Committing every N seconds is what auto-commit does — usually wrong because it commits records you may not have processed yet.

Secondary tradeoff: server-side commits vs application-managed offsets. Server-side (__consumer_offsets) is simple, gives you --groups tooling, and is enough for . Application-managed ( stored next to the side effect in the same transaction) is the only path to true into a non- system, but you give up Kafka's lag-monitoring story and have to roll your own.

QUICK CHECK

A payment processing service consumes Kafka events and writes each payment record to a PostgreSQL database. The team wants true exactly-once semantics so that a consumer crash never causes a payment to be recorded twice or lost. Which approach achieves this?

Choose one answer

5. Interview & System Design Cheat Sheet

  • An is a per- position; a commit is persisted progress for a (group, , ) triple stored in __consumer_offsets.
  • The committed is the next offset to read — process 0–9, commit 10.
  • Commit-after-process = ; commit-before-process = ; commits alone never give across systems — you need transactions or idempotency.
  • auto.offset.reset only applies when no commit exists — it does not "reset" anything once a group has progress.
  • Lag = high − committed offset is the canonical health metric.

Common follow-ups:

  • "How do you guarantee no duplicates downstream?" — Either transactions (if writing back to ) or write the side effect + offset together in the downstream system's transaction (if writing to a DB). Server-side commits alone cannot.
  • "What happens on rebalance during processing?" — The partition you're processing may get reassigned. Standard pattern: register ConsumerRebalanceListener.onPartitionsRevoked, call commitSync there so the next owner picks up exactly where you stopped.
  • "Our lag is rising on one partition only — why?" — Almost always a hot key (one partition is getting disproportionate traffic because key % partitionCount is skewed) or one instance is slow / GCing / I/O-blocked on that partition's processing.

If asked to design X, anchor on this: Decide who owns the offsetKafka's __consumer_offsets is the default; only move to application-managed offsets when you specifically need transactional into one downstream system. Then choose commit strategy: explicit, after processing, batched per poll — that single decision determines your delivery semantics.

QUICK CHECK

A Kafka consumer group processes messages from a topic partition and commits offsets to __consumer_offsets after each batch is fully processed. Despite this setup, you notice occasional duplicate records in your downstream PostgreSQL database after consumer crashes. What is the most accurate explanation for why duplicates occur, and what would eliminate them?

Choose one answer