Topics & Partitions

5 min read

Reading Progress0%

Streaming Systems Index

Topics & Partitions

1. What Is It?

A is a named, durable stream of messages — e.g., orders, clicks, user-events. A is one append-only log within a ; a topic is split into 1..N partitions. Every message lives in exactly one . Partitions are the unit of parallelism, ordering, and in .

The problem this solves: a single log can only be written and read serially. To horizontally scale a topic — both throughput and throughput — splits it into independent partitions that can live on different brokers and be consumed by different workers in parallel. Without partitions, a topic is a single-machine bottleneck.

QUICK CHECK

A team is building a high-traffic event ingestion service using Kafka. Their user-events topic currently has only 1 partition, and they notice that consumer throughput is not scaling even after adding more consumer instances. What is the most likely reason for this bottleneck?

2. How It Works

When you create a , you choose the count (and factor). E.g., -topics --create -- orders --partitions 12 ---factor 3.
spreads leaders across brokers ( 1 leads P0, P3, P6, P9; 2 leads P1, ...).
Producers decide which partition each record goes to:
- If the record has a key → partition = hash(key) % numPartitions.
- If no key → round-robin or sticky (batched) assignment.
- Or explicit partitioner code.
Within a partition, messages are appended in order and assigned monotonically increasing offsets.
Consumers within a group are assigned disjoint subsets of partitions; each reads its assigned partitions sequentially.

Concrete example. Topic payments with 12 partitions, by customer_id:

All events for customer_id=42 always land in the same partition (hash(42) % 12). They are processed in order by whichever is assigned that partition.
Events for different customers spread across all 12 partitions.
A with 4 instances gets 3 partitions each. With 12 instances, each gets exactly 1. With 13 instances, one sits idle.

QUICK CHECK

A Kafka topic called payments has 12 partitions and a consumer group with 5 consumer instances. Each message is published with a customer_id key. How are partitions distributed across the consumers, and what guarantees does keyed partitioning provide?

3. What Mid-Senior SWEs Actually Need to Know

count caps parallelism. 12 partitions = at most 12 active consumers in a group. Choose partitions for peak future throughput, not current.
count is sticky. You can increase partitions on a but changing the count breaks key-based ordering: a key that used to hash to partition 3 may now hash to partition 9, so its old and new messages live in different partitions and arrive out of order to consumers. Treat partition count as an architectural decision.
key choice is critical.
- Pick a key with high cardinality to spread load evenly. by country_code with 5 unique values → uneven distribution and hot partitions.
- Pick a key that matches your required ordering unit. If you need per-order ordering, key by order_id; if per-user, key by user_id.
- No key (or null key) → no ordering guarantees, even within a single call.
Hot partitions are a top operational pain. One key with 1000× the traffic of any other ⇒ one overloaded, one lagging. Monitor per-partition byte rate.
Partition placement and rack awareness. Configure .rack so replicas of one partition span racks/AZs. Otherwise an AZ outage can take an entire partition (all replicas) offline.
Sizing heuristics (orders of magnitude, not laws):
- Aim for ~tens of MB/sec per partition as a comfortable upper bound; many production topics live well below this.
- Total partitions per broker in the low thousands is typical; beyond that, controller metadata pressure starts to bite.
- Each partition has a fixed RAM/file-descriptor cost; don't go to 10,000 partitions on a single without reason.
Common misunderstanding: "I'll just set 1000 partitions and never have to think about it." Cost: longer rebalance times, larger metadata, slower failover, and once your is bigger than the partitions, you cannot scale further without changing partition count.

QUICK CHECK

A Kafka topic currently has 8 partitions and uses order_id as the partitioning key. Your team decides to increase the partition count to 16 to support higher throughput. What is the most significant risk introduced by this change?

4. Tradeoffs & Decisions

If you need...	Choose...	Tradeoff
Strict per-entity ordering	Partition by that entity ID	Hot keys can overload one partition; less parallelism for that entity
Maximum throughput, no ordering needs	Many partitions, round-robin / sticky producer	No per-key ordering; harder to reason about
Future headroom for consumer scaling	Over-provision partitions early	Higher controller metadata cost; cannot reduce later
Migrate to more partitions later without breaking ordering	Don't — instead build a new topic and dual-write or cut over	More operational work; only choice when stuck
Per-tenant fairness in a multi-tenant topic	Custom partitioner that mixes tenant + load-balancing salt	Cannot use stock consumer-side ordering for that tenant

Key tradeoff: count = parallelism vs metadata cost vs commit weight. Senior engineers pick a count that comfortably handles ~2–4× current peak throughput, with the assumption it won't change again for the life of the .

Secondary tradeoff: ordering vs evenness. Sharper ordering guarantees mean choosing a narrower key, which risks skew. Broader keys spread load but weaken ordering. Match the key to the actual ordering unit you need — usually finer-grained than instinct suggests.

QUICK CHECK

Your team's Kafka topic has 12 partitions keyed by user_id, and a single high-traffic user is causing one partition to receive 80% of all messages. You need strict per-user ordering and cannot tolerate dropped messages during migration. What is the recommended approach to resolve the hot partition problem?

5. Interview & System Design Cheat Sheet

A is a logical stream; a is a single append-only log within it. Parallelism, ordering, and all happen at the level.
Choose the key from the smallest unit you need ordered — usually the entity ID (user_id, order_id, device_id).
The partition count is effectively immutable once consumers depend on key-based ordering. Pick for headroom: 2–4× expected peak.
A can have at most one per partition — partition count is the ceiling on parallelism.
Hot partitions are the dominant failure mode in real systems; monitor per-partition throughput, not just throughput.

Common follow-ups:

"What happens if I add partitions to a live topic?" — Existing data stays where it is; new messages with a given key may land in a different partition than before. Per-key ordering across the change boundary is broken.
"How do you avoid hot partitions for power users?" — Either accept the skew and partition more aggressively, use a composite key (user_id + bucket), or move that tenant to a separate topic.
"How do you size partitions?" — Estimate peak throughput per topic, divide by a conservative per-partition rate (e.g. tens of MB/sec), then add headroom for growth and parallelism.

If asked to design X, anchor on this: Name the topic, the partition count, the key (and why — what ordering does it enforce?), and the factor. Those four numbers define a topic.

QUICK CHECK

A Kafka topic currently has 6 partitions, and all messages are keyed by user_id. Your team decides to increase the partition count to 12 because throughput is growing. What is the most critical consequence of this change for a running system?

Glossary History

Click dotted jargon to save explanations here.

Glossary History

Click dotted jargon to save explanations here.

Topics & Partitions

Streaming Systems Index

Streaming Systems Index

Topics & Partitions

1. What Is It?

2. How It Works

3. What Mid-Senior SWEs Actually Need to Know

4. Tradeoffs & Decisions

5. Interview & System Design Cheat Sheet

On This Page

On This Page

Glossary History

Glossary History