Topics & Partitions
5 min read
Streaming Systems Index
Streaming Systems Index
Topics & Partitions
1. What Is It?
A is a named, durable stream of messages — e.g., orders, clicks, user-events. A is one append-only log within a ; a topic is split into 1..N partitions. Every message lives in exactly one . Partitions are the unit of parallelism, ordering, and in .
The problem this solves: a single log can only be written and read serially. To horizontally scale a topic — both throughput and throughput — splits it into independent partitions that can live on different brokers and be consumed by different workers in parallel. Without partitions, a topic is a single-machine bottleneck.
A team is building a high-traffic event ingestion service using Kafka. Their user-events topic currently has only 1 partition, and they notice that consumer throughput is not scaling even after adding more consumer instances. What is the most likely reason for this bottleneck?
2. How It Works
- When you create a , you choose the count (and factor). E.g.,
-topics --create -- orders --partitions 12 ---factor 3. - spreads leaders across brokers ( 1 leads P0, P3, P6, P9; 2 leads P1, ...).
- Producers decide which partition each record goes to:
- If the record has a key →
partition = hash(key) % numPartitions. - If no key → round-robin or sticky (batched) assignment.
- Or explicit partitioner code.
- If the record has a key →
- Within a partition, messages are appended in order and assigned monotonically increasing offsets.
- Consumers within a group are assigned disjoint subsets of partitions; each reads its assigned partitions sequentially.
Concrete example. Topic payments with 12 partitions, by customer_id:
- All events for
customer_id=42always land in the same partition (hash(42) % 12). They are processed in order by whichever is assigned that partition. - Events for different customers spread across all 12 partitions.
- A with 4 instances gets 3 partitions each. With 12 instances, each gets exactly 1. With 13 instances, one sits idle.
A Kafka topic called payments has 12 partitions and a consumer group with 5 consumer instances. Each message is published with a customer_id key. How are partitions distributed across the consumers, and what guarantees does keyed partitioning provide?
3. What Mid-Senior SWEs Actually Need to Know
- count caps parallelism. 12 partitions = at most 12 active consumers in a group. Choose partitions for peak future throughput, not current.
- count is sticky. You can increase partitions on a but changing the count breaks key-based ordering: a key that used to hash to partition 3 may now hash to partition 9, so its old and new messages live in different partitions and arrive out of order to consumers. Treat partition count as an architectural decision.
- key choice is critical.
- Pick a key with high cardinality to spread load evenly. by
country_codewith 5 unique values → uneven distribution and hot partitions. - Pick a key that matches your required ordering unit. If you need per-order ordering, key by
order_id; if per-user, key byuser_id. - No key (or
nullkey) → no ordering guarantees, even within a single call.
- Pick a key with high cardinality to spread load evenly. by
- Hot partitions are a top operational pain. One key with 1000× the traffic of any other ⇒ one overloaded, one lagging. Monitor per-partition byte rate.
- Partition placement and rack awareness. Configure
.rackso replicas of one partition span racks/AZs. Otherwise an AZ outage can take an entire partition (all replicas) offline. - Sizing heuristics (orders of magnitude, not laws):
- Aim for ~tens of MB/sec per partition as a comfortable upper bound; many production topics live well below this.
- Total partitions per broker in the low thousands is typical; beyond that, controller metadata pressure starts to bite.
- Each partition has a fixed RAM/file-descriptor cost; don't go to 10,000 partitions on a single without reason.
- Common misunderstanding: "I'll just set 1000 partitions and never have to think about it." Cost: longer rebalance times, larger metadata, slower failover, and once your is bigger than the partitions, you cannot scale further without changing partition count.
A Kafka topic currently has 8 partitions and uses order_id as the partitioning key. Your team decides to increase the partition count to 16 to support higher throughput. What is the most significant risk introduced by this change?
4. Tradeoffs & Decisions
| If you need... | Choose... | Tradeoff |
|---|---|---|
| Strict per-entity ordering | Partition by that entity ID | Hot keys can overload one partition; less parallelism for that entity |
| Maximum throughput, no ordering needs | Many partitions, round-robin / sticky producer | No per-key ordering; harder to reason about |
| Future headroom for consumer scaling | Over-provision partitions early | Higher controller metadata cost; cannot reduce later |
| Migrate to more partitions later without breaking ordering | Don't — instead build a new topic and dual-write or cut over | More operational work; only choice when stuck |
| Per-tenant fairness in a multi-tenant topic | Custom partitioner that mixes tenant + load-balancing salt | Cannot use stock consumer-side ordering for that tenant |
Key tradeoff: count = parallelism vs metadata cost vs commit weight. Senior engineers pick a count that comfortably handles ~2–4× current peak throughput, with the assumption it won't change again for the life of the .
Secondary tradeoff: ordering vs evenness. Sharper ordering guarantees mean choosing a narrower key, which risks skew. Broader keys spread load but weaken ordering. Match the key to the actual ordering unit you need — usually finer-grained than instinct suggests.
Your team's Kafka topic has 12 partitions keyed by user_id, and a single high-traffic user is causing one partition to receive 80% of all messages. You need strict per-user ordering and cannot tolerate dropped messages during migration. What is the recommended approach to resolve the hot partition problem?
5. Interview & System Design Cheat Sheet
- A is a logical stream; a is a single append-only log within it. Parallelism, ordering, and all happen at the level.
- Choose the key from the smallest unit you need ordered — usually the entity ID (
user_id,order_id,device_id). - The partition count is effectively immutable once consumers depend on key-based ordering. Pick for headroom: 2–4× expected peak.
- A can have at most one per partition — partition count is the ceiling on parallelism.
- Hot partitions are the dominant failure mode in real systems; monitor per-partition throughput, not just throughput.
Common follow-ups:
- "What happens if I add partitions to a live topic?" — Existing data stays where it is; new messages with a given key may land in a different partition than before. Per-key ordering across the change boundary is broken.
- "How do you avoid hot partitions for power users?" — Either accept the skew and partition more aggressively, use a composite key (
user_id + bucket), or move that tenant to a separate topic. - "How do you size partitions?" — Estimate peak throughput per topic, divide by a conservative per-partition rate (e.g. tens of MB/sec), then add headroom for growth and parallelism.
If asked to design X, anchor on this: Name the topic, the partition count, the key (and why — what ordering does it enforce?), and the factor. Those four numbers define a topic.
A Kafka topic currently has 6 partitions, and all messages are keyed by user_id. Your team decides to increase the partition count to 12 because throughput is growing. What is the most critical consequence of this change for a running system?
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.