Pub/Sub vs Queue

5 min read

Reading Progress0%
Streaming Systems Index
Streaming Systems Index

Pub/Sub vs Queue

1. What Is It?

A queue is a delivery channel where each message is consumed by exactly one worker — multiple workers compete for messages and share the load. A (publish/subscribe) channel is one where each message is delivered to every subscriber — multiple consumers each get their own copy. 's " groups" express both patterns within the same system: workers in the same group share the load (queue behavior), workers in different groups each see every message ( behavior).

The problem they solve: how to move work or facts between processes asynchronously. Without queues, you can't horizontally scale a worker pool against a backlog. Without pub/sub, you can't fan out the same event to many independent consumers without the knowing each one.

QUICK CHECK

Your e-commerce platform publishes an 'order_placed' event whenever a customer checks out. Both the inventory service and the email notification service need to react to every order — independently of each other. Which messaging pattern should you use to deliver these events?

Choose one answer

2. How It Works

Queue (work distribution):

  1. writes a message to the queue.
  2. The delivers the message to one of N competing workers.
  3. The worker acks; the deletes (or hides) the message.
  4. If the worker fails before acking, the message becomes visible again and another worker picks it up.

(broadcast):

  1. publishes a message to a .
  2. The broker delivers a copy to every subscriber.
  3. Each subscriber tracks its own progress independently.
  4. A new subscriber may (or may not) see historical messages, depending on the broker.

unifies both via groups. Partitions in a are distributed across the consumers of a single group (queue semantics within the group). Multiple groups each independently consume every ( across groups).

Concrete example. An image-upload event needs two things to happen: generate a thumbnail (one worker is enough), and update analytics (separate system).

  • Queue: A thumbnail-jobs SQS queue with 20 worker processes. Each upload produces one job; one worker handles it.
  • Pub/sub: An uploads SNS topic. The thumbnail service subscribes; the analytics service subscribes. Each gets every event.
  • unified: Topic uploads with thumbnailer (20 instances sharing partitions) and analytics (separately reading every event).
QUICK CHECK

Your team needs to process uploaded videos: a transcoding service with 10 worker instances should handle each video exactly once, while a separate notification service and a separate analytics service each need to receive every upload event independently. Which messaging setup best satisfies all three requirements?

Choose one answer

3. What Mid-Senior SWEs Actually Need to Know

  • Queue semantics need ack/nack and visibility timeouts. SQS, RabbitMQ: a message is "in flight" while a worker holds it; if not acked within the timeout, it reappears. Setting the timeout too low causes duplicates; too high causes stuck messages on crashed workers.
  • without persistence loses messages. Classic SNS / Redis delivers only to subscribers connected at publish time. A that was down or slow misses events forever. Modern pub/sub (, Pulsar, Kinesis) persists for a retention window precisely to fix this.
  • "Fan-out from a queue" is a common antipattern. Don't have one queue with N workers each filtering "is this for me?" — use pub/sub (or a per ) instead.
  • consumer-group counts matter. Workers within a group are limited by count: 12 partitions = at most 12 active workers in a group; adding a 13th means one sits idle.
  • Ordering rules differ.
    • Queues like SQS standard: no ordering. FIFO SQS: per MessageGroupId.
    • Kafka: per- ordering. Partition by your ordering key.
    • Pub/sub topics: typically no global ordering across subscribers.
  • Common misunderstanding: SNS-style pub/sub is sometimes thought of as "real-time" or "low latency" — it is, but it has no replay. If you need both fan-out and replayability (analytics, debugging, new consumer onboarding), you need a log-based pub/sub like Kafka.

4. Tradeoffs & Decisions

If you need...Pick...Choose the other when...
Distribute work across a worker pool, each job done onceQueue (SQS, RabbitMQ)Multiple independent systems each need the same input
Fan-out events to many independent servicesPub/sub (Kafka, SNS, EventBridge)Only one consumer ever, and it's pure work-queue semantics
Replay history, add new consumers laterLog-based (Kafka, Pulsar, Kinesis)Storage cost matters and old data is worthless
Per-message ack with redelivery on failureQueue with visibility timeoutYou can re-process from offsets (log-based)
Both fan-out and load-sharing within each consumerKafka with consumer groupsYou don't need replay — SNS → SQS fan-out is simpler

Key tradeoff: delete-on-consume vs append-only log. Classic queues delete messages once consumed — cheap, simple, but no replay. Log-based brokers retain messages for a window — heavier storage, but they let you onboard a new that re-reads from the beginning, debug by inspecting history, and reprocess after a bug.

QUICK CHECK

Your team discovers a bug that caused an order-processing service to silently miscalculate totals for the past 6 hours. You need to reprocess all affected messages through a corrected version of the service. Which messaging system characteristic makes this possible, and which type of broker provides it?

Choose one answer

5. Interview & System Design Cheat Sheet

  • A queue distributes one message to one worker — used for work. A delivers one message to every subscriber — used for facts.
  • collapses both into one mental model: partitions + groups. Within a group = queue. Across groups = .
  • Always ask: "Will another team need this data later?" If yes, use a log-based now — retrofitting fan-out onto a queue is painful.
  • Ordering requires -key control () or MessageGroupId (SQS FIFO). Anything else is best-effort.
  • The replay capability of log-based systems is the single biggest reason Kafka displaced traditional message queues in event-driven backends.

Common follow-ups:

  • "Why not just use SNS → SQS fan-out?" — Works well for low-throughput notification fan-out without replay needs. Fails when you want a new analytics to backfill the last week of events.
  • "How does Kafka act as a queue?" — One with N consumers shares the 's partitions. That's a queue. The "queue" persists, which is the only behavioral difference from SQS.
  • "What guarantees does pub/sub give about ordering?" — Generally none across subscribers. Within a single / message-group, ordering is preserved per consumer.

If asked to design X, anchor on this: Decide for each interaction whether the consumer side is "work to distribute" (queue) or "facts to broadcast" (pub/sub). Then ask whether replay matters. Those two answers pick the .

QUICK CHECK

Your team uses an SQS queue to publish order-placement events. A new analytics team now wants to consume those same events and backfill the last week of data to bootstrap their pipeline. What is the core problem with the current setup, and what should you have used instead?

Choose one answer