Heaps and Priority Queues
5 min read
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Heaps and Priority Queues
1. What It Is and Why It Exists
A binary is a complete binary tree (stored compactly in an array) maintaining the property: in a min-heap every parent is ≤ its children, so the minimum is always at the root. A is the abstract "always give me the most extreme element next" structure; a heap is its standard O(log n) implementation. It exists because many problems need repeated access to the current min/max of a changing set: keeping the data fully sorted costs O(n log n) and re-sorting on every change is wasteful, while a heap gives you the extreme in O(1) and insert/extract in O(log n).
The "why it works" intuition: a heap maintains only a partial order — just "parent beats child" — which is all you need when you only ever want the extreme, not a full sort. That weaker invariant is exactly why push/pop are O(log n) (one root-to-leaf sift) instead of O(n). The classic interview leverage: top-K, k-way merge, running median, and all reduce to "repeatedly pull the smallest/largest from a live set."
A backend service maintains a live leaderboard of user scores and must frequently add new scores and retrieve the current highest score. A developer considers two approaches: (A) keep scores in a fully sorted array, re-sorting after every insertion, or (B) use a max-heap. Which statement best explains why the heap is the better choice here?
2. Operations and Complexity
| Operation | Average | Worst | Notes |
|---|---|---|---|
Peek min/max (heap[0]) | O(1) | O(1) | The defining strength |
| Push | O(log n) | O(log n) | Sift up |
| Pop (extract extreme) | O(log n) | O(log n) | Move last to root, sift down |
heapreplace / heappushpop | O(log n) | O(log n) | One sift instead of two |
Build heap from n items (heapify) | O(n) | O(n) | Bottom-up — not O(n log n) |
| Search for arbitrary element | O(n) | O(n) | A heap is not searchable/sorted |
| Delete arbitrary / update key | O(n) (or O(log n) with index map) | O(n) | Lazy deletion is the usual interview trick |
Space O(n). Most-probed: heapify is O(n), not O(n log n) (bottom-up sift; the proof is a converging series — state the result, not the proof); a gives the extreme in O(1) but arbitrary search is O(n) (it is only partially ordered); heapq is min- only — for a max-heap push negatives (or (-key, item) tuples).
A backend service needs to initialize a priority queue from a list of 1,000,000 existing task priorities loaded from a database at startup. A developer argues that building the heap from all items at once will take O(n log n) time — the same as sorting. Is the developer correct, and what is the actual complexity?
3. Implementation
A binary min- on an array, with the two sift routines that define it:
class MinHeap: def __init__(self): self._h = [] # complete tree as array: kids of i are 2i+1, 2i+2 def push(self, x): # O(log n): place at end, sift up self._h.append(x) i = len(self._h) - 1 while i > 0 and self._h[(i - 1) // 2] > self._h[i]: par = (i - 1) // 2 self._h[i], self._h[par] = self._h[par], self._h[i] i = par def pop(self): # O(log n): root is the min last = self._h.pop() if not self._h: return last top, self._h[0] = self._h[0], last i, n = 0, len(self._h) while True: # sift down to the smaller child l, r, smallest = 2*i + 1, 2*i + 2, i if l < n and self._h[l] < self._h[smallest]: smallest = l if r < n and self._h[r] < self._h[smallest]: smallest = r if smallest == i: break self._h[i], self._h[smallest] = self._h[smallest], self._h[i] i = smallest return top
In a real interview, use heapq on a plain list — never hand-roll the above unless asked:
import heapq h = [] heapq.heappush(h, 3); heapq.heappush(h, 1) heapq.heappop(h) # -> 1 (smallest) heapq.heapify(arr) # O(n) build in place heapq.nlargest(k, arr) # top-K convenience # Max-heap: push -x and negate on pop, or push (-priority, item) tuples.
Your backend service needs a max-heap to always serve the highest-priority job first. You're using Python's heapq module, which only provides a min-heap. What is the standard approach to simulate a max-heap with heapq?
4. When to Reach for It
- "Top K", "K largest/smallest", "K closest" → a of size
k:O(n log k)instead ofO(n log n)from sorting. - "K-th largest/smallest" in a stream or static array → size-
k. - Merge K sorted lists/streams → push one element per list, pop-and-refill (
O(N log k)). - "Running / sliding median" → two heaps (max-heap of low half, min-heap of high half).
- / Prim → keyed by current best distance/weight.
- Schedule by priority / repeatedly take the current extreme of a changing set.
A backend service processes a stream of user events and must always report the current median response latency after each new event arrives. Which data structure approach handles this efficiently?
5. Common Pitfalls
- Wanting a max- with
heapq: it's min-only. Negate values, or push(-key, item)tuples — and remember to negate back on pop. - Tuple ties throwing
TypeError:(priority, item)comparison falls through toitemwhen priorities tie; ifitemisn't orderable it crashes. Add a tie-breaker:(priority, counter, item). - Thinking
heapifyisO(n log n): bottom-up build isO(n). Stating it wrong is a classic miss. - Searching/deleting an arbitrary element: that's
O(n)— a isn't sorted. Use lazy deletion (mark stale, skip on pop) or an index map. - Heap vs sorted: a heap gives you the extreme cheaply but iterating it in order still costs
O(n log n)— it is not a sorted container. - Wrong-size top-K heap: for the k largest, keep a min-heap of size
k(pop the smallest when it exceedsk) — using a max-heap of allndefeats the point.
A backend service needs to efficiently track the top 10 most-requested API endpoints out of 1 million total endpoints. A developer proposes maintaining a max-heap containing all 1 million entries, then popping the top 10 at query time. What is the main problem with this approach, and what is the better alternative?
6. Interview Cheat Sheet
- "A keeps only a partial order — parent beats child — so I get the extreme in
O(1)and push/pop inO(log n); full sort isn't needed." - "
heapifybuilds inO(n)bottom-up, notO(n log n)." - "For top-K I keep a size-
kmin- →O(n log k), better than sorting'sO(n log n)." - "
heapqis min-only; for a max-heap I push negatives or(-key, item)tuples, with a counter tie-breaker to avoid comparing unorderable items."
Follow-ups:
- "Top K vs sorting?" — Heap of size
k:O(n log k)time,O(k)space; sorting isO(n log n). Quickselect can get the kth element inO(n)average but doesn't keep them ordered. - "Find the running median." — Two heaps: a max-heap for the lower half and a min-heap for the upper half, rebalanced so sizes differ by ≤1; median is a root (or the average of both roots).
O(log n)per insert. - "Decrease-key for ?" —
heapqhas no decrease-key; push the updated(dist, node)and skip stale pairs when popped (lazy deletion) — overall stillO(E log V).
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.