Heaps and Priority Queues

5 min read

Reading Progress0%
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming

Heaps and Priority Queues

1. What It Is and Why It Exists

A binary is a complete binary tree (stored compactly in an array) maintaining the property: in a min-heap every parent is ≤ its children, so the minimum is always at the root. A is the abstract "always give me the most extreme element next" structure; a heap is its standard O(log n) implementation. It exists because many problems need repeated access to the current min/max of a changing set: keeping the data fully sorted costs O(n log n) and re-sorting on every change is wasteful, while a heap gives you the extreme in O(1) and insert/extract in O(log n).

The "why it works" intuition: a heap maintains only a partial order — just "parent beats child" — which is all you need when you only ever want the extreme, not a full sort. That weaker invariant is exactly why push/pop are O(log n) (one root-to-leaf sift) instead of O(n). The classic interview leverage: top-K, k-way merge, running median, and all reduce to "repeatedly pull the smallest/largest from a live set."

QUICK CHECK

A backend service maintains a live leaderboard of user scores and must frequently add new scores and retrieve the current highest score. A developer considers two approaches: (A) keep scores in a fully sorted array, re-sorting after every insertion, or (B) use a max-heap. Which statement best explains why the heap is the better choice here?

Choose one answer

2. Operations and Complexity

OperationAverageWorstNotes
Peek min/max (heap[0])O(1)O(1)The defining strength
PushO(log n)O(log n)Sift up
Pop (extract extreme)O(log n)O(log n)Move last to root, sift down
heapreplace / heappushpopO(log n)O(log n)One sift instead of two
Build heap from n items (heapify)O(n)O(n)Bottom-up — not O(n log n)
Search for arbitrary elementO(n)O(n)A heap is not searchable/sorted
Delete arbitrary / update keyO(n) (or O(log n) with index map)O(n)Lazy deletion is the usual interview trick

Space O(n). Most-probed: heapify is O(n), not O(n log n) (bottom-up sift; the proof is a converging series — state the result, not the proof); a gives the extreme in O(1) but arbitrary search is O(n) (it is only partially ordered); heapq is min- only — for a max-heap push negatives (or (-key, item) tuples).

QUICK CHECK

A backend service needs to initialize a priority queue from a list of 1,000,000 existing task priorities loaded from a database at startup. A developer argues that building the heap from all items at once will take O(n log n) time — the same as sorting. Is the developer correct, and what is the actual complexity?

Choose one answer

3. Implementation

A binary min- on an array, with the two sift routines that define it:

class MinHeap:
    def __init__(self):
        self._h = []                      # complete tree as array: kids of i are 2i+1, 2i+2

    def push(self, x):                    # O(log n): place at end, sift up
        self._h.append(x)
        i = len(self._h) - 1
        while i > 0 and self._h[(i - 1) // 2] > self._h[i]:
            par = (i - 1) // 2
            self._h[i], self._h[par] = self._h[par], self._h[i]
            i = par

    def pop(self):                        # O(log n): root is the min
        last = self._h.pop()
        if not self._h:
            return last
        top, self._h[0] = self._h[0], last
        i, n = 0, len(self._h)
        while True:                       # sift down to the smaller child
            l, r, smallest = 2*i + 1, 2*i + 2, i
            if l < n and self._h[l] < self._h[smallest]: smallest = l
            if r < n and self._h[r] < self._h[smallest]: smallest = r
            if smallest == i:
                break
            self._h[i], self._h[smallest] = self._h[smallest], self._h[i]
            i = smallest
        return top

In a real interview, use heapq on a plain list — never hand-roll the above unless asked:

import heapq
h = []
heapq.heappush(h, 3); heapq.heappush(h, 1)
heapq.heappop(h)                 # -> 1   (smallest)
heapq.heapify(arr)               # O(n) build in place
heapq.nlargest(k, arr)           # top-K convenience
# Max-heap: push -x and negate on pop, or push (-priority, item) tuples.
QUICK CHECK

Your backend service needs a max-heap to always serve the highest-priority job first. You're using Python's heapq module, which only provides a min-heap. What is the standard approach to simulate a max-heap with heapq?

Choose one answer

4. When to Reach for It

  • "Top K", "K largest/smallest", "K closest" → a of size k: O(n log k) instead of O(n log n) from sorting.
  • "K-th largest/smallest" in a stream or static array → size-k .
  • Merge K sorted lists/streams → push one element per list, pop-and-refill (O(N log k)).
  • "Running / sliding median" → two heaps (max-heap of low half, min-heap of high half).
  • / Prim keyed by current best distance/weight.
  • Schedule by priority / repeatedly take the current extreme of a changing set.
QUICK CHECK

A backend service processes a stream of user events and must always report the current median response latency after each new event arrives. Which data structure approach handles this efficiently?

Choose one answer

5. Common Pitfalls

  • Wanting a max- with heapq: it's min-only. Negate values, or push (-key, item) tuples — and remember to negate back on pop.
  • Tuple ties throwing TypeError: (priority, item) comparison falls through to item when priorities tie; if item isn't orderable it crashes. Add a tie-breaker: (priority, counter, item).
  • Thinking heapify is O(n log n): bottom-up build is O(n). Stating it wrong is a classic miss.
  • Searching/deleting an arbitrary element: that's O(n) — a isn't sorted. Use lazy deletion (mark stale, skip on pop) or an index map.
  • Heap vs sorted: a heap gives you the extreme cheaply but iterating it in order still costs O(n log n) — it is not a sorted container.
  • Wrong-size top-K heap: for the k largest, keep a min-heap of size k (pop the smallest when it exceeds k) — using a max-heap of all n defeats the point.
QUICK CHECK

A backend service needs to efficiently track the top 10 most-requested API endpoints out of 1 million total endpoints. A developer proposes maintaining a max-heap containing all 1 million entries, then popping the top 10 at query time. What is the main problem with this approach, and what is the better alternative?

Choose one answer

6. Interview Cheat Sheet

  • "A keeps only a partial order — parent beats child — so I get the extreme in O(1) and push/pop in O(log n); full sort isn't needed."
  • "heapify builds in O(n) bottom-up, not O(n log n)."
  • "For top-K I keep a size-k min-O(n log k), better than sorting's O(n log n)."
  • "heapq is min-only; for a max-heap I push negatives or (-key, item) tuples, with a counter tie-breaker to avoid comparing unorderable items."

Follow-ups:

  • "Top K vs sorting?"Heap of size k: O(n log k) time, O(k) space; sorting is O(n log n). Quickselect can get the kth element in O(n) average but doesn't keep them ordered.
  • "Find the running median." — Two heaps: a max-heap for the lower half and a min-heap for the upper half, rebalanced so sizes differ by ≤1; median is a root (or the average of both roots). O(log n) per insert.
  • "Decrease-key for ?"heapq has no decrease-key; push the updated (dist, node) and skip stale pairs when popped (lazy deletion) — overall still O(E log V).