Binary Trees and BSTs
5 min read
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Binary Trees and BSTs
1. What It Is and Why It Exists
A binary tree is a hierarchy where each node has at most two children (left, right). It exists to model anything naturally branching or recursive — parse trees, decision trees, file systems — and because the recursive shape makes "solve the whole tree = combine solutions of the two subtrees" the default solution strategy. Most binary-tree interview problems are one recursive function over (left subtree, right subtree).
A (BST) adds one invariant: for every node, all keys in its left subtree are smaller and all keys in its right subtree are larger. That ordering is what makes a BST useful: search/insert/delete each follow a single root-to-leaf path, so they cost O(h) where h is the height. The "why it works" intuition: a BST is made dynamic — each comparison discards one subtree, so on a balanced tree h = O(log n) and operations are O(log n). The catch interviewers always probe: inserting sorted data degenerates the BST into a , h = n, and everything becomes O(n) — which is why self-balancing trees (and Python's sortedcontainers) exist.
A backend service uses a binary search tree to index user IDs for fast lookup. During a bulk import, user IDs are inserted in ascending order (1, 2, 3, 4, … 1,000,000). What is the time complexity of looking up a user ID after this import, and why?
2. Operations and Complexity
| Operation | Average (balanced) | Worst (skewed) | Notes |
|---|---|---|---|
| BST search | O(log n) | O(n) | One root-to-leaf path |
| BST insert | O(log n) | O(n) | — |
| BST delete | O(log n) | O(n) | Two-child case: swap with in-order successor |
| Traversal (in/pre/post/level order) | O(n) | O(n) | Must visit every node |
| Find min / max (BST) | O(log n) | O(n) | Leftmost / rightmost node |
| In-order traversal of a BST | O(n) | O(n) | Yields keys in sorted order — key fact |
Space: O(n) for the tree; traversal uses O(h) (O(log n) balanced, O(n) skewed). Most-probed: all BST operations are O(h), which is O(log n) only if balanced — worst case O(n); and in-order traversal of a BST is sorted, the trick behind validate-BST, kth-smallest, and BST-iterator problems.
A backend service stores user IDs in a Binary Search Tree (BST). A developer needs to implement an API endpoint that returns all user IDs in ascending sorted order. What is the most efficient way to retrieve them in sorted order, and what is its time complexity?
3. Implementation
A BST with the three operations interviewers ask for, plus the four traversals:
class TreeNode: def __init__(self, val): self.val = val self.left = None self.right = None def bst_search(root, key): # O(h): each step discards one subtree while root and root.val != key: root = root.left if key < root.val else root.right return root def bst_insert(root, key): # O(h) if root is None: return TreeNode(key) if key < root.val: root.left = bst_insert(root.left, key) else: root.right = bst_insert(root.right, key) return root def bst_delete(root, key): # O(h) if root is None: return None if key < root.val: root.left = bst_delete(root.left, key) elif key > root.val: root.right = bst_delete(root.right, key) else: # found it if not root.left: return root.right # 0/1 child if not root.right: return root.left succ = root.right # 2 children: in-order successor while succ.left: succ = succ.left root.val = succ.val root.right = bst_delete(root.right, succ.val) return root def inorder(root, out): # BST in-order => SORTED keys if root: inorder(root.left, out) out.append(root.val) inorder(root.right, out) # Level-order (BFS) needs a queue, not recursion: from collections import deque def level_order(root): res, q = [], deque([root] if root else []) while q: level = [] for _ in range(len(q)): # snapshot size = one full level n = q.popleft() level.append(n.val) if n.left: q.append(n.left) if n.right: q.append(n.right) res.append(level) return res
There is no idiomatic stdlib BST in Python — implement TreeNode yourself (interview problems hand you one). When you genuinely need an ordered map/multiset with guaranteed O(log n) ops, the real-world reach is the third-party sortedcontainers.SortedList/SortedDict; in interviews, state that and proceed with the node class.
4. When to Reach for It
- Input is given as a tree (
TreeNode root) — traversal, depth, path, LCA, validate, serialize problems. - The phrase "in-order / sorted order" appears with a BST → in-order traversal yields sorted keys (kth-smallest, validate-BST, BST iterator).
- You need a dynamic ordered set supporting insert/delete and min/max/predecessor/successor/range — a balanced BST (or
SortedList); a can't do ordering. - Recursive "combine left and right subtree answers" structure (height, diameter, path sums, tree DP).
- keyword "level" → level-order traversal with a .
You are building a leaderboard service that needs to support three operations efficiently: insert a new score, delete an existing score, and retrieve the 10th-lowest score at any time. A teammate suggests using a hash table for O(1) inserts and deletes. Why is a hash table insufficient here, and what data structure would be more appropriate?
5. Common Pitfalls
- Assuming
O(log n): BST ops areO(h); a skewed tree (sorted inserts) makes themO(n). Always state "O(h),O(log n)if balanced". - Validating a BST by checking only direct children: the property is global. Pass down
(low, high)bounds, or check that the in-order traversal is strictly increasing. - BST delete with two children done wrong: must replace with the in-order successor (smallest in right subtree) or predecessor, then delete that node — not just splice.
- depth:
O(h); a deep/skewed tree can hit Python's limit — mention an explicit as the iterative alternative. - Empty tree / single node: every traversal and recursion must handle
root is Noneas the base case. - Confusing complete/full/perfect/balanced "binary tree" terms — clarify which the problem means before assuming
h = log n.
A backend engineer builds a BST-based index for a database table and inserts records in ascending order of their primary key. After insertion, search queries are noticeably slow. What is the most accurate explanation for the performance degradation?
6. Interview Cheat Sheet
- "BST search/insert/delete are
O(h)—O(log n)if balanced,O(n)if skewed by sorted inserts." - "In-order traversal of a BST yields keys in sorted order — that's how I validate a BST or get the kth smallest."
- "Most binary-tree problems are one : solve left, solve right, combine — base case
root is None." - "Level-order uses a with a per-level size snapshot; the other three traversals are (or an explicit to avoid depth limits)."
Follow-ups:
- "How do you keep a BST balanced?" — Self-balancing variants (AVL, red-black) rotate on insert/delete to keep
h = O(log n); in Python interviews you'd citesortedcontainersrather than code a red-black tree. - "Validate a BST." — Recurse with
(low, high)bounds, tightening them as you descend; or assert the in-order traversal is strictly increasing.O(n). - "Lowest common ancestor in a BST?" — Walk from the root: both targets smaller → go left, both larger → go right, otherwise this node is the split point and the LCA.
O(h).
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.