Binary Trees and BSTs

5 min read

Reading Progress0%
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming

Binary Trees and BSTs

1. What It Is and Why It Exists

A binary tree is a hierarchy where each node has at most two children (left, right). It exists to model anything naturally branching or recursive — parse trees, decision trees, file systems — and because the recursive shape makes "solve the whole tree = combine solutions of the two subtrees" the default solution strategy. Most binary-tree interview problems are one recursive function over (left subtree, right subtree).

A (BST) adds one invariant: for every node, all keys in its left subtree are smaller and all keys in its right subtree are larger. That ordering is what makes a BST useful: search/insert/delete each follow a single root-to-leaf path, so they cost O(h) where h is the height. The "why it works" intuition: a BST is made dynamic — each comparison discards one subtree, so on a balanced tree h = O(log n) and operations are O(log n). The catch interviewers always probe: inserting sorted data degenerates the BST into a , h = n, and everything becomes O(n) — which is why self-balancing trees (and Python's sortedcontainers) exist.

QUICK CHECK

A backend service uses a binary search tree to index user IDs for fast lookup. During a bulk import, user IDs are inserted in ascending order (1, 2, 3, 4, … 1,000,000). What is the time complexity of looking up a user ID after this import, and why?

Choose one answer

2. Operations and Complexity

OperationAverage (balanced)Worst (skewed)Notes
BST searchO(log n)O(n)One root-to-leaf path
BST insertO(log n)O(n)
BST deleteO(log n)O(n)Two-child case: swap with in-order successor
Traversal (in/pre/post/level order)O(n)O(n)Must visit every node
Find min / max (BST)O(log n)O(n)Leftmost / rightmost node
In-order traversal of a BSTO(n)O(n)Yields keys in sorted order — key fact

Space: O(n) for the tree; traversal uses O(h) (O(log n) balanced, O(n) skewed). Most-probed: all BST operations are O(h), which is O(log n) only if balanced — worst case O(n); and in-order traversal of a BST is sorted, the trick behind validate-BST, kth-smallest, and BST-iterator problems.

QUICK CHECK

A backend service stores user IDs in a Binary Search Tree (BST). A developer needs to implement an API endpoint that returns all user IDs in ascending sorted order. What is the most efficient way to retrieve them in sorted order, and what is its time complexity?

Choose one answer

3. Implementation

A BST with the three operations interviewers ask for, plus the four traversals:

class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None

def bst_search(root, key):               # O(h): each step discards one subtree
    while root and root.val != key:
        root = root.left if key < root.val else root.right
    return root

def bst_insert(root, key):               # O(h)
    if root is None:
        return TreeNode(key)
    if key < root.val:
        root.left = bst_insert(root.left, key)
    else:
        root.right = bst_insert(root.right, key)
    return root

def bst_delete(root, key):               # O(h)
    if root is None:
        return None
    if key < root.val:
        root.left = bst_delete(root.left, key)
    elif key > root.val:
        root.right = bst_delete(root.right, key)
    else:                                # found it
        if not root.left:  return root.right    # 0/1 child
        if not root.right: return root.left
        succ = root.right                # 2 children: in-order successor
        while succ.left:
            succ = succ.left
        root.val = succ.val
        root.right = bst_delete(root.right, succ.val)
    return root

def inorder(root, out):                  # BST in-order => SORTED keys
    if root:
        inorder(root.left, out)
        out.append(root.val)
        inorder(root.right, out)

# Level-order (BFS) needs a queue, not recursion:
from collections import deque
def level_order(root):
    res, q = [], deque([root] if root else [])
    while q:
        level = []
        for _ in range(len(q)):          # snapshot size = one full level
            n = q.popleft()
            level.append(n.val)
            if n.left:  q.append(n.left)
            if n.right: q.append(n.right)
        res.append(level)
    return res

There is no idiomatic stdlib BST in Python — implement TreeNode yourself (interview problems hand you one). When you genuinely need an ordered map/multiset with guaranteed O(log n) ops, the real-world reach is the third-party sortedcontainers.SortedList/SortedDict; in interviews, state that and proceed with the node class.

4. When to Reach for It

  • Input is given as a tree (TreeNode root) — traversal, depth, path, LCA, validate, serialize problems.
  • The phrase "in-order / sorted order" appears with a BST → in-order traversal yields sorted keys (kth-smallest, validate-BST, BST iterator).
  • You need a dynamic ordered set supporting insert/delete and min/max/predecessor/successor/range — a balanced BST (or SortedList); a can't do ordering.
  • Recursive "combine left and right subtree answers" structure (height, diameter, path sums, tree DP).
  • keyword "level" → level-order traversal with a .
QUICK CHECK

You are building a leaderboard service that needs to support three operations efficiently: insert a new score, delete an existing score, and retrieve the 10th-lowest score at any time. A teammate suggests using a hash table for O(1) inserts and deletes. Why is a hash table insufficient here, and what data structure would be more appropriate?

Choose one answer

5. Common Pitfalls

  • Assuming O(log n): BST ops are O(h); a skewed tree (sorted inserts) makes them O(n). Always state "O(h), O(log n) if balanced".
  • Validating a BST by checking only direct children: the property is global. Pass down (low, high) bounds, or check that the in-order traversal is strictly increasing.
  • BST delete with two children done wrong: must replace with the in-order successor (smallest in right subtree) or predecessor, then delete that node — not just splice.
  • depth: O(h) ; a deep/skewed tree can hit Python's limit — mention an explicit as the iterative alternative.
  • Empty tree / single node: every traversal and recursion must handle root is None as the base case.
  • Confusing complete/full/perfect/balanced "binary tree" terms — clarify which the problem means before assuming h = log n.
QUICK CHECK

A backend engineer builds a BST-based index for a database table and inserts records in ascending order of their primary key. After insertion, search queries are noticeably slow. What is the most accurate explanation for the performance degradation?

Choose one answer

6. Interview Cheat Sheet

  • "BST search/insert/delete are O(h)O(log n) if balanced, O(n) if skewed by sorted inserts."
  • "In-order traversal of a BST yields keys in sorted order — that's how I validate a BST or get the kth smallest."
  • "Most binary-tree problems are one : solve left, solve right, combine — base case root is None."
  • "Level-order uses a with a per-level size snapshot; the other three traversals are (or an explicit to avoid depth limits)."

Follow-ups:

  • "How do you keep a BST balanced?" — Self-balancing variants (AVL, red-black) rotate on insert/delete to keep h = O(log n); in Python interviews you'd cite sortedcontainers rather than code a red-black tree.
  • "Validate a BST." — Recurse with (low, high) bounds, tightening them as you descend; or assert the in-order traversal is strictly increasing. O(n).
  • "Lowest common ancestor in a BST?" — Walk from the root: both targets smaller → go left, both larger → go right, otherwise this node is the split point and the LCA. O(h).