Graphs and Representations
5 min read
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Graphs and Representations
1. What It Is and Why It Exists
A graph is a set of vertices (nodes) connected by edges. It exists because an enormous class of problems is fundamentally "things and relationships between them": road maps, social networks, course prerequisites, task dependencies, state machines. Trees and linked lists are just restricted graphs; the moment a problem has arbitrary connections, cycles, or "reachability/shortest path/dependency" language, it is a graph problem and you model it as one.
The representation you choose decides your complexity, so it's the first decision you state. An adjacency list maps each vertex to its neighbors — O(V + E) space, and you iterate a vertex's neighbors in O(degree). An adjacency matrix is a V×V grid where M[u][v] marks an edge — O(V^2) space but O(1) edge existence checks. The "why it works" intuition: real graphs are usually sparse (E ≪ V^2), so an adjacency list wastes no space and makes traversal O(V + E); you only prefer a matrix when the graph is dense or you need constant-time "is there an edge u→v?" checks. Most interview graphs are given as edge lists you convert to an adjacency list first.
You're building a backend service that models a social network where users can follow each other. The network has 10 million users but each user follows an average of only 300 others. You need to frequently traverse a user's list of followers. Which graph representation is most appropriate, and why?
2. Operations and Complexity
| Operation | Adjacency List | Adjacency Matrix | Notes |
|---|---|---|---|
| Space | O(V + E) | O(V^2) | List wins on sparse graphs (the usual case) |
| Add edge | O(1) | O(1) | — |
Check edge u–v exists | O(deg(u)) | O(1) | Matrix's one advantage |
Iterate neighbors of u | O(deg(u)) | O(V) | List wins — only real neighbors |
| Full traversal (BFS/DFS) | O(V + E) | O(V^2) | List is why traversal is O(V+E) |
Most-probed facts: default to an adjacency list; / are O(V + E) on a list (each vertex and edge touched once); on an unweighted graph gives the shortest path (fewest edges) because vertices leave the in non-decreasing distance order; a visited set is mandatory or cycles loop forever. Directed vs undirected and weighted vs unweighted change the build and the algorithm — clarify both before coding.
You are building a social network backend where users can have millions of connections, but the average user follows only a few hundred people out of the millions on the platform. You need to run BFS to find the shortest follow-path between two users. Which graph representation should you choose, and what will the BFS time complexity be?
3. Implementation
Build an adjacency list from an edge list, then the two canonical traversals:
from collections import deque, defaultdict def build_graph(n, edges, directed=False): # edges: list of (u, v) g = defaultdict(list) for u, v in edges: g[u].append(v) if not directed: # undirected => add both ways g[v].append(u) return g # O(V + E) space def bfs(g, start): # O(V + E); shortest path on UNWEIGHTED graph visited = {start} # mark on ENQUEUE, not on dequeue q = deque([start]) order = [] while q: node = q.popleft() order.append(node) for nxt in g[node]: if nxt not in visited: # visited set => no infinite loop on cycles visited.add(nxt) q.append(nxt) return order def _dfs_visit(g, node, visited, out): # recursion stack O(V) worst case visited.add(node) out.append(node) for nxt in g[node]: if nxt not in visited: _dfs_visit(g, nxt, visited, out) def dfs(g, start): # O(V + E) visited = set() out = [] _dfs_visit(g, start, visited, out) return out # Adjacency matrix (use only when dense or you need O(1) edge checks): def build_matrix(n, edges): M = [[0] * n for _ in range(n)] # O(V^2) space for u, v in edges: M[u][v] = M[v][u] = 1 return M
In a real interview, don't define a graph class — use collections.defaultdict(list) for the adjacency list and collections.deque for the ; both are the expected idioms. A 2D grid problem is an implicit graph: cells are vertices, the 4 (or 8) neighbors are edges — no explicit graph build needed.
You are implementing BFS to find the shortest path between two users in a social network graph (an unweighted, undirected graph). A colleague suggests marking nodes as visited when they are dequeued rather than when they are enqueued. What is the consequence of this change?
4. When to Reach for It
- Language signals: "network," "connections," "dependencies," "prerequisites," "reachable," "shortest path," "islands/regions," "cycle."
- Grid problems (number of islands, shortest path in a maze, rotting oranges) → treat the grid as an implicit graph, / over neighbor cells.
- Shortest path, fewest steps, minimum moves on an unweighted graph → (it gives the minimum-edge path for free).
- Connectivity / components / cycle detection / topological order → or BFS (or for pure connectivity).
- Choose matrix only when the graph is dense or you need many
O(1)"edge exists?" queries; otherwise adjacency list.
A backend service needs to find the fewest number of API hops between two microservices in a distributed system, where every hop between services has equal cost. Which algorithm and data structure combination is most appropriate for this task?
5. Common Pitfalls
- No visited set: traversing a graph with a cycle without marking visited loops forever — the single most common graph bug.
- Marking visited on dequeue instead of enqueue (): the same node gets enqueued multiple times → blows up time/space and can break shortest-path correctness. Mark when you add to the .
- Forgetting undirected edges go both ways: adding only
g[u].append(v)for an undirected graph silently loses half the connections. - Using for shortest path on a weighted graph: BFS only minimizes edge count; weighted graphs need (non-negative) or Bellman-Ford.
- Defaulting to an adjacency matrix:
O(V^2)space TLEs/MLEs on large sparse graphs — list is the default. - depth:
O(V)on a long path can hit Python's limit — mention an explicit for large inputs. - Disconnected graph: one BFS/ from a single source misses other components — loop over all unvisited vertices when the whole graph matters.
A backend engineer implements BFS to find the shortest path between two nodes in a service dependency graph. The graph has weighted edges representing latency values (e.g., 50ms, 200ms). After testing, the engineer notices the reported 'shortest path' often has high total latency despite having few hops. What is the most likely cause?
6. Interview Cheat Sheet
- "I default to an adjacency list —
O(V+E)space — and only use a matrix for dense graphs orO(1)edge-existence checks." - " and are both
O(V+E); gives the shortest path on an unweighted graph because nodes dequeue in non-decreasing distance order." - "A visited set is mandatory to handle cycles, and in BFS I mark visited on enqueue, not dequeue."
- "A 2D grid is an implicit graph: cells are vertices and the 4 neighbors are edges — no explicit build needed."
Follow-ups:
- "Adjacency list vs matrix?" — List:
O(V+E)space,O(deg)neighbor iteration — best for sparse graphs. Matrix:O(V^2)space,O(1)edge check — only for dense graphs. - "BFS vs for shortest path?" — BFS gives the fewest-edges path on an unweighted graph. DFS does not (it finds a path, not the shortest). Weighted → /Bellman-Ford.
- "Detect a cycle?" — Undirected: DFS and find an already-visited node that isn't the parent (or ). Directed: DFS with a /"in-progress" set — a back edge to a gray node is a cycle.
Glossary History
Click dotted jargon to save explanations here.
Glossary History
Click dotted jargon to save explanations here.