Graphs and Representations

5 min read

Reading Progress0%
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming
Data Structures & Algorithms Index
Tier 1 -- Foundations
Tier 2 -- Patterns
Tier 3 -- Dynamic Programming

Graphs and Representations

1. What It Is and Why It Exists

A graph is a set of vertices (nodes) connected by edges. It exists because an enormous class of problems is fundamentally "things and relationships between them": road maps, social networks, course prerequisites, task dependencies, state machines. Trees and linked lists are just restricted graphs; the moment a problem has arbitrary connections, cycles, or "reachability/shortest path/dependency" language, it is a graph problem and you model it as one.

The representation you choose decides your complexity, so it's the first decision you state. An adjacency list maps each vertex to its neighbors — O(V + E) space, and you iterate a vertex's neighbors in O(degree). An adjacency matrix is a V×V grid where M[u][v] marks an edge — O(V^2) space but O(1) edge existence checks. The "why it works" intuition: real graphs are usually sparse (E ≪ V^2), so an adjacency list wastes no space and makes traversal O(V + E); you only prefer a matrix when the graph is dense or you need constant-time "is there an edge u→v?" checks. Most interview graphs are given as edge lists you convert to an adjacency list first.

QUICK CHECK

You're building a backend service that models a social network where users can follow each other. The network has 10 million users but each user follows an average of only 300 others. You need to frequently traverse a user's list of followers. Which graph representation is most appropriate, and why?

Choose one answer

2. Operations and Complexity

OperationAdjacency ListAdjacency MatrixNotes
SpaceO(V + E)O(V^2)List wins on sparse graphs (the usual case)
Add edgeO(1)O(1)
Check edge u–v existsO(deg(u))O(1)Matrix's one advantage
Iterate neighbors of uO(deg(u))O(V)List wins — only real neighbors
Full traversal (BFS/DFS)O(V + E)O(V^2)List is why traversal is O(V+E)

Most-probed facts: default to an adjacency list; / are O(V + E) on a list (each vertex and edge touched once); on an unweighted graph gives the shortest path (fewest edges) because vertices leave the in non-decreasing distance order; a visited set is mandatory or cycles loop forever. Directed vs undirected and weighted vs unweighted change the build and the algorithm — clarify both before coding.

QUICK CHECK

You are building a social network backend where users can have millions of connections, but the average user follows only a few hundred people out of the millions on the platform. You need to run BFS to find the shortest follow-path between two users. Which graph representation should you choose, and what will the BFS time complexity be?

Choose one answer

3. Implementation

Build an adjacency list from an edge list, then the two canonical traversals:

from collections import deque, defaultdict

def build_graph(n, edges, directed=False):       # edges: list of (u, v)
    g = defaultdict(list)
    for u, v in edges:
        g[u].append(v)
        if not directed:                          # undirected => add both ways
            g[v].append(u)
    return g                                       # O(V + E) space

def bfs(g, start):                                # O(V + E); shortest path on UNWEIGHTED graph
    visited = {start}                              # mark on ENQUEUE, not on dequeue
    q = deque([start])
    order = []
    while q:
        node = q.popleft()
        order.append(node)
        for nxt in g[node]:
            if nxt not in visited:                 # visited set => no infinite loop on cycles
                visited.add(nxt)
                q.append(nxt)
    return order

def _dfs_visit(g, node, visited, out):            # recursion stack O(V) worst case
    visited.add(node)
    out.append(node)
    for nxt in g[node]:
        if nxt not in visited:
            _dfs_visit(g, nxt, visited, out)

def dfs(g, start):                                # O(V + E)
    visited = set()
    out = []
    _dfs_visit(g, start, visited, out)
    return out

# Adjacency matrix (use only when dense or you need O(1) edge checks):
def build_matrix(n, edges):
    M = [[0] * n for _ in range(n)]                # O(V^2) space
    for u, v in edges:
        M[u][v] = M[v][u] = 1
    return M

In a real interview, don't define a graph class — use collections.defaultdict(list) for the adjacency list and collections.deque for the ; both are the expected idioms. A 2D grid problem is an implicit graph: cells are vertices, the 4 (or 8) neighbors are edges — no explicit graph build needed.

QUICK CHECK

You are implementing BFS to find the shortest path between two users in a social network graph (an unweighted, undirected graph). A colleague suggests marking nodes as visited when they are dequeued rather than when they are enqueued. What is the consequence of this change?

Choose one answer

4. When to Reach for It

  • Language signals: "network," "connections," "dependencies," "prerequisites," "reachable," "shortest path," "islands/regions," "cycle."
  • Grid problems (number of islands, shortest path in a maze, rotting oranges) → treat the grid as an implicit graph, / over neighbor cells.
  • Shortest path, fewest steps, minimum moves on an unweighted graph (it gives the minimum-edge path for free).
  • Connectivity / components / cycle detection / topological order or BFS (or for pure connectivity).
  • Choose matrix only when the graph is dense or you need many O(1) "edge exists?" queries; otherwise adjacency list.
QUICK CHECK

A backend service needs to find the fewest number of API hops between two microservices in a distributed system, where every hop between services has equal cost. Which algorithm and data structure combination is most appropriate for this task?

Choose one answer

5. Common Pitfalls

  • No visited set: traversing a graph with a cycle without marking visited loops forever — the single most common graph bug.
  • Marking visited on dequeue instead of enqueue (): the same node gets enqueued multiple times → blows up time/space and can break shortest-path correctness. Mark when you add to the .
  • Forgetting undirected edges go both ways: adding only g[u].append(v) for an undirected graph silently loses half the connections.
  • Using for shortest path on a weighted graph: BFS only minimizes edge count; weighted graphs need (non-negative) or Bellman-Ford.
  • Defaulting to an adjacency matrix: O(V^2) space TLEs/MLEs on large sparse graphs — list is the default.
  • depth: O(V) on a long path can hit Python's limit — mention an explicit for large inputs.
  • Disconnected graph: one BFS/ from a single source misses other components — loop over all unvisited vertices when the whole graph matters.
QUICK CHECK

A backend engineer implements BFS to find the shortest path between two nodes in a service dependency graph. The graph has weighted edges representing latency values (e.g., 50ms, 200ms). After testing, the engineer notices the reported 'shortest path' often has high total latency despite having few hops. What is the most likely cause?

Choose one answer

6. Interview Cheat Sheet

  • "I default to an adjacency list — O(V+E) space — and only use a matrix for dense graphs or O(1) edge-existence checks."
  • " and are both O(V+E); gives the shortest path on an unweighted graph because nodes dequeue in non-decreasing distance order."
  • "A visited set is mandatory to handle cycles, and in BFS I mark visited on enqueue, not dequeue."
  • "A 2D grid is an implicit graph: cells are vertices and the 4 neighbors are edges — no explicit build needed."

Follow-ups:

  • "Adjacency list vs matrix?" — List: O(V+E) space, O(deg) neighbor iteration — best for sparse graphs. Matrix: O(V^2) space, O(1) edge check — only for dense graphs.
  • "BFS vs for shortest path?"BFS gives the fewest-edges path on an unweighted graph. DFS does not (it finds a path, not the shortest). Weighted → /Bellman-Ford.
  • "Detect a cycle?" — Undirected: DFS and find an already-visited node that isn't the parent (or ). Directed: DFS with a /"in-progress" set — a back edge to a gray node is a cycle.