Graphs That Run the World

Q: PageRank treats the web as what kind of graph?

A directed graph where edges are links from one page to another

Q: Why does a package manager need to detect cycles in the dependency graph?

A cycle means A depends on B, B depends on C, and C depends on A, so nothing can be installed

The pattern you already know

In Phase 1 you learned that a graph is nodes and edges. In Phase 2 you learned to walk through graphs, find shortest paths, and detect cycles. Now you are going to see those same nodes and edges running the world.

The pattern is always the same:

Identify the things (people, web pages, packages, routers).
Identify the connections between them (follows, links, dependencies, cables).
Draw it as a graph.
Run graph algorithms on it to find something useful.

That is Google search. That is your package manager resolving dependencies. That is the internet routing your email. That is the social network suggesting who to follow.

PageRank: the graph that made Google

In the late 1990s, the web was a mess of pages linking to other pages. The question was: which pages are important?

Larry Page and Sergey Brin realized that a link from one page to another is a vote. But not all votes are equal. A vote from an important page should count more than a vote from an unimportant one.

So they built a graph. Each web page is a node. Each link is a directed edge from the linking page to the linked page. Then they assigned an importance score to each node and let the scores flow along the edges.

The rule is simple: a page's importance is the sum of the importance of all pages that link to it, divided by the number of links on each of those pages. A link from a page with high importance and few outgoing links is worth more than a link from a page with low importance and many outgoing links.

Run that rule over and over, like water flowing through pipes, and the scores settle down. The result is PageRank. The math is graph theory: a directed graph of links, and an iterative transformation that redistributes importance along the edges.

You do not need to implement PageRank to use the insight. Every time you search for something and the "right" answer appears near the top, you are seeing graph theory at work.

Social networks: the graph of who knows whom

Your social network is a graph. You are a node. Your friends are nodes connected to you by edges. Their friends are nodes connected to them. The whole structure stretches out to billions of nodes and trillions of edges.

When a social network suggests "people you may know," it is running a graph algorithm. The most common approach is something like: "you have 12 friends in common, and none of them are connected to each other." That is a graph pattern: two nodes with many common neighbors and no direct edge.

When a feed algorithm decides what to show you, it is traversing the graph of your interactions. The posts from nodes you interact with most heavily are weighted more. The graph structure determines what you see.

Package managers: the dependency graph

When you run npm install or pip install, the tool is solving a graph problem. Each package is a node. Each dependency is a directed edge: "this package needs that package." The tool must find an ordering of the nodes where every edge points forward - a topological sort.

If the graph has a cycle, the tool cannot proceed. Package A needs B, B needs C, and C needs A. Nothing can be installed because everything is waiting for something else. The error message "circular dependency detected" is a graph algorithm telling you that the dependency graph is not a valid partial order.

This is why lock files exist. They record the exact version of every package that was chosen when the graph was solved. If you delete the lock file and reinstall, you might get a different solution, because the graph has multiple valid orderings.

Network routing: the graph of the internet

The internet is a graph. The nodes are routers. The edges are the cables and wireless links between them. When you send a packet from your computer to a server, the packet travels through a sequence of routers. The path it takes is determined by routing protocols that run graph algorithms in real time.

OSPF (Open Shortest Path First) is a protocol where each router knows the weight of its own edges (usually based on speed or cost) and shares that information with its neighbors. Each router then runs Dijkstra's algorithm to compute the shortest path to every destination. The result is a routing table: for every possible destination, the next hop to send the packet toward.

This happens millions of times per second across the globe. The internet works because every router is running a graph algorithm, all the time.

For builders

This is the part where graph theory stops being abstract and starts being the reason your software works.

Dependency resolution - Every build system, package manager, and container orchestrator solves a graph problem. Understanding the graph structure helps you debug "why is this taking so long" and "why did this cycle appear."
Social features - "Friends of friends," "suggested connections," "community detection" are all graph algorithms. If you are building any feature that involves relationships between users, you are building a graph feature.
Data pipelines - A data pipeline with branching and merging is a directed acyclic graph. Tools like Airflow and Prefect let you define pipelines as DAGs (directed acyclic graphs) and execute them in the correct order.
Recommendation systems - Collaborative filtering often builds a bipartite graph of users and items, then uses graph algorithms to find similar nodes.
Infrastructure - Your network topology, your service mesh, your deployment pipeline: all graphs. Understanding the graph structure helps you reason about failure modes, bottlenecks, and blast radius.

The key insight: any time you have things and relationships between them, you have a graph. The moment you can draw it as dots and lines, you can apply graph algorithms to find shortest paths, detect cycles, rank nodes, or order tasks.

What we have built

A graph is nodes connected by edges, representing relationships between things.
BFS explores level by level and finds the shortest path in an unweighted graph.
DFS follows one path as far as it can go, then backtracks.
Dijkstra's algorithm finds the shortest weighted path by always expanding the closest node.
Cycle detection finds loops in a graph, which is essential for dependency validation.
Topological sort orders nodes so that every edge points forward, or reports that no such order exists.
PageRank uses a directed graph of links and iterative importance flow to rank web pages.
Social networks use graph traversal to suggest connections and rank content.
Package managers use topological sort to resolve dependencies and detect cycles.
Network routing uses Dijkstra's algorithm to compute shortest paths across the internet.

You started this guide with a simple question: "what is connected to what?" You ended with the algorithms that run Google, npm, the internet, and your social feed. The same dots and lines you drew on a whiteboard in Phase 1 are the structure behind most of the digital world.

A quick check before you go:

[
  {
    "q": "PageRank treats the web as what kind of graph?",
    "choices": ["An undirected graph of web pages", "A directed graph where edges are links from one page to another", "A weighted graph where weights are page sizes", "A tree structure of categories"],
    "answer": 1,
    "explain": "PageRank models the web as a directed graph. Each web page is a node, and each hyperlink is a directed edge from the linking page to the linked page. Importance flows along these directed edges."
  },
  {
    "q": "Why does a package manager need to detect cycles in the dependency graph?",
    "choices": ["Cycles make the graph look messy", "A cycle means A depends on B, B depends on C, and C depends on A, so nothing can be installed", "Cycles slow down the installation", "Cycles are not a problem; they are ignored"],
    "answer": 1,
    "explain": "A circular dependency means every package in the cycle is waiting for another package in the same cycle. Nothing can be resolved until the cycle is broken, so the package manager reports an error."
  },
  {
    "q": "What algorithm do most internet routers use to compute the shortest path to every destination?",
    "choices": ["BFS", "DFS", "Dijkstra's algorithm", "Topological sort"],
    "answer": 2,
    "explain": "Routers run Dijkstra's algorithm (or a variant) on the network graph, where edge weights represent cost, speed, or delay. The result is a routing table that tells each router where to send packets next."
  }
]

← Phase 2: Finding the Shortest Path and Detecting Cycles · Guide overview

Check your understanding 3 questions

1. PageRank treats the web as what kind of graph?

2. Why does a package manager need to detect cycles in the dependency graph?

3. What algorithm do most internet routers use to compute the shortest path to every destination?