Updated Jun 19, 2026

Designing for Scale (Load Balancing & Statelessness)

Your app works. Then it works too well — traffic climbs, the one server you've been running starts to sweat, response times creep up, and you get the message every engineer eventually gets: "it needs to handle more load." The panic move is to buy a bigger box and hope. That buys you a little time and teaches you nothing, and one day there is no bigger box to buy.

This guide is about the calm alternative: designing a system that grows by adding machines instead of replacing them — and the one property that makes that possible. Almost all of scaling comes down to a single idea, and once you see it, the architecture diagrams stop looking like magic. The idea is this: if any server can handle any request, you can add servers freely. Everything else here — load balancers, stateless services, shared session stores — is in service of that one sentence.

How to read this

  • Need the mental model fast? Read Phase 1: Scale Up vs Scale Out — statelessness is the whole game, and it's explained there first.
  • Want it to finally make sense? Read in order. The statelessness idea makes load balancing make sense, and load balancing makes the "what about the stateful bits?" question make sense.

The phases

  1. Scale Up vs Scale Out, and Why Statelessness Matters — bigger box (simple, capped) vs more boxes (the real answer for big scale), and the property that unlocks the second one: statelessness.
  2. Load Balancing — spreading requests across many identical servers: what a load balancer actually does, health checks, and the sticky-session trap.
  3. Scaling the Stateful Bits — the parts you can't just clone: sessions (move them to a shared store), the database (the usual bottleneck), and caching to shed load.

Deliberately deferred to follow-up guides: scaling the database itself (replication and sharding) lives in Scaling a Database; the mechanics of caching live in Caching, Explained; and what to do when a machine dies rather than just gets busy is Designing for Failure. This guide is about handling more load. Those are about handling everything around it.