Auto-Scaling, Explained
You provision servers for the traffic you expect, and then real traffic shows up and does whatever it wants — a product launch, a viral post, a Monday-morning login rush, a 3am lull where almost nobody's around. Buy for the busiest moment and you're paying for idle machines the other 20 hours a day. Buy for the average and the busy moment falls over. Auto-scaling is the answer to that exact bind: capacity that grows and shrinks with actual demand instead of a guess made once and left alone. This guide covers why you'd want it, how it actually decides to act, and the sharp edges that show up the first time it kicks in for real.
The phases
- Why you'd want this at all — the peak-vs-average traffic problem, and what over- and under-provisioning each cost you.
- How it actually decides to scale — metrics, thresholds, cooldowns, and the policies that turn a number into an action.
- The gotchas — cold starts, the thundering herd, and why auto-scaling needs a load balancer to actually work.
Phase 1: Why you'd want this at all →