The ELK Stack

It's 2am, the alert fired, and the incident spans four services on a dozen machines. The old move is to SSH into a box, tail a file, guess which box, SSH into the next one, and grep until your eyes blur. That doesn't scale past a handful of servers, and it falls apart the moment a container dies and takes its logs with it. ELK fixes the shape of the problem: every log lands in one searchable place, and you ask questions across the whole fleet from a browser.

How to read this

Three phases, in order. Phase 1 builds the mental model - the four pieces, what each one does, and why centralizing logs beats logging into boxes. Phase 2 is the everyday core: shipping logs with Beats, structuring them, and searching in Kibana. Phase 3 is production reality - the cost of indexing everything, index lifecycle and retention, and the failure modes that page you. Read 1 even if you're impatient; the model makes the rest obvious.

The phases

What ELK actually is - the four pieces and why centralized logs win.
Shipping, structuring, and searching - Beats, parsing, index patterns, and Kibana queries.
Cost, retention, and production reality - index lifecycle, the price of indexing, and what breaks.

Phase 1: What ELK actually is →