Regular Expressions, Explained
You've seen them: a wall of slashes, backslashes, and dollar signs like ^\d{4}-\d{2}-\d{2}$ -
and your stomach sank a little. Regular expressions have a reputation for being write-only magic
that only wizards understand. That reputation is undeserved. A regex is one small idea wearing a
scary costume, and once you see the idea, the costume stops working on you.
This guide makes regex readable instead of terrifying. We'll start with what a regex actually is (a pattern that describes a shape of text), learn the handful of pieces you'll reach for almost every time, and then meet the real-world traps - greedy matching, escaping, and regex that turns into gibberish - so they don't bite you.
How to read this
- Just need to recognize the pieces? Skim Phase 2: The Core Toolkit - it's a tour of every symbol you'll actually use, each with a tiny example.
- Want it to finally make sense? Read in order. Each phase builds on the last, and the whole thing rests on the one idea in Phase 1.
The phases
- What a Regex Actually Is - the mental model: you're describing the shape of text, not writing code. With a tiny first example you can see match and not-match against.
- The Core Toolkit - the pieces you'll use 90% of the time: literals, character classes, quantifiers, anchors, and groups - built up to matching something real, and an honest word on why "the perfect email regex" is a trap.
- Using Regex for Real (and the Gotchas) - where you meet regex
(editors,
grep, code), and the classic traps: greedy vs lazy matching, escaping special characters, and regex becoming unreadable - with the cure for each.
Deeper material - lookahead/lookbehind, backreferences, the differences between regex flavors (PCRE vs JavaScript vs POSIX), and catastrophic backtracking - is deliberately left for a follow-up guide. You can do an enormous amount of real work with only what's here.
Related reading: Programming From Zero for the basics underneath
this, and The Terminal and Shell for using regex with tools like
grep.