ML Basics for Data People
You already work with data. You write SQL, you build dashboards, you know what a clean table looks like and you can smell a bad join from across the room. And now machine learning keeps coming up — in standups, in job posts, in that one meeting where someone said "can't we just throw a model at it?" — and it feels like a different world with its own priesthood and its own vocabulary.
Here's the reassuring truth: ML is not a different world. It's a different technique applied to the same raw material you already handle every day — data. The hard part of ML is almost never the math or the model. It's the data work you already understand. This guide gives you enough of a mental model to follow the conversation, ask the right questions, and recognize where your existing skills are exactly what a project needs.
How to read this
- Want the one-paragraph version? ML learns patterns from historical data to make predictions on new data, instead of you hand-writing the rules. Everything else is detail. Read Phase 1 and you'll have the core idea.
- Want it to finally make sense? Read in order. Each phase builds on the last: what ML is, how a project actually flows, and where you — the data person — fit and why you matter more than you think.
The phases
- What ML Actually Is (for Data People) — learning patterns from examples instead of writing rules by hand; the difference between supervised and unsupervised, grounded in a churn example.
- The Workflow — features, splitting into train and test (and why), training, and evaluating — including why "99% accurate" can still be a useless model.
- Where Data People Fit — the unglamorous truth: clean inputs, good features, leak-free splits, reliable pipelines. The model is the easy part.
This guide stops at the basics on purpose. Deep learning, neural networks, and large language models (the "AI" everyone's talking about) are their own territory — we'll point you toward a future ai-ml category for that, rather than cram it in here. The foundations below are what make that material make sense later.
Related reading: What Is Data Engineering and Data Quality and Observability — the disciplines that feed ML its lifeblood.