The Session & Unit of Work
In Phase 3 you taught SQLAlchemy the shape of your world — Author, Book,
Tag — and even built the tables with create_all. But a mapping just sits there. Something has to
actually use it: insert a new author, fetch one back, change a title and have that change reach the
database. That something is the Session, and it's the single most important object in the whole ORM.
This is the phase. If you take one idea from this entire guide, take this one. Almost every SQLAlchemy surprise you'll ever hit — a change that saved without you calling save, a query that ran once instead of twice, a lazy load that worked here but exploded there — traces straight back to what's in this file. So we're going slow and making it land, because once the Session clicks, the rest of SQLAlchemy stops being magic and starts being obvious.
The mental model: a workbench, not a pipe
📝 People picture an ORM as a pipe: Python object goes in one end, SQL comes out the other, a row lands in the table. That picture will mislead you for years. The Session isn't a pipe — it's a workbench. When you add or load objects, the Session lays them out on a bench it keeps for the duration of your work, watches them, and only sends SQL to the database when it decides it's time.
The Session is your handle to that workbench. It holds your objects, tracks every change you make to
them, and talks to the database through the engine you built in Phase 2.
You open one with the modern context-managed pattern:
=
# do all your work here — add, load, modify
...
What just happened: Session(engine) created a workbench bound to your engine — that's how it knows
which database to talk to. The with block is the modern, recommended way to use it: when the block ends
(normally or via an exception), the Session is closed and cleaned up automatically — no dangling
connections. Everything you do with your models happens inside this block.
💡 If you've come from Java's Hibernate/JPA, this is the exact same idea wearing Python clothes.
SQLAlchemy's Session is Hibernate's EntityManager/persistence context: a per-unit-of-work, in-memory
area that manages your objects and syncs them to the database on its schedule. If the workbench framing
feels familiar, that's why — see /guides/hibernate-and-jpa-from-zero
for the Java framing. The concepts transfer cleanly in both directions.
Persisting: add + commit
Let's save an Author and a Book. Two steps: hand the object to the Session with add, then make it
permanent with commit.
=
=
# author is now on the workbench (pending)
# None — no INSERT has run yet
# NOW the INSERTs fire
# 1 — the database assigned it, SQLAlchemy read it back
INSERT INTO authors (name) VALUES ('Ursula K. Le Guin');
INSERT INTO books (title) VALUES ('A Wizard of Earthsea');
What just happened: Author(name=...) made a plain Python object — at that moment the Session knows
nothing about it. session.add(author) placed it on the workbench, but no SQL ran yet, which is why
author.id is still None. The INSERTs didn't fire on the add lines — they fired at commit. That's
when SQLAlchemy sent the pending work to the database, the database assigned each row a primary key, and
SQLAlchemy read those ids back and populated author.id (now 1). That gap between "I told the Session
about this object" and "the SQL actually ran" is the whole story of this phase.
⚠️ Nothing hits the database permanently until commit. Until then your changes live only on the
workbench. If the with block exits without a commit — including because an exception was raised — the
work is rolled back and never reaches the database. add schedules; commit saves.
The identity map & unit of work
This is the core idea, and it surprises everyone the first time. 📝 Within a single Session, each database
row maps to exactly one Python object — this is the identity map. Ask the Session for author 1
ten times and you get the same instance back every time, and after the first lookup, no further
queries.
Watch how many SELECTs come out:
= # runs a SELECT, puts the Author on the bench
= # same id, same Session
# True — not just equal, the SAME object
SELECT authors.id, authors.name FROM authors WHERE authors.id = 1;
True
What just happened: two get calls, but only one SELECT. The first call ran the query, built an
Author from the row, and stored it on the workbench. The second call found it already there and handed
it straight back from memory — no database round trip. And first is second is True: not two objects
with equal data, the very same object. That's the identity map guaranteeing one row, one object, per
Session.
💡 The other half of this idea is the unit of work. The Session doesn't send your changes to the
database one at a time as you make them. It collects everything — new objects from add, modifications to
existing ones, deletions — and flushes them together as a single coordinated batch, in the right order, at
the right moment. You describe what the data should look like; the Session figures out the minimal set of
INSERT/UPDATE/DELETE statements to get there and runs them as one unit. The identity map is what
makes this possible: one object per row means there's exactly one source of truth on the bench to watch
and synchronize.
Flush vs commit & dirty tracking
Two words get thrown around a lot, and conflating them causes real confusion. 📝 Flush = send the
pending SQL to the database (the INSERTs, UPDATEs, DELETEs), but inside the still-open transaction.
📝 Commit = make it all permanent.
Here's the relationship: commit always flushes first, then commits. And the Session also autoflushes
automatically right before it runs a query — so that any pending changes are visible to that query. Most of
the time you never call flush() yourself; it happens for you. You think in terms of commit. But knowing
flush exists explains when your SQL actually runs.
Now the behavior that feels like magic until you understand the workbench: dirty tracking (also called
dirty checking). 📝 When you change an attribute on an object the Session is managing, the Session
notices — and issues an UPDATE at the next flush. There is no explicit save call.
= # load it — now managed by the Session
# 'A Wizard of Earthsea'
= # just set the attribute
# the Session noticed — UPDATE fires
SELECT books.id, books.title FROM books WHERE books.id = 1;
UPDATE books SET title='A Wizard of Earthsea (Illustrated)' WHERE books.id = 1;
What just happened: you loaded a Book, which put it on the workbench under the Session's watch. Then you
just assigned a new value to book.title — no session.add, no session.save, no session.update
(there is no such method). At commit, the Session compared the object's current state to what it loaded,
saw title had changed, and emitted exactly one UPDATE for that one column. This is the unit of work
doing its job: you mutate plain Python objects, and the Session translates your mutations into the right
SQL. (Java/Hibernate developers know this exact behavior — it's the same dirty checking, see
/guides/hibernate-and-jpa-from-zero.)
Object states & the session-scope rule
Every object, from the Session's point of view, is always in exactly one of four states. Learn these names — error messages, docs, and your own debugging all speak this language.
📝 The four states:
- Transient — a brand-new object you made with
Book(...). The Session has never heard of it; it's not on the bench and has no row. (Book(title="...")before anyadd.) - Pending — you've called
add, so it's on the bench and scheduled forINSERT, but the flush hasn't happened yet. (Afteradd, beforecommit/flush.) - Persistent — on the bench, tracked by the Session, and tied to a real database row. This is the
state where dirty tracking works. (After commit, or anything
get/a query returns.) - Detached — was persistent, but its Session has closed. It still holds its data, but nobody's
watching it; changes go nowhere. (An object you loaded, after the
withblock ended.)
stateDiagram-v2
[*] --> Transient: Book(...)
Transient --> Pending: session.add()
Pending --> Persistent: flush / commit
[*] --> Persistent: get() / query loads it
Persistent --> Detached: session closes
⚠️ Here's the trap that bites everyone eventually: a detached object can't lazy-load. If you load an
Author inside a with Session(...) block, let the block end, and then try to walk to a relationship
that wasn't fetched yet (say author.books), there's no open Session to run the query through — and you
get a DetachedInstanceError. You don't have relationships yet (they arrive in
Phase 6), and we'll meet this error properly in Phase 7.
But you already understand why it happens: no open Session, no workbench, nothing to do the lazy load.
That's the payoff of learning states first.
💡 So how should you scope a Session? The pattern is one Session per unit of work: open it, do a
coherent chunk of work, commit (or rollback on error), close it — exactly what the with block gives
you. In a web app this becomes one Session per request. Keep them short-lived. ⚠️ And do not share a
single Session across threads or across concurrent requests — a Session is not thread-safe, and sharing one
is a classic source of corrupted state and baffling bugs. One unit of work, one Session, then let it go.
💡 Step back, because this is the lens for everything that follows. Nearly every ORM behavior in the rest of this guide is one of these Session ideas wearing a costume:
- "I changed a field and it saved without calling save" → the object was persistent, and dirty tracking caught the change.
- "The same query ran once instead of twice" → the identity map served the second lookup from memory.
- "
isreturnedTruefor two loads" → the identity map gave you one object per row. - "Why did this
DetachedInstanceErrorblow up?" → you touched a detached object after its Session closed.
Master the Session, and the rest of SQLAlchemy is just details. Next, in Phase 5,
you'll use this Session to run real queries with select().
Recap
- The
Sessionis your handle to the ORM — a workbench that holds your objects, tracks their changes, and talks to the database via the engine. Open it withwith Session(engine) as session:. It's SQLAlchemy's equivalent of Hibernate'sEntityManager/persistence context. session.add(obj)schedules an insert;session.commit()makes it permanent. The primary key is populated only after the flush/commit — nothing hits the database permanently untilcommit.- The Session is an identity map: within one Session, a row maps to exactly one object, so
getthe same id twice → oneSELECTand the same instance (isisTrue). It batches all changes and applies them as one unit of work. - Flush sends pending SQL inside the transaction (it autoflushes before queries and on commit);
commit makes it permanent. Dirty tracking means changing an attribute on a persistent object
issues an
UPDATEat flush — with no explicit save call. - Every object is transient (new, unknown), pending (added, not flushed), persistent (tracked
- in the DB), or detached (Session closed, unwatched). A detached object can't lazy-load — that's the
DetachedInstanceErroryou'll meet in Phase 7.
- in the DB), or detached (Session closed, unwatched). A detached object can't lazy-load — that's the
- 💡 Scope a Session as one unit of work (one per request in web apps); keep it short-lived and never share one across threads. Nearly every ORM behavior traces back to the Session — it's the lens for the rest of the guide.
Quick check
The three ideas that explain the most future bugs:
[
{
"q": "You call `session.add(author)` and then immediately print `author.id`. What do you see, and why?",
"choices": [
"The real primary key — `add` runs the INSERT immediately",
"None — `add` only schedules the insert; the id isn't populated until the flush/commit, when the database assigns it",
"A randomly generated UUID assigned by SQLAlchemy",
"It raises an error because the object isn't committed yet"
],
"answer": 1,
"explain": "`add` places the object on the workbench in the pending state but runs no SQL. Nothing hits the database until commit (or a flush), so the database hasn't assigned a primary key yet and `author.id` is None. After commit, SQLAlchemy reads the assigned id back and populates it."
},
{
"q": "Inside one Session, you call `session.get(Author, 1)` twice. How many SELECT queries run, and is the result the same object?",
"choices": [
"One SELECT; both calls return the same object instance (`is` is True) — the identity map serves the second call from memory",
"Two SELECTs; you get two separate objects with equal data",
"Two SELECTs, but SQLAlchemy returns the same object both times",
"Zero SELECTs; get never touches the database"
],
"answer": 0,
"explain": "The Session is an identity map. The first get runs the SELECT and stores the Author on the workbench; the second get finds it already there and returns that same instance from memory with no new query, so `is` is True. One row maps to exactly one object per Session."
},
{
"q": "You load a Book (it's now persistent), set `book.title = \"New Title\"`, and call `session.commit()`. There's no `session.save()` call. What happens?",
"choices": [
"Nothing is saved — you must call session.save() or session.update() to persist the change",
"It raises an error because you modified an object without re-adding it",
"An UPDATE fires at commit — dirty tracking noticed the changed attribute, so no explicit save call is needed",
"The whole row is re-inserted as a new record"
],
"answer": 2,
"explain": "While an object is persistent, the Session watches it. Changing an attribute marks it dirty, and at the next flush (which commit triggers) the Session emits an UPDATE for the changed column. This is dirty tracking — there is no session.save()/session.update() method; mutating the object is the save."
}
]
← Phase 3: Defining Models · Guide overview · Phase 5: Querying with select() →
Check your understanding
1. You call `session.add(author)` and then immediately print `author.id`. What do you see, and why?
2. Inside one Session, you call `session.get(Author, 1)` twice. How many SELECT queries run, and is the result the same object?
3. You load a Book (it's now persistent), set `book.title = "New Title"`, and call `session.commit()`. There's no `session.save()` call. What happens?