Foreign Keys & Referential Integrity

Q: What is a foreign key?

A column in one table that holds a primary key value from another table - a pointer the database can protect

Q: What is referential integrity?

The database's refusal to create orphans - every foreign key value must point at a row that actually exists

Q: How do you model a many-to-many relationship (students and courses)?

Add a junction table whose rows are the pairings, each holding two foreign keys

In Phase 1 you put a number in each order — customer 1 — to say which customer it belonged to. In Phase 2 you made sure customer 1 was a stable, unique name. But so far that number is just a number. Nothing stops you from typing customer 999 into an order when there is no customer 999. The link is a convention, not a guarantee.

The foreign key is what upgrades the convention into a guarantee. It's the second half of a relationship — and the moment you declare one, the database stops trusting you to keep your links honest and starts enforcing it itself.

What a foreign key actually is

What it actually is. A foreign key is a column in one table whose job is to hold a primary key value from another table. It's a pointer — orders.customer_id holds the id of a row over in customers — and you tell the database it's a pointer so the database can protect it.

erDiagram
  CUSTOMERS ||--o{ ORDERS : "referenced by"
  CUSTOMERS {
    int id PK
  }
  ORDERS {
    int customer_id FK
  }

The customers.id is the primary key (the name); orders.customer_id is the foreign key (a pointer at a customer's id). Orders 1001, 1002, and 1004 all carry customer_id = 1, so they all point at Ada.

📝 Terminology. The table being pointed at (customers) is the referenced or parent table. The table doing the pointing (orders) is the referencing or child table. The foreign key always lives on the child.

You declare it when you create the child table:

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(id),
    product     TEXT NOT NULL,
    amount      NUMERIC NOT NULL
);

What just happened: REFERENCES customers(id) is the line that matters. It tells the database "customer_id is not a free-floating number — it must always equal some real id in customers." From this point on, the database itself stands guard over that promise.

What "referential integrity" means — the database refuses orphans

📝 Referential integrity is the rule that every foreign key value must point at a row that actually exists. No order may belong to a customer who isn't there. A child with a missing parent is called an orphan, and referential integrity is the database's standing refusal to create one.

Watch it enforce the rule. We try to add an order for customer 999, who doesn't exist:

INSERT INTO orders (customer_id, product, amount)
VALUES (999, 'Desk lamp', 35.00);

  ERROR:  insert or update on table "orders" violates foreign key constraint
  DETAIL: Key (customer_id)=(999) is not present in table "customers".

What just happened: the database checked customers for an id of 999, found nothing, and rejected the order before it existed. Without the foreign key, that bad row would have slipped in silently and shown up months later as a crash or a blank name in a report. With it, the mistake is caught at the exact moment it's made. The database is doing your data-quality checking for you, every single write, forever.

How foreign keys model the two relationship shapes

Almost every relationship in real data is one of two shapes, and foreign keys express both.

One-to-many — the everyday case

One customer has many orders; each order has exactly one customer. That's the shape we've had all along, and it's the most common relationship there is. The pattern is simple:

Put the foreign key on the "many" side, pointing at the "one" side.

Each order carries one customer_id. A customer can be pointed at by any number of orders. That asymmetry — many pointers in, one pointer out — is one-to-many.

flowchart RL
  O1[order 1001] --> C[customer 1 - Ada]
  O2[order 1002] --> C
  O4[order 1004] --> C

Many-to-many — when both sides multiply

Now a harder shape. A student takes many courses; a course has many students. You can't put the foreign key on either side — a single student_id column on courses could only hold one student per course, and vice versa. Neither table has room.

The fix is a third table that exists purely to hold the pairings:

📝 Junction table (also: join table, bridge table, association table). A small table whose rows are the connections themselves. Each row holds two foreign keys — one to each side — and each row means "this student is in this course."

erDiagram
  STUDENTS ||--o{ ENROLLMENTS : has
  COURSES ||--o{ ENROLLMENTS : has
  ENROLLMENTS {
    int student_id FK
    int course_id FK
  }

One row per pairing in enrollments: "Ada is in Calculus," "Ada is in Logic," "Grace is in Calculus." Each row holds two foreign keys — one to each side.

CREATE TABLE enrollments (
    student_id INTEGER NOT NULL REFERENCES students(id),
    course_id  INTEGER NOT NULL REFERENCES courses(id),
    PRIMARY KEY (student_id, course_id)
);

What just happened: the junction table turns one many-to-many relationship into two ordinary one-to-many relationships (students→enrollments and courses→enrollments). The PRIMARY KEY (student_id, course_id) — a primary key made of two columns together — also quietly prevents enrolling the same student in the same course twice. Whenever you hear "many-to-many," reach for a junction table; there is essentially always one hiding in the middle.

The gotcha that bites everyone: what happens on delete?

You've protected the creation of orphans — no order can point at a nonexistent customer. But there's a back door. What if the customer exists when the order is created, and you delete the customer later? All their orders would instantly become orphans.

The database won't allow that silently. When you declare a foreign key, you also choose what should happen if someone tries to delete a parent that still has children. The two choices you'll meet first:

⚠️ ON DELETE RESTRICT — block the delete (the safe default). If a customer still has orders, the database refuses to delete the customer at all. You must deal with the orders first. This is the cautious choice, and it's what you get by default if you don't specify — the database would rather stop you than let data quietly disappear.

DELETE FROM customers WHERE id = 1;

  ERROR:  update or delete on table "customers" violates foreign key constraint
          on table "orders"
  DETAIL: Key (id)=(1) is still referenced from table "orders".

What just happened: Ada still has orders pointing at her, so the database blocked the deletion entirely. Nothing was orphaned because nothing was deleted. The error is the database protecting you from a mistake.

⚠️ ON DELETE CASCADE — delete the children too. If you delete the customer, the database automatically deletes all their orders in the same breath. Convenient — and genuinely dangerous. One DELETE on a parent can wipe out thousands of child rows you never mentioned, with no second confirmation. Reach for CASCADE only when the children truly have no meaning without the parent (e.g. deleting a user should remove their draft posts), and never on data you'd grieve.

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
    product     TEXT NOT NULL,
    amount      NUMERIC NOT NULL
);

What just happened: with this declaration, DELETE FROM customers WHERE id = 1 would succeed and silently delete orders 1001, 1002, and 1004 along with Ada. That's the behavior you want for a user and their drafts; it's a catastrophe for a customer and their financial order history. The whole risk lives in that one ON DELETE clause — read it carefully on any table you didn't write.

(There are gentler options too, like ON DELETE SET NULL, which leaves the child but blanks its pointer. The two above are the ones to understand first.)

Why this all sets up JOINs

Step back and look at what you've built. Your data is split into clean tables (Phase 1). Every row has a stable name (Phase 2). And those names are connected by enforced, trustworthy links (this phase). You have a small web of related tables where the relationships are guaranteed correct — no orphans, no dangling pointers.

That guarantee is precisely what makes the next step safe. When you want "every order with its customer's name," you follow the foreign keys back to their primary keys and stitch the tables together. That stitching is the JOIN, and because referential integrity ensures every customer_id really does match a customer, your joins return whole, sensible rows instead of gaps. Here's that payoff on a built-in pair of tables — books each carry an author_id foreign key pointing at authors.id:

SELECT books.title, authors.name AS author
FROM books
JOIN authors ON books.author_id = authors.id;

What just happened: Each book carried an author_id; the JOIN followed that foreign key back to the matching authors row and stitched the two tables into one result — every book shown beside its author's name. Because each author_id really points at a real author, no row came back with a gap.

You're ready for SQL JOINs Explained — it's the natural payoff of everything here.

Recap

A foreign key is a column holding another table's primary key — a pointer from a child row to a parent row.
Referential integrity is the database refusing orphans — every foreign key value must point at a row that actually exists; bad pointers are rejected at write time.
One-to-many: put the foreign key on the "many" side, pointing at the "one." (Orders → customer.)
Many-to-many: add a junction table whose rows are the pairings, each holding two foreign keys.
On delete, you choose: RESTRICT blocks deleting a parent that still has children (the safe default); CASCADE deletes the children with it (convenient and dangerous — read it carefully).
This is the foundation JOINs stand on — trustworthy links are what let you reassemble the tables correctly.

← Phase 2: Primary Keys · Guide overview · Next guide: SQL JOINs Explained →

Check your understanding 3 questions

1. What is a foreign key?

2. What is referential integrity?

3. How do you model a many-to-many relationship (students and courses)?