Why Relations Need Separate Tables
The Many-to-Many Problem
Section titled “The Many-to-Many Problem”Consider people living at addresses from our hackathon data:
“The same person can live at different addresses. And at the same address, different people can live.”
This is a many-to-many relationship. You cannot represent it by adding columns.
flowchart LR
subgraph People
P1["Person A"]
P2["Person B"]
P3["Person C"]
end
subgraph Addresses
A1["123 Main St"]
A2["456 Oak Ave"]
end
P1 --> A1
P1 --> A2
P2 --> A1
P3 --> A2
Person A lives at two addresses. Address 123 Main St has two residents. How do you store this in a single table?
Why Single Tables Fail
Section titled “Why Single Tables Fail”Attempt 1: Add address columns to people
person_id | name | address_1 | address_2 | address_3 | ...How many columns? What if someone has 10 addresses over time?
Attempt 2: Add person columns to addresses
address_id | street | person_1 | person_2 | person_3 | ...Same problem in reverse.
Attempt 3: One row per combination
person_id | address_id | name | streetA | 1 | Alice| Main StA | 2 | Alice| Oak AveB | 1 | Bob | Main StNow “Alice” appears twice. Is birth_year the same in both rows? What if one says 1985 and one says NULL?
The Solution: Junction Tables
Section titled “The Solution: Junction Tables”Create a third table that connects the two entities:
erDiagram
PEOPLE {
string person_id PK
string sex
int birth_year
}
ADDRESSES {
string address_id PK
string settlement
}
PERSON_ADDRESS {
string person_id FK
string address_id FK
date valid_from
date valid_till
}
PEOPLE ||--o{ PERSON_ADDRESS : "lives at"
ADDRESSES ||--o{ PERSON_ADDRESS : "is home of"
The Actual Code
Section titled “The Actual Code”Here’s the SQL we wrote during the hackathon to create the person_address junction table:
-- code/create/person_address.sqlCREATE TABLE person_address ASSELECT manager_id AS person_id, address_id, valid_from, valid_till, 'manager' AS relation_typeFROM read_csv_auto('input/motherlode-opten_20250104/manage.csv')WHERE manager_id IS NOT NULL AND address_id IS NOT NULLUNION ALLSELECT owner_id AS person_id, address_id, valid_from, valid_till, 'owner' AS relation_typeFROM read_csv_auto('input/motherlode-opten_20250104/own.csv')WHERE owner_id IS NOT NULL AND address_id IS NOT NULL;
COPY person_address TO 'temp/scd/person_address.parquet' (FORMAT PARQUET);Notice:
- We extract
address_idfrom bothmanage.csvandown.csv - These are person home addresses (not company addresses!)
- We track
relation_typeto know the source
When You Can Embed (One-to-One)
Section titled “When You Can Embed (One-to-One)”Some relationships are genuinely one-to-one:
“Everybody has one mother. At least in Hungary. And so then you could directly write it in the entity table.”
In our data, we embed sex and birth_year directly in the people table because each person has (at most) one sex and one birth year.
-- code/create/people.sqlCREATE TABLE people ASSELECT person_id, MODE(sex) AS sex, -- embedded attribute MODE(birth_year) AS birth_year -- embedded attributeFROM (...)GROUP BY person_id;The Bug We Found
Section titled “The Bug We Found”We initially put person addresses in the wrong table! The address_id in manage.csv and own.csv looked like it belonged with organization locations.
flowchart TD
subgraph Wrong["Initial Design (WRONG)"]
loc1["location table<br/>org → address<br/>including person home addresses!"]
end
subgraph Right["Fixed Design (CORRECT)"]
loc2["location table<br/>org → address<br/>HQ, branch, site only"]
pa["person_address table<br/>person → address<br/>home addresses"]
end
Wrong -->|"FK test failed!"| Right
The FK integrity test caught this:
person_address.address_id -> addresses: FAILWe had to update addresses.sql to include addresses from manage.csv and own.csv.
The Rule
Section titled “The Rule”If there’s any chance the relationship could be many-to-many, make it a separate table from the start.
In our hackathon:
location: org ↔ address (many-to-many over time)manage: org ↔ person (many-to-many)own: org ↔ person (many-to-many with share %)person_address: person ↔ address (many-to-many over time)
All four needed separate junction tables.