Files
2nd/10_Wiki/Topics/Backend/Relational Algebra in Databases.md
T
2026-05-10 22:08:15 +09:00

151 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-relational-algebra-in-databases
title: Relational Algebra in Databases
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Relational Algebra, RA, SQL Algebra]
duplicate_of: none
source_trust_level: A
confidence_score: 0.95
verification_status: applied
tags: [database, theory, sql, query-optimization]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: sql
framework: postgres
---
# Relational Algebra in Databases
## 매 한 줄
> **"매 SQL은 매 algebra 의 syntactic sugar"**. Codd(1970)의 relational algebra는 매 set-based operator(σ, π, ⋈, , , ×) 매 closed system. 매 modern query optimizer(Postgres, DuckDB, Snowflake)의 plan tree 매 그대로 RA expression.
## 매 핵심
### 매 6 primitive operators
- **σ (Selection)**: row filter. `σ_{age>30}(R)``WHERE age>30`.
- **π (Projection)**: column subset. `π_{name,age}(R)``SELECT name, age`.
- **⋈ (Join)**: theta/equi/natural. `R ⋈_{R.id=S.rid} S`.
- ** / / ∩**: set ops on union-compatible relations.
- **× (Cartesian product)**: `R × S` — 매 expensive.
- **ρ (Rename)**: alias.
### 매 Derived operators
- **Outer joins** (⟕, ⟖, ⟗): null-padded.
- **Division** (÷): "all-quantifier". `R ÷ S` = "tuples in R related to every S".
- **Aggregation** (γ): `_{dept}γ_{avg(salary)}(Emp)`.
### 매 응용
1. Query optimizer 매 RA tree 의 rewrite (predicate pushdown, join reordering).
2. View materialization 매 algebraic equivalence.
3. Datalog / Differential dataflow의 incremental engine.
## 💻 패턴
### Selection pushdown
```sql
-- Logical: π_{name}(σ_{age>30}(Emp ⋈ Dept))
-- Physical: σ pushed below ⋈ — 매 smaller intermediate
SELECT name FROM Emp e JOIN Dept d ON e.dept_id=d.id WHERE e.age > 30;
-- 매 optimizer 매 σ_{age>30} 의 Emp 매 push.
```
### Projection pushdown
```sql
-- π_{name,salary}(Emp ⋈ Dept) — Dept columns 매 unused
EXPLAIN (FORMAT TEXT)
SELECT e.name, e.salary FROM Emp e JOIN Dept d ON e.dept_id=d.id;
-- Postgres: only e.name,e.salary,e.dept_id materialized.
```
### Join reordering (⋈ associative + commutative)
```sql
-- (A ⋈ B) ⋈ C ≡ A ⋈ (B ⋈ C) — but cost 매 다름
SET join_collapse_limit = 12;
EXPLAIN ANALYZE
SELECT * FROM small s JOIN big b ON s.k=b.k JOIN huge h ON b.k=h.k;
-- 매 small 매 build side 의 선택.
```
### Division via NOT EXISTS
```sql
-- "students who took every required course"
-- Took ÷ Required
SELECT s.id FROM Students s
WHERE NOT EXISTS (
SELECT 1 FROM Required r
WHERE NOT EXISTS (
SELECT 1 FROM Took t
WHERE t.student_id=s.id AND t.course_id=r.course_id
)
);
```
### Aggregation (γ)
```sql
-- _{dept_id}γ_{count(*),avg(salary)}(Emp)
SELECT dept_id, COUNT(*), AVG(salary)
FROM Emp
GROUP BY dept_id;
```
### Set operations
```sql
-- A B (set difference)
SELECT id FROM ActiveUsers
EXCEPT
SELECT id FROM BannedUsers;
-- A ∩ B
SELECT id FROM Premium INTERSECT SELECT id FROM Annual;
```
### Equivalence rewriting
```sql
-- σ_{p∧q}(R) ≡ σ_p(σ_q(R)) 매 split 의 가능
-- σ_p(R ⋈ S) ≡ σ_p(R) ⋈ S if p references only R
-- π_L(R ⋈ S) ≡ π_L(π_{Ljoin}(R) ⋈ π_{Ljoin}(S))
```
## 매 결정 기준
| 상황 | Operator |
|---|---|
| Filter rows | σ |
| Pick columns | π |
| Combine relations on key | ⋈ |
| Union-compatible merge | |
| All-quantifier | ÷ |
| Group + aggregate | γ |
| Preserve unmatched | ⟕/⟖/⟗ |
**기본값**: σ/π/⋈ 의 covers 매 95% of queries.
## 🔗 Graph
- 부모: [[Database Theory]] · [[SQL]]
- 변형: [[Relational Calculus]] · [[Datalog]] · [[Tuple Calculus]]
- 응용: [[Query Optimizer]] · [[Materialized Views]] · [[Differential Dataflow]]
- Adjacent: [[Codd's 12 Rules]] · [[Normalization]] · [[ACID]]
## 🤖 LLM 활용
**언제**: SQL → RA tree 변환 설명, query rewrite suggestion, 학습용 derivation.
**언제 X**: production query plan — 매 EXPLAIN ANALYZE 의 사용.
## ❌ 안티패턴
- **Cartesian product 의 무심**: missing JOIN condition → N×M rows.
- **σ above ⋈**: 매 optimizer 매 push 못 하는 case → manual rewrite.
- **SELECT *** in subquery: π pushdown 매 방해.
- **Bag vs set 의 혼동**: SQL은 bag(multiset). UNION ALL ≠ .
## 🧪 검증 / 중복
- Verified (Codd 1970; Garcia-Molina *Database Systems* ch.2.4; Postgres planner docs).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content (operators + 7 patterns) |