[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,376 @@
|
||||
---
|
||||
id: db-oltp-olap-htap
|
||||
title: OLTP vs OLAP vs HTAP — workload 별 DB
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [database, oltp, olap, vibe-coding]
|
||||
tech_stack: { language: "SQL", applicable_to: ["Database"] }
|
||||
applied_in: []
|
||||
aliases: [OLTP, OLAP, HTAP, transactional, analytical, columnar, row-based, mixed workload]
|
||||
---
|
||||
|
||||
# OLTP / OLAP / HTAP
|
||||
|
||||
> Workload 다름 = DB 다름. **OLTP (작은 read/write 많이) vs OLAP (큰 query 적게) vs HTAP (둘 다)**. 잘못 선택 = 큰 성능 / cost 차이.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- OLTP: Online Transaction Processing.
|
||||
- OLAP: Online Analytical Processing.
|
||||
- HTAP: Hybrid (둘 다).
|
||||
- Row vs Columnar storage.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### OLTP 특징
|
||||
```
|
||||
Workload:
|
||||
- 매 query 가 작음 (< 100 rows)
|
||||
- 많은 query (1000+ QPS)
|
||||
- Read + write mix
|
||||
- Latency 중요 (< 50ms)
|
||||
- ACID transaction
|
||||
|
||||
Examples:
|
||||
- User login
|
||||
- Order creation
|
||||
- Cart update
|
||||
- Profile read
|
||||
|
||||
DB:
|
||||
- Postgres / MySQL (default)
|
||||
- CockroachDB / Spanner (distributed)
|
||||
- DynamoDB (NoSQL)
|
||||
```
|
||||
|
||||
### OLAP 특징
|
||||
```
|
||||
Workload:
|
||||
- 매 query 가 큼 (M+ rows)
|
||||
- 적은 query (분 / 시간)
|
||||
- Read 거의만
|
||||
- Throughput 중요
|
||||
- Scan 가 큰 portion
|
||||
|
||||
Examples:
|
||||
- Daily revenue
|
||||
- User cohort analysis
|
||||
- A/B test report
|
||||
- Dashboard
|
||||
|
||||
DB:
|
||||
- ClickHouse / DuckDB
|
||||
- Snowflake / BigQuery / Redshift
|
||||
- Druid / Pinot
|
||||
```
|
||||
|
||||
### Row vs Columnar
|
||||
```
|
||||
Row (OLTP):
|
||||
[id, name, email, age]
|
||||
[1, 'Alice', 'a@x', 25]
|
||||
[2, 'Bob', 'b@x', 30]
|
||||
|
||||
→ 1 row 다 읽기 빠름. SELECT * WHERE id=1.
|
||||
|
||||
Columnar (OLAP):
|
||||
id : [1, 2, 3, ...]
|
||||
name : ['Alice', 'Bob', ...]
|
||||
age : [25, 30, ...]
|
||||
|
||||
→ 1 column 만 읽기 빠름. SELECT AVG(age).
|
||||
+ Compression 강함 (같은 type 모임).
|
||||
```
|
||||
|
||||
### Postgres (OLTP default)
|
||||
```sql
|
||||
-- 빠른 single-row
|
||||
SELECT * FROM users WHERE id = 123;
|
||||
|
||||
-- ACID transaction
|
||||
BEGIN;
|
||||
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
|
||||
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
|
||||
COMMIT;
|
||||
```
|
||||
|
||||
### ClickHouse (OLAP)
|
||||
```sql
|
||||
CREATE TABLE events (
|
||||
ts DateTime,
|
||||
user_id UInt32,
|
||||
event String,
|
||||
value Float32
|
||||
) ENGINE = MergeTree
|
||||
ORDER BY (ts, user_id);
|
||||
|
||||
-- 빠른 aggregation
|
||||
SELECT
|
||||
toStartOfHour(ts) AS hour,
|
||||
count() AS cnt,
|
||||
avg(value) AS avg_val
|
||||
FROM events
|
||||
WHERE ts > now() - INTERVAL 7 DAY
|
||||
GROUP BY hour
|
||||
ORDER BY hour;
|
||||
```
|
||||
|
||||
→ 1B+ row aggregation = ms / 초.
|
||||
|
||||
### DuckDB (single-node OLAP)
|
||||
```sql
|
||||
-- Parquet 직접
|
||||
SELECT date, SUM(amount) AS total
|
||||
FROM 's3://bucket/sales/*.parquet'
|
||||
WHERE date > '2026-01-01'
|
||||
GROUP BY date;
|
||||
|
||||
-- Postgres 도 query (foreign data wrapper)
|
||||
ATTACH 'postgres://...' AS pg;
|
||||
SELECT * FROM pg.public.users LIMIT 10;
|
||||
```
|
||||
|
||||
→ TB 까지 single VM. Spark 보다 simple.
|
||||
|
||||
### Snowflake (cloud OLAP)
|
||||
```sql
|
||||
CREATE TABLE events ... CLUSTER BY (ts);
|
||||
|
||||
-- Compute scaling
|
||||
ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'X-LARGE';
|
||||
|
||||
-- Time travel
|
||||
SELECT * FROM events AT(TIMESTAMP => '2026-05-01');
|
||||
```
|
||||
|
||||
→ Compute / storage 분리. Pay per query.
|
||||
|
||||
### BigQuery
|
||||
```sql
|
||||
SELECT
|
||||
DATE(ts) AS day,
|
||||
COUNT(*) AS events
|
||||
FROM `project.dataset.events`
|
||||
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260509'
|
||||
GROUP BY day;
|
||||
```
|
||||
|
||||
→ Pay per scanned bytes. Partition + cluster 가 cost 줄임.
|
||||
|
||||
### HTAP (Hybrid)
|
||||
```
|
||||
같은 DB 가 OLTP + OLAP 둘 다.
|
||||
|
||||
기술:
|
||||
- TiDB (Raft + columnar)
|
||||
- SingleStore (MemSQL)
|
||||
- CockroachDB (column store 추가됨)
|
||||
- Postgres + Citus columnar
|
||||
|
||||
→ "OLTP db 의 read replica = ClickHouse" 식 architecture 가 일반적.
|
||||
```
|
||||
|
||||
### Postgres + Citus (HTAP)
|
||||
```sql
|
||||
-- Convert to columnar
|
||||
SELECT alter_table_set_access_method('events', 'columnar');
|
||||
|
||||
-- Insert 가 row, query 가 column
|
||||
```
|
||||
|
||||
### Architectures
|
||||
```
|
||||
1. Pure OLTP:
|
||||
App → Postgres
|
||||
|
||||
2. OLAP 분리:
|
||||
App → Postgres → CDC → Snowflake / ClickHouse
|
||||
↓
|
||||
BI tool / dashboard
|
||||
|
||||
3. HTAP:
|
||||
App → TiDB (writes go to OLTP, reads to OLAP)
|
||||
```
|
||||
|
||||
### CDC (Change Data Capture)
|
||||
```
|
||||
Postgres → Debezium → Kafka → ClickHouse / Snowflake.
|
||||
|
||||
Real-time:
|
||||
- Order placed (Postgres)
|
||||
- 1초 후 dashboard 업데이트 (ClickHouse)
|
||||
```
|
||||
|
||||
### Star schema (OLAP)
|
||||
```sql
|
||||
-- Fact table (큰)
|
||||
CREATE TABLE fact_sales (
|
||||
date_id INT,
|
||||
product_id INT,
|
||||
customer_id INT,
|
||||
amount DECIMAL,
|
||||
quantity INT
|
||||
);
|
||||
|
||||
-- Dimension tables (작은)
|
||||
CREATE TABLE dim_date (date_id INT, date DATE, year INT, month INT);
|
||||
CREATE TABLE dim_product (product_id INT, name VARCHAR, category VARCHAR);
|
||||
CREATE TABLE dim_customer (customer_id INT, country VARCHAR, segment VARCHAR);
|
||||
|
||||
-- Join
|
||||
SELECT d.year, p.category, SUM(f.amount)
|
||||
FROM fact_sales f
|
||||
JOIN dim_date d USING (date_id)
|
||||
JOIN dim_product p USING (product_id)
|
||||
GROUP BY d.year, p.category;
|
||||
```
|
||||
|
||||
→ OLAP 의 표준 schema.
|
||||
|
||||
### Snowflake schema (정규화)
|
||||
```
|
||||
dim_product → dim_category → dim_dept
|
||||
|
||||
→ Star 가 simple 가, snowflake 가 정규화.
|
||||
```
|
||||
|
||||
### Data warehouse vs lake
|
||||
```
|
||||
Warehouse: schema-on-write (정형, SQL 친화).
|
||||
Lake: schema-on-read (raw file, flexible).
|
||||
|
||||
Lakehouse: 둘 다 (Iceberg, Delta, Hudi).
|
||||
```
|
||||
|
||||
### Materialized view (OLAP boost)
|
||||
```sql
|
||||
-- 매일 dashboard query
|
||||
CREATE MATERIALIZED VIEW daily_revenue AS
|
||||
SELECT date, SUM(amount) FROM orders GROUP BY date;
|
||||
|
||||
-- 빠른 query
|
||||
SELECT * FROM daily_revenue;
|
||||
|
||||
-- Refresh
|
||||
REFRESH MATERIALIZED VIEW daily_revenue;
|
||||
```
|
||||
|
||||
→ Pre-compute. Dashboard latency ↓.
|
||||
|
||||
### Pre-aggregation (Cube.dev)
|
||||
```yaml
|
||||
cube('Sales', {
|
||||
sql: 'SELECT * FROM orders',
|
||||
measures: {
|
||||
total: { sql: 'amount', type: 'sum' },
|
||||
},
|
||||
dimensions: {
|
||||
date: { sql: 'created_at', type: 'time' },
|
||||
},
|
||||
preAggregations: {
|
||||
daily: {
|
||||
type: 'rollup',
|
||||
measures: [Sales.total],
|
||||
timeDimension: Sales.date,
|
||||
granularity: 'day',
|
||||
refreshKey: { every: '1 hour' },
|
||||
},
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
→ API layer 가 자동 pre-agg.
|
||||
|
||||
### When OLTP, when OLAP?
|
||||
```
|
||||
Query frequency:
|
||||
> 100 qps → OLTP db
|
||||
< 1 qps → OLAP db
|
||||
|
||||
Query size:
|
||||
< 1k row → OLTP
|
||||
> 100k row → OLAP
|
||||
|
||||
Latency:
|
||||
< 50ms → OLTP
|
||||
> 1s OK → OLAP
|
||||
|
||||
Mixed: 두 DB + CDC.
|
||||
```
|
||||
|
||||
### Postgres 가 OLAP 가능?
|
||||
```
|
||||
GB 까지 OK (proper index + partition).
|
||||
TB+ = 어려움.
|
||||
PB = 안 됨.
|
||||
|
||||
→ Postgres 의 row store 가 큰 scan 약함.
|
||||
Citus columnar / pg_duckdb / hydra 가 hybrid.
|
||||
```
|
||||
|
||||
### Cost 비교
|
||||
```
|
||||
Postgres RDS: $$
|
||||
Snowflake: $$$ (compute hour)
|
||||
BigQuery: $$ (scan bytes)
|
||||
ClickHouse self-host: $$
|
||||
Redshift: $$$
|
||||
DuckDB self-host: $ (1 VM)
|
||||
```
|
||||
|
||||
### Pinot / Druid / StarRocks (real-time OLAP)
|
||||
```
|
||||
ClickHouse: 강력, batch 친화.
|
||||
Pinot / Druid: real-time + low-latency aggregation.
|
||||
StarRocks: ClickHouse 대안, MySQL 호환.
|
||||
|
||||
→ User-facing OLAP (사용자 dashboard) 가 Pinot / Druid.
|
||||
Internal BI = ClickHouse / Snowflake.
|
||||
```
|
||||
|
||||
### LLM 가 SQL 작성
|
||||
```ts
|
||||
const sql = await llm.complete({
|
||||
system: 'Generate ClickHouse SQL.',
|
||||
prompt: `Schema: ${schema}\nQuestion: How many users active in last 7 days?`,
|
||||
});
|
||||
|
||||
// Validate + run
|
||||
```
|
||||
|
||||
→ Text-to-SQL. ClickHouse 의 syntax 가 약간 다름 → eval 필수.
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 상황 | DB |
|
||||
|---|---|
|
||||
| 일반 app | Postgres / MySQL |
|
||||
| Analytics dashboard | ClickHouse / Snowflake |
|
||||
| Single VM TB | DuckDB |
|
||||
| User-facing analytics | Pinot / Druid / StarRocks |
|
||||
| HTAP | TiDB / SingleStore / Citus |
|
||||
| 데이터 < 100 GB | Postgres OK |
|
||||
| 데이터 1+ TB analytics | Specialized OLAP |
|
||||
| Real-time + analytical | Materialize / RisingWave |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **Postgres 가 모든 거 (BI)**: 큰 scan 느림.
|
||||
- **OLAP 의 매 row update**: 비효율.
|
||||
- **Index 무시 (OLTP)**: full scan.
|
||||
- **OLAP 도 transaction 가정**: 약함.
|
||||
- **ETL 없음**: 매 query 가 raw.
|
||||
- **Materialized view stale**: 잘못 답.
|
||||
- **Star schema 없음 OLAP**: query 어려움.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- OLTP = Postgres. OLAP = ClickHouse / Snowflake / DuckDB.
|
||||
- HTAP 가 매력 — but mature 기술 적음.
|
||||
- CDC 가 OLTP → OLAP 의 답.
|
||||
- Star schema 가 OLAP 의 canonical.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[DB_ClickHouse_OLAP]]
|
||||
- [[DB_DuckDB_Embedded]]
|
||||
- [[Data_Eng_Lakehouse]]
|
||||
Reference in New Issue
Block a user