7.8 KiB
7.8 KiB
id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
| id | title | category | status | source_trust_level | verification_status | created_at | updated_at | tags | tech_stack | applied_in | aliases | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| db-oltp-olap-htap | OLTP vs OLAP vs HTAP — workload 별 DB | Coding | draft | B | conceptual | 2026-05-09 | 2026-05-09 |
|
|
|
OLTP / OLAP / HTAP
Workload 다름 = DB 다름. OLTP (작은 read/write 많이) vs OLAP (큰 query 적게) vs HTAP (둘 다). 잘못 선택 = 큰 성능 / cost 차이.
📖 핵심 개념
- OLTP: Online Transaction Processing.
- OLAP: Online Analytical Processing.
- HTAP: Hybrid (둘 다).
- Row vs Columnar storage.
💻 코드 패턴
OLTP 특징
Workload:
- 매 query 가 작음 (< 100 rows)
- 많은 query (1000+ QPS)
- Read + write mix
- Latency 중요 (< 50ms)
- ACID transaction
Examples:
- User login
- Order creation
- Cart update
- Profile read
DB:
- Postgres / MySQL (default)
- CockroachDB / Spanner (distributed)
- DynamoDB (NoSQL)
OLAP 특징
Workload:
- 매 query 가 큼 (M+ rows)
- 적은 query (분 / 시간)
- Read 거의만
- Throughput 중요
- Scan 가 큰 portion
Examples:
- Daily revenue
- User cohort analysis
- A/B test report
- Dashboard
DB:
- ClickHouse / DuckDB
- Snowflake / BigQuery / Redshift
- Druid / Pinot
Row vs Columnar
Row (OLTP):
[id, name, email, age]
[1, 'Alice', 'a@x', 25]
[2, 'Bob', 'b@x', 30]
→ 1 row 다 읽기 빠름. SELECT * WHERE id=1.
Columnar (OLAP):
id : [1, 2, 3, ...]
name : ['Alice', 'Bob', ...]
age : [25, 30, ...]
→ 1 column 만 읽기 빠름. SELECT AVG(age).
+ Compression 강함 (같은 type 모임).
Postgres (OLTP default)
-- 빠른 single-row
SELECT * FROM users WHERE id = 123;
-- ACID transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
ClickHouse (OLAP)
CREATE TABLE events (
ts DateTime,
user_id UInt32,
event String,
value Float32
) ENGINE = MergeTree
ORDER BY (ts, user_id);
-- 빠른 aggregation
SELECT
toStartOfHour(ts) AS hour,
count() AS cnt,
avg(value) AS avg_val
FROM events
WHERE ts > now() - INTERVAL 7 DAY
GROUP BY hour
ORDER BY hour;
→ 1B+ row aggregation = ms / 초.
DuckDB (single-node OLAP)
-- Parquet 직접
SELECT date, SUM(amount) AS total
FROM 's3://bucket/sales/*.parquet'
WHERE date > '2026-01-01'
GROUP BY date;
-- Postgres 도 query (foreign data wrapper)
ATTACH 'postgres://...' AS pg;
SELECT * FROM pg.public.users LIMIT 10;
→ TB 까지 single VM. Spark 보다 simple.
Snowflake (cloud OLAP)
CREATE TABLE events ... CLUSTER BY (ts);
-- Compute scaling
ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'X-LARGE';
-- Time travel
SELECT * FROM events AT(TIMESTAMP => '2026-05-01');
→ Compute / storage 분리. Pay per query.
BigQuery
SELECT
DATE(ts) AS day,
COUNT(*) AS events
FROM `project.dataset.events`
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260509'
GROUP BY day;
→ Pay per scanned bytes. Partition + cluster 가 cost 줄임.
HTAP (Hybrid)
같은 DB 가 OLTP + OLAP 둘 다.
기술:
- TiDB (Raft + columnar)
- SingleStore (MemSQL)
- CockroachDB (column store 추가됨)
- Postgres + Citus columnar
→ "OLTP db 의 read replica = ClickHouse" 식 architecture 가 일반적.
Postgres + Citus (HTAP)
-- Convert to columnar
SELECT alter_table_set_access_method('events', 'columnar');
-- Insert 가 row, query 가 column
Architectures
1. Pure OLTP:
App → Postgres
2. OLAP 분리:
App → Postgres → CDC → Snowflake / ClickHouse
↓
BI tool / dashboard
3. HTAP:
App → TiDB (writes go to OLTP, reads to OLAP)
CDC (Change Data Capture)
Postgres → Debezium → Kafka → ClickHouse / Snowflake.
Real-time:
- Order placed (Postgres)
- 1초 후 dashboard 업데이트 (ClickHouse)
Star schema (OLAP)
-- Fact table (큰)
CREATE TABLE fact_sales (
date_id INT,
product_id INT,
customer_id INT,
amount DECIMAL,
quantity INT
);
-- Dimension tables (작은)
CREATE TABLE dim_date (date_id INT, date DATE, year INT, month INT);
CREATE TABLE dim_product (product_id INT, name VARCHAR, category VARCHAR);
CREATE TABLE dim_customer (customer_id INT, country VARCHAR, segment VARCHAR);
-- Join
SELECT d.year, p.category, SUM(f.amount)
FROM fact_sales f
JOIN dim_date d USING (date_id)
JOIN dim_product p USING (product_id)
GROUP BY d.year, p.category;
→ OLAP 의 표준 schema.
Snowflake schema (정규화)
dim_product → dim_category → dim_dept
→ Star 가 simple 가, snowflake 가 정규화.
Data warehouse vs lake
Warehouse: schema-on-write (정형, SQL 친화).
Lake: schema-on-read (raw file, flexible).
Lakehouse: 둘 다 (Iceberg, Delta, Hudi).
Materialized view (OLAP boost)
-- 매일 dashboard query
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT date, SUM(amount) FROM orders GROUP BY date;
-- 빠른 query
SELECT * FROM daily_revenue;
-- Refresh
REFRESH MATERIALIZED VIEW daily_revenue;
→ Pre-compute. Dashboard latency ↓.
Pre-aggregation (Cube.dev)
cube('Sales', {
sql: 'SELECT * FROM orders',
measures: {
total: { sql: 'amount', type: 'sum' },
},
dimensions: {
date: { sql: 'created_at', type: 'time' },
},
preAggregations: {
daily: {
type: 'rollup',
measures: [Sales.total],
timeDimension: Sales.date,
granularity: 'day',
refreshKey: { every: '1 hour' },
},
},
});
→ API layer 가 자동 pre-agg.
When OLTP, when OLAP?
Query frequency:
> 100 qps → OLTP db
< 1 qps → OLAP db
Query size:
< 1k row → OLTP
> 100k row → OLAP
Latency:
< 50ms → OLTP
> 1s OK → OLAP
Mixed: 두 DB + CDC.
Postgres 가 OLAP 가능?
GB 까지 OK (proper index + partition).
TB+ = 어려움.
PB = 안 됨.
→ Postgres 의 row store 가 큰 scan 약함.
Citus columnar / pg_duckdb / hydra 가 hybrid.
Cost 비교
Postgres RDS: $$
Snowflake: $$$ (compute hour)
BigQuery: $$ (scan bytes)
ClickHouse self-host: $$
Redshift: $$$
DuckDB self-host: $ (1 VM)
Pinot / Druid / StarRocks (real-time OLAP)
ClickHouse: 강력, batch 친화.
Pinot / Druid: real-time + low-latency aggregation.
StarRocks: ClickHouse 대안, MySQL 호환.
→ User-facing OLAP (사용자 dashboard) 가 Pinot / Druid.
Internal BI = ClickHouse / Snowflake.
LLM 가 SQL 작성
const sql = await llm.complete({
system: 'Generate ClickHouse SQL.',
prompt: `Schema: ${schema}\nQuestion: How many users active in last 7 days?`,
});
// Validate + run
→ Text-to-SQL. ClickHouse 의 syntax 가 약간 다름 → eval 필수.
🤔 의사결정 기준
| 상황 | DB |
|---|---|
| 일반 app | Postgres / MySQL |
| Analytics dashboard | ClickHouse / Snowflake |
| Single VM TB | DuckDB |
| User-facing analytics | Pinot / Druid / StarRocks |
| HTAP | TiDB / SingleStore / Citus |
| 데이터 < 100 GB | Postgres OK |
| 데이터 1+ TB analytics | Specialized OLAP |
| Real-time + analytical | Materialize / RisingWave |
❌ 안티패턴
- Postgres 가 모든 거 (BI): 큰 scan 느림.
- OLAP 의 매 row update: 비효율.
- Index 무시 (OLTP): full scan.
- OLAP 도 transaction 가정: 약함.
- ETL 없음: 매 query 가 raw.
- Materialized view stale: 잘못 답.
- Star schema 없음 OLAP: query 어려움.
🤖 LLM 활용 힌트
- OLTP = Postgres. OLAP = ClickHouse / Snowflake / DuckDB.
- HTAP 가 매력 — but mature 기술 적음.
- CDC 가 OLTP → OLAP 의 답.
- Star schema 가 OLAP 의 canonical.