[G1-Sync] Manual knowledge update

This commit is contained in:
Antigravity Agent
2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
+376
View File
@@ -0,0 +1,376 @@
---
id: db-oltp-olap-htap
title: OLTP vs OLAP vs HTAP — workload 별 DB
category: Coding
status: draft
source_trust_level: B
verification_status: conceptual
created_at: 2026-05-09
updated_at: 2026-05-09
tags: [database, oltp, olap, vibe-coding]
tech_stack: { language: "SQL", applicable_to: ["Database"] }
applied_in: []
aliases: [OLTP, OLAP, HTAP, transactional, analytical, columnar, row-based, mixed workload]
---
# OLTP / OLAP / HTAP
> Workload 다름 = DB 다름. **OLTP (작은 read/write 많이) vs OLAP (큰 query 적게) vs HTAP (둘 다)**. 잘못 선택 = 큰 성능 / cost 차이.
## 📖 핵심 개념
- OLTP: Online Transaction Processing.
- OLAP: Online Analytical Processing.
- HTAP: Hybrid (둘 다).
- Row vs Columnar storage.
## 💻 코드 패턴
### OLTP 특징
```
Workload:
- 매 query 가 작음 (< 100 rows)
- 많은 query (1000+ QPS)
- Read + write mix
- Latency 중요 (< 50ms)
- ACID transaction
Examples:
- User login
- Order creation
- Cart update
- Profile read
DB:
- Postgres / MySQL (default)
- CockroachDB / Spanner (distributed)
- DynamoDB (NoSQL)
```
### OLAP 특징
```
Workload:
- 매 query 가 큼 (M+ rows)
- 적은 query (분 / 시간)
- Read 거의만
- Throughput 중요
- Scan 가 큰 portion
Examples:
- Daily revenue
- User cohort analysis
- A/B test report
- Dashboard
DB:
- ClickHouse / DuckDB
- Snowflake / BigQuery / Redshift
- Druid / Pinot
```
### Row vs Columnar
```
Row (OLTP):
[id, name, email, age]
[1, 'Alice', 'a@x', 25]
[2, 'Bob', 'b@x', 30]
→ 1 row 다 읽기 빠름. SELECT * WHERE id=1.
Columnar (OLAP):
id : [1, 2, 3, ...]
name : ['Alice', 'Bob', ...]
age : [25, 30, ...]
→ 1 column 만 읽기 빠름. SELECT AVG(age).
+ Compression 강함 (같은 type 모임).
```
### Postgres (OLTP default)
```sql
-- 빠른 single-row
SELECT * FROM users WHERE id = 123;
-- ACID transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
```
### ClickHouse (OLAP)
```sql
CREATE TABLE events (
ts DateTime,
user_id UInt32,
event String,
value Float32
) ENGINE = MergeTree
ORDER BY (ts, user_id);
-- 빠른 aggregation
SELECT
toStartOfHour(ts) AS hour,
count() AS cnt,
avg(value) AS avg_val
FROM events
WHERE ts > now() - INTERVAL 7 DAY
GROUP BY hour
ORDER BY hour;
```
→ 1B+ row aggregation = ms / 초.
### DuckDB (single-node OLAP)
```sql
-- Parquet 직접
SELECT date, SUM(amount) AS total
FROM 's3://bucket/sales/*.parquet'
WHERE date > '2026-01-01'
GROUP BY date;
-- Postgres 도 query (foreign data wrapper)
ATTACH 'postgres://...' AS pg;
SELECT * FROM pg.public.users LIMIT 10;
```
→ TB 까지 single VM. Spark 보다 simple.
### Snowflake (cloud OLAP)
```sql
CREATE TABLE events ... CLUSTER BY (ts);
-- Compute scaling
ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'X-LARGE';
-- Time travel
SELECT * FROM events AT(TIMESTAMP => '2026-05-01');
```
→ Compute / storage 분리. Pay per query.
### BigQuery
```sql
SELECT
DATE(ts) AS day,
COUNT(*) AS events
FROM `project.dataset.events`
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260509'
GROUP BY day;
```
→ Pay per scanned bytes. Partition + cluster 가 cost 줄임.
### HTAP (Hybrid)
```
같은 DB 가 OLTP + OLAP 둘 다.
기술:
- TiDB (Raft + columnar)
- SingleStore (MemSQL)
- CockroachDB (column store 추가됨)
- Postgres + Citus columnar
→ "OLTP db 의 read replica = ClickHouse" 식 architecture 가 일반적.
```
### Postgres + Citus (HTAP)
```sql
-- Convert to columnar
SELECT alter_table_set_access_method('events', 'columnar');
-- Insert 가 row, query 가 column
```
### Architectures
```
1. Pure OLTP:
App → Postgres
2. OLAP 분리:
App → Postgres → CDC → Snowflake / ClickHouse
BI tool / dashboard
3. HTAP:
App → TiDB (writes go to OLTP, reads to OLAP)
```
### CDC (Change Data Capture)
```
Postgres → Debezium → Kafka → ClickHouse / Snowflake.
Real-time:
- Order placed (Postgres)
- 1초 후 dashboard 업데이트 (ClickHouse)
```
### Star schema (OLAP)
```sql
-- Fact table (큰)
CREATE TABLE fact_sales (
date_id INT,
product_id INT,
customer_id INT,
amount DECIMAL,
quantity INT
);
-- Dimension tables (작은)
CREATE TABLE dim_date (date_id INT, date DATE, year INT, month INT);
CREATE TABLE dim_product (product_id INT, name VARCHAR, category VARCHAR);
CREATE TABLE dim_customer (customer_id INT, country VARCHAR, segment VARCHAR);
-- Join
SELECT d.year, p.category, SUM(f.amount)
FROM fact_sales f
JOIN dim_date d USING (date_id)
JOIN dim_product p USING (product_id)
GROUP BY d.year, p.category;
```
→ OLAP 의 표준 schema.
### Snowflake schema (정규화)
```
dim_product → dim_category → dim_dept
→ Star 가 simple 가, snowflake 가 정규화.
```
### Data warehouse vs lake
```
Warehouse: schema-on-write (정형, SQL 친화).
Lake: schema-on-read (raw file, flexible).
Lakehouse: 둘 다 (Iceberg, Delta, Hudi).
```
### Materialized view (OLAP boost)
```sql
-- 매일 dashboard query
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT date, SUM(amount) FROM orders GROUP BY date;
-- 빠른 query
SELECT * FROM daily_revenue;
-- Refresh
REFRESH MATERIALIZED VIEW daily_revenue;
```
→ Pre-compute. Dashboard latency ↓.
### Pre-aggregation (Cube.dev)
```yaml
cube('Sales', {
sql: 'SELECT * FROM orders',
measures: {
total: { sql: 'amount', type: 'sum' },
},
dimensions: {
date: { sql: 'created_at', type: 'time' },
},
preAggregations: {
daily: {
type: 'rollup',
measures: [Sales.total],
timeDimension: Sales.date,
granularity: 'day',
refreshKey: { every: '1 hour' },
},
},
});
```
→ API layer 가 자동 pre-agg.
### When OLTP, when OLAP?
```
Query frequency:
> 100 qps → OLTP db
< 1 qps → OLAP db
Query size:
< 1k row → OLTP
> 100k row → OLAP
Latency:
< 50ms → OLTP
> 1s OK → OLAP
Mixed: 두 DB + CDC.
```
### Postgres 가 OLAP 가능?
```
GB 까지 OK (proper index + partition).
TB+ = 어려움.
PB = 안 됨.
→ Postgres 의 row store 가 큰 scan 약함.
Citus columnar / pg_duckdb / hydra 가 hybrid.
```
### Cost 비교
```
Postgres RDS: $$
Snowflake: $$$ (compute hour)
BigQuery: $$ (scan bytes)
ClickHouse self-host: $$
Redshift: $$$
DuckDB self-host: $ (1 VM)
```
### Pinot / Druid / StarRocks (real-time OLAP)
```
ClickHouse: 강력, batch 친화.
Pinot / Druid: real-time + low-latency aggregation.
StarRocks: ClickHouse 대안, MySQL 호환.
→ User-facing OLAP (사용자 dashboard) 가 Pinot / Druid.
Internal BI = ClickHouse / Snowflake.
```
### LLM 가 SQL 작성
```ts
const sql = await llm.complete({
system: 'Generate ClickHouse SQL.',
prompt: `Schema: ${schema}\nQuestion: How many users active in last 7 days?`,
});
// Validate + run
```
→ Text-to-SQL. ClickHouse 의 syntax 가 약간 다름 → eval 필수.
## 🤔 의사결정 기준
| 상황 | DB |
|---|---|
| 일반 app | Postgres / MySQL |
| Analytics dashboard | ClickHouse / Snowflake |
| Single VM TB | DuckDB |
| User-facing analytics | Pinot / Druid / StarRocks |
| HTAP | TiDB / SingleStore / Citus |
| 데이터 < 100 GB | Postgres OK |
| 데이터 1+ TB analytics | Specialized OLAP |
| Real-time + analytical | Materialize / RisingWave |
## ❌ 안티패턴
- **Postgres 가 모든 거 (BI)**: 큰 scan 느림.
- **OLAP 의 매 row update**: 비효율.
- **Index 무시 (OLTP)**: full scan.
- **OLAP 도 transaction 가정**: 약함.
- **ETL 없음**: 매 query 가 raw.
- **Materialized view stale**: 잘못 답.
- **Star schema 없음 OLAP**: query 어려움.
## 🤖 LLM 활용 힌트
- OLTP = Postgres. OLAP = ClickHouse / Snowflake / DuckDB.
- HTAP 가 매력 — but mature 기술 적음.
- CDC 가 OLTP → OLAP 의 답.
- Star schema 가 OLAP 의 canonical.
## 🔗 관련 문서
- [[DB_ClickHouse_OLAP]]
- [[DB_DuckDB_Embedded]]
- [[Data_Eng_Lakehouse]]