--- id: db-oltp-olap-htap title: OLTP vs OLAP vs HTAP — workload 별 DB category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [database, oltp, olap, vibe-coding] tech_stack: { language: "SQL", applicable_to: ["Database"] } applied_in: [] aliases: [OLTP, OLAP, HTAP, transactional, analytical, columnar, row-based, mixed workload] --- # OLTP / OLAP / HTAP > Workload 다름 = DB 다름. **OLTP (작은 read/write 많이) vs OLAP (큰 query 적게) vs HTAP (둘 다)**. 잘못 선택 = 큰 성능 / cost 차이. ## 📖 핵심 개념 - OLTP: Online Transaction Processing. - OLAP: Online Analytical Processing. - HTAP: Hybrid (둘 다). - Row vs Columnar storage. ## 💻 코드 패턴 ### OLTP 특징 ``` Workload: - 매 query 가 작음 (< 100 rows) - 많은 query (1000+ QPS) - Read + write mix - Latency 중요 (< 50ms) - ACID transaction Examples: - User login - Order creation - Cart update - Profile read DB: - Postgres / MySQL (default) - CockroachDB / Spanner (distributed) - DynamoDB (NoSQL) ``` ### OLAP 특징 ``` Workload: - 매 query 가 큼 (M+ rows) - 적은 query (분 / 시간) - Read 거의만 - Throughput 중요 - Scan 가 큰 portion Examples: - Daily revenue - User cohort analysis - A/B test report - Dashboard DB: - ClickHouse / DuckDB - Snowflake / BigQuery / Redshift - Druid / Pinot ``` ### Row vs Columnar ``` Row (OLTP): [id, name, email, age] [1, 'Alice', 'a@x', 25] [2, 'Bob', 'b@x', 30] → 1 row 다 읽기 빠름. SELECT * WHERE id=1. Columnar (OLAP): id : [1, 2, 3, ...] name : ['Alice', 'Bob', ...] age : [25, 30, ...] → 1 column 만 읽기 빠름. SELECT AVG(age). + Compression 강함 (같은 type 모임). ``` ### Postgres (OLTP default) ```sql -- 빠른 single-row SELECT * FROM users WHERE id = 123; -- ACID transaction BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT; ``` ### ClickHouse (OLAP) ```sql CREATE TABLE events ( ts DateTime, user_id UInt32, event String, value Float32 ) ENGINE = MergeTree ORDER BY (ts, user_id); -- 빠른 aggregation SELECT toStartOfHour(ts) AS hour, count() AS cnt, avg(value) AS avg_val FROM events WHERE ts > now() - INTERVAL 7 DAY GROUP BY hour ORDER BY hour; ``` → 1B+ row aggregation = ms / 초. ### DuckDB (single-node OLAP) ```sql -- Parquet 직접 SELECT date, SUM(amount) AS total FROM 's3://bucket/sales/*.parquet' WHERE date > '2026-01-01' GROUP BY date; -- Postgres 도 query (foreign data wrapper) ATTACH 'postgres://...' AS pg; SELECT * FROM pg.public.users LIMIT 10; ``` → TB 까지 single VM. Spark 보다 simple. ### Snowflake (cloud OLAP) ```sql CREATE TABLE events ... CLUSTER BY (ts); -- Compute scaling ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'X-LARGE'; -- Time travel SELECT * FROM events AT(TIMESTAMP => '2026-05-01'); ``` → Compute / storage 분리. Pay per query. ### BigQuery ```sql SELECT DATE(ts) AS day, COUNT(*) AS events FROM `project.dataset.events` WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260509' GROUP BY day; ``` → Pay per scanned bytes. Partition + cluster 가 cost 줄임. ### HTAP (Hybrid) ``` 같은 DB 가 OLTP + OLAP 둘 다. 기술: - TiDB (Raft + columnar) - SingleStore (MemSQL) - CockroachDB (column store 추가됨) - Postgres + Citus columnar → "OLTP db 의 read replica = ClickHouse" 식 architecture 가 일반적. ``` ### Postgres + Citus (HTAP) ```sql -- Convert to columnar SELECT alter_table_set_access_method('events', 'columnar'); -- Insert 가 row, query 가 column ``` ### Architectures ``` 1. Pure OLTP: App → Postgres 2. OLAP 분리: App → Postgres → CDC → Snowflake / ClickHouse ↓ BI tool / dashboard 3. HTAP: App → TiDB (writes go to OLTP, reads to OLAP) ``` ### CDC (Change Data Capture) ``` Postgres → Debezium → Kafka → ClickHouse / Snowflake. Real-time: - Order placed (Postgres) - 1초 후 dashboard 업데이트 (ClickHouse) ``` ### Star schema (OLAP) ```sql -- Fact table (큰) CREATE TABLE fact_sales ( date_id INT, product_id INT, customer_id INT, amount DECIMAL, quantity INT ); -- Dimension tables (작은) CREATE TABLE dim_date (date_id INT, date DATE, year INT, month INT); CREATE TABLE dim_product (product_id INT, name VARCHAR, category VARCHAR); CREATE TABLE dim_customer (customer_id INT, country VARCHAR, segment VARCHAR); -- Join SELECT d.year, p.category, SUM(f.amount) FROM fact_sales f JOIN dim_date d USING (date_id) JOIN dim_product p USING (product_id) GROUP BY d.year, p.category; ``` → OLAP 의 표준 schema. ### Snowflake schema (정규화) ``` dim_product → dim_category → dim_dept → Star 가 simple 가, snowflake 가 정규화. ``` ### Data warehouse vs lake ``` Warehouse: schema-on-write (정형, SQL 친화). Lake: schema-on-read (raw file, flexible). Lakehouse: 둘 다 (Iceberg, Delta, Hudi). ``` ### Materialized view (OLAP boost) ```sql -- 매일 dashboard query CREATE MATERIALIZED VIEW daily_revenue AS SELECT date, SUM(amount) FROM orders GROUP BY date; -- 빠른 query SELECT * FROM daily_revenue; -- Refresh REFRESH MATERIALIZED VIEW daily_revenue; ``` → Pre-compute. Dashboard latency ↓. ### Pre-aggregation (Cube.dev) ```yaml cube('Sales', { sql: 'SELECT * FROM orders', measures: { total: { sql: 'amount', type: 'sum' }, }, dimensions: { date: { sql: 'created_at', type: 'time' }, }, preAggregations: { daily: { type: 'rollup', measures: [Sales.total], timeDimension: Sales.date, granularity: 'day', refreshKey: { every: '1 hour' }, }, }, }); ``` → API layer 가 자동 pre-agg. ### When OLTP, when OLAP? ``` Query frequency: > 100 qps → OLTP db < 1 qps → OLAP db Query size: < 1k row → OLTP > 100k row → OLAP Latency: < 50ms → OLTP > 1s OK → OLAP Mixed: 두 DB + CDC. ``` ### Postgres 가 OLAP 가능? ``` GB 까지 OK (proper index + partition). TB+ = 어려움. PB = 안 됨. → Postgres 의 row store 가 큰 scan 약함. Citus columnar / pg_duckdb / hydra 가 hybrid. ``` ### Cost 비교 ``` Postgres RDS: $$ Snowflake: $$$ (compute hour) BigQuery: $$ (scan bytes) ClickHouse self-host: $$ Redshift: $$$ DuckDB self-host: $ (1 VM) ``` ### Pinot / Druid / StarRocks (real-time OLAP) ``` ClickHouse: 강력, batch 친화. Pinot / Druid: real-time + low-latency aggregation. StarRocks: ClickHouse 대안, MySQL 호환. → User-facing OLAP (사용자 dashboard) 가 Pinot / Druid. Internal BI = ClickHouse / Snowflake. ``` ### LLM 가 SQL 작성 ```ts const sql = await llm.complete({ system: 'Generate ClickHouse SQL.', prompt: `Schema: ${schema}\nQuestion: How many users active in last 7 days?`, }); // Validate + run ``` → Text-to-SQL. ClickHouse 의 syntax 가 약간 다름 → eval 필수. ## 🤔 의사결정 기준 | 상황 | DB | |---|---| | 일반 app | Postgres / MySQL | | Analytics dashboard | ClickHouse / Snowflake | | Single VM TB | DuckDB | | User-facing analytics | Pinot / Druid / StarRocks | | HTAP | TiDB / SingleStore / Citus | | 데이터 < 100 GB | Postgres OK | | 데이터 1+ TB analytics | Specialized OLAP | | Real-time + analytical | Materialize / RisingWave | ## ❌ 안티패턴 - **Postgres 가 모든 거 (BI)**: 큰 scan 느림. - **OLAP 의 매 row update**: 비효율. - **Index 무시 (OLTP)**: full scan. - **OLAP 도 transaction 가정**: 약함. - **ETL 없음**: 매 query 가 raw. - **Materialized view stale**: 잘못 답. - **Star schema 없음 OLAP**: query 어려움. ## 🤖 LLM 활용 힌트 - OLTP = Postgres. OLAP = ClickHouse / Snowflake / DuckDB. - HTAP 가 매력 — but mature 기술 적음. - CDC 가 OLTP → OLAP 의 답. - Star schema 가 OLAP 의 canonical. ## 🔗 관련 문서 - [[DB_ClickHouse_OLAP]] - [[DB_DuckDB_Embedded]] - [[Data_Eng_Lakehouse]]