--- id: db-clickhouse-olap title: ClickHouse — OLAP / 컬럼 / 빠른 집계 category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [database, clickhouse, olap, analytics, vibe-coding] tech_stack: { language: "SQL / ClickHouse", applicable_to: ["Backend"] } applied_in: [] aliases: [ClickHouse, OLAP, columnar, MergeTree, materialized view, aggregating] --- # ClickHouse > 분석 / 메트릭 / 로그 = 컬럼 DB. **수십억 row 의 group by 가 초 단위**. Postgres 가 못 따라옴 — analytics 만. 단 update / 작은 row 잘 못함. ## 📖 핵심 개념 - Columnar: 컬럼별 저장 — group by / aggregate 빠름. - MergeTree: 표준 engine. 시간 정렬, 압축 자동. - Materialized view: 변경 stream → 미리 계산. - Distributed: shard 자연. ## 💻 코드 패턴 ### 테이블 (MergeTree) ```sql CREATE TABLE events ( ts DateTime64(3), event LowCardinality(String), user_id UUID, country LowCardinality(String), revenue Decimal64(2), metadata Map(String, String) ) ENGINE = MergeTree() ORDER BY (event, ts, user_id) -- sort key PARTITION BY toYYYYMM(ts) -- 월별 파티션 TTL ts + INTERVAL 90 DAY; -- 90일 후 자동 drop ``` ### Insert (대량 권장) ```sql INSERT INTO events VALUES (now64(3), 'page_view', generateUUIDv4(), 'KR', 0, {}), ...; ``` ```ts // HTTP interface await fetch('http://clickhouse:8123/', { method: 'POST', body: 'INSERT INTO events FORMAT JSONEachRow\n' + rows.map(r => JSON.stringify(r)).join('\n'), }); ``` ### Aggregate (이게 강점) ```sql -- 일별 revenue SELECT toDate(ts) AS day, sum(revenue) AS rev, count() AS events FROM events WHERE ts >= now() - INTERVAL 30 DAY AND event = 'purchase' GROUP BY day ORDER BY day; -- 사용자 cohort SELECT toMonday(min(ts)) AS cohort_week, count(DISTINCT user_id) AS users FROM events GROUP BY user_id; ``` → 100M+ row 도 1초 미만. ### LowCardinality ```sql -- 적은 unique value (status, country) → 사전 인코딩 + 작은 저장 status LowCardinality(String) ``` ### Materialized view (자동 집계) ```sql CREATE MATERIALIZED VIEW events_daily ENGINE = SummingMergeTree() ORDER BY (day, event) AS SELECT toDate(ts) AS day, event, count() AS cnt, sum(revenue) AS rev FROM events GROUP BY day, event; -- INSERT 가 자동으로 events_daily 도 update ``` ### Aggregating MergeTree (uniq 같은 state) ```sql CREATE MATERIALIZED VIEW events_daily_users ENGINE = AggregatingMergeTree() ORDER BY day AS SELECT toDate(ts) AS day, uniqState(user_id) AS users_state FROM events GROUP BY day; -- 조회 시 merge SELECT day, uniqMerge(users_state) AS users FROM events_daily_users GROUP BY day; ``` ### Funnel (sequenceMatch) ```sql SELECT user_id, windowFunnel(3600)(ts, event = 'page_view', event = 'add_to_cart', event = 'purchase' ) AS step FROM events GROUP BY user_id; SELECT step, count() FROM (...) GROUP BY step ORDER BY step; -- step 0 = 안 봄, 1 = 첫 단계만, 2 = 2단계, 3 = 끝까지 ``` ### Probabilistic (uniq, quantile) ```sql SELECT toDate(ts) AS day, uniq(user_id) AS dau, -- HyperLogLog 근사 uniqExact(user_id) AS dau_exact, quantile(0.95)(latency_ms) AS p95 FROM events GROUP BY day; ``` ### CDC ingestion (Debezium → Kafka → ClickHouse) ```sql CREATE TABLE events_kafka (...) ENGINE = Kafka() SETTINGS kafka_broker_list = 'kafka:9092', kafka_topic_list = 'events', kafka_group_name = 'ch-consumer', kafka_format = 'JSONEachRow'; CREATE MATERIALIZED VIEW events_mv TO events AS SELECT * FROM events_kafka; ``` ### Compress / disk 사용 ``` ClickHouse 자동 압축 = LZ4 / ZSTD. 일반적으로 10-100x 압축 (시간 + LowCardinality). 1B rows = 10-100 GB 정도. ``` ### TTL / 만료 ```sql ALTER TABLE events MODIFY TTL ts + INTERVAL 90 DAY; -- 90일 지난 row 자동 drop ``` ## 🤔 의사결정 기준 | 데이터 | 추천 | |---|---| | 분석 / 로그 / 메트릭 | ClickHouse | | OLTP (transaction) | Postgres / MySQL | | Time-series + small | TimescaleDB | | Time-series + huge | ClickHouse | | Real-time analytics | ClickHouse + Kafka | | Data warehouse | Snowflake / BigQuery (managed) | ## ❌ 안티패턴 - **Row-level UPDATE**: ClickHouse 가 약함. Replacement 패턴. - **단건 INSERT**: 너무 많은 part. Batch (1000+). - **OLTP 처럼 사용**: deadlock / lock 다름. analytics 만. - **Sort key 잘못**: query 매번 풀 스캔. 자주 filter 컬럼 sort. - **Partition 너무 잘게**: 너무 많은 part. 월/주 정도. - **JOIN 큰 table**: 한 쪽 small (right) 만. - **TTL 없음 + 무한**: 디스크 폭발. ## 🤖 LLM 활용 힌트 - INSERT 는 batch. - Sort key + partition + TTL 항상. - Materialized view 로 선계산. ## 🔗 관련 문서 - [[DB_Time_Series_Patterns]] - [[DB_Partitioning_Patterns]] - [[DB_Change_Data_Capture]]