[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,205 @@
|
||||
---
|
||||
id: db-clickhouse-olap
|
||||
title: ClickHouse — OLAP / 컬럼 / 빠른 집계
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [database, clickhouse, olap, analytics, vibe-coding]
|
||||
tech_stack: { language: "SQL / ClickHouse", applicable_to: ["Backend"] }
|
||||
applied_in: []
|
||||
aliases: [ClickHouse, OLAP, columnar, MergeTree, materialized view, aggregating]
|
||||
---
|
||||
|
||||
# ClickHouse
|
||||
|
||||
> 분석 / 메트릭 / 로그 = 컬럼 DB. **수십억 row 의 group by 가 초 단위**. Postgres 가 못 따라옴 — analytics 만. 단 update / 작은 row 잘 못함.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- Columnar: 컬럼별 저장 — group by / aggregate 빠름.
|
||||
- MergeTree: 표준 engine. 시간 정렬, 압축 자동.
|
||||
- Materialized view: 변경 stream → 미리 계산.
|
||||
- Distributed: shard 자연.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### 테이블 (MergeTree)
|
||||
```sql
|
||||
CREATE TABLE events (
|
||||
ts DateTime64(3),
|
||||
event LowCardinality(String),
|
||||
user_id UUID,
|
||||
country LowCardinality(String),
|
||||
revenue Decimal64(2),
|
||||
metadata Map(String, String)
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY (event, ts, user_id) -- sort key
|
||||
PARTITION BY toYYYYMM(ts) -- 월별 파티션
|
||||
TTL ts + INTERVAL 90 DAY; -- 90일 후 자동 drop
|
||||
```
|
||||
|
||||
### Insert (대량 권장)
|
||||
```sql
|
||||
INSERT INTO events VALUES
|
||||
(now64(3), 'page_view', generateUUIDv4(), 'KR', 0, {}),
|
||||
...;
|
||||
```
|
||||
|
||||
```ts
|
||||
// HTTP interface
|
||||
await fetch('http://clickhouse:8123/', {
|
||||
method: 'POST',
|
||||
body: 'INSERT INTO events FORMAT JSONEachRow\n' +
|
||||
rows.map(r => JSON.stringify(r)).join('\n'),
|
||||
});
|
||||
```
|
||||
|
||||
### Aggregate (이게 강점)
|
||||
```sql
|
||||
-- 일별 revenue
|
||||
SELECT
|
||||
toDate(ts) AS day,
|
||||
sum(revenue) AS rev,
|
||||
count() AS events
|
||||
FROM events
|
||||
WHERE ts >= now() - INTERVAL 30 DAY
|
||||
AND event = 'purchase'
|
||||
GROUP BY day
|
||||
ORDER BY day;
|
||||
|
||||
-- 사용자 cohort
|
||||
SELECT
|
||||
toMonday(min(ts)) AS cohort_week,
|
||||
count(DISTINCT user_id) AS users
|
||||
FROM events
|
||||
GROUP BY user_id;
|
||||
```
|
||||
|
||||
→ 100M+ row 도 1초 미만.
|
||||
|
||||
### LowCardinality
|
||||
```sql
|
||||
-- 적은 unique value (status, country) → 사전 인코딩 + 작은 저장
|
||||
status LowCardinality(String)
|
||||
```
|
||||
|
||||
### Materialized view (자동 집계)
|
||||
```sql
|
||||
CREATE MATERIALIZED VIEW events_daily
|
||||
ENGINE = SummingMergeTree()
|
||||
ORDER BY (day, event)
|
||||
AS
|
||||
SELECT
|
||||
toDate(ts) AS day,
|
||||
event,
|
||||
count() AS cnt,
|
||||
sum(revenue) AS rev
|
||||
FROM events
|
||||
GROUP BY day, event;
|
||||
|
||||
-- INSERT 가 자동으로 events_daily 도 update
|
||||
```
|
||||
|
||||
### Aggregating MergeTree (uniq 같은 state)
|
||||
```sql
|
||||
CREATE MATERIALIZED VIEW events_daily_users
|
||||
ENGINE = AggregatingMergeTree()
|
||||
ORDER BY day
|
||||
AS
|
||||
SELECT
|
||||
toDate(ts) AS day,
|
||||
uniqState(user_id) AS users_state
|
||||
FROM events
|
||||
GROUP BY day;
|
||||
|
||||
-- 조회 시 merge
|
||||
SELECT day, uniqMerge(users_state) AS users
|
||||
FROM events_daily_users
|
||||
GROUP BY day;
|
||||
```
|
||||
|
||||
### Funnel (sequenceMatch)
|
||||
```sql
|
||||
SELECT
|
||||
user_id,
|
||||
windowFunnel(3600)(ts,
|
||||
event = 'page_view',
|
||||
event = 'add_to_cart',
|
||||
event = 'purchase'
|
||||
) AS step
|
||||
FROM events
|
||||
GROUP BY user_id;
|
||||
|
||||
SELECT step, count() FROM (...) GROUP BY step ORDER BY step;
|
||||
-- step 0 = 안 봄, 1 = 첫 단계만, 2 = 2단계, 3 = 끝까지
|
||||
```
|
||||
|
||||
### Probabilistic (uniq, quantile)
|
||||
```sql
|
||||
SELECT
|
||||
toDate(ts) AS day,
|
||||
uniq(user_id) AS dau, -- HyperLogLog 근사
|
||||
uniqExact(user_id) AS dau_exact,
|
||||
quantile(0.95)(latency_ms) AS p95
|
||||
FROM events
|
||||
GROUP BY day;
|
||||
```
|
||||
|
||||
### CDC ingestion (Debezium → Kafka → ClickHouse)
|
||||
```sql
|
||||
CREATE TABLE events_kafka (...)
|
||||
ENGINE = Kafka()
|
||||
SETTINGS
|
||||
kafka_broker_list = 'kafka:9092',
|
||||
kafka_topic_list = 'events',
|
||||
kafka_group_name = 'ch-consumer',
|
||||
kafka_format = 'JSONEachRow';
|
||||
|
||||
CREATE MATERIALIZED VIEW events_mv TO events
|
||||
AS SELECT * FROM events_kafka;
|
||||
```
|
||||
|
||||
### Compress / disk 사용
|
||||
```
|
||||
ClickHouse 자동 압축 = LZ4 / ZSTD.
|
||||
일반적으로 10-100x 압축 (시간 + LowCardinality).
|
||||
1B rows = 10-100 GB 정도.
|
||||
```
|
||||
|
||||
### TTL / 만료
|
||||
```sql
|
||||
ALTER TABLE events MODIFY TTL ts + INTERVAL 90 DAY;
|
||||
-- 90일 지난 row 자동 drop
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 데이터 | 추천 |
|
||||
|---|---|
|
||||
| 분석 / 로그 / 메트릭 | ClickHouse |
|
||||
| OLTP (transaction) | Postgres / MySQL |
|
||||
| Time-series + small | TimescaleDB |
|
||||
| Time-series + huge | ClickHouse |
|
||||
| Real-time analytics | ClickHouse + Kafka |
|
||||
| Data warehouse | Snowflake / BigQuery (managed) |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **Row-level UPDATE**: ClickHouse 가 약함. Replacement 패턴.
|
||||
- **단건 INSERT**: 너무 많은 part. Batch (1000+).
|
||||
- **OLTP 처럼 사용**: deadlock / lock 다름. analytics 만.
|
||||
- **Sort key 잘못**: query 매번 풀 스캔. 자주 filter 컬럼 sort.
|
||||
- **Partition 너무 잘게**: 너무 많은 part. 월/주 정도.
|
||||
- **JOIN 큰 table**: 한 쪽 small (right) 만.
|
||||
- **TTL 없음 + 무한**: 디스크 폭발.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- INSERT 는 batch.
|
||||
- Sort key + partition + TTL 항상.
|
||||
- Materialized view 로 선계산.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[DB_Time_Series_Patterns]]
|
||||
- [[DB_Partitioning_Patterns]]
|
||||
- [[DB_Change_Data_Capture]]
|
||||
Reference in New Issue
Block a user