Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

7.8 KiB

Raw Blame History

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases

title

OLTP / OLAP / HTAP

Workload 다름 = DB 다름. OLTP (작은 read/write 많이) vs OLAP (큰 query 적게) vs HTAP (둘 다). 잘못 선택 = 큰 성능 / cost 차이.

📖 핵심 개념

OLTP: Online Transaction Processing.
OLAP: Online Analytical Processing.
HTAP: Hybrid (둘 다).
Row vs Columnar storage.

💻 코드 패턴

OLTP 특징

Workload:
- 매 query 가 작음 (< 100 rows)
- 많은 query (1000+ QPS)
- Read + write mix
- Latency 중요 (< 50ms)
- ACID transaction

Examples:
- User login
- Order creation
- Cart update
- Profile read

DB:
- Postgres / MySQL (default)
- CockroachDB / Spanner (distributed)
- DynamoDB (NoSQL)

OLAP 특징

Workload:
- 매 query 가 큼 (M+ rows)
- 적은 query (분 / 시간)
- Read 거의만
- Throughput 중요
- Scan 가 큰 portion

Examples:
- Daily revenue
- User cohort analysis
- A/B test report
- Dashboard

DB:
- ClickHouse / DuckDB
- Snowflake / BigQuery / Redshift
- Druid / Pinot

Row vs Columnar

Row (OLTP):
[id, name, email, age]
[1,  'Alice', 'a@x', 25]
[2,  'Bob',   'b@x', 30]

→ 1 row 다 읽기 빠름. SELECT * WHERE id=1.

Columnar (OLAP):
id    : [1, 2, 3, ...]
name  : ['Alice', 'Bob', ...]
age   : [25, 30, ...]

→ 1 column 만 읽기 빠름. SELECT AVG(age).
+ Compression 강함 (같은 type 모임).

Postgres (OLTP default)

-- 빠른 single-row
SELECT * FROM users WHERE id = 123;

-- ACID transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

ClickHouse (OLAP)

CREATE TABLE events (
  ts DateTime,
  user_id UInt32,
  event String,
  value Float32
) ENGINE = MergeTree
ORDER BY (ts, user_id);

-- 빠른 aggregation
SELECT
  toStartOfHour(ts) AS hour,
  count() AS cnt,
  avg(value) AS avg_val
FROM events
WHERE ts > now() - INTERVAL 7 DAY
GROUP BY hour
ORDER BY hour;

→ 1B+ row aggregation = ms / 초.

DuckDB (single-node OLAP)

-- Parquet 직접
SELECT date, SUM(amount) AS total
FROM 's3://bucket/sales/*.parquet'
WHERE date > '2026-01-01'
GROUP BY date;

-- Postgres 도 query (foreign data wrapper)
ATTACH 'postgres://...' AS pg;
SELECT * FROM pg.public.users LIMIT 10;

→ TB 까지 single VM. Spark 보다 simple.

Snowflake (cloud OLAP)

CREATE TABLE events ... CLUSTER BY (ts);

-- Compute scaling
ALTER WAREHOUSE my_wh SET WAREHOUSE_SIZE = 'X-LARGE';

-- Time travel
SELECT * FROM events AT(TIMESTAMP => '2026-05-01');

→ Compute / storage 분리. Pay per query.

BigQuery

SELECT
  DATE(ts) AS day,
  COUNT(*) AS events
FROM `project.dataset.events`
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260509'
GROUP BY day;

→ Pay per scanned bytes. Partition + cluster 가 cost 줄임.

HTAP (Hybrid)

같은 DB 가 OLTP + OLAP 둘 다.

기술:
- TiDB (Raft + columnar)
- SingleStore (MemSQL)
- CockroachDB (column store 추가됨)
- Postgres + Citus columnar

→ "OLTP db 의 read replica = ClickHouse" 식 architecture 가 일반적.

Postgres + Citus (HTAP)

-- Convert to columnar
SELECT alter_table_set_access_method('events', 'columnar');

-- Insert 가 row, query 가 column

Architectures

1. Pure OLTP:
   App → Postgres
   
2. OLAP 분리:
   App → Postgres → CDC → Snowflake / ClickHouse
                                ↓
                              BI tool / dashboard
   
3. HTAP:
   App → TiDB (writes go to OLTP, reads to OLAP)

CDC (Change Data Capture)

Postgres → Debezium → Kafka → ClickHouse / Snowflake.

Real-time:
- Order placed (Postgres)
- 1초 후 dashboard 업데이트 (ClickHouse)

Star schema (OLAP)

-- Fact table (큰)
CREATE TABLE fact_sales (
  date_id INT,
  product_id INT,
  customer_id INT,
  amount DECIMAL,
  quantity INT
);

-- Dimension tables (작은)
CREATE TABLE dim_date (date_id INT, date DATE, year INT, month INT);
CREATE TABLE dim_product (product_id INT, name VARCHAR, category VARCHAR);
CREATE TABLE dim_customer (customer_id INT, country VARCHAR, segment VARCHAR);

-- Join
SELECT d.year, p.category, SUM(f.amount)
FROM fact_sales f
JOIN dim_date d USING (date_id)
JOIN dim_product p USING (product_id)
GROUP BY d.year, p.category;

→ OLAP 의 표준 schema.

Snowflake schema (정규화)

dim_product → dim_category → dim_dept

→ Star 가 simple 가, snowflake 가 정규화.

Data warehouse vs lake

Warehouse: schema-on-write (정형, SQL 친화).
Lake: schema-on-read (raw file, flexible).

Lakehouse: 둘 다 (Iceberg, Delta, Hudi).

Materialized view (OLAP boost)

-- 매일 dashboard query
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT date, SUM(amount) FROM orders GROUP BY date;

-- 빠른 query
SELECT * FROM daily_revenue;

-- Refresh
REFRESH MATERIALIZED VIEW daily_revenue;

→ Pre-compute. Dashboard latency ↓.

Pre-aggregation (Cube.dev)

cube('Sales', {
  sql: 'SELECT * FROM orders',
  measures: {
    total: { sql: 'amount', type: 'sum' },
  },
  dimensions: {
    date: { sql: 'created_at', type: 'time' },
  },
  preAggregations: {
    daily: {
      type: 'rollup',
      measures: [Sales.total],
      timeDimension: Sales.date,
      granularity: 'day',
      refreshKey: { every: '1 hour' },
    },
  },
});

→ API layer 가 자동 pre-agg.

When OLTP, when OLAP?

Query frequency:
> 100 qps → OLTP db
< 1 qps → OLAP db

Query size:
< 1k row → OLTP
> 100k row → OLAP

Latency:
< 50ms → OLTP
> 1s OK → OLAP

Mixed: 두 DB + CDC.

Postgres 가 OLAP 가능?

GB 까지 OK (proper index + partition).
TB+ = 어려움.
PB = 안 됨.

→ Postgres 의 row store 가 큰 scan 약함.
Citus columnar / pg_duckdb / hydra 가 hybrid.

Cost 비교

Postgres RDS: $$
Snowflake: $$$ (compute hour)
BigQuery: $$ (scan bytes)
ClickHouse self-host: $$
Redshift: $$$
DuckDB self-host: $ (1 VM)

Pinot / Druid / StarRocks (real-time OLAP)

ClickHouse: 강력, batch 친화.
Pinot / Druid: real-time + low-latency aggregation.
StarRocks: ClickHouse 대안, MySQL 호환.

→ User-facing OLAP (사용자 dashboard) 가 Pinot / Druid.
Internal BI = ClickHouse / Snowflake.

LLM 가 SQL 작성

const sql = await llm.complete({
  system: 'Generate ClickHouse SQL.',
  prompt: `Schema: ${schema}\nQuestion: How many users active in last 7 days?`,
});

// Validate + run

→ Text-to-SQL. ClickHouse 의 syntax 가 약간 다름 → eval 필수.

🤔 의사결정 기준

상황	DB
일반 app	Postgres / MySQL
Analytics dashboard	ClickHouse / Snowflake
Single VM TB	DuckDB
User-facing analytics	Pinot / Druid / StarRocks
HTAP	TiDB / SingleStore / Citus
데이터 < 100 GB	Postgres OK
데이터 1+ TB analytics	Specialized OLAP
Real-time + analytical	Materialize / RisingWave

❌ 안티패턴

Postgres 가 모든 거 (BI): 큰 scan 느림.
OLAP 의 매 row update: 비효율.
Index 무시 (OLTP): full scan.
OLAP 도 transaction 가정: 약함.
ETL 없음: 매 query 가 raw.
Materialized view stale: 잘못 답.
Star schema 없음 OLAP: query 어려움.

🤖 LLM 활용 힌트

OLTP = Postgres. OLAP = ClickHouse / Snowflake / DuckDB.
HTAP 가 매력 — but mature 기술 적음.
CDC 가 OLTP → OLAP 의 답.
Star schema 가 OLAP 의 canonical.

7.8 KiB Raw Blame History