[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,301 @@
|
||||
---
|
||||
id: db-duckdb-embedded
|
||||
title: DuckDB — Embedded OLAP / Local Analytics
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [database, duckdb, olap, embedded, vibe-coding]
|
||||
tech_stack: { language: "TS / Python / SQL", applicable_to: ["Backend", "Frontend"] }
|
||||
applied_in: []
|
||||
aliases: [DuckDB, MotherDuck, embedded analytics, columnar SQLite, Parquet query]
|
||||
---
|
||||
|
||||
# DuckDB
|
||||
|
||||
> SQLite 의 OLAP 버전. **Embedded columnar DB — 단일 파일, in-process, 매우 빠른 analytic query**. Parquet / CSV 직접 query. ClickHouse / BigQuery 의 단일 노드 alternative.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- Embedded: process 안 (no server).
|
||||
- Columnar: analytic 빠름.
|
||||
- File: .duckdb 단일 파일 또는 in-memory.
|
||||
- Federation: Parquet / CSV / S3 직접 query.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### Node 사용
|
||||
```bash
|
||||
yarn add @duckdb/node-api
|
||||
```
|
||||
|
||||
```ts
|
||||
import { DuckDBInstance } from '@duckdb/node-api';
|
||||
|
||||
const db = await DuckDBInstance.create('app.duckdb');
|
||||
const conn = await db.connect();
|
||||
|
||||
await conn.run(`
|
||||
CREATE TABLE orders (
|
||||
id UUID,
|
||||
user_id UUID,
|
||||
amount DECIMAL(10, 2),
|
||||
created_at TIMESTAMP
|
||||
)
|
||||
`);
|
||||
|
||||
await conn.run(`INSERT INTO orders VALUES (?, ?, ?, ?)`, [id, userId, 99.50, new Date()]);
|
||||
|
||||
const result = await conn.run(`SELECT user_id, SUM(amount) FROM orders GROUP BY user_id`);
|
||||
const rows = result.getRows();
|
||||
```
|
||||
|
||||
### Python
|
||||
```python
|
||||
import duckdb
|
||||
|
||||
con = duckdb.connect('app.duckdb')
|
||||
con.execute('CREATE TABLE orders (...)')
|
||||
con.execute('INSERT INTO orders VALUES (...)', [...])
|
||||
|
||||
df = con.execute('SELECT * FROM orders').df() # pandas
|
||||
```
|
||||
|
||||
### Parquet 직접 query
|
||||
```sql
|
||||
-- 파일 직접 (no import)
|
||||
SELECT * FROM 'data.parquet';
|
||||
|
||||
-- 여러 파일
|
||||
SELECT * FROM 'data/*.parquet';
|
||||
|
||||
-- Hive-partitioned
|
||||
SELECT * FROM 'data/year=*/month=*/data.parquet';
|
||||
|
||||
-- Aggregate
|
||||
SELECT date, count(*) FROM 'events_*.parquet' GROUP BY date;
|
||||
```
|
||||
|
||||
### S3 / HTTP
|
||||
```sql
|
||||
INSTALL httpfs; LOAD httpfs;
|
||||
|
||||
SELECT * FROM 's3://bucket/data.parquet';
|
||||
SELECT * FROM 'https://example.com/data.csv';
|
||||
|
||||
-- Credentials
|
||||
SET s3_region = 'us-east-1';
|
||||
SET s3_access_key_id = '...';
|
||||
SET s3_secret_access_key = '...';
|
||||
```
|
||||
|
||||
### Iceberg / Delta
|
||||
```sql
|
||||
INSTALL iceberg; LOAD iceberg;
|
||||
SELECT * FROM iceberg_scan('s3://bucket/orders');
|
||||
|
||||
INSTALL delta; LOAD delta;
|
||||
SELECT * FROM delta_scan('s3://bucket/orders');
|
||||
```
|
||||
|
||||
→ Lakehouse 직접 query. 작은 cluster 또는 dev.
|
||||
|
||||
### Postgres 직접 (federate)
|
||||
```sql
|
||||
INSTALL postgres; LOAD postgres;
|
||||
ATTACH 'postgresql://user:pw@host/db' AS pg;
|
||||
|
||||
SELECT * FROM pg.public.users WHERE created_at > '2026-01-01';
|
||||
|
||||
-- DuckDB 가 push down 가능한 filter 그렇게.
|
||||
```
|
||||
|
||||
→ Postgres 안 데이터 + DuckDB 의 analytic 함께.
|
||||
|
||||
### CSV / JSON
|
||||
```sql
|
||||
SELECT * FROM read_csv('data.csv', header=true);
|
||||
SELECT * FROM read_csv_auto('data.csv'); -- 자동 schema
|
||||
|
||||
SELECT * FROM read_json('data.json');
|
||||
SELECT * FROM read_json_auto('data.ndjson');
|
||||
```
|
||||
|
||||
### Window / analytic
|
||||
```sql
|
||||
SELECT
|
||||
user_id,
|
||||
amount,
|
||||
SUM(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total,
|
||||
LAG(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_amount,
|
||||
RANK() OVER (ORDER BY amount DESC) AS rank
|
||||
FROM orders;
|
||||
```
|
||||
|
||||
→ Window function full 지원.
|
||||
|
||||
### MotherDuck (managed)
|
||||
```sql
|
||||
ATTACH 'md:my_database' AS cloud;
|
||||
|
||||
SELECT * FROM cloud.orders WHERE date > '2026-05-01';
|
||||
|
||||
-- Local + cloud 혼합
|
||||
SELECT * FROM local_orders UNION ALL SELECT * FROM cloud.orders;
|
||||
```
|
||||
|
||||
→ DuckDB 클라우드 — local query + cloud sync.
|
||||
|
||||
### Use case
|
||||
```
|
||||
1. ETL / data prep:
|
||||
- Parquet 변환
|
||||
- Aggregate 계산
|
||||
- dbt 와 통합
|
||||
|
||||
2. Local analytics:
|
||||
- 큰 CSV 분석
|
||||
- Notebook 안
|
||||
|
||||
3. Embedded analytics in app:
|
||||
- Small / medium dataset (~100GB)
|
||||
- 빠른 query (BigQuery 같지만 local)
|
||||
|
||||
4. Test fixture:
|
||||
- dbt local dev
|
||||
- Production analytic 모델 검증
|
||||
|
||||
5. Edge analytics:
|
||||
- Cloudflare D1 alternative (analytic)
|
||||
```
|
||||
|
||||
### Use case 안 적합
|
||||
```
|
||||
- OLTP (transaction, write-heavy concurrent)
|
||||
- 매우 큰 (TB+) — Snowflake / BigQuery
|
||||
- 분산 cluster 필요
|
||||
```
|
||||
|
||||
### Performance
|
||||
```
|
||||
1B rows aggregate: ~10s on laptop.
|
||||
10M rows complex query: ms.
|
||||
|
||||
vs Postgres: 5-50x analytic.
|
||||
vs Pandas: 메모리 효율, parallel.
|
||||
```
|
||||
|
||||
### vs SQLite
|
||||
```
|
||||
SQLite: OLTP, row-oriented.
|
||||
DuckDB: OLAP, columnar.
|
||||
|
||||
DuckDB 가 SQLite read.
|
||||
```
|
||||
|
||||
```sql
|
||||
INSTALL sqlite; LOAD sqlite;
|
||||
ATTACH 'app.sqlite' AS s (TYPE SQLITE);
|
||||
SELECT * FROM s.users;
|
||||
```
|
||||
|
||||
### Concurrent access
|
||||
```
|
||||
DuckDB 는 single-writer (concurrent read OK).
|
||||
Concurrent write = lock.
|
||||
|
||||
→ Process 1개 또는 외부 sync.
|
||||
```
|
||||
|
||||
### React (browser)
|
||||
```ts
|
||||
import * as duckdb from '@duckdb/duckdb-wasm';
|
||||
|
||||
const bundles = duckdb.getJsDelivrBundles();
|
||||
const bundle = await duckdb.selectBundle(bundles);
|
||||
const worker = new Worker(bundle.mainWorker!);
|
||||
const logger = new duckdb.ConsoleLogger();
|
||||
const db = new duckdb.AsyncDuckDB(logger, worker);
|
||||
await db.instantiate(bundle.mainModule);
|
||||
|
||||
const conn = await db.connect();
|
||||
await conn.query(`SELECT * FROM 'https://example.com/data.parquet'`);
|
||||
```
|
||||
|
||||
→ Browser 안 SQL 분석. WASM 빌드.
|
||||
|
||||
### Dataframe-like API
|
||||
```python
|
||||
con.sql('SELECT * FROM orders').df() # pandas
|
||||
con.sql('SELECT * FROM orders').arrow() # PyArrow
|
||||
con.sql('SELECT * FROM orders').pl() # polars
|
||||
```
|
||||
|
||||
### CLI
|
||||
```bash
|
||||
duckdb my.duckdb
|
||||
> SELECT * FROM 'data.parquet';
|
||||
> .mode json
|
||||
> SELECT * FROM users LIMIT 5;
|
||||
> .schema users
|
||||
```
|
||||
|
||||
### Persistent vs in-memory
|
||||
```ts
|
||||
const db = await DuckDBInstance.create(); // in-memory
|
||||
const db = await DuckDBInstance.create('app.db'); // file
|
||||
```
|
||||
|
||||
### Migration / schema
|
||||
```sql
|
||||
-- DuckDB 도 일반 DDL
|
||||
ALTER TABLE orders ADD COLUMN status VARCHAR;
|
||||
CREATE INDEX orders_user ON orders(user_id);
|
||||
|
||||
-- Constraint
|
||||
CREATE TABLE users (
|
||||
id UUID PRIMARY KEY,
|
||||
email VARCHAR UNIQUE NOT NULL CHECK (email LIKE '%@%')
|
||||
);
|
||||
```
|
||||
|
||||
### Backup
|
||||
```bash
|
||||
# 단순 — file copy
|
||||
cp app.duckdb app.duckdb.bak
|
||||
|
||||
# 또는 export
|
||||
EXPORT DATABASE 'export_dir';
|
||||
IMPORT DATABASE 'export_dir';
|
||||
```
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 상황 | 추천 |
|
||||
|---|---|
|
||||
| Embedded analytics | DuckDB |
|
||||
| 큰 CSV / Parquet 분석 | DuckDB |
|
||||
| ETL / dbt local | DuckDB |
|
||||
| OLTP | Postgres / SQLite |
|
||||
| 분산 / TB+ | Snowflake / BigQuery / ClickHouse |
|
||||
| Browser analytics | DuckDB-wasm |
|
||||
| Edge | Cloudflare D1 (SQLite) |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **OLTP write 많음**: SQLite 가 낫다.
|
||||
- **Concurrent writers**: lock contention.
|
||||
- **모든 데이터 in-memory + 큰 dataset**: OOM. 파일.
|
||||
- **Schema drift (read auto-detect 매번)**: 고정 schema.
|
||||
- **Index 없는 큰 join**: composite index.
|
||||
- **No backup**: file 손실 = 영원.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- ETL / 분석 / dbt local = DuckDB.
|
||||
- Parquet / S3 / Iceberg 직접 query.
|
||||
- Postgres + DuckDB federation.
|
||||
- WASM = browser 안 SQL.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[DB_ClickHouse_OLAP]]
|
||||
- [[Data_Eng_Lakehouse]]
|
||||
- [[Data_Eng_dbt]]
|
||||
Reference in New Issue
Block a user