--- id: db-duckdb-embedded title: DuckDB — Embedded OLAP / Local Analytics category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [database, duckdb, olap, embedded, vibe-coding] tech_stack: { language: "TS / Python / SQL", applicable_to: ["Backend", "Frontend"] } applied_in: [] aliases: [DuckDB, MotherDuck, embedded analytics, columnar SQLite, Parquet query] --- # DuckDB > SQLite 의 OLAP 버전. **Embedded columnar DB — 단일 파일, in-process, 매우 빠른 analytic query**. Parquet / CSV 직접 query. ClickHouse / BigQuery 의 단일 노드 alternative. ## 📖 핵심 개념 - Embedded: process 안 (no server). - Columnar: analytic 빠름. - File: .duckdb 단일 파일 또는 in-memory. - Federation: Parquet / CSV / S3 직접 query. ## 💻 코드 패턴 ### Node 사용 ```bash yarn add @duckdb/node-api ``` ```ts import { DuckDBInstance } from '@duckdb/node-api'; const db = await DuckDBInstance.create('app.duckdb'); const conn = await db.connect(); await conn.run(` CREATE TABLE orders ( id UUID, user_id UUID, amount DECIMAL(10, 2), created_at TIMESTAMP ) `); await conn.run(`INSERT INTO orders VALUES (?, ?, ?, ?)`, [id, userId, 99.50, new Date()]); const result = await conn.run(`SELECT user_id, SUM(amount) FROM orders GROUP BY user_id`); const rows = result.getRows(); ``` ### Python ```python import duckdb con = duckdb.connect('app.duckdb') con.execute('CREATE TABLE orders (...)') con.execute('INSERT INTO orders VALUES (...)', [...]) df = con.execute('SELECT * FROM orders').df() # pandas ``` ### Parquet 직접 query ```sql -- 파일 직접 (no import) SELECT * FROM 'data.parquet'; -- 여러 파일 SELECT * FROM 'data/*.parquet'; -- Hive-partitioned SELECT * FROM 'data/year=*/month=*/data.parquet'; -- Aggregate SELECT date, count(*) FROM 'events_*.parquet' GROUP BY date; ``` ### S3 / HTTP ```sql INSTALL httpfs; LOAD httpfs; SELECT * FROM 's3://bucket/data.parquet'; SELECT * FROM 'https://example.com/data.csv'; -- Credentials SET s3_region = 'us-east-1'; SET s3_access_key_id = '...'; SET s3_secret_access_key = '...'; ``` ### Iceberg / Delta ```sql INSTALL iceberg; LOAD iceberg; SELECT * FROM iceberg_scan('s3://bucket/orders'); INSTALL delta; LOAD delta; SELECT * FROM delta_scan('s3://bucket/orders'); ``` → Lakehouse 직접 query. 작은 cluster 또는 dev. ### Postgres 직접 (federate) ```sql INSTALL postgres; LOAD postgres; ATTACH 'postgresql://user:pw@host/db' AS pg; SELECT * FROM pg.public.users WHERE created_at > '2026-01-01'; -- DuckDB 가 push down 가능한 filter 그렇게. ``` → Postgres 안 데이터 + DuckDB 의 analytic 함께. ### CSV / JSON ```sql SELECT * FROM read_csv('data.csv', header=true); SELECT * FROM read_csv_auto('data.csv'); -- 자동 schema SELECT * FROM read_json('data.json'); SELECT * FROM read_json_auto('data.ndjson'); ``` ### Window / analytic ```sql SELECT user_id, amount, SUM(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total, LAG(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_amount, RANK() OVER (ORDER BY amount DESC) AS rank FROM orders; ``` → Window function full 지원. ### MotherDuck (managed) ```sql ATTACH 'md:my_database' AS cloud; SELECT * FROM cloud.orders WHERE date > '2026-05-01'; -- Local + cloud 혼합 SELECT * FROM local_orders UNION ALL SELECT * FROM cloud.orders; ``` → DuckDB 클라우드 — local query + cloud sync. ### Use case ``` 1. ETL / data prep: - Parquet 변환 - Aggregate 계산 - dbt 와 통합 2. Local analytics: - 큰 CSV 분석 - Notebook 안 3. Embedded analytics in app: - Small / medium dataset (~100GB) - 빠른 query (BigQuery 같지만 local) 4. Test fixture: - dbt local dev - Production analytic 모델 검증 5. Edge analytics: - Cloudflare D1 alternative (analytic) ``` ### Use case 안 적합 ``` - OLTP (transaction, write-heavy concurrent) - 매우 큰 (TB+) — Snowflake / BigQuery - 분산 cluster 필요 ``` ### Performance ``` 1B rows aggregate: ~10s on laptop. 10M rows complex query: ms. vs Postgres: 5-50x analytic. vs Pandas: 메모리 효율, parallel. ``` ### vs SQLite ``` SQLite: OLTP, row-oriented. DuckDB: OLAP, columnar. DuckDB 가 SQLite read. ``` ```sql INSTALL sqlite; LOAD sqlite; ATTACH 'app.sqlite' AS s (TYPE SQLITE); SELECT * FROM s.users; ``` ### Concurrent access ``` DuckDB 는 single-writer (concurrent read OK). Concurrent write = lock. → Process 1개 또는 외부 sync. ``` ### React (browser) ```ts import * as duckdb from '@duckdb/duckdb-wasm'; const bundles = duckdb.getJsDelivrBundles(); const bundle = await duckdb.selectBundle(bundles); const worker = new Worker(bundle.mainWorker!); const logger = new duckdb.ConsoleLogger(); const db = new duckdb.AsyncDuckDB(logger, worker); await db.instantiate(bundle.mainModule); const conn = await db.connect(); await conn.query(`SELECT * FROM 'https://example.com/data.parquet'`); ``` → Browser 안 SQL 분석. WASM 빌드. ### Dataframe-like API ```python con.sql('SELECT * FROM orders').df() # pandas con.sql('SELECT * FROM orders').arrow() # PyArrow con.sql('SELECT * FROM orders').pl() # polars ``` ### CLI ```bash duckdb my.duckdb > SELECT * FROM 'data.parquet'; > .mode json > SELECT * FROM users LIMIT 5; > .schema users ``` ### Persistent vs in-memory ```ts const db = await DuckDBInstance.create(); // in-memory const db = await DuckDBInstance.create('app.db'); // file ``` ### Migration / schema ```sql -- DuckDB 도 일반 DDL ALTER TABLE orders ADD COLUMN status VARCHAR; CREATE INDEX orders_user ON orders(user_id); -- Constraint CREATE TABLE users ( id UUID PRIMARY KEY, email VARCHAR UNIQUE NOT NULL CHECK (email LIKE '%@%') ); ``` ### Backup ```bash # 단순 — file copy cp app.duckdb app.duckdb.bak # 또는 export EXPORT DATABASE 'export_dir'; IMPORT DATABASE 'export_dir'; ``` ## 🤔 의사결정 기준 | 상황 | 추천 | |---|---| | Embedded analytics | DuckDB | | 큰 CSV / Parquet 분석 | DuckDB | | ETL / dbt local | DuckDB | | OLTP | Postgres / SQLite | | 분산 / TB+ | Snowflake / BigQuery / ClickHouse | | Browser analytics | DuckDB-wasm | | Edge | Cloudflare D1 (SQLite) | ## ❌ 안티패턴 - **OLTP write 많음**: SQLite 가 낫다. - **Concurrent writers**: lock contention. - **모든 데이터 in-memory + 큰 dataset**: OOM. 파일. - **Schema drift (read auto-detect 매번)**: 고정 schema. - **Index 없는 큰 join**: composite index. - **No backup**: file 손실 = 영원. ## 🤖 LLM 활용 힌트 - ETL / 분석 / dbt local = DuckDB. - Parquet / S3 / Iceberg 직접 query. - Postgres + DuckDB federation. - WASM = browser 안 SQL. ## 🔗 관련 문서 - [[DB_ClickHouse_OLAP]] - [[Data_Eng_Lakehouse]] - [[Data_Eng_dbt]]