[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -0,0 +1,330 @@
|
||||
---
|
||||
id: data-eng-dbt
|
||||
title: dbt — SQL Transform / Test / Doc
|
||||
category: Coding
|
||||
status: draft
|
||||
source_trust_level: B
|
||||
verification_status: conceptual
|
||||
created_at: 2026-05-09
|
||||
updated_at: 2026-05-09
|
||||
tags: [data-engineering, dbt, sql, vibe-coding]
|
||||
tech_stack: { language: "SQL / Jinja", applicable_to: ["Data Engineering"] }
|
||||
applied_in: []
|
||||
aliases: [dbt, dbt-core, model, source, seed, snapshot, macro]
|
||||
---
|
||||
|
||||
# dbt (data build tool)
|
||||
|
||||
> SQL transform 의 modern 표준. **Model = SELECT, ref / source = lineage, test = data quality, docs = 자동**. ELT (Extract Load Transform) 의 T.
|
||||
|
||||
## 📖 핵심 개념
|
||||
- Model: `.sql` file = SELECT → table / view.
|
||||
- Source: 외부 raw table.
|
||||
- Ref: 다른 model 참조.
|
||||
- Test: 데이터 품질 검사.
|
||||
- Snapshot: SCD Type 2.
|
||||
|
||||
## 💻 코드 패턴
|
||||
|
||||
### 폴더 구조
|
||||
```
|
||||
dbt_project/
|
||||
├── dbt_project.yml
|
||||
├── models/
|
||||
│ ├── staging/
|
||||
│ │ ├── stg_orders.sql
|
||||
│ │ └── stg_users.sql
|
||||
│ ├── marts/
|
||||
│ │ ├── core/
|
||||
│ │ │ ├── dim_users.sql
|
||||
│ │ │ └── fct_orders.sql
|
||||
│ │ └── finance/
|
||||
│ │ └── revenue_daily.sql
|
||||
│ └── schema.yml
|
||||
├── macros/
|
||||
├── tests/
|
||||
├── seeds/
|
||||
└── snapshots/
|
||||
```
|
||||
|
||||
### Model
|
||||
```sql
|
||||
-- models/staging/stg_orders.sql
|
||||
{{ config(materialized='view') }}
|
||||
|
||||
select
|
||||
id as order_id,
|
||||
user_id,
|
||||
amount,
|
||||
status,
|
||||
created_at
|
||||
from {{ source('raw', 'orders') }}
|
||||
where status != 'cancelled'
|
||||
```
|
||||
|
||||
```sql
|
||||
-- models/marts/core/fct_orders.sql
|
||||
{{ config(
|
||||
materialized='incremental',
|
||||
unique_key='order_id',
|
||||
on_schema_change='fail'
|
||||
) }}
|
||||
|
||||
select
|
||||
o.order_id,
|
||||
o.user_id,
|
||||
u.email,
|
||||
o.amount,
|
||||
o.created_at
|
||||
from {{ ref('stg_orders') }} o
|
||||
left join {{ ref('dim_users') }} u using (user_id)
|
||||
|
||||
{% if is_incremental() %}
|
||||
where o.created_at > (select max(created_at) from {{ this }})
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
### sources / schema
|
||||
```yaml
|
||||
# models/staging/sources.yml
|
||||
version: 2
|
||||
|
||||
sources:
|
||||
- name: raw
|
||||
database: production
|
||||
schema: public
|
||||
tables:
|
||||
- name: orders
|
||||
loaded_at_field: _ingested_at
|
||||
freshness:
|
||||
warn_after: { count: 1, period: hour }
|
||||
error_after: { count: 4, period: hour }
|
||||
- name: users
|
||||
```
|
||||
|
||||
### Tests
|
||||
```yaml
|
||||
# models/marts/core/schema.yml
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: dim_users
|
||||
description: User dimension
|
||||
columns:
|
||||
- name: user_id
|
||||
description: Primary key
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: email
|
||||
tests:
|
||||
- not_null
|
||||
- unique
|
||||
- name: plan
|
||||
tests:
|
||||
- accepted_values:
|
||||
values: ['free', 'pro', 'enterprise']
|
||||
|
||||
- name: fct_orders
|
||||
columns:
|
||||
- name: order_id
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: user_id
|
||||
tests:
|
||||
- relationships:
|
||||
to: ref('dim_users')
|
||||
field: user_id
|
||||
```
|
||||
|
||||
### Custom test
|
||||
```sql
|
||||
-- tests/positive_amount.sql
|
||||
select * from {{ ref('fct_orders') }}
|
||||
where amount <= 0
|
||||
```
|
||||
|
||||
### Macro (재사용)
|
||||
```sql
|
||||
-- macros/clean_string.sql
|
||||
{% macro clean_string(col) %}
|
||||
trim(lower({{ col }}))
|
||||
{% endmacro %}
|
||||
|
||||
-- 사용
|
||||
select {{ clean_string('email') }} as email_clean
|
||||
from {{ ref('stg_users') }}
|
||||
```
|
||||
|
||||
### Materialization 종류
|
||||
```
|
||||
view: SELECT 마다 — 작은 / 자주 변경
|
||||
table: 전체 rebuild — 작은-중간
|
||||
incremental: 변경 분만 추가 — 큰 fact
|
||||
ephemeral: CTE inline — temp transformation
|
||||
snapshot: SCD Type 2 (history)
|
||||
```
|
||||
|
||||
### Snapshot (SCD Type 2 — history)
|
||||
```sql
|
||||
-- snapshots/users_snapshot.sql
|
||||
{% snapshot users_snapshot %}
|
||||
{{ config(
|
||||
target_schema='snapshots',
|
||||
unique_key='user_id',
|
||||
strategy='check',
|
||||
check_cols=['email', 'plan'],
|
||||
) }}
|
||||
select user_id, email, plan, updated_at from {{ source('raw', 'users') }}
|
||||
{% endsnapshot %}
|
||||
```
|
||||
|
||||
```bash
|
||||
dbt snapshot
|
||||
```
|
||||
|
||||
→ 변경 추적: `dbt_valid_from`, `dbt_valid_to`.
|
||||
|
||||
### Seed (작은 lookup table)
|
||||
```csv
|
||||
# seeds/country_codes.csv
|
||||
country_code,country_name
|
||||
US,United States
|
||||
KR,Korea
|
||||
JP,Japan
|
||||
```
|
||||
|
||||
```bash
|
||||
dbt seed
|
||||
```
|
||||
|
||||
```sql
|
||||
select * from {{ ref('country_codes') }}
|
||||
```
|
||||
|
||||
### Run / test
|
||||
```bash
|
||||
dbt run # 모든 model 빌드
|
||||
dbt run --select stg_orders+ # stg_orders 와 downstream
|
||||
dbt run --select +dim_users # dim_users 와 upstream
|
||||
|
||||
dbt test
|
||||
dbt test --select dim_users
|
||||
|
||||
# 첫 build = full, 이후 = incremental
|
||||
dbt build # run + test 한 번
|
||||
```
|
||||
|
||||
### Docs
|
||||
```bash
|
||||
dbt docs generate
|
||||
dbt docs serve # web UI
|
||||
|
||||
# Lineage graph + column descriptions
|
||||
```
|
||||
|
||||
### CI
|
||||
```yaml
|
||||
- run: dbt deps
|
||||
- run: dbt seed --target ci
|
||||
- run: dbt run --target ci
|
||||
- run: dbt test --target ci
|
||||
- run: dbt source freshness # source 신선도
|
||||
```
|
||||
|
||||
```yaml
|
||||
# .github/workflows/dbt.yml
|
||||
- run: dbt build --select state:modified+ --defer --state ./prod-manifest
|
||||
```
|
||||
|
||||
→ 변경된 model + downstream 만 build.
|
||||
|
||||
### Adapter (warehouse)
|
||||
```
|
||||
dbt-postgres, dbt-snowflake, dbt-bigquery, dbt-redshift, dbt-databricks, dbt-duckdb
|
||||
```
|
||||
|
||||
→ 같은 코드 다른 warehouse.
|
||||
|
||||
### profiles.yml (connection)
|
||||
```yaml
|
||||
my_project:
|
||||
outputs:
|
||||
dev:
|
||||
type: postgres
|
||||
host: localhost
|
||||
user: dev
|
||||
password: "{{ env_var('DB_PW') }}"
|
||||
dbname: dev
|
||||
schema: dbt_dev
|
||||
prod:
|
||||
type: postgres
|
||||
host: prod.example.com
|
||||
user: dbt_prod
|
||||
schema: analytics
|
||||
target: dev
|
||||
```
|
||||
|
||||
### Performance
|
||||
```sql
|
||||
-- Incremental + clustering / partition
|
||||
{{ config(
|
||||
materialized='incremental',
|
||||
unique_key='order_id',
|
||||
incremental_strategy='merge',
|
||||
partition_by={'field': 'date', 'data_type': 'date'},
|
||||
cluster_by=['user_id'],
|
||||
) }}
|
||||
```
|
||||
|
||||
→ BigQuery / Snowflake clustering.
|
||||
|
||||
### dbt Cloud / Mesh (큰 조직)
|
||||
- 매니지드 dbt run.
|
||||
- Cross-project (dbt Mesh).
|
||||
- IDE in browser.
|
||||
- Scheduling.
|
||||
|
||||
→ 또는 Airflow / Dagster 가 dbt 호출.
|
||||
|
||||
### Dagster + dbt
|
||||
```python
|
||||
from dagster_dbt import dbt_assets, DbtCliResource
|
||||
|
||||
@dbt_assets(manifest=Path('target/manifest.json'))
|
||||
def my_dbt_assets(context, dbt: DbtCliResource):
|
||||
yield from dbt.cli(['build'], context=context).stream()
|
||||
```
|
||||
|
||||
→ dbt model 가 Dagster asset 으로 자동.
|
||||
|
||||
## 🤔 의사결정 기준
|
||||
| 작업 | 추천 |
|
||||
|---|---|
|
||||
| SQL transform | dbt |
|
||||
| Python ML pipeline | Dagster / Airflow |
|
||||
| Streaming | Flink / Spark Structured |
|
||||
| Schema migration | dbt-snapshot 또는 dbt-history |
|
||||
| 작은 / 단일 SQL | 직접 |
|
||||
| 큰 organization | dbt Cloud / Dagster |
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **모든 model `materialized='table'`**: rebuild 비쌈. incremental.
|
||||
- **Test 없는 model**: 데이터 quality 모름.
|
||||
- **Source freshness 없음**: stale 데이터 모름.
|
||||
- **Schema.yml 없음**: column descriptions 없음.
|
||||
- **Macro 남발**: 가독성 떨어짐.
|
||||
- **Production 에 직접 dbt run**: scheduler 없이.
|
||||
- **Seed 큰 데이터 (>10MB)**: ETL 로.
|
||||
|
||||
## 🤖 LLM 활용 힌트
|
||||
- staging / marts 분리 표준.
|
||||
- ref / source 항상.
|
||||
- Test (unique, not_null, relationships) 필수.
|
||||
- Incremental 큰 table.
|
||||
|
||||
## 🔗 관련 문서
|
||||
- [[Data_Eng_Airflow_Dagster]]
|
||||
- [[Data_Eng_Lakehouse]]
|
||||
- [[DB_ClickHouse_OLAP]]
|
||||
Reference in New Issue
Block a user