f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
181 lines
5.3 KiB
Markdown
181 lines
5.3 KiB
Markdown
---
|
||
id: wiki-2026-0508-snowflake-data-warehousing
|
||
title: Snowflake Data Warehousing
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [Snowflake, Snowflake DW, Snowflake Cloud Data Platform]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.9
|
||
verification_status: applied
|
||
tags: [database, data-warehouse, cloud, analytics]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: sql
|
||
framework: snowflake
|
||
---
|
||
|
||
# Snowflake Data Warehousing
|
||
|
||
## 매 한 줄
|
||
> **"매 storage 매 separated · 매 compute 매 elastic"**. Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 Architecture (3 layers)
|
||
- **Storage**: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed.
|
||
- **Compute (Virtual Warehouses)**: independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed.
|
||
- **Cloud Services**: metadata · query optimization · auth · 매 stateless brain.
|
||
|
||
### 매 Key features
|
||
- **Zero-copy clone**: instant DB/schema/table copy via metadata.
|
||
- **Time Travel**: query as of 90-day past (Enterprise: 90, default 1).
|
||
- **Streams + Tasks**: CDC + scheduled SQL = native pipeline.
|
||
- **Snowpark**: Python/Scala/Java in-DB compute.
|
||
- **Iceberg tables (2026)**: external open-table format.
|
||
- **Cortex AI**: built-in LLM functions.
|
||
|
||
### 매 응용
|
||
1. Analytical workloads (OLAP, BI).
|
||
2. Data sharing (Secure Data Share — no copy).
|
||
3. ELT with dbt.
|
||
4. ML feature engineering (Snowpark + Cortex).
|
||
|
||
## 💻 패턴
|
||
|
||
### Warehouse sizing & auto-suspend
|
||
```sql
|
||
CREATE WAREHOUSE etl_wh
|
||
WAREHOUSE_SIZE = 'MEDIUM'
|
||
AUTO_SUSPEND = 60
|
||
AUTO_RESUME = TRUE
|
||
MIN_CLUSTER_COUNT = 1
|
||
MAX_CLUSTER_COUNT = 4
|
||
SCALING_POLICY = 'STANDARD';
|
||
```
|
||
|
||
### Copy from S3 (bulk load)
|
||
```sql
|
||
CREATE STAGE my_stage URL='s3://bucket/path/' STORAGE_INTEGRATION = my_int;
|
||
COPY INTO orders
|
||
FROM @my_stage/orders/
|
||
FILE_FORMAT = (TYPE = PARQUET)
|
||
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
|
||
ON_ERROR = 'CONTINUE';
|
||
```
|
||
|
||
### Zero-copy clone for testing
|
||
```sql
|
||
-- 매 instant · 매 storage 추가 X (copy-on-write)
|
||
CREATE DATABASE prod_clone CLONE prod;
|
||
-- 매 dbt CI 매 패턴
|
||
```
|
||
|
||
### Time travel + undrop
|
||
```sql
|
||
SELECT * FROM orders AT (OFFSET => -60*5); -- 5분 전
|
||
SELECT * FROM orders BEFORE (STATEMENT => '01a...');
|
||
UNDROP TABLE orders; -- 매 within retention
|
||
```
|
||
|
||
### Streams + Tasks (CDC pipeline)
|
||
```sql
|
||
CREATE STREAM orders_stream ON TABLE orders;
|
||
CREATE TASK orders_etl
|
||
WAREHOUSE = etl_wh
|
||
SCHEDULE = '5 MINUTE'
|
||
WHEN SYSTEM$STREAM_HAS_DATA('orders_stream')
|
||
AS
|
||
INSERT INTO orders_silver
|
||
SELECT *, CURRENT_TIMESTAMP() AS ingest_ts
|
||
FROM orders_stream;
|
||
ALTER TASK orders_etl RESUME;
|
||
```
|
||
|
||
### Snowpark Python (in-DB compute)
|
||
```python
|
||
from snowflake.snowpark import Session, functions as F
|
||
|
||
sess = Session.builder.configs(cfg).create()
|
||
df = sess.table('orders') \
|
||
.filter(F.col('amount') > 100) \
|
||
.group_by('customer_id') \
|
||
.agg(F.sum('amount').alias('total'))
|
||
df.write.save_as_table('top_customers', mode='overwrite')
|
||
```
|
||
|
||
### Cortex AI (LLM in SQL)
|
||
```sql
|
||
SELECT order_id,
|
||
SNOWFLAKE.CORTEX.SUMMARIZE(review_text) AS summary,
|
||
SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment
|
||
FROM reviews;
|
||
|
||
-- Free-text classify
|
||
SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
|
||
ticket_body,
|
||
['billing','technical','refund','other']
|
||
) FROM tickets;
|
||
```
|
||
|
||
### Iceberg external table (2026)
|
||
```sql
|
||
CREATE ICEBERG TABLE events
|
||
CATALOG = my_glue
|
||
EXTERNAL_VOLUME = my_s3_vol
|
||
CATALOG_TABLE_NAME = 'analytics.events';
|
||
-- 매 Snowflake/Spark/Trino 매 same data.
|
||
```
|
||
|
||
### Cost optimization (Resource Monitor)
|
||
```sql
|
||
CREATE RESOURCE MONITOR rm_dev
|
||
WITH CREDIT_QUOTA = 100
|
||
TRIGGERS
|
||
ON 80 PERCENT DO NOTIFY
|
||
ON 100 PERCENT DO SUSPEND;
|
||
ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = rm_dev;
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Choice |
|
||
|---|---|
|
||
| BI / dashboards | Snowflake + dbt |
|
||
| Open lakehouse | Iceberg + Snowflake/Databricks |
|
||
| Spark-heavy ML | Databricks |
|
||
| GCP-native | BigQuery |
|
||
| Sub-second OLAP | ClickHouse / Druid |
|
||
| Tiny data <100GB | Postgres + DuckDB |
|
||
|
||
**기본값**: Snowflake + dbt + Iceberg (open + managed).
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Data Warehouse]] · [[Cloud Native]]
|
||
- 변형: [[ClickHouse]]
|
||
- 응용: [[Feature Store]]
|
||
- Adjacent: [[Apache Iceberg]] · [[dbt]] · [[Principles of Data Connect]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: SQL tuning suggestion, dbt model scaffolding, Cortex function selection.
|
||
**언제 X**: production query 매 직접 실행 — 매 EXPLAIN + governance review.
|
||
|
||
## ❌ 안티패턴
|
||
- **Always-on warehouse**: AUTO_SUSPEND 미설정 → cost 폭발.
|
||
- **SELECT * on wide table**: columnar 의 이점 매 손실.
|
||
- **One huge warehouse**: workload isolation X — ETL 매 BI 매 contend.
|
||
- **No clustering on huge table**: prune 매 작동 X — full scan.
|
||
- **Copy data instead of Data Share**: governance · cost penalty.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; *Snowflake: The Definitive Guide* 2nd ed).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — full content (architecture + 9 patterns) |
|