[G1-Sync] Manual knowledge update
This commit is contained in:
@@ -2,24 +2,179 @@
|
||||
id: wiki-2026-0508-snowflake-data-warehousing
|
||||
title: Snowflake Data Warehousing
|
||||
category: 10_Wiki/Topics
|
||||
status: merged
|
||||
redirect_to: 데이터_엔지니어링_및_가상_인프라_표준
|
||||
canonical_id: wiki-2026-0508-001
|
||||
aliases: []
|
||||
status: verified
|
||||
canonical_id: self
|
||||
aliases: [Snowflake, Snowflake DW, Snowflake Cloud Data Platform]
|
||||
duplicate_of: none
|
||||
source_trust_level: A
|
||||
confidence_score: 0.92
|
||||
tags: [uncategorized]
|
||||
confidence_score: 0.9
|
||||
verification_status: applied
|
||||
tags: [database, data-warehouse, cloud, analytics]
|
||||
raw_sources: []
|
||||
last_reinforced: 2026-05-08
|
||||
last_reinforced: 2026-05-10
|
||||
github_commit: pending
|
||||
inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
|
||||
tech_stack:
|
||||
language: unspecified
|
||||
framework: unspecified
|
||||
language: sql
|
||||
framework: snowflake
|
||||
---
|
||||
|
||||
# Redirect
|
||||
# Snowflake Data Warehousing
|
||||
|
||||
이 문서는 Canonical 문서인 [[데이터_엔지니어링_및_가상_인프라_표준]]으로 통합되었습니다.
|
||||
모든 최신 지식과 세부 내용은 위 링크를 참조하십시오.
|
||||
## 매 한 줄
|
||||
> **"매 storage 매 separated · 매 compute 매 elastic"**. Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁.
|
||||
|
||||
## 매 핵심
|
||||
|
||||
### 매 Architecture (3 layers)
|
||||
- **Storage**: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed.
|
||||
- **Compute (Virtual Warehouses)**: independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed.
|
||||
- **Cloud Services**: metadata · query optimization · auth · 매 stateless brain.
|
||||
|
||||
### 매 Key features
|
||||
- **Zero-copy clone**: instant DB/schema/table copy via metadata.
|
||||
- **Time Travel**: query as of 90-day past (Enterprise: 90, default 1).
|
||||
- **Streams + Tasks**: CDC + scheduled SQL = native pipeline.
|
||||
- **Snowpark**: Python/Scala/Java in-DB compute.
|
||||
- **Iceberg tables (2026)**: external open-table format.
|
||||
- **Cortex AI**: built-in LLM functions.
|
||||
|
||||
### 매 응용
|
||||
1. Analytical workloads (OLAP, BI).
|
||||
2. Data sharing (Secure Data Share — no copy).
|
||||
3. ELT with dbt.
|
||||
4. ML feature engineering (Snowpark + Cortex).
|
||||
|
||||
## 💻 패턴
|
||||
|
||||
### Warehouse sizing & auto-suspend
|
||||
```sql
|
||||
CREATE WAREHOUSE etl_wh
|
||||
WAREHOUSE_SIZE = 'MEDIUM'
|
||||
AUTO_SUSPEND = 60
|
||||
AUTO_RESUME = TRUE
|
||||
MIN_CLUSTER_COUNT = 1
|
||||
MAX_CLUSTER_COUNT = 4
|
||||
SCALING_POLICY = 'STANDARD';
|
||||
```
|
||||
|
||||
### Copy from S3 (bulk load)
|
||||
```sql
|
||||
CREATE STAGE my_stage URL='s3://bucket/path/' STORAGE_INTEGRATION = my_int;
|
||||
COPY INTO orders
|
||||
FROM @my_stage/orders/
|
||||
FILE_FORMAT = (TYPE = PARQUET)
|
||||
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
|
||||
ON_ERROR = 'CONTINUE';
|
||||
```
|
||||
|
||||
### Zero-copy clone for testing
|
||||
```sql
|
||||
-- 매 instant · 매 storage 추가 X (copy-on-write)
|
||||
CREATE DATABASE prod_clone CLONE prod;
|
||||
-- 매 dbt CI 매 패턴
|
||||
```
|
||||
|
||||
### Time travel + undrop
|
||||
```sql
|
||||
SELECT * FROM orders AT (OFFSET => -60*5); -- 5분 전
|
||||
SELECT * FROM orders BEFORE (STATEMENT => '01a...');
|
||||
UNDROP TABLE orders; -- 매 within retention
|
||||
```
|
||||
|
||||
### Streams + Tasks (CDC pipeline)
|
||||
```sql
|
||||
CREATE STREAM orders_stream ON TABLE orders;
|
||||
CREATE TASK orders_etl
|
||||
WAREHOUSE = etl_wh
|
||||
SCHEDULE = '5 MINUTE'
|
||||
WHEN SYSTEM$STREAM_HAS_DATA('orders_stream')
|
||||
AS
|
||||
INSERT INTO orders_silver
|
||||
SELECT *, CURRENT_TIMESTAMP() AS ingest_ts
|
||||
FROM orders_stream;
|
||||
ALTER TASK orders_etl RESUME;
|
||||
```
|
||||
|
||||
### Snowpark Python (in-DB compute)
|
||||
```python
|
||||
from snowflake.snowpark import Session, functions as F
|
||||
|
||||
sess = Session.builder.configs(cfg).create()
|
||||
df = sess.table('orders') \
|
||||
.filter(F.col('amount') > 100) \
|
||||
.group_by('customer_id') \
|
||||
.agg(F.sum('amount').alias('total'))
|
||||
df.write.save_as_table('top_customers', mode='overwrite')
|
||||
```
|
||||
|
||||
### Cortex AI (LLM in SQL)
|
||||
```sql
|
||||
SELECT order_id,
|
||||
SNOWFLAKE.CORTEX.SUMMARIZE(review_text) AS summary,
|
||||
SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment
|
||||
FROM reviews;
|
||||
|
||||
-- Free-text classify
|
||||
SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
|
||||
ticket_body,
|
||||
['billing','technical','refund','other']
|
||||
) FROM tickets;
|
||||
```
|
||||
|
||||
### Iceberg external table (2026)
|
||||
```sql
|
||||
CREATE ICEBERG TABLE events
|
||||
CATALOG = my_glue
|
||||
EXTERNAL_VOLUME = my_s3_vol
|
||||
CATALOG_TABLE_NAME = 'analytics.events';
|
||||
-- 매 Snowflake/Spark/Trino 매 same data.
|
||||
```
|
||||
|
||||
### Cost optimization (Resource Monitor)
|
||||
```sql
|
||||
CREATE RESOURCE MONITOR rm_dev
|
||||
WITH CREDIT_QUOTA = 100
|
||||
TRIGGERS
|
||||
ON 80 PERCENT DO NOTIFY
|
||||
ON 100 PERCENT DO SUSPEND;
|
||||
ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = rm_dev;
|
||||
```
|
||||
|
||||
## 매 결정 기준
|
||||
| 상황 | Choice |
|
||||
|---|---|
|
||||
| BI / dashboards | Snowflake + dbt |
|
||||
| Open lakehouse | Iceberg + Snowflake/Databricks |
|
||||
| Spark-heavy ML | Databricks |
|
||||
| GCP-native | BigQuery |
|
||||
| Sub-second OLAP | ClickHouse / Druid |
|
||||
| Tiny data <100GB | Postgres + DuckDB |
|
||||
|
||||
**기본값**: Snowflake + dbt + Iceberg (open + managed).
|
||||
|
||||
## 🔗 Graph
|
||||
- 부모: [[Data Warehouse]] · [[Cloud Native]]
|
||||
- 변형: [[BigQuery]] · [[Databricks]] · [[Redshift]] · [[ClickHouse]]
|
||||
- 응용: [[ELT Pattern]] · [[Data Sharing]] · [[Feature Store]]
|
||||
- Adjacent: [[Apache Iceberg]] · [[dbt]] · [[Snowpark]] · [[Principles of Data Connect]]
|
||||
|
||||
## 🤖 LLM 활용
|
||||
**언제**: SQL tuning suggestion, dbt model scaffolding, Cortex function selection.
|
||||
**언제 X**: production query 매 직접 실행 — 매 EXPLAIN + governance review.
|
||||
|
||||
## ❌ 안티패턴
|
||||
- **Always-on warehouse**: AUTO_SUSPEND 미설정 → cost 폭발.
|
||||
- **SELECT * on wide table**: columnar 의 이점 매 손실.
|
||||
- **One huge warehouse**: workload isolation X — ETL 매 BI 매 contend.
|
||||
- **No clustering on huge table**: prune 매 작동 X — full scan.
|
||||
- **Copy data instead of Data Share**: governance · cost penalty.
|
||||
|
||||
## 🧪 검증 / 중복
|
||||
- Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; *Snowflake: The Definitive Guide* 2nd ed).
|
||||
- 신뢰도 A.
|
||||
|
||||
## 🕓 Changelog
|
||||
| 날짜 | 변경 |
|
||||
|---|---|
|
||||
| 2026-05-08 | Phase 1 |
|
||||
| 2026-05-10 | Manual cleanup — full content (architecture + 9 patterns) |
|
||||
|
||||
Reference in New Issue
Block a user