2nd/10_Wiki/Topics/Backend/Snowflake-Data-Warehousing.md

---
id: wiki-2026-0508-snowflake-data-warehousing
title: Snowflake Data Warehousing
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Snowflake, Snowflake DW, Snowflake Cloud Data Platform]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [database, data-warehouse, cloud, analytics]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: sql
  framework: snowflake
---

# Snowflake Data Warehousing

## 매 한 줄
> **"매 storage 매 separated · 매 compute 매 elastic"**. Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁.

## 매 핵심

### 매 Architecture (3 layers)
- **Storage**: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed.
- **Compute (Virtual Warehouses)**: independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed.
- **Cloud Services**: metadata · query optimization · auth · 매 stateless brain.

### 매 Key features
- **Zero-copy clone**: instant DB/schema/table copy via metadata.
- **Time Travel**: query as of 90-day past (Enterprise: 90, default 1).
- **Streams + Tasks**: CDC + scheduled SQL = native pipeline.
- **Snowpark**: Python/Scala/Java in-DB compute.
- **Iceberg tables (2026)**: external open-table format.
- **Cortex AI**: built-in LLM functions.

### 매 응용
1. Analytical workloads (OLAP, BI).
2. Data sharing (Secure Data Share — no copy).
3. ELT with dbt.
4. ML feature engineering (Snowpark + Cortex).

## 💻 패턴

### Warehouse sizing & auto-suspend
```sql
CREATE WAREHOUSE etl_wh
  WAREHOUSE_SIZE = 'MEDIUM'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 4
  SCALING_POLICY = 'STANDARD';
```

### Copy from S3 (bulk load)
```sql
CREATE STAGE my_stage URL='s3://bucket/path/' STORAGE_INTEGRATION = my_int;
COPY INTO orders
  FROM @my_stage/orders/
  FILE_FORMAT = (TYPE = PARQUET)
  MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
  ON_ERROR = 'CONTINUE';
```

### Zero-copy clone for testing
```sql
-- 매 instant · 매 storage 추가 X (copy-on-write)
CREATE DATABASE prod_clone CLONE prod;
-- 매 dbt CI 매 패턴
```

### Time travel + undrop
```sql
SELECT * FROM orders AT (OFFSET => -60*5);          -- 5분 전
SELECT * FROM orders BEFORE (STATEMENT => '01a...');
UNDROP TABLE orders;                                 -- 매 within retention
```

### Streams + Tasks (CDC pipeline)
```sql
CREATE STREAM orders_stream ON TABLE orders;
CREATE TASK orders_etl
  WAREHOUSE = etl_wh
  SCHEDULE = '5 MINUTE'
  WHEN SYSTEM$STREAM_HAS_DATA('orders_stream')
AS
  INSERT INTO orders_silver
  SELECT *, CURRENT_TIMESTAMP() AS ingest_ts
  FROM orders_stream;
ALTER TASK orders_etl RESUME;
```

### Snowpark Python (in-DB compute)
```python
from snowflake.snowpark import Session, functions as F

sess = Session.builder.configs(cfg).create()
df = sess.table('orders') \
  .filter(F.col('amount') > 100) \
  .group_by('customer_id') \
  .agg(F.sum('amount').alias('total'))
df.write.save_as_table('top_customers', mode='overwrite')
```

### Cortex AI (LLM in SQL)
```sql
SELECT order_id,
  SNOWFLAKE.CORTEX.SUMMARIZE(review_text) AS summary,
  SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment
FROM reviews;

-- Free-text classify
SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
  ticket_body,
  ['billing','technical','refund','other']
) FROM tickets;
```

### Iceberg external table (2026)
```sql
CREATE ICEBERG TABLE events
  CATALOG = my_glue
  EXTERNAL_VOLUME = my_s3_vol
  CATALOG_TABLE_NAME = 'analytics.events';
-- 매 Snowflake/Spark/Trino 매 same data.
```

### Cost optimization (Resource Monitor)
```sql
CREATE RESOURCE MONITOR rm_dev
  WITH CREDIT_QUOTA = 100
  TRIGGERS
    ON 80 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;
ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = rm_dev;
```

## 매 결정 기준
| 상황 | Choice |
|---|---|
| BI / dashboards | Snowflake + dbt |
| Open lakehouse | Iceberg + Snowflake/Databricks |
| Spark-heavy ML | Databricks |
| GCP-native | BigQuery |
| Sub-second OLAP | ClickHouse / Druid |
| Tiny data <100GB | Postgres + DuckDB |

**기본값**: Snowflake + dbt + Iceberg (open + managed).

## 🔗 Graph
- 부모: [[Data Warehouse]] · [[Cloud Native]]
- 변형: [[ClickHouse]]
- 응용: [[Feature Store]]
- Adjacent: [[Apache Iceberg]] · [[dbt]] · [[Principles of Data Connect]]

## 🤖 LLM 활용
**언제**: SQL tuning suggestion, dbt model scaffolding, Cortex function selection.
**언제 X**: production query 매 직접 실행 — 매 EXPLAIN + governance review.

## ❌ 안티패턴
- **Always-on warehouse**: AUTO_SUSPEND 미설정 → cost 폭발.
- **SELECT * on wide table**: columnar 의 이점 매 손실.
- **One huge warehouse**: workload isolation X — ETL 매 BI 매 contend.
- **No clustering on huge table**: prune 매 작동 X — full scan.
- **Copy data instead of Data Share**: governance · cost penalty.

## 🧪 검증 / 중복
- Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; *Snowflake: The Definitive Guide* 2nd ed).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full content (architecture + 9 patterns) |