[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,24 +2,179 @@
 id: wiki-2026-0508-snowflake-data-warehousing
 title: Snowflake Data Warehousing
 category: 10_Wiki/Topics
-status: merged
-redirect_to: 데이터_엔지니어링_및_가상_인프라_표준
-canonical_id: wiki-2026-0508-001
-aliases: []
+status: verified
+canonical_id: self
+aliases: [Snowflake, Snowflake DW, Snowflake Cloud Data Platform]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.92
-tags: [uncategorized]
+confidence_score: 0.9
+verification_status: applied
+tags: [database, data-warehouse, cloud, analytics]
 raw_sources: []
-last_reinforced: 2026-05-08
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: sql
+  framework: snowflake
 ---

-# Redirect
+# Snowflake Data Warehousing

-이 문서는 Canonical 문서인 [[데이터_엔지니어링_및_가상_인프라_표준]]으로 통합되었습니다.
-모든 최신 지식과 세부 내용은 위 링크를 참조하십시오.
+## 매 한 줄
+> **"매 storage 매 separated · 매 compute 매 elastic"**. Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁.
+
+## 매 핵심
+
+### 매 Architecture (3 layers)
+- **Storage**: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed.
+- **Compute (Virtual Warehouses)**: independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed.
+- **Cloud Services**: metadata · query optimization · auth · 매 stateless brain.
+
+### 매 Key features
+- **Zero-copy clone**: instant DB/schema/table copy via metadata.
+- **Time Travel**: query as of 90-day past (Enterprise: 90, default 1).
+- **Streams + Tasks**: CDC + scheduled SQL = native pipeline.
+- **Snowpark**: Python/Scala/Java in-DB compute.
+- **Iceberg tables (2026)**: external open-table format.
+- **Cortex AI**: built-in LLM functions.
+
+### 매 응용
+1. Analytical workloads (OLAP, BI).
+2. Data sharing (Secure Data Share — no copy).
+3. ELT with dbt.
+4. ML feature engineering (Snowpark + Cortex).
+
+## 💻 패턴
+
+### Warehouse sizing & auto-suspend
+```sql
+CREATE WAREHOUSE etl_wh
+  WAREHOUSE_SIZE = 'MEDIUM'
+  AUTO_SUSPEND = 60
+  AUTO_RESUME = TRUE
+  MIN_CLUSTER_COUNT = 1
+  MAX_CLUSTER_COUNT = 4
+  SCALING_POLICY = 'STANDARD';
+```
+
+### Copy from S3 (bulk load)
+```sql
+CREATE STAGE my_stage URL='s3://bucket/path/' STORAGE_INTEGRATION = my_int;
+COPY INTO orders
+  FROM @my_stage/orders/
+  FILE_FORMAT = (TYPE = PARQUET)
+  MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
+  ON_ERROR = 'CONTINUE';
+```
+
+### Zero-copy clone for testing
+```sql
+-- 매 instant · 매 storage 추가 X (copy-on-write)
+CREATE DATABASE prod_clone CLONE prod;
+-- 매 dbt CI 매 패턴
+```
+
+### Time travel + undrop
+```sql
+SELECT * FROM orders AT (OFFSET => -60*5);          -- 5분 전
+SELECT * FROM orders BEFORE (STATEMENT => '01a...');
+UNDROP TABLE orders;                                 -- 매 within retention
+```
+
+### Streams + Tasks (CDC pipeline)
+```sql
+CREATE STREAM orders_stream ON TABLE orders;
+CREATE TASK orders_etl
+  WAREHOUSE = etl_wh
+  SCHEDULE = '5 MINUTE'
+  WHEN SYSTEM$STREAM_HAS_DATA('orders_stream')
+AS
+  INSERT INTO orders_silver
+  SELECT *, CURRENT_TIMESTAMP() AS ingest_ts
+  FROM orders_stream;
+ALTER TASK orders_etl RESUME;
+```
+
+### Snowpark Python (in-DB compute)
+```python
+from snowflake.snowpark import Session, functions as F
+
+sess = Session.builder.configs(cfg).create()
+df = sess.table('orders') \
+  .filter(F.col('amount') > 100) \
+  .group_by('customer_id') \
+  .agg(F.sum('amount').alias('total'))
+df.write.save_as_table('top_customers', mode='overwrite')
+```
+
+### Cortex AI (LLM in SQL)
+```sql
+SELECT order_id,
+  SNOWFLAKE.CORTEX.SUMMARIZE(review_text) AS summary,
+  SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment
+FROM reviews;
+
+-- Free-text classify
+SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
+  ticket_body,
+  ['billing','technical','refund','other']
+) FROM tickets;
+```
+
+### Iceberg external table (2026)
+```sql
+CREATE ICEBERG TABLE events
+  CATALOG = my_glue
+  EXTERNAL_VOLUME = my_s3_vol
+  CATALOG_TABLE_NAME = 'analytics.events';
+-- 매 Snowflake/Spark/Trino 매 same data.
+```
+
+### Cost optimization (Resource Monitor)
+```sql
+CREATE RESOURCE MONITOR rm_dev
+  WITH CREDIT_QUOTA = 100
+  TRIGGERS
+    ON 80 PERCENT DO NOTIFY
+    ON 100 PERCENT DO SUSPEND;
+ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = rm_dev;
+```
+
+## 매 결정 기준
+| 상황 | Choice |
+|---|---|
+| BI / dashboards | Snowflake + dbt |
+| Open lakehouse | Iceberg + Snowflake/Databricks |
+| Spark-heavy ML | Databricks |
+| GCP-native | BigQuery |
+| Sub-second OLAP | ClickHouse / Druid |
+| Tiny data <100GB | Postgres + DuckDB |
+
+**기본값**: Snowflake + dbt + Iceberg (open + managed).
+
+## 🔗 Graph
+- 부모: [[Data Warehouse]] · [[Cloud Native]]
+- 변형: [[BigQuery]] · [[Databricks]] · [[Redshift]] · [[ClickHouse]]
+- 응용: [[ELT Pattern]] · [[Data Sharing]] · [[Feature Store]]
+- Adjacent: [[Apache Iceberg]] · [[dbt]] · [[Snowpark]] · [[Principles of Data Connect]]
+
+## 🤖 LLM 활용
+**언제**: SQL tuning suggestion, dbt model scaffolding, Cortex function selection.
+**언제 X**: production query 매 직접 실행 — 매 EXPLAIN + governance review.
+
+## ❌ 안티패턴
+- **Always-on warehouse**: AUTO_SUSPEND 미설정 → cost 폭발.
+- **SELECT * on wide table**: columnar 의 이점 매 손실.
+- **One huge warehouse**: workload isolation X — ETL 매 BI 매 contend.
+- **No clustering on huge table**: prune 매 작동 X — full scan.
+- **Copy data instead of Data Share**: governance · cost penalty.
+
+## 🧪 검증 / 중복
+- Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; *Snowflake: The Definitive Guide* 2nd ed).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — full content (architecture + 9 patterns) |