id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id |
title |
category |
status |
canonical_id |
aliases |
duplicate_of |
source_trust_level |
confidence_score |
verification_status |
tags |
raw_sources |
last_reinforced |
github_commit |
tech_stack |
| wiki-2026-0508-snowflake-data-warehousing |
Snowflake Data Warehousing |
10_Wiki/Topics |
verified |
self |
| Snowflake |
| Snowflake DW |
| Snowflake Cloud Data Platform |
|
none |
A |
0.9 |
applied |
| database |
| data-warehouse |
| cloud |
| analytics |
|
|
2026-05-10 |
pending |
| language |
framework |
| sql |
snowflake |
|
Snowflake Data Warehousing
매 한 줄
"매 storage 매 separated · 매 compute 매 elastic". Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁.
매 핵심
매 Architecture (3 layers)
- Storage: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed.
- Compute (Virtual Warehouses): independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed.
- Cloud Services: metadata · query optimization · auth · 매 stateless brain.
매 Key features
- Zero-copy clone: instant DB/schema/table copy via metadata.
- Time Travel: query as of 90-day past (Enterprise: 90, default 1).
- Streams + Tasks: CDC + scheduled SQL = native pipeline.
- Snowpark: Python/Scala/Java in-DB compute.
- Iceberg tables (2026): external open-table format.
- Cortex AI: built-in LLM functions.
매 응용
- Analytical workloads (OLAP, BI).
- Data sharing (Secure Data Share — no copy).
- ELT with dbt.
- ML feature engineering (Snowpark + Cortex).
💻 패턴
Warehouse sizing & auto-suspend
Copy from S3 (bulk load)
Zero-copy clone for testing
Time travel + undrop
Streams + Tasks (CDC pipeline)
Snowpark Python (in-DB compute)
Cortex AI (LLM in SQL)
Iceberg external table (2026)
Cost optimization (Resource Monitor)
매 결정 기준
| 상황 |
Choice |
| BI / dashboards |
Snowflake + dbt |
| Open lakehouse |
Iceberg + Snowflake/Databricks |
| Spark-heavy ML |
Databricks |
| GCP-native |
BigQuery |
| Sub-second OLAP |
ClickHouse / Druid |
| Tiny data <100GB |
Postgres + DuckDB |
기본값: Snowflake + dbt + Iceberg (open + managed).
🔗 Graph
🤖 LLM 활용
언제: SQL tuning suggestion, dbt model scaffolding, Cortex function selection.
언제 X: production query 매 직접 실행 — 매 EXPLAIN + governance review.
❌ 안티패턴
- Always-on warehouse: AUTO_SUSPEND 미설정 → cost 폭발.
- SELECT * on wide table: columnar 의 이점 매 손실.
- One huge warehouse: workload isolation X — ETL 매 BI 매 contend.
- No clustering on huge table: prune 매 작동 X — full scan.
- Copy data instead of Data Share: governance · cost penalty.
🧪 검증 / 중복
- Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; Snowflake: The Definitive Guide 2nd ed).
- 신뢰도 A.
🕓 Changelog
| 날짜 |
변경 |
| 2026-05-08 |
Phase 1 |
| 2026-05-10 |
Manual cleanup — full content (architecture + 9 patterns) |