--- id: wiki-2026-0508-snowflake-data-warehousing title: Snowflake Data Warehousing category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Snowflake, Snowflake DW, Snowflake Cloud Data Platform] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [database, data-warehouse, cloud, analytics] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: sql framework: snowflake --- # Snowflake Data Warehousing ## 매 한 줄 > **"매 storage 매 separated · 매 compute 매 elastic"**. Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁. ## 매 핵심 ### 매 Architecture (3 layers) - **Storage**: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed. - **Compute (Virtual Warehouses)**: independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed. - **Cloud Services**: metadata · query optimization · auth · 매 stateless brain. ### 매 Key features - **Zero-copy clone**: instant DB/schema/table copy via metadata. - **Time Travel**: query as of 90-day past (Enterprise: 90, default 1). - **Streams + Tasks**: CDC + scheduled SQL = native pipeline. - **Snowpark**: Python/Scala/Java in-DB compute. - **Iceberg tables (2026)**: external open-table format. - **Cortex AI**: built-in LLM functions. ### 매 응용 1. Analytical workloads (OLAP, BI). 2. Data sharing (Secure Data Share — no copy). 3. ELT with dbt. 4. ML feature engineering (Snowpark + Cortex). ## 💻 패턴 ### Warehouse sizing & auto-suspend ```sql CREATE WAREHOUSE etl_wh WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 4 SCALING_POLICY = 'STANDARD'; ``` ### Copy from S3 (bulk load) ```sql CREATE STAGE my_stage URL='s3://bucket/path/' STORAGE_INTEGRATION = my_int; COPY INTO orders FROM @my_stage/orders/ FILE_FORMAT = (TYPE = PARQUET) MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE ON_ERROR = 'CONTINUE'; ``` ### Zero-copy clone for testing ```sql -- 매 instant · 매 storage 추가 X (copy-on-write) CREATE DATABASE prod_clone CLONE prod; -- 매 dbt CI 매 패턴 ``` ### Time travel + undrop ```sql SELECT * FROM orders AT (OFFSET => -60*5); -- 5분 전 SELECT * FROM orders BEFORE (STATEMENT => '01a...'); UNDROP TABLE orders; -- 매 within retention ``` ### Streams + Tasks (CDC pipeline) ```sql CREATE STREAM orders_stream ON TABLE orders; CREATE TASK orders_etl WAREHOUSE = etl_wh SCHEDULE = '5 MINUTE' WHEN SYSTEM$STREAM_HAS_DATA('orders_stream') AS INSERT INTO orders_silver SELECT *, CURRENT_TIMESTAMP() AS ingest_ts FROM orders_stream; ALTER TASK orders_etl RESUME; ``` ### Snowpark Python (in-DB compute) ```python from snowflake.snowpark import Session, functions as F sess = Session.builder.configs(cfg).create() df = sess.table('orders') \ .filter(F.col('amount') > 100) \ .group_by('customer_id') \ .agg(F.sum('amount').alias('total')) df.write.save_as_table('top_customers', mode='overwrite') ``` ### Cortex AI (LLM in SQL) ```sql SELECT order_id, SNOWFLAKE.CORTEX.SUMMARIZE(review_text) AS summary, SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment FROM reviews; -- Free-text classify SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT( ticket_body, ['billing','technical','refund','other'] ) FROM tickets; ``` ### Iceberg external table (2026) ```sql CREATE ICEBERG TABLE events CATALOG = my_glue EXTERNAL_VOLUME = my_s3_vol CATALOG_TABLE_NAME = 'analytics.events'; -- 매 Snowflake/Spark/Trino 매 same data. ``` ### Cost optimization (Resource Monitor) ```sql CREATE RESOURCE MONITOR rm_dev WITH CREDIT_QUOTA = 100 TRIGGERS ON 80 PERCENT DO NOTIFY ON 100 PERCENT DO SUSPEND; ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = rm_dev; ``` ## 매 결정 기준 | 상황 | Choice | |---|---| | BI / dashboards | Snowflake + dbt | | Open lakehouse | Iceberg + Snowflake/Databricks | | Spark-heavy ML | Databricks | | GCP-native | BigQuery | | Sub-second OLAP | ClickHouse / Druid | | Tiny data <100GB | Postgres + DuckDB | **기본값**: Snowflake + dbt + Iceberg (open + managed). ## 🔗 Graph - 부모: [[Data Warehouse]] · [[Cloud Native]] - 변형: [[ClickHouse]] - 응용: [[Feature Store]] - Adjacent: [[Apache Iceberg]] · [[dbt]] · [[Principles of Data Connect]] ## 🤖 LLM 활용 **언제**: SQL tuning suggestion, dbt model scaffolding, Cortex function selection. **언제 X**: production query 매 직접 실행 — 매 EXPLAIN + governance review. ## ❌ 안티패턴 - **Always-on warehouse**: AUTO_SUSPEND 미설정 → cost 폭발. - **SELECT * on wide table**: columnar 의 이점 매 손실. - **One huge warehouse**: workload isolation X — ETL 매 BI 매 contend. - **No clustering on huge table**: prune 매 작동 X — full scan. - **Copy data instead of Data Share**: governance · cost penalty. ## 🧪 검증 / 중복 - Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; *Snowflake: The Definitive Guide* 2nd ed). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full content (architecture + 9 patterns) |