Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

5.3 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Snowflake Data Warehousing

매 한 줄

"매 storage 매 separated · 매 compute 매 elastic". Snowflake는 매 multi-cluster shared-data architecture 의 cloud DW — micro-partition columnar storage · virtual warehouse · zero-copy clone · time travel · Iceberg 매 native(2026). Databricks · BigQuery · Redshift 매 big-3 경쟁.

매 핵심

매 Architecture (3 layers)

Storage: S3/GCS/Blob 매 micro-partitions(50–500MB), columnar(FDN). 매 compressed.
Compute (Virtual Warehouses): independent compute clusters, X-Small ~ 6X-Large. 매 per-second billed.
Cloud Services: metadata · query optimization · auth · 매 stateless brain.

매 Key features

Zero-copy clone: instant DB/schema/table copy via metadata.
Time Travel: query as of 90-day past (Enterprise: 90, default 1).
Streams + Tasks: CDC + scheduled SQL = native pipeline.
Snowpark: Python/Scala/Java in-DB compute.
Iceberg tables (2026): external open-table format.
Cortex AI: built-in LLM functions.

매 응용

Analytical workloads (OLAP, BI).
Data sharing (Secure Data Share — no copy).
ELT with dbt.
ML feature engineering (Snowpark + Cortex).

💻 패턴

Warehouse sizing & auto-suspend

CREATE WAREHOUSE etl_wh
  WAREHOUSE_SIZE = 'MEDIUM'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 4
  SCALING_POLICY = 'STANDARD';

Copy from S3 (bulk load)

CREATE STAGE my_stage URL='s3://bucket/path/' STORAGE_INTEGRATION = my_int;
COPY INTO orders
  FROM @my_stage/orders/
  FILE_FORMAT = (TYPE = PARQUET)
  MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
  ON_ERROR = 'CONTINUE';

Zero-copy clone for testing

-- 매 instant · 매 storage 추가 X (copy-on-write)
CREATE DATABASE prod_clone CLONE prod;
-- 매 dbt CI 매 패턴

Time travel + undrop

SELECT * FROM orders AT (OFFSET => -60*5);          -- 5분 전
SELECT * FROM orders BEFORE (STATEMENT => '01a...');
UNDROP TABLE orders;                                 -- 매 within retention

Streams + Tasks (CDC pipeline)

CREATE STREAM orders_stream ON TABLE orders;
CREATE TASK orders_etl
  WAREHOUSE = etl_wh
  SCHEDULE = '5 MINUTE'
  WHEN SYSTEM$STREAM_HAS_DATA('orders_stream')
AS
  INSERT INTO orders_silver
  SELECT *, CURRENT_TIMESTAMP() AS ingest_ts
  FROM orders_stream;
ALTER TASK orders_etl RESUME;

Snowpark Python (in-DB compute)

from snowflake.snowpark import Session, functions as F

sess = Session.builder.configs(cfg).create()
df = sess.table('orders') \
  .filter(F.col('amount') > 100) \
  .group_by('customer_id') \
  .agg(F.sum('amount').alias('total'))
df.write.save_as_table('top_customers', mode='overwrite')

Cortex AI (LLM in SQL)

SELECT order_id,
  SNOWFLAKE.CORTEX.SUMMARIZE(review_text) AS summary,
  SNOWFLAKE.CORTEX.SENTIMENT(review_text) AS sentiment
FROM reviews;

-- Free-text classify
SELECT SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
  ticket_body,
  ['billing','technical','refund','other']
) FROM tickets;

Iceberg external table (2026)

CREATE ICEBERG TABLE events
  CATALOG = my_glue
  EXTERNAL_VOLUME = my_s3_vol
  CATALOG_TABLE_NAME = 'analytics.events';
-- 매 Snowflake/Spark/Trino 매 same data.

Cost optimization (Resource Monitor)

CREATE RESOURCE MONITOR rm_dev
  WITH CREDIT_QUOTA = 100
  TRIGGERS
    ON 80 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;
ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = rm_dev;

매 결정 기준

상황	Choice
BI / dashboards	Snowflake + dbt
Open lakehouse	Iceberg + Snowflake/Databricks
Spark-heavy ML	Databricks
GCP-native	BigQuery
Sub-second OLAP	ClickHouse / Druid
Tiny data <100GB	Postgres + DuckDB

기본값: Snowflake + dbt + Iceberg (open + managed).

🔗 Graph

부모: Data Warehouse · Cloud Native
변형: ClickHouse
응용: Feature Store
Adjacent: Apache Iceberg · dbt · Principles of Data Connect

🤖 LLM 활용

언제: SQL tuning suggestion, dbt model scaffolding, Cortex function selection. 언제 X: production query 매 직접 실행 — 매 EXPLAIN + governance review.

❌ 안티패턴

Always-on warehouse: AUTO_SUSPEND 미설정 → cost 폭발.
SELECT * on wide table: columnar 의 이점 매 손실.
One huge warehouse: workload isolation X — ETL 매 BI 매 contend.
No clustering on huge table: prune 매 작동 X — full scan.
Copy data instead of Data Share: governance · cost penalty.

🧪 검증 / 중복

Verified (Snowflake docs 2026; Dageville et al. SIGMOD 2016; Snowflake: The Definitive Guide 2nd ed).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — full content (architecture + 9 patterns)

5.3 KiB Raw Blame History Unescape Escape