[G1-Sync] Manual knowledge update

2026-05-09 22:47:42 +09:00
parent 93ec7e9056
commit 21ac3ed255
56 changed files with 22043 additions and 43 deletions
@@ -0,0 +1,317 @@
+---
+id: cs-time-series-algorithms
+title: Time-Series Algorithms — downsample / detect / forecast
+category: Coding
+status: draft
+source_trust_level: B
+verification_status: conceptual
+created_at: 2026-05-09
+updated_at: 2026-05-09
+tags: [cs, time-series, vibe-coding]
+tech_stack: { language: "TS / Python", applicable_to: ["Backend", "Data"] }
+applied_in: []
+aliases: [time-series, downsample, LTTB, anomaly detection, forecast, Prophet, ARIMA]
+---
+
+# Time-Series Algorithms
+
+> Metric / IoT / log 가 시간 차원. 핵심 — **downsample (그래프), aggregation (rollup), anomaly detection (alert), forecast (capacity)**. TimescaleDB / VictoriaMetrics / Prometheus.
+
+## 📖 핵심 개념
+- 시간 = 1차원 + value(s).
+- Equally-spaced (sample) vs irregular.
+- Aggregation (sum / avg / p99) over window.
+- Storage = downsample older data (1s → 1min → 1hr).
+
+## 💻 코드 패턴
+
+### Downsample (LTTB — Largest Triangle Three Buckets)
+```ts
+// 1M point → 1000 point UI graph 가 좋음
+// Naive: every Nth — spike 잃음
+// LTTB: 가장 "특징적" point 선택
+
+function lttb(data: { x: number; y: number }[], threshold: number) {
+  if (data.length <= threshold) return data;
+  
+  const bucketSize = (data.length - 2) / (threshold - 2);
+  const out = [data[0]];
+  
+  for (let i = 0; i < threshold - 2; i++) {
+    const bucketStart = Math.floor((i + 1) * bucketSize) + 1;
+    const bucketEnd = Math.floor((i + 2) * bucketSize) + 1;
+    
+    // 다음 bucket 평균
+    const avgX = data.slice(bucketEnd, bucketEnd + bucketSize).reduce(...) / bucketSize;
+    const avgY = ...;
+    
+    // 가장 큰 삼각형 (현재 bucket)
+    let maxArea = -1, maxIdx = bucketStart;
+    for (let j = bucketStart; j < bucketEnd; j++) {
+      const area = Math.abs(
+        (out[out.length - 1].x - avgX) * (data[j].y - out[out.length - 1].y)
+        - (out[out.length - 1].x - data[j].x) * (avgY - out[out.length - 1].y)
+      );
+      if (area > maxArea) { maxArea = area; maxIdx = j; }
+    }
+    out.push(data[maxIdx]);
+  }
+  out.push(data[data.length - 1]);
+  return out;
+}
+```
+
+→ 큰 시리즈 → smooth + spike 보존.
+
+### Time-bucketing (rollup)
+```sql
+-- TimescaleDB
+SELECT
+  time_bucket('1 minute', ts) AS bucket,
+  AVG(value), MAX(value), MIN(value), COUNT(*)
+FROM metrics
+WHERE ts > NOW() - INTERVAL '1 hour'
+GROUP BY bucket
+ORDER BY bucket;
+```
+
+```sql
+-- Postgres native
+SELECT
+  date_trunc('minute', ts) AS bucket,
+  AVG(value)
+FROM metrics
+GROUP BY bucket;
+```
+
+### Continuous aggregate (TimescaleDB)
+```sql
+CREATE MATERIALIZED VIEW metrics_1min
+WITH (timescaledb.continuous) AS
+SELECT
+  time_bucket('1 minute', ts) AS bucket,
+  AVG(value), COUNT(*)
+FROM metrics
+GROUP BY bucket;
+
+-- Auto refresh
+SELECT add_continuous_aggregate_policy('metrics_1min',
+  start_offset => INTERVAL '1 hour',
+  end_offset   => INTERVAL '1 minute',
+  schedule_interval => INTERVAL '1 minute'
+);
+```
+
+→ pre-aggregated. Query 빠름 + storage 절약.
+
+### Retention / hot-cold
+```sql
+-- 7일 후 1초 데이터 삭제 (1분 rollup 만 남김)
+SELECT add_retention_policy('metrics', INTERVAL '7 days');
+
+-- 또는 압축 (Timescale)
+ALTER TABLE metrics SET (timescaledb.compress);
+SELECT add_compression_policy('metrics', INTERVAL '1 day');
+```
+
+### Moving average
+```ts
+function sma(data: number[], window: number) {
+  const out = [];
+  let sum = 0;
+  for (let i = 0; i < data.length; i++) {
+    sum += data[i];
+    if (i >= window) sum -= data[i - window];
+    if (i >= window - 1) out.push(sum / window);
+  }
+  return out;
+}
+
+// EWMA (exponential weighted)
+function ewma(data: number[], alpha: number) {
+  const out = [data[0]];
+  for (let i = 1; i < data.length; i++) {
+    out.push(alpha * data[i] + (1 - alpha) * out[i - 1]);
+  }
+  return out;
+}
+```
+
+→ Smoothing. EWMA = 최신 가중치.
+
+### Anomaly detection (간단)
+```ts
+// Z-score (정규분포 가정)
+function zScoreAnomalies(data: number[], threshold = 3) {
+  const mean = data.reduce((a, b) => a + b) / data.length;
+  const variance = data.reduce((a, b) => a + (b - mean) ** 2, 0) / data.length;
+  const std = Math.sqrt(variance);
+  return data.map((v, i) => ({ i, v, isAnomaly: Math.abs((v - mean) / std) > threshold }));
+}
+
+// Robust: median + MAD (Median Absolute Deviation)
+function madAnomalies(data: number[]) {
+  const sorted = [...data].sort();
+  const median = sorted[Math.floor(sorted.length / 2)];
+  const mad = data.map(v => Math.abs(v - median)).sort()[Math.floor(data.length / 2)];
+  return data.map((v, i) => ({ i, v, isAnomaly: Math.abs(v - median) / (mad * 1.4826) > 3.5 }));
+}
+```
+
+→ Z-score 가 outlier 에 약함. MAD 가 robust.
+
+### Seasonality (요일 / 시간)
+```python
+# Python — pandas
+import pandas as pd
+
+s = pd.Series(values, index=times)
+hourly = s.groupby(s.index.hour).mean()
+# Hour-of-day pattern
+
+dow = s.groupby(s.index.dayofweek).mean()
+# Day-of-week pattern
+```
+
+### STL decomposition
+```python
+from statsmodels.tsa.seasonal import STL
+
+stl = STL(s, period=24).fit()  # 시간 단위 daily
+trend = stl.trend
+seasonal = stl.seasonal
+residual = stl.resid
+
+# Anomaly = residual 가 큼
+```
+
+### Forecast — Prophet (간단)
+```python
+from prophet import Prophet
+
+df = pd.DataFrame({'ds': times, 'y': values})
+m = Prophet(yearly_seasonality=True, daily_seasonality=True).fit(df)
+future = m.make_future_dataframe(periods=24, freq='H')
+forecast = m.predict(future)
+```
+
+→ Facebook 의 라이브러리. Fortuna 자동 weekly + yearly + holiday.
+
+### ARIMA (전통)
+```python
+from statsmodels.tsa.arima.model import ARIMA
+
+model = ARIMA(s, order=(1, 1, 1)).fit()
+forecast = model.forecast(24)
+```
+
+→ p, d, q tuning 필요. Prophet 가 더 simple.
+
+### Holt-Winters (smoothing)
+```python
+from statsmodels.tsa.holtwinters import ExponentialSmoothing
+
+m = ExponentialSmoothing(s, seasonal_periods=24, trend='add', seasonal='add').fit()
+forecast = m.forecast(24)
+```
+
+### Prometheus PromQL
+```promql
+# 5 분 rate
+rate(http_requests_total[5m])
+
+# Quantile
+histogram_quantile(0.99, rate(http_request_duration_bucket[5m]))
+
+# 1 시간 평균
+avg_over_time(cpu_usage[1h])
+
+# Anomaly: 현재 가 7일 평균 보다 3 std 다름
+abs(rate(traffic[5m]) - avg_over_time(rate(traffic[5m])[7d:1h]))
+  > 3 * stddev_over_time(rate(traffic[5m])[7d:1h])
+```
+
+### Cardinality (중요)
+```
+Time-series DB 의 적: high cardinality.
+- (host, path, status, user_id) → user_id 가 수백만 = 폭발.
+
+→ User_id 같은 거 metric 에 넣지 마라. Log 로.
+```
+
+### Time-series storage 비교
+```
+Prometheus:    pull, K8s 친화, 단일 instance scaling 한계
+VictoriaMetrics: Prom 호환, 더 efficient
+InfluxDB:      push, SQL-like
+TimescaleDB:   Postgres 기반, SQL
+ClickHouse:    OLAP, 큰 cardinality OK
+Mimir / Cortex: Prom HA / multi-tenant
+```
+
+### Window functions
+```ts
+// Rolling window
+function rolling<T>(data: T[], window: number, fn: (w: T[]) => T): T[] {
+  const out = [];
+  for (let i = 0; i < data.length; i++) {
+    const w = data.slice(Math.max(0, i - window + 1), i + 1);
+    out.push(fn(w));
+  }
+  return out;
+}
+
+const p99 = rolling(values, 60, w => quantile(w, 0.99));
+```
+
+### Gap-filling
+```sql
+-- TimescaleDB
+SELECT
+  time_bucket_gapfill('1 minute', ts) AS bucket,
+  COALESCE(AVG(value), 0)
+FROM metrics
+WHERE ts > NOW() - INTERVAL '1 hour'
+GROUP BY bucket;
+```
+
+→ 비어있는 bucket 도 행 만듦.
+
+### Real-time anomaly (streaming)
+```
+EWMA 업데이트 + threshold check.
+또는 작은 window (1-5 min) z-score.
+
+큰 시스템: 별 process / Flink job.
+```
+
+## 🤔 의사결정 기준
+| 작업 | 추천 |
+|---|---|
+| Metric storage | Prom / VictoriaMetrics / Timescale |
+| 큰 cardinality | ClickHouse |
+| Forecast | Prophet (simple), ARIMA (math) |
+| Anomaly | EWMA + z-score / MAD |
+| Graph downsample | LTTB |
+| Aggregate | Continuous aggregate / window |
+| Real-time | Flink / Materialize / Bytewax |
+
+## ❌ 안티패턴
+- **모든 raw data 영구**: storage 폭발. Downsample.
+- **High cardinality metric (user_id)**: TSDB 죽임.
+- **Naive downsample (every Nth)**: spike 잃음. LTTB.
+- **Z-score on non-Gaussian**: false positives. MAD.
+- **Seasonality 무시**: 요일 패턴 = "anomaly".
+- **Continuous aggregate 없음**: 매 query 가 raw.
+- **Gap fill 안 함**: 그래프 깨짐.
+
+## 🤖 LLM 활용 힌트
+- LTTB 가 graph downsample 표준.
+- Continuous aggregate / pre-roll 거의 항상.
+- Cardinality 주의 (TSDB 의 적).
+- Prophet 가 simple forecast.
+
+## 🔗 관련 문서
+- [[DB_Time_Series_Patterns]]
+- [[Observability_RED_USE_Metrics]]
+- [[CS_Cache_Eviction]]