--- id: cs-time-series-algorithms title: Time-Series Algorithms — downsample / detect / forecast category: Coding status: draft source_trust_level: B verification_status: conceptual created_at: 2026-05-09 updated_at: 2026-05-09 tags: [cs, time-series, vibe-coding] tech_stack: { language: "TS / Python", applicable_to: ["Backend", "Data"] } applied_in: [] aliases: [time-series, downsample, LTTB, anomaly detection, forecast, Prophet, ARIMA] --- # Time-Series Algorithms > Metric / IoT / log 가 시간 차원. 핵심 — **downsample (그래프), aggregation (rollup), anomaly detection (alert), forecast (capacity)**. TimescaleDB / VictoriaMetrics / Prometheus. ## 📖 핵심 개념 - 시간 = 1차원 + value(s). - Equally-spaced (sample) vs irregular. - Aggregation (sum / avg / p99) over window. - Storage = downsample older data (1s → 1min → 1hr). ## 💻 코드 패턴 ### Downsample (LTTB — Largest Triangle Three Buckets) ```ts // 1M point → 1000 point UI graph 가 좋음 // Naive: every Nth — spike 잃음 // LTTB: 가장 "특징적" point 선택 function lttb(data: { x: number; y: number }[], threshold: number) { if (data.length <= threshold) return data; const bucketSize = (data.length - 2) / (threshold - 2); const out = [data[0]]; for (let i = 0; i < threshold - 2; i++) { const bucketStart = Math.floor((i + 1) * bucketSize) + 1; const bucketEnd = Math.floor((i + 2) * bucketSize) + 1; // 다음 bucket 평균 const avgX = data.slice(bucketEnd, bucketEnd + bucketSize).reduce(...) / bucketSize; const avgY = ...; // 가장 큰 삼각형 (현재 bucket) let maxArea = -1, maxIdx = bucketStart; for (let j = bucketStart; j < bucketEnd; j++) { const area = Math.abs( (out[out.length - 1].x - avgX) * (data[j].y - out[out.length - 1].y) - (out[out.length - 1].x - data[j].x) * (avgY - out[out.length - 1].y) ); if (area > maxArea) { maxArea = area; maxIdx = j; } } out.push(data[maxIdx]); } out.push(data[data.length - 1]); return out; } ``` → 큰 시리즈 → smooth + spike 보존. ### Time-bucketing (rollup) ```sql -- TimescaleDB SELECT time_bucket('1 minute', ts) AS bucket, AVG(value), MAX(value), MIN(value), COUNT(*) FROM metrics WHERE ts > NOW() - INTERVAL '1 hour' GROUP BY bucket ORDER BY bucket; ``` ```sql -- Postgres native SELECT date_trunc('minute', ts) AS bucket, AVG(value) FROM metrics GROUP BY bucket; ``` ### Continuous aggregate (TimescaleDB) ```sql CREATE MATERIALIZED VIEW metrics_1min WITH (timescaledb.continuous) AS SELECT time_bucket('1 minute', ts) AS bucket, AVG(value), COUNT(*) FROM metrics GROUP BY bucket; -- Auto refresh SELECT add_continuous_aggregate_policy('metrics_1min', start_offset => INTERVAL '1 hour', end_offset => INTERVAL '1 minute', schedule_interval => INTERVAL '1 minute' ); ``` → pre-aggregated. Query 빠름 + storage 절약. ### Retention / hot-cold ```sql -- 7일 후 1초 데이터 삭제 (1분 rollup 만 남김) SELECT add_retention_policy('metrics', INTERVAL '7 days'); -- 또는 압축 (Timescale) ALTER TABLE metrics SET (timescaledb.compress); SELECT add_compression_policy('metrics', INTERVAL '1 day'); ``` ### Moving average ```ts function sma(data: number[], window: number) { const out = []; let sum = 0; for (let i = 0; i < data.length; i++) { sum += data[i]; if (i >= window) sum -= data[i - window]; if (i >= window - 1) out.push(sum / window); } return out; } // EWMA (exponential weighted) function ewma(data: number[], alpha: number) { const out = [data[0]]; for (let i = 1; i < data.length; i++) { out.push(alpha * data[i] + (1 - alpha) * out[i - 1]); } return out; } ``` → Smoothing. EWMA = 최신 가중치. ### Anomaly detection (간단) ```ts // Z-score (정규분포 가정) function zScoreAnomalies(data: number[], threshold = 3) { const mean = data.reduce((a, b) => a + b) / data.length; const variance = data.reduce((a, b) => a + (b - mean) ** 2, 0) / data.length; const std = Math.sqrt(variance); return data.map((v, i) => ({ i, v, isAnomaly: Math.abs((v - mean) / std) > threshold })); } // Robust: median + MAD (Median Absolute Deviation) function madAnomalies(data: number[]) { const sorted = [...data].sort(); const median = sorted[Math.floor(sorted.length / 2)]; const mad = data.map(v => Math.abs(v - median)).sort()[Math.floor(data.length / 2)]; return data.map((v, i) => ({ i, v, isAnomaly: Math.abs(v - median) / (mad * 1.4826) > 3.5 })); } ``` → Z-score 가 outlier 에 약함. MAD 가 robust. ### Seasonality (요일 / 시간) ```python # Python — pandas import pandas as pd s = pd.Series(values, index=times) hourly = s.groupby(s.index.hour).mean() # Hour-of-day pattern dow = s.groupby(s.index.dayofweek).mean() # Day-of-week pattern ``` ### STL decomposition ```python from statsmodels.tsa.seasonal import STL stl = STL(s, period=24).fit() # 시간 단위 daily trend = stl.trend seasonal = stl.seasonal residual = stl.resid # Anomaly = residual 가 큼 ``` ### Forecast — Prophet (간단) ```python from prophet import Prophet df = pd.DataFrame({'ds': times, 'y': values}) m = Prophet(yearly_seasonality=True, daily_seasonality=True).fit(df) future = m.make_future_dataframe(periods=24, freq='H') forecast = m.predict(future) ``` → Facebook 의 라이브러리. Fortuna 자동 weekly + yearly + holiday. ### ARIMA (전통) ```python from statsmodels.tsa.arima.model import ARIMA model = ARIMA(s, order=(1, 1, 1)).fit() forecast = model.forecast(24) ``` → p, d, q tuning 필요. Prophet 가 더 simple. ### Holt-Winters (smoothing) ```python from statsmodels.tsa.holtwinters import ExponentialSmoothing m = ExponentialSmoothing(s, seasonal_periods=24, trend='add', seasonal='add').fit() forecast = m.forecast(24) ``` ### Prometheus PromQL ```promql # 5 분 rate rate(http_requests_total[5m]) # Quantile histogram_quantile(0.99, rate(http_request_duration_bucket[5m])) # 1 시간 평균 avg_over_time(cpu_usage[1h]) # Anomaly: 현재 가 7일 평균 보다 3 std 다름 abs(rate(traffic[5m]) - avg_over_time(rate(traffic[5m])[7d:1h])) > 3 * stddev_over_time(rate(traffic[5m])[7d:1h]) ``` ### Cardinality (중요) ``` Time-series DB 의 적: high cardinality. - (host, path, status, user_id) → user_id 가 수백만 = 폭발. → User_id 같은 거 metric 에 넣지 마라. Log 로. ``` ### Time-series storage 비교 ``` Prometheus: pull, K8s 친화, 단일 instance scaling 한계 VictoriaMetrics: Prom 호환, 더 efficient InfluxDB: push, SQL-like TimescaleDB: Postgres 기반, SQL ClickHouse: OLAP, 큰 cardinality OK Mimir / Cortex: Prom HA / multi-tenant ``` ### Window functions ```ts // Rolling window function rolling(data: T[], window: number, fn: (w: T[]) => T): T[] { const out = []; for (let i = 0; i < data.length; i++) { const w = data.slice(Math.max(0, i - window + 1), i + 1); out.push(fn(w)); } return out; } const p99 = rolling(values, 60, w => quantile(w, 0.99)); ``` ### Gap-filling ```sql -- TimescaleDB SELECT time_bucket_gapfill('1 minute', ts) AS bucket, COALESCE(AVG(value), 0) FROM metrics WHERE ts > NOW() - INTERVAL '1 hour' GROUP BY bucket; ``` → 비어있는 bucket 도 행 만듦. ### Real-time anomaly (streaming) ``` EWMA 업데이트 + threshold check. 또는 작은 window (1-5 min) z-score. 큰 시스템: 별 process / Flink job. ``` ## 🤔 의사결정 기준 | 작업 | 추천 | |---|---| | Metric storage | Prom / VictoriaMetrics / Timescale | | 큰 cardinality | ClickHouse | | Forecast | Prophet (simple), ARIMA (math) | | Anomaly | EWMA + z-score / MAD | | Graph downsample | LTTB | | Aggregate | Continuous aggregate / window | | Real-time | Flink / Materialize / Bytewax | ## ❌ 안티패턴 - **모든 raw data 영구**: storage 폭발. Downsample. - **High cardinality metric (user_id)**: TSDB 죽임. - **Naive downsample (every Nth)**: spike 잃음. LTTB. - **Z-score on non-Gaussian**: false positives. MAD. - **Seasonality 무시**: 요일 패턴 = "anomaly". - **Continuous aggregate 없음**: 매 query 가 raw. - **Gap fill 안 함**: 그래프 깨짐. ## 🤖 LLM 활용 힌트 - LTTB 가 graph downsample 표준. - Continuous aggregate / pre-roll 거의 항상. - Cardinality 주의 (TSDB 의 적). - Prophet 가 simple forecast. ## 🔗 관련 문서 - [[DB_Time_Series_Patterns]] - [[Observability_RED_USE_Metrics]] - [[CS_Cache_Eviction]]