Files
2nd/10_Wiki/Topics/Coding/CS_Time_Series_Algorithms.md
T
2026-05-09 22:47:42 +09:00

8.3 KiB

id, title, category, status, source_trust_level, verification_status, created_at, updated_at, tags, tech_stack, applied_in, aliases
id title category status source_trust_level verification_status created_at updated_at tags tech_stack applied_in aliases
cs-time-series-algorithms Time-Series Algorithms — downsample / detect / forecast Coding draft B conceptual 2026-05-09 2026-05-09
cs
time-series
vibe-coding
language applicable_to
TS / Python
Backend
Data
time-series
downsample
LTTB
anomaly detection
forecast
Prophet
ARIMA

Time-Series Algorithms

Metric / IoT / log 가 시간 차원. 핵심 — downsample (그래프), aggregation (rollup), anomaly detection (alert), forecast (capacity). TimescaleDB / VictoriaMetrics / Prometheus.

📖 핵심 개념

  • 시간 = 1차원 + value(s).
  • Equally-spaced (sample) vs irregular.
  • Aggregation (sum / avg / p99) over window.
  • Storage = downsample older data (1s → 1min → 1hr).

💻 코드 패턴

Downsample (LTTB — Largest Triangle Three Buckets)

// 1M point → 1000 point UI graph 가 좋음
// Naive: every Nth — spike 잃음
// LTTB: 가장 "특징적" point 선택

function lttb(data: { x: number; y: number }[], threshold: number) {
  if (data.length <= threshold) return data;
  
  const bucketSize = (data.length - 2) / (threshold - 2);
  const out = [data[0]];
  
  for (let i = 0; i < threshold - 2; i++) {
    const bucketStart = Math.floor((i + 1) * bucketSize) + 1;
    const bucketEnd = Math.floor((i + 2) * bucketSize) + 1;
    
    // 다음 bucket 평균
    const avgX = data.slice(bucketEnd, bucketEnd + bucketSize).reduce(...) / bucketSize;
    const avgY = ...;
    
    // 가장 큰 삼각형 (현재 bucket)
    let maxArea = -1, maxIdx = bucketStart;
    for (let j = bucketStart; j < bucketEnd; j++) {
      const area = Math.abs(
        (out[out.length - 1].x - avgX) * (data[j].y - out[out.length - 1].y)
        - (out[out.length - 1].x - data[j].x) * (avgY - out[out.length - 1].y)
      );
      if (area > maxArea) { maxArea = area; maxIdx = j; }
    }
    out.push(data[maxIdx]);
  }
  out.push(data[data.length - 1]);
  return out;
}

→ 큰 시리즈 → smooth + spike 보존.

Time-bucketing (rollup)

-- TimescaleDB
SELECT
  time_bucket('1 minute', ts) AS bucket,
  AVG(value), MAX(value), MIN(value), COUNT(*)
FROM metrics
WHERE ts > NOW() - INTERVAL '1 hour'
GROUP BY bucket
ORDER BY bucket;
-- Postgres native
SELECT
  date_trunc('minute', ts) AS bucket,
  AVG(value)
FROM metrics
GROUP BY bucket;

Continuous aggregate (TimescaleDB)

CREATE MATERIALIZED VIEW metrics_1min
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 minute', ts) AS bucket,
  AVG(value), COUNT(*)
FROM metrics
GROUP BY bucket;

-- Auto refresh
SELECT add_continuous_aggregate_policy('metrics_1min',
  start_offset => INTERVAL '1 hour',
  end_offset   => INTERVAL '1 minute',
  schedule_interval => INTERVAL '1 minute'
);

→ pre-aggregated. Query 빠름 + storage 절약.

Retention / hot-cold

-- 7일 후 1초 데이터 삭제 (1분 rollup 만 남김)
SELECT add_retention_policy('metrics', INTERVAL '7 days');

-- 또는 압축 (Timescale)
ALTER TABLE metrics SET (timescaledb.compress);
SELECT add_compression_policy('metrics', INTERVAL '1 day');

Moving average

function sma(data: number[], window: number) {
  const out = [];
  let sum = 0;
  for (let i = 0; i < data.length; i++) {
    sum += data[i];
    if (i >= window) sum -= data[i - window];
    if (i >= window - 1) out.push(sum / window);
  }
  return out;
}

// EWMA (exponential weighted)
function ewma(data: number[], alpha: number) {
  const out = [data[0]];
  for (let i = 1; i < data.length; i++) {
    out.push(alpha * data[i] + (1 - alpha) * out[i - 1]);
  }
  return out;
}

→ Smoothing. EWMA = 최신 가중치.

Anomaly detection (간단)

// Z-score (정규분포 가정)
function zScoreAnomalies(data: number[], threshold = 3) {
  const mean = data.reduce((a, b) => a + b) / data.length;
  const variance = data.reduce((a, b) => a + (b - mean) ** 2, 0) / data.length;
  const std = Math.sqrt(variance);
  return data.map((v, i) => ({ i, v, isAnomaly: Math.abs((v - mean) / std) > threshold }));
}

// Robust: median + MAD (Median Absolute Deviation)
function madAnomalies(data: number[]) {
  const sorted = [...data].sort();
  const median = sorted[Math.floor(sorted.length / 2)];
  const mad = data.map(v => Math.abs(v - median)).sort()[Math.floor(data.length / 2)];
  return data.map((v, i) => ({ i, v, isAnomaly: Math.abs(v - median) / (mad * 1.4826) > 3.5 }));
}

→ Z-score 가 outlier 에 약함. MAD 가 robust.

Seasonality (요일 / 시간)

# Python — pandas
import pandas as pd

s = pd.Series(values, index=times)
hourly = s.groupby(s.index.hour).mean()
# Hour-of-day pattern

dow = s.groupby(s.index.dayofweek).mean()
# Day-of-week pattern

STL decomposition

from statsmodels.tsa.seasonal import STL

stl = STL(s, period=24).fit()  # 시간 단위 daily
trend = stl.trend
seasonal = stl.seasonal
residual = stl.resid

# Anomaly = residual 가 큼

Forecast — Prophet (간단)

from prophet import Prophet

df = pd.DataFrame({'ds': times, 'y': values})
m = Prophet(yearly_seasonality=True, daily_seasonality=True).fit(df)
future = m.make_future_dataframe(periods=24, freq='H')
forecast = m.predict(future)

→ Facebook 의 라이브러리. Fortuna 자동 weekly + yearly + holiday.

ARIMA (전통)

from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(s, order=(1, 1, 1)).fit()
forecast = model.forecast(24)

→ p, d, q tuning 필요. Prophet 가 더 simple.

Holt-Winters (smoothing)

from statsmodels.tsa.holtwinters import ExponentialSmoothing

m = ExponentialSmoothing(s, seasonal_periods=24, trend='add', seasonal='add').fit()
forecast = m.forecast(24)

Prometheus PromQL

# 5 분 rate
rate(http_requests_total[5m])

# Quantile
histogram_quantile(0.99, rate(http_request_duration_bucket[5m]))

# 1 시간 평균
avg_over_time(cpu_usage[1h])

# Anomaly: 현재 가 7일 평균 보다 3 std 다름
abs(rate(traffic[5m]) - avg_over_time(rate(traffic[5m])[7d:1h]))
  > 3 * stddev_over_time(rate(traffic[5m])[7d:1h])

Cardinality (중요)

Time-series DB 의 적: high cardinality.
- (host, path, status, user_id) → user_id 가 수백만 = 폭발.

→ User_id 같은 거 metric 에 넣지 마라. Log 로.

Time-series storage 비교

Prometheus:    pull, K8s 친화, 단일 instance scaling 한계
VictoriaMetrics: Prom 호환, 더 efficient
InfluxDB:      push, SQL-like
TimescaleDB:   Postgres 기반, SQL
ClickHouse:    OLAP, 큰 cardinality OK
Mimir / Cortex: Prom HA / multi-tenant

Window functions

// Rolling window
function rolling<T>(data: T[], window: number, fn: (w: T[]) => T): T[] {
  const out = [];
  for (let i = 0; i < data.length; i++) {
    const w = data.slice(Math.max(0, i - window + 1), i + 1);
    out.push(fn(w));
  }
  return out;
}

const p99 = rolling(values, 60, w => quantile(w, 0.99));

Gap-filling

-- TimescaleDB
SELECT
  time_bucket_gapfill('1 minute', ts) AS bucket,
  COALESCE(AVG(value), 0)
FROM metrics
WHERE ts > NOW() - INTERVAL '1 hour'
GROUP BY bucket;

→ 비어있는 bucket 도 행 만듦.

Real-time anomaly (streaming)

EWMA 업데이트 + threshold check.
또는 작은 window (1-5 min) z-score.

큰 시스템: 별 process / Flink job.

🤔 의사결정 기준

작업 추천
Metric storage Prom / VictoriaMetrics / Timescale
큰 cardinality ClickHouse
Forecast Prophet (simple), ARIMA (math)
Anomaly EWMA + z-score / MAD
Graph downsample LTTB
Aggregate Continuous aggregate / window
Real-time Flink / Materialize / Bytewax

안티패턴

  • 모든 raw data 영구: storage 폭발. Downsample.
  • High cardinality metric (user_id): TSDB 죽임.
  • Naive downsample (every Nth): spike 잃음. LTTB.
  • Z-score on non-Gaussian: false positives. MAD.
  • Seasonality 무시: 요일 패턴 = "anomaly".
  • Continuous aggregate 없음: 매 query 가 raw.
  • Gap fill 안 함: 그래프 깨짐.

🤖 LLM 활용 힌트

  • LTTB 가 graph downsample 표준.
  • Continuous aggregate / pre-roll 거의 항상.
  • Cardinality 주의 (TSDB 의 적).
  • Prophet 가 simple forecast.

🔗 관련 문서