Files
2nd/10_Wiki/Topics/Computer_Science_and_Theory/Standard-Deviation-and-Variance.md
T
2026-05-10 22:08:15 +09:00

4.8 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-standard-deviation-and-variance Standard Deviation and Variance 10_Wiki/Topics verified self
SD
Variance
Sigma
none A 0.95 applied
statistics
variance
dispersion
foundations
2026-05-10 pending
language framework
python numpy-scipy

Standard Deviation and Variance

매 한 줄

"매 spread = E[(X-μ)²] 의 sqrt". Variance 의 expected squared deviation, SD 의 unit-scale spread. 1894 Pearson 이 명명 — 매 모든 statistics 의 가장 fundamental dispersion measure.

매 핵심

매 정의

  • Variance: σ² = E[(X μ)²] = E[X²] (E[X])².
  • Standard deviation: σ = √Var(X) — 매 same unit as X.
  • Sample variance: s² = Σ(xᵢ x̄)² / (n 1) — 매 Bessel correction (n1) 의 unbiased estimator.
  • Population variance: σ² = Σ(xᵢ μ)² / N.

매 property

  • Var(aX + b) = a²·Var(X) — 매 shift invariant, scale 의 square.
  • Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y) — independent 면 cov=0.
  • SD 의 robust X: 매 outlier 의 영향 의 큼 — robust alt: MAD (Median Absolute Deviation), IQR.
  • Chebyshev: P(|X−μ| ≥ kσ) ≤ 1/k² — 매 distribution-free.

매 응용

  1. Risk (finance) — volatility = SD of returns.
  2. Quality control — Six Sigma (process σ).
  3. Z-score normalization — (x μ) / σ.
  4. Confidence interval — μ ± z·(σ/√n).

💻 패턴

1. NumPy — sample vs population

import numpy as np
x = np.array([2, 4, 4, 4, 5, 5, 7, 9])
np.var(x)            # 매 population (ddof=0) → 4.0
np.var(x, ddof=1)    # 매 sample (Bessel) → 4.571
np.std(x, ddof=1)    # 매 2.138

2. Welford's online algorithm

# 매 streaming, numerically stable, single-pass
def welford_update(state, x):
    n, mean, M2 = state
    n += 1
    delta = x - mean
    mean += delta / n
    M2 += delta * (x - mean)
    return n, mean, M2

n, mean, M2 = 0, 0.0, 0.0
for x in stream:
    n, mean, M2 = welford_update((n, mean, M2), x)
var = M2 / (n - 1)

3. Z-score normalization

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # 매 column-wise (x  μ) / σ

4. Rolling SD (pandas)

import pandas as pd
returns = pd.Series(prices).pct_change()
volatility_30d = returns.rolling(30).std() * np.sqrt(252)  # annualized

5. Pooled variance (two groups)

def pooled_var(x1, x2):
    n1, n2 = len(x1), len(x2)
    s1, s2 = np.var(x1, ddof=1), np.var(x2, ddof=1)
    return ((n1-1)*s1 + (n2-1)*s2) / (n1 + n2 - 2)

6. Bias-variance decomposition

# E[(y - ŷ)²] = Bias² + Variance + σ²_noise
# 매 ML model selection 의 핵심

7. Robust alternatives — MAD

from scipy.stats import median_abs_deviation
mad = median_abs_deviation(x, scale="normal")  # ~1.4826 * raw MAD ≈ σ for Gaussian

8. Two-pass safe variance (numerically)

# Naive E[X²] - E[X]² → 매 catastrophic cancellation 의 위험 (large mean, small var)
# 매 2-pass: μ first, then Σ(x-μ)²
mean = x.sum() / n
var = ((x - mean) ** 2).sum() / (n - 1)

매 결정 기준

상황 Approach
Sample (inference) ddof=1 (n1 division)
Population (descriptive) ddof=0
Streaming / online Welford
Outlier-heavy data MAD, IQR (not SD)
Heavy-tail distribution Quantile-based (P95/P5)
Volatility (finance) Rolling SD × √(periods/year)

기본값: np.std(x, ddof=1) — 매 sample SD, 매 unbiased point estimator.

🔗 Graph

🤖 LLM 활용

언제: explanation of spread to non-technical, choosing appropriate measure given distribution. 언제 X: 매 numerical computation — numpy 의 사용.

안티패턴

  • ddof confusion: NumPy default ddof=0 (population) — 매 inference 시 ddof=1 의 명시.
  • SD on non-numeric / ordinal: 매 ordinal scale 의 SD 의 의미 X.
  • Reporting SD without n: 매 SE = σ/√n 의 더 informative.
  • Catastrophic cancellation: naive Σx² (Σx)²/n → use Welford or 2-pass.
  • SD assumes Gaussian-like: 매 power-law 의 SD 의 unstable, 매 quantile 의 사용.

🧪 검증 / 중복

  • Verified (Knuth TAOCP vol 2, Welford 1962, NIST/SEMATECH stats handbook).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — variance fundamentals, Welford, ddof, robust alt.