--- id: wiki-2026-0508-standard-deviation-and-variance title: Standard Deviation and Variance category: 10_Wiki/Topics status: verified canonical_id: self aliases: [SD, Variance, Sigma] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [statistics, variance, dispersion, foundations] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: numpy-scipy --- # Standard Deviation and Variance ## 매 한 줄 > **"매 spread = E[(X-μ)²] 의 sqrt"**. Variance 의 expected squared deviation, SD 의 unit-scale spread. 1894 Pearson 이 명명 — 매 모든 statistics 의 가장 fundamental dispersion measure. ## 매 핵심 ### 매 정의 - **Variance**: σ² = E[(X − μ)²] = E[X²] − (E[X])². - **Standard deviation**: σ = √Var(X) — 매 same unit as X. - **Sample variance**: s² = Σ(xᵢ − x̄)² / (n − 1) — 매 Bessel correction (n−1) 의 unbiased estimator. - **Population variance**: σ² = Σ(xᵢ − μ)² / N. ### 매 property - **Var(aX + b) = a²·Var(X)** — 매 shift invariant, scale 의 square. - **Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y)** — independent 면 cov=0. - **SD 의 robust X**: 매 outlier 의 영향 의 큼 — robust alt: MAD (Median Absolute Deviation), IQR. - **Chebyshev**: P(|X−μ| ≥ kσ) ≤ 1/k² — 매 distribution-free. ### 매 응용 1. Risk (finance) — volatility = SD of returns. 2. Quality control — Six Sigma (process σ). 3. Z-score normalization — (x − μ) / σ. 4. Confidence interval — μ ± z·(σ/√n). ## 💻 패턴 ### 1. NumPy — sample vs population ```python import numpy as np x = np.array([2, 4, 4, 4, 5, 5, 7, 9]) np.var(x) # 매 population (ddof=0) → 4.0 np.var(x, ddof=1) # 매 sample (Bessel) → 4.571 np.std(x, ddof=1) # 매 2.138 ``` ### 2. Welford's online algorithm ```python # 매 streaming, numerically stable, single-pass def welford_update(state, x): n, mean, M2 = state n += 1 delta = x - mean mean += delta / n M2 += delta * (x - mean) return n, mean, M2 n, mean, M2 = 0, 0.0, 0.0 for x in stream: n, mean, M2 = welford_update((n, mean, M2), x) var = M2 / (n - 1) ``` ### 3. Z-score normalization ```python from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # 매 column-wise (x − μ) / σ ``` ### 4. Rolling SD (pandas) ```python import pandas as pd returns = pd.Series(prices).pct_change() volatility_30d = returns.rolling(30).std() * np.sqrt(252) # annualized ``` ### 5. Pooled variance (two groups) ```python def pooled_var(x1, x2): n1, n2 = len(x1), len(x2) s1, s2 = np.var(x1, ddof=1), np.var(x2, ddof=1) return ((n1-1)*s1 + (n2-1)*s2) / (n1 + n2 - 2) ``` ### 6. Bias-variance decomposition ```python # E[(y - ŷ)²] = Bias² + Variance + σ²_noise # 매 ML model selection 의 핵심 ``` ### 7. Robust alternatives — MAD ```python from scipy.stats import median_abs_deviation mad = median_abs_deviation(x, scale="normal") # ~1.4826 * raw MAD ≈ σ for Gaussian ``` ### 8. Two-pass safe variance (numerically) ```python # Naive E[X²] - E[X]² → 매 catastrophic cancellation 의 위험 (large mean, small var) # 매 2-pass: μ first, then Σ(x-μ)² mean = x.sum() / n var = ((x - mean) ** 2).sum() / (n - 1) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Sample (inference) | ddof=1 (n−1 division) | | Population (descriptive) | ddof=0 | | Streaming / online | Welford | | Outlier-heavy data | MAD, IQR (not SD) | | Heavy-tail distribution | Quantile-based (P95/P5) | | Volatility (finance) | Rolling SD × √(periods/year) | **기본값**: `np.std(x, ddof=1)` — 매 sample SD, 매 unbiased point estimator. ## 🔗 Graph - 부모: [[Statistics]] · [[Probability Theory]] - 변형: [[Mutual-Information]] · [[Information-Entropy]] - 응용: [[Regression-Analysis-Foundations]] · [[Statistical-Power]] - Adjacent: [[Sampling-Techniques]] · [[Multivariate-Analysis]] ## 🤖 LLM 활용 **언제**: explanation of spread to non-technical, choosing appropriate measure given distribution. **언제 X**: 매 numerical computation — `numpy` 의 사용. ## ❌ 안티패턴 - **ddof confusion**: NumPy default `ddof=0` (population) — 매 inference 시 `ddof=1` 의 명시. - **SD on non-numeric / ordinal**: 매 ordinal scale 의 SD 의 의미 X. - **Reporting SD without n**: 매 SE = σ/√n 의 더 informative. - **Catastrophic cancellation**: naive Σx² − (Σx)²/n → use Welford or 2-pass. - **SD assumes Gaussian-like**: 매 power-law 의 SD 의 unstable, 매 quantile 의 사용. ## 🧪 검증 / 중복 - Verified (Knuth TAOCP vol 2, Welford 1962, NIST/SEMATECH stats handbook). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — variance fundamentals, Welford, ddof, robust alt. |