--- id: wiki-2026-0508-inferential-statistics title: Inferential Statistics category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Statistical Inference, Hypothesis Testing, Confidence Intervals] duplicate_of: none source_trust_level: A confidence_score: 0.92 verification_status: applied tags: [statistics, inference, hypothesis-testing, ab-testing, sre] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: scipy --- # Inferential Statistics ## 매 한 줄 > **"매 sample → population parameter 의 추정 + uncertainty 의 quantify"**. 매 1900s Fisher, Neyman, Pearson 의 frequentist framework, 매 2026 A/B test, SRE alerting, ML evaluation 의 backbone — Bayesian + bootstrap 의 modern hybrid 가 default. ## 매 핵심 ### 매 Frequentist vs Bayesian - **Frequentist**: parameter fixed, data random. p-value, CI. - **Bayesian**: parameter random (prior), data fixed. Posterior, credible interval. - **Bootstrap**: distribution-free, resample n→inf 시뮬레이션. ### 매 Test 분류 - **Parametric**: t-test, ANOVA, Z-test (assumes normal). - **Non-parametric**: Mann-Whitney U, Kruskal-Wallis, permutation. - **Sequential**: Always Valid Inference, mSPRT (peek-safe). ### 매 응용 1. A/B test: conversion lift 측정. 2. SRE: SLO breach 의 statistical significance. 3. ML: model A vs B 의 holdout 비교. ## 💻 패턴 ### Two-sample t-test ```python import scipy.stats as st control = [12, 14, 11, 13, 12, 15, 13] treat = [16, 18, 15, 17, 19, 16, 18] res = st.ttest_ind(control, treat, equal_var=False) print(f"t={res.statistic:.3f} p={res.pvalue:.4f}") ci = res.confidence_interval(0.95) print(f"95% CI: [{ci.low:.2f}, {ci.high:.2f}]") ``` ### Bootstrap CI ```python import numpy as np def bootstrap_mean_ci(x, n=10_000, alpha=0.05): rng = np.random.default_rng(42) boots = rng.choice(x, size=(n, len(x)), replace=True).mean(axis=1) return np.quantile(boots, [alpha/2, 1-alpha/2]) ci = bootstrap_mean_ci(np.array(control)) print(f"Bootstrap 95% CI: {ci}") ``` ### Sample size calculation (power) ```python from statsmodels.stats.power import TTestIndPower analysis = TTestIndPower() n = analysis.solve_power(effect_size=0.3, power=0.8, alpha=0.05) print(f"매 group 당 n = {int(np.ceil(n))}") ``` ### Sequential test (mSPRT, peek-safe) ```python import numpy as np def msprt_log_likelihood(x, mu0=0, sigma=1, theta=0.1): n = len(x); xbar = np.mean(x); v = sigma**2 tau2 = theta**2 log_bf = 0.5*np.log(v/(v+n*tau2)) + (n**2 * (xbar-mu0)**2 * tau2) / (2*v*(v+n*tau2)) return log_bf # > log(1/alpha) 매 reject H0 ``` ### Bayesian A/B (PyMC) ```python import pymc as pm with pm.Model() as m: p_a = pm.Beta("p_a", 1, 1) p_b = pm.Beta("p_b", 1, 1) pm.Binomial("y_a", n=10_000, p=p_a, observed=520) pm.Binomial("y_b", n=10_000, p=p_b, observed=580) diff = pm.Deterministic("diff", p_b - p_a) idata = pm.sample(2000, chains=4, random_seed=42) print(f"P(B > A) = {(idata.posterior['diff'] > 0).mean().item():.3f}") ``` ### Permutation test ```python def permutation_test(a, b, n=10_000): diff_obs = np.mean(a) - np.mean(b) pool = np.concatenate([a, b]) rng = np.random.default_rng(0) diffs = [] for _ in range(n): rng.shuffle(pool) diffs.append(np.mean(pool[:len(a)]) - np.mean(pool[len(a):])) return np.mean(np.abs(diffs) >= abs(diff_obs)) ``` ### SRE: Welch's test on latency p99 ```python # 매 deploy 전후 latency p99 비교 from scipy.stats import ttest_ind before_p99 = np.array([124, 130, 128, 132, 125]) # ms after_p99 = np.array([142, 138, 145, 140, 144]) t, p = ttest_ind(before_p99, after_p99, equal_var=False) if p < 0.01: print("매 regression detected — rollback") ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Fixed-N A/B | t-test or chi-squared | | Continuous monitoring | mSPRT or always-valid CI | | Small N, non-normal | Bootstrap or permutation | | Multi-arm + prior | Bayesian (Beta-Binomial) | **기본값**: Bootstrap CI + sequential test 의 production A/B. ## 🔗 Graph - 부모: [[Statistics & Data Analysis]] · [[Probability Theory]] - 변형: [[Bayesian_Inference|Bayesian Inference]] - 응용: [[SRE]] · [[Anomaly-Detection]] - Adjacent: [[Type 1 vs Type 2 Errors]] · [[Power Analysis]] ## 🤖 LLM 활용 **언제**: test 선택 의 advice (data shape → test type), 의 result interpretation. **언제 X**: 매 multiple-comparison correction 매 자동화 X — domain knowledge 필요. ## ❌ 안티패턴 - **p-hacking**: 매 multiple test 후 cherry-pick. - **Peeking**: fixed-N test 의 매 day 확인 → α inflation. - **Single point**: CI 매 보고 안하고 mean 만. - **N=∞ → significance ≠ effect size**: Cohen's d 도 같이. ## 🧪 검증 / 중복 - Verified (Casella & Berger "Statistical Inference", scipy/statsmodels docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — frequentist + Bayesian + sequential pattern |