f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
157 lines
5.0 KiB
Markdown
157 lines
5.0 KiB
Markdown
---
|
||
id: wiki-2026-0508-inferential-statistics
|
||
title: Inferential Statistics
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [Statistical Inference, Hypothesis Testing, Confidence Intervals]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.92
|
||
verification_status: applied
|
||
tags: [statistics, inference, hypothesis-testing, ab-testing, sre]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: scipy
|
||
---
|
||
|
||
# Inferential Statistics
|
||
|
||
## 매 한 줄
|
||
> **"매 sample → population parameter 의 추정 + uncertainty 의 quantify"**. 매 1900s Fisher, Neyman, Pearson 의 frequentist framework, 매 2026 A/B test, SRE alerting, ML evaluation 의 backbone — Bayesian + bootstrap 의 modern hybrid 가 default.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 Frequentist vs Bayesian
|
||
- **Frequentist**: parameter fixed, data random. p-value, CI.
|
||
- **Bayesian**: parameter random (prior), data fixed. Posterior, credible interval.
|
||
- **Bootstrap**: distribution-free, resample n→inf 시뮬레이션.
|
||
|
||
### 매 Test 분류
|
||
- **Parametric**: t-test, ANOVA, Z-test (assumes normal).
|
||
- **Non-parametric**: Mann-Whitney U, Kruskal-Wallis, permutation.
|
||
- **Sequential**: Always Valid Inference, mSPRT (peek-safe).
|
||
|
||
### 매 응용
|
||
1. A/B test: conversion lift 측정.
|
||
2. SRE: SLO breach 의 statistical significance.
|
||
3. ML: model A vs B 의 holdout 비교.
|
||
|
||
## 💻 패턴
|
||
|
||
### Two-sample t-test
|
||
```python
|
||
import scipy.stats as st
|
||
control = [12, 14, 11, 13, 12, 15, 13]
|
||
treat = [16, 18, 15, 17, 19, 16, 18]
|
||
res = st.ttest_ind(control, treat, equal_var=False)
|
||
print(f"t={res.statistic:.3f} p={res.pvalue:.4f}")
|
||
ci = res.confidence_interval(0.95)
|
||
print(f"95% CI: [{ci.low:.2f}, {ci.high:.2f}]")
|
||
```
|
||
|
||
### Bootstrap CI
|
||
```python
|
||
import numpy as np
|
||
def bootstrap_mean_ci(x, n=10_000, alpha=0.05):
|
||
rng = np.random.default_rng(42)
|
||
boots = rng.choice(x, size=(n, len(x)), replace=True).mean(axis=1)
|
||
return np.quantile(boots, [alpha/2, 1-alpha/2])
|
||
|
||
ci = bootstrap_mean_ci(np.array(control))
|
||
print(f"Bootstrap 95% CI: {ci}")
|
||
```
|
||
|
||
### Sample size calculation (power)
|
||
```python
|
||
from statsmodels.stats.power import TTestIndPower
|
||
analysis = TTestIndPower()
|
||
n = analysis.solve_power(effect_size=0.3, power=0.8, alpha=0.05)
|
||
print(f"매 group 당 n = {int(np.ceil(n))}")
|
||
```
|
||
|
||
### Sequential test (mSPRT, peek-safe)
|
||
```python
|
||
import numpy as np
|
||
def msprt_log_likelihood(x, mu0=0, sigma=1, theta=0.1):
|
||
n = len(x); xbar = np.mean(x); v = sigma**2
|
||
tau2 = theta**2
|
||
log_bf = 0.5*np.log(v/(v+n*tau2)) + (n**2 * (xbar-mu0)**2 * tau2) / (2*v*(v+n*tau2))
|
||
return log_bf # > log(1/alpha) 매 reject H0
|
||
```
|
||
|
||
### Bayesian A/B (PyMC)
|
||
```python
|
||
import pymc as pm
|
||
with pm.Model() as m:
|
||
p_a = pm.Beta("p_a", 1, 1)
|
||
p_b = pm.Beta("p_b", 1, 1)
|
||
pm.Binomial("y_a", n=10_000, p=p_a, observed=520)
|
||
pm.Binomial("y_b", n=10_000, p=p_b, observed=580)
|
||
diff = pm.Deterministic("diff", p_b - p_a)
|
||
idata = pm.sample(2000, chains=4, random_seed=42)
|
||
print(f"P(B > A) = {(idata.posterior['diff'] > 0).mean().item():.3f}")
|
||
```
|
||
|
||
### Permutation test
|
||
```python
|
||
def permutation_test(a, b, n=10_000):
|
||
diff_obs = np.mean(a) - np.mean(b)
|
||
pool = np.concatenate([a, b])
|
||
rng = np.random.default_rng(0)
|
||
diffs = []
|
||
for _ in range(n):
|
||
rng.shuffle(pool)
|
||
diffs.append(np.mean(pool[:len(a)]) - np.mean(pool[len(a):]))
|
||
return np.mean(np.abs(diffs) >= abs(diff_obs))
|
||
```
|
||
|
||
### SRE: Welch's test on latency p99
|
||
```python
|
||
# 매 deploy 전후 latency p99 비교
|
||
from scipy.stats import ttest_ind
|
||
before_p99 = np.array([124, 130, 128, 132, 125]) # ms
|
||
after_p99 = np.array([142, 138, 145, 140, 144])
|
||
t, p = ttest_ind(before_p99, after_p99, equal_var=False)
|
||
if p < 0.01: print("매 regression detected — rollback")
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| Fixed-N A/B | t-test or chi-squared |
|
||
| Continuous monitoring | mSPRT or always-valid CI |
|
||
| Small N, non-normal | Bootstrap or permutation |
|
||
| Multi-arm + prior | Bayesian (Beta-Binomial) |
|
||
|
||
**기본값**: Bootstrap CI + sequential test 의 production A/B.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Statistics & Data Analysis]] · [[Probability Theory]]
|
||
- 변형: [[Bayesian_Inference|Bayesian Inference]]
|
||
- 응용: [[SRE]] · [[Anomaly-Detection]]
|
||
- Adjacent: [[Type 1 vs Type 2 Errors]] · [[Power Analysis]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: test 선택 의 advice (data shape → test type), 의 result interpretation.
|
||
**언제 X**: 매 multiple-comparison correction 매 자동화 X — domain knowledge 필요.
|
||
|
||
## ❌ 안티패턴
|
||
- **p-hacking**: 매 multiple test 후 cherry-pick.
|
||
- **Peeking**: fixed-N test 의 매 day 확인 → α inflation.
|
||
- **Single point**: CI 매 보고 안하고 mean 만.
|
||
- **N=∞ → significance ≠ effect size**: Cohen's d 도 같이.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Casella & Berger "Statistical Inference", scipy/statsmodels docs).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — frequentist + Bayesian + sequential pattern |
|