Files
2nd/10_Wiki/Topics/DevOps_and_Security/Inferential-Statistics.md
T
2026-05-10 22:08:15 +09:00

157 lines
5.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-inferential-statistics
title: Inferential Statistics
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Statistical Inference, Hypothesis Testing, Confidence Intervals]
duplicate_of: none
source_trust_level: A
confidence_score: 0.92
verification_status: applied
tags: [statistics, inference, hypothesis-testing, ab-testing, sre]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: scipy
---
# Inferential Statistics
## 매 한 줄
> **"매 sample → population parameter 의 추정 + uncertainty 의 quantify"**. 매 1900s Fisher, Neyman, Pearson 의 frequentist framework, 매 2026 A/B test, SRE alerting, ML evaluation 의 backbone — Bayesian + bootstrap 의 modern hybrid 가 default.
## 매 핵심
### 매 Frequentist vs Bayesian
- **Frequentist**: parameter fixed, data random. p-value, CI.
- **Bayesian**: parameter random (prior), data fixed. Posterior, credible interval.
- **Bootstrap**: distribution-free, resample n→inf 시뮬레이션.
### 매 Test 분류
- **Parametric**: t-test, ANOVA, Z-test (assumes normal).
- **Non-parametric**: Mann-Whitney U, Kruskal-Wallis, permutation.
- **Sequential**: Always Valid Inference, mSPRT (peek-safe).
### 매 응용
1. A/B test: conversion lift 측정.
2. SRE: SLO breach 의 statistical significance.
3. ML: model A vs B 의 holdout 비교.
## 💻 패턴
### Two-sample t-test
```python
import scipy.stats as st
control = [12, 14, 11, 13, 12, 15, 13]
treat = [16, 18, 15, 17, 19, 16, 18]
res = st.ttest_ind(control, treat, equal_var=False)
print(f"t={res.statistic:.3f} p={res.pvalue:.4f}")
ci = res.confidence_interval(0.95)
print(f"95% CI: [{ci.low:.2f}, {ci.high:.2f}]")
```
### Bootstrap CI
```python
import numpy as np
def bootstrap_mean_ci(x, n=10_000, alpha=0.05):
rng = np.random.default_rng(42)
boots = rng.choice(x, size=(n, len(x)), replace=True).mean(axis=1)
return np.quantile(boots, [alpha/2, 1-alpha/2])
ci = bootstrap_mean_ci(np.array(control))
print(f"Bootstrap 95% CI: {ci}")
```
### Sample size calculation (power)
```python
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
n = analysis.solve_power(effect_size=0.3, power=0.8, alpha=0.05)
print(f"매 group 당 n = {int(np.ceil(n))}")
```
### Sequential test (mSPRT, peek-safe)
```python
import numpy as np
def msprt_log_likelihood(x, mu0=0, sigma=1, theta=0.1):
n = len(x); xbar = np.mean(x); v = sigma**2
tau2 = theta**2
log_bf = 0.5*np.log(v/(v+n*tau2)) + (n**2 * (xbar-mu0)**2 * tau2) / (2*v*(v+n*tau2))
return log_bf # > log(1/alpha) 매 reject H0
```
### Bayesian A/B (PyMC)
```python
import pymc as pm
with pm.Model() as m:
p_a = pm.Beta("p_a", 1, 1)
p_b = pm.Beta("p_b", 1, 1)
pm.Binomial("y_a", n=10_000, p=p_a, observed=520)
pm.Binomial("y_b", n=10_000, p=p_b, observed=580)
diff = pm.Deterministic("diff", p_b - p_a)
idata = pm.sample(2000, chains=4, random_seed=42)
print(f"P(B > A) = {(idata.posterior['diff'] > 0).mean().item():.3f}")
```
### Permutation test
```python
def permutation_test(a, b, n=10_000):
diff_obs = np.mean(a) - np.mean(b)
pool = np.concatenate([a, b])
rng = np.random.default_rng(0)
diffs = []
for _ in range(n):
rng.shuffle(pool)
diffs.append(np.mean(pool[:len(a)]) - np.mean(pool[len(a):]))
return np.mean(np.abs(diffs) >= abs(diff_obs))
```
### SRE: Welch's test on latency p99
```python
# 매 deploy 전후 latency p99 비교
from scipy.stats import ttest_ind
before_p99 = np.array([124, 130, 128, 132, 125]) # ms
after_p99 = np.array([142, 138, 145, 140, 144])
t, p = ttest_ind(before_p99, after_p99, equal_var=False)
if p < 0.01: print("매 regression detected — rollback")
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Fixed-N A/B | t-test or chi-squared |
| Continuous monitoring | mSPRT or always-valid CI |
| Small N, non-normal | Bootstrap or permutation |
| Multi-arm + prior | Bayesian (Beta-Binomial) |
**기본값**: Bootstrap CI + sequential test 의 production A/B.
## 🔗 Graph
- 부모: [[Statistics & Data Analysis]] · [[Probability Theory]]
- 변형: [[Bayesian Inference]] · [[Bootstrap]] · [[Permutation Test]]
- 응용: [[A/B Testing]] · [[SRE]] · [[Anomaly-Detection]]
- Adjacent: [[Type 1 vs Type 2 Errors]] · [[Power Analysis]]
## 🤖 LLM 활용
**언제**: test 선택 의 advice (data shape → test type), 의 result interpretation.
**언제 X**: 매 multiple-comparison correction 매 자동화 X — domain knowledge 필요.
## ❌ 안티패턴
- **p-hacking**: 매 multiple test 후 cherry-pick.
- **Peeking**: fixed-N test 의 매 day 확인 → α inflation.
- **Single point**: CI 매 보고 안하고 mean 만.
- **N=∞ → significance ≠ effect size**: Cohen's d 도 같이.
## 🧪 검증 / 중복
- Verified (Casella & Berger "Statistical Inference", scipy/statsmodels docs).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — frequentist + Bayesian + sequential pattern |