Files

T

Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization

10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 23:52:15 +09:00

5.0 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Inferential Statistics

매 한 줄

"매 sample → population parameter 의 추정 + uncertainty 의 quantify". 매 1900s Fisher, Neyman, Pearson 의 frequentist framework, 매 2026 A/B test, SRE alerting, ML evaluation 의 backbone — Bayesian + bootstrap 의 modern hybrid 가 default.

매 핵심

매 Frequentist vs Bayesian

Frequentist: parameter fixed, data random. p-value, CI.
Bayesian: parameter random (prior), data fixed. Posterior, credible interval.
Bootstrap: distribution-free, resample n→inf 시뮬레이션.

매 Test 분류

Parametric: t-test, ANOVA, Z-test (assumes normal).
Non-parametric: Mann-Whitney U, Kruskal-Wallis, permutation.
Sequential: Always Valid Inference, mSPRT (peek-safe).

매 응용

A/B test: conversion lift 측정.
SRE: SLO breach 의 statistical significance.
ML: model A vs B 의 holdout 비교.

💻 패턴

Two-sample t-test

import scipy.stats as st
control = [12, 14, 11, 13, 12, 15, 13]
treat   = [16, 18, 15, 17, 19, 16, 18]
res = st.ttest_ind(control, treat, equal_var=False)
print(f"t={res.statistic:.3f} p={res.pvalue:.4f}")
ci = res.confidence_interval(0.95)
print(f"95% CI: [{ci.low:.2f}, {ci.high:.2f}]")

Bootstrap CI

import numpy as np
def bootstrap_mean_ci(x, n=10_000, alpha=0.05):
    rng = np.random.default_rng(42)
    boots = rng.choice(x, size=(n, len(x)), replace=True).mean(axis=1)
    return np.quantile(boots, [alpha/2, 1-alpha/2])

ci = bootstrap_mean_ci(np.array(control))
print(f"Bootstrap 95% CI: {ci}")

Sample size calculation (power)

from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
n = analysis.solve_power(effect_size=0.3, power=0.8, alpha=0.05)
print(f"매 group 당 n = {int(np.ceil(n))}")

Sequential test (mSPRT, peek-safe)

import numpy as np
def msprt_log_likelihood(x, mu0=0, sigma=1, theta=0.1):
    n = len(x); xbar = np.mean(x); v = sigma**2
    tau2 = theta**2
    log_bf = 0.5*np.log(v/(v+n*tau2)) + (n**2 * (xbar-mu0)**2 * tau2) / (2*v*(v+n*tau2))
    return log_bf  # > log(1/alpha) 매 reject H0

Bayesian A/B (PyMC)

import pymc as pm
with pm.Model() as m:
    p_a = pm.Beta("p_a", 1, 1)
    p_b = pm.Beta("p_b", 1, 1)
    pm.Binomial("y_a", n=10_000, p=p_a, observed=520)
    pm.Binomial("y_b", n=10_000, p=p_b, observed=580)
    diff = pm.Deterministic("diff", p_b - p_a)
    idata = pm.sample(2000, chains=4, random_seed=42)
print(f"P(B > A) = {(idata.posterior['diff'] > 0).mean().item():.3f}")

Permutation test

def permutation_test(a, b, n=10_000):
    diff_obs = np.mean(a) - np.mean(b)
    pool = np.concatenate([a, b])
    rng = np.random.default_rng(0)
    diffs = []
    for _ in range(n):
        rng.shuffle(pool)
        diffs.append(np.mean(pool[:len(a)]) - np.mean(pool[len(a):]))
    return np.mean(np.abs(diffs) >= abs(diff_obs))

SRE: Welch's test on latency p99

# 매 deploy 전후 latency p99 비교
from scipy.stats import ttest_ind
before_p99 = np.array([124, 130, 128, 132, 125])  # ms
after_p99  = np.array([142, 138, 145, 140, 144])
t, p = ttest_ind(before_p99, after_p99, equal_var=False)
if p < 0.01: print("매 regression detected — rollback")

매 결정 기준

상황	Approach
Fixed-N A/B	t-test or chi-squared
Continuous monitoring	mSPRT or always-valid CI
Small N, non-normal	Bootstrap or permutation
Multi-arm + prior	Bayesian (Beta-Binomial)

기본값: Bootstrap CI + sequential test 의 production A/B.

🔗 Graph

부모: Statistics & Data Analysis · Probability Theory
변형: Bayesian_Inference
응용: SRE · Anomaly-Detection
Adjacent: Type 1 vs Type 2 Errors · Power Analysis

🤖 LLM 활용

언제: test 선택 의 advice (data shape → test type), 의 result interpretation. 언제 X: 매 multiple-comparison correction 매 자동화 X — domain knowledge 필요.

❌ 안티패턴

p-hacking: 매 multiple test 후 cherry-pick.
Peeking: fixed-N test 의 매 day 확인 → α inflation.
Single point: CI 매 보고 안하고 mean 만.
N=∞ → significance ≠ effect size: Cohen's d 도 같이.

🧪 검증 / 중복

Verified (Casella & Berger "Statistical Inference", scipy/statsmodels docs).
신뢰도 A.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — frequentist + Bayesian + sequential pattern

5.0 KiB Raw Blame History Unescape Escape