Files
2nd/10_Wiki/Topics/AI_and_ML/Bayesian Statistics.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

240 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-bayesian-statistics
title: Bayesian Statistics
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [베이지안 통계, Bayes' theorem, posterior, prior, MCMC, variational inference, PyMC, Stan]
duplicate_of: none
source_trust_level: A
confidence_score: 0.95
verification_status: applied
tags: [bayesian, statistics, mcmc, variational-inference, pymc, stan, probabilistic-programming, uncertainty]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: PyMC / Stan / NumPyro / Pyro
---
# Bayesian Statistics
## 📌 한 줄 통찰
> **"매 probability = 매 belief 의 degree"**. 매 frequency X — 매 prior + data → posterior 의 update. 매 small data + prior knowledge 의 strong. 매 result = 매 distribution (not point). 매 modern compute (MCMC / VI) 의 mainstream.
## 📖 핵심
### Bayes' theorem
$$P(\theta | D) = \frac{P(D | \theta) \cdot P(\theta)}{P(D)}$$
- **P(θ)**: prior — 매 belief.
- **P(D | θ)**: likelihood — 매 data 의 model.
- **P(θ | D)**: posterior — 매 update 된 belief.
- **P(D)**: evidence (normalizer).
### vs Frequentist
| 측면 | Frequentist | Bayesian |
|---|---|---|
| Probability | 매 long-run frequency | 매 belief degree |
| Parameter | 매 fixed unknown | 매 random variable |
| Result | 매 point + CI | 매 posterior distribution |
| Small data | 매 fragile | 매 prior 의 robust |
| Compute | 매 cheap | 매 expensive (until MCMC) |
| Interpretation | "95% of intervals contain θ" | "P(θ ∈ [a,b]) = 0.95" |
### 매 conjugate prior (analytical)
| Likelihood | Prior | Posterior |
|---|---|---|
| Binomial | Beta | Beta |
| Poisson | Gamma | Gamma |
| Normal (known σ) | Normal | Normal |
| Normal (unknown μ,σ) | Normal-Gamma | Normal-Gamma |
| Multinomial | Dirichlet | Dirichlet |
→ 매 closed-form 가, 매 limited.
### 매 inference (modern)
#### MCMC (Markov Chain Monte Carlo)
- **Metropolis-Hastings**: 매 random walk + accept/reject.
- **Hamiltonian MC (HMC)**: 매 gradient 활용.
- **NUTS** (No-U-Turn): 매 HMC 의 auto-tune.
- ✅ 매 정확. ❌ 매 slow.
#### Variational Inference (VI)
- 매 posterior 의 approximate distribution q(θ) 의 fit.
- 매 KL divergence 의 minimize.
- ✅ 매 fast + scale. ❌ 매 approximate.
#### Sequential Monte Carlo
- 매 particle filter.
- 매 streaming OK.
### 매 응용
1. **A/B testing**: 매 frequentist 보다 매 interpretable.
2. **Hyperparameter tuning** (Bayesian Optimization): 매 GP + acquisition.
3. **Hierarchical models**: 매 group-level prior.
4. **Time series** (state-space): 매 Kalman, 매 particle filter.
5. **Causal inference** (Bayesian network): 매 DAG.
6. **Drug discovery / clinical**: 매 small N + strong prior.
7. **Robotics** (SLAM): 매 pose + map 의 joint.
8. **Topic modeling** (LDA): 매 Dirichlet prior.
### 매 modern stack
- **Stan**: 매 NUTS, 매 mature.
- **PyMC** (3 → 4 → 5): 매 Python + Aesara.
- **NumPyro**: 매 JAX-based, 매 fast.
- **Pyro**: 매 PyTorch + VI.
- **TFP**: 매 TensorFlow Probability.
- **Edward2 / blackjax**: 매 modular.
## 💻 패턴
### Coin flip (PyMC)
```python
import pymc as pm
import numpy as np
# 매 data: 매 8 head, 매 2 tail
data = np.array([1]*8 + [0]*2)
with pm.Model() as model:
p = pm.Beta('p', alpha=2, beta=2) # 매 prior
obs = pm.Bernoulli('obs', p=p, observed=data)
trace = pm.sample(2000, return_inferencedata=True)
# 매 posterior
import arviz as az
az.plot_posterior(trace)
print(az.summary(trace))
# p mean ≈ 0.71, hdi_3% ≈ 0.50, hdi_97% ≈ 0.89
```
### Hierarchical (group-level)
```python
with pm.Model() as h:
# 매 hyperprior
mu = pm.Normal('mu', 0, 10)
sigma = pm.HalfNormal('sigma', 5)
# 매 group-level
theta = pm.Normal('theta', mu, sigma, shape=n_groups)
# 매 likelihood
y = pm.Normal('y', theta[group_idx], 1, observed=data)
trace = pm.sample(2000)
```
→ 매 partial pooling — 매 group 의 small N 의 borrow strength.
### Bayesian A/B test
```python
with pm.Model() as ab:
p_a = pm.Beta('p_a', 1, 1)
p_b = pm.Beta('p_b', 1, 1)
obs_a = pm.Binomial('obs_a', n=n_a, p=p_a, observed=conv_a)
obs_b = pm.Binomial('obs_b', n=n_b, p=p_b, observed=conv_b)
diff = pm.Deterministic('diff', p_b - p_a)
trace = pm.sample(2000)
# 매 P(B > A)
prob_b_better = (trace.posterior['diff'] > 0).mean().item()
print(f'P(B > A) = {prob_b_better:.3f}')
```
→ 매 frequentist 보다 매 actionable.
### Variational inference (faster)
```python
import numpyro
import numpyro.distributions as dist
from numpyro.infer import SVI, Trace_ELBO
from numpyro.infer.autoguide import AutoNormal
def model(data):
p = numpyro.sample('p', dist.Beta(2, 2))
numpyro.sample('obs', dist.Bernoulli(p), obs=data)
guide = AutoNormal(model)
svi = SVI(model, guide, optim.Adam(0.01), Trace_ELBO())
state = svi.init(jax.random.PRNGKey(0), data)
for step in range(2000):
state, loss = svi.update(state, data)
```
### Bayesian Optimization (hyperparameter)
```python
from skopt import gp_minimize
from skopt.space import Real, Integer
def objective(params):
lr, depth = params
return train_and_eval(lr, depth) # 매 minimize
result = gp_minimize(
objective,
[Real(1e-5, 1e-1, prior='log-uniform', name='lr'),
Integer(1, 10, name='depth')],
n_calls=50,
)
```
### Posterior predictive check
```python
with model:
ppc = pm.sample_posterior_predictive(trace)
# 매 simulated data 의 actual 의 비교 — 매 model fit 의 visual.
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model))
```
## 🤔 결정 기준
| 상황 | Method |
|---|---|
| Small data + prior | Conjugate (analytical) |
| Complex model + accuracy | NUTS (PyMC / Stan) |
| Large data + speed | VI (Pyro / NumPyro) |
| Streaming | Particle filter |
| Hyperparameter tune | BO (skopt / Optuna) |
| A/B test | Beta-Binomial + Bayes |
| Topic modeling | LDA |
| Causal | Bayesian network |
**기본값**: PyMC + NUTS 의 baseline. 매 scale 가 NumPyro / VI.
## 🔗 Graph
- 부모: [[Statistics]] · [[Probability-Theory]]
- 변형: [[MCMC]] · [[Variational-Inference]] · [[Bayesian-Network]]
- 응용: [[Bayesian-Optimization]] · [[LDA]] · [[SLAM]]
- Tool: [[PyMC]] · [[Stan]]
- Adjacent: [[Bayes-Theorem]] · [[Bayesian-Updating]]
## 🤖 LLM 활용
**언제**: 매 small data + prior. 매 uncertainty quantify. 매 hierarchical structure. 매 hyperparameter tune.
**언제 X**: 매 large data + speed > accuracy. 매 simple frequentist 의 OK.
## ❌ 안티패턴
- **Improper prior**: 매 posterior 의 invalid.
- **No PPC**: 매 fit 의 모름.
- **MCMC 의 chains 1**: 매 convergence 의 detect X.
- **Burn-in 무시**: 매 biased estimate.
- **Conjugate 의 force**: 매 wrong likelihood.
- **VI 의 over-confident** (mean-field): 매 underestimate uncertainty.
- **R-hat ignore**: 매 non-convergence.
## 🧪 검증 / 중복
- Verified (Gelman BDA, McElreath Statistical Rethinking, Stan/PyMC docs).
- 신뢰도 A.
- Related: [[Bayes-Theorem]] · [[MCMC]] · [[Bayesian-Optimization]] · [[Variational-Inference]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Bayes formula + MCMC / VI + 매 PyMC / NumPyro / skopt code |