Files
2nd/10_Wiki/Topics/AI_and_ML/Bayesian Statistics.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

240 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-bayesian-statistics
title: Bayesian Statistics
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [베이지안 통계, Bayes' theorem, posterior, prior, MCMC, variational inference, PyMC, Stan]
duplicate_of: none
source_trust_level: A
confidence_score: 0.95
verification_status: applied
tags: [bayesian, statistics, mcmc, variational-inference, pymc, stan, probabilistic-programming, uncertainty]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: PyMC / Stan / NumPyro / Pyro
---
# Bayesian Statistics
## 📌 한 줄 통찰
> **"매 probability = 매 belief 의 degree"**. 매 frequency X — 매 prior + data → posterior 의 update. 매 small data + prior knowledge 의 strong. 매 result = 매 distribution (not point). 매 modern compute (MCMC / VI) 의 mainstream.
## 📖 핵심
### Bayes' theorem
$$P(\theta | D) = \frac{P(D | \theta) \cdot P(\theta)}{P(D)}$$
- **P(θ)**: prior — 매 belief.
- **P(D | θ)**: likelihood — 매 data 의 model.
- **P(θ | D)**: posterior — 매 update 된 belief.
- **P(D)**: evidence (normalizer).
### vs Frequentist
| 측면 | Frequentist | Bayesian |
|---|---|---|
| Probability | 매 long-run frequency | 매 belief degree |
| Parameter | 매 fixed unknown | 매 random variable |
| Result | 매 point + CI | 매 posterior distribution |
| Small data | 매 fragile | 매 prior 의 robust |
| Compute | 매 cheap | 매 expensive (until MCMC) |
| Interpretation | "95% of intervals contain θ" | "P(θ ∈ [a,b]) = 0.95" |
### 매 conjugate prior (analytical)
| Likelihood | Prior | Posterior |
|---|---|---|
| Binomial | Beta | Beta |
| Poisson | Gamma | Gamma |
| Normal (known σ) | Normal | Normal |
| Normal (unknown μ,σ) | Normal-Gamma | Normal-Gamma |
| Multinomial | Dirichlet | Dirichlet |
→ 매 closed-form 가, 매 limited.
### 매 inference (modern)
#### MCMC (Markov Chain Monte Carlo)
- **Metropolis-Hastings**: 매 random walk + accept/reject.
- **Hamiltonian MC (HMC)**: 매 gradient 활용.
- **NUTS** (No-U-Turn): 매 HMC 의 auto-tune.
- ✅ 매 정확. ❌ 매 slow.
#### Variational Inference (VI)
- 매 posterior 의 approximate distribution q(θ) 의 fit.
- 매 KL divergence 의 minimize.
- ✅ 매 fast + scale. ❌ 매 approximate.
#### Sequential Monte Carlo
- 매 particle filter.
- 매 streaming OK.
### 매 응용
1. **A/B testing**: 매 frequentist 보다 매 interpretable.
2. **Hyperparameter tuning** (Bayesian Optimization): 매 GP + acquisition.
3. **Hierarchical models**: 매 group-level prior.
4. **Time series** (state-space): 매 Kalman, 매 particle filter.
5. **Causal inference** (Bayesian network): 매 DAG.
6. **Drug discovery / clinical**: 매 small N + strong prior.
7. **Robotics** (SLAM): 매 pose + map 의 joint.
8. **Topic modeling** (LDA): 매 Dirichlet prior.
### 매 modern stack
- **Stan**: 매 NUTS, 매 mature.
- **PyMC** (3 → 4 → 5): 매 Python + Aesara.
- **NumPyro**: 매 JAX-based, 매 fast.
- **Pyro**: 매 PyTorch + VI.
- **TFP**: 매 TensorFlow Probability.
- **Edward2 / blackjax**: 매 modular.
## 💻 패턴
### Coin flip (PyMC)
```python
import pymc as pm
import numpy as np
# 매 data: 매 8 head, 매 2 tail
data = np.array([1]*8 + [0]*2)
with pm.Model() as model:
p = pm.Beta('p', alpha=2, beta=2) # 매 prior
obs = pm.Bernoulli('obs', p=p, observed=data)
trace = pm.sample(2000, return_inferencedata=True)
# 매 posterior
import arviz as az
az.plot_posterior(trace)
print(az.summary(trace))
# p mean ≈ 0.71, hdi_3% ≈ 0.50, hdi_97% ≈ 0.89
```
### Hierarchical (group-level)
```python
with pm.Model() as h:
# 매 hyperprior
mu = pm.Normal('mu', 0, 10)
sigma = pm.HalfNormal('sigma', 5)
# 매 group-level
theta = pm.Normal('theta', mu, sigma, shape=n_groups)
# 매 likelihood
y = pm.Normal('y', theta[group_idx], 1, observed=data)
trace = pm.sample(2000)
```
→ 매 partial pooling — 매 group 의 small N 의 borrow strength.
### Bayesian A/B test
```python
with pm.Model() as ab:
p_a = pm.Beta('p_a', 1, 1)
p_b = pm.Beta('p_b', 1, 1)
obs_a = pm.Binomial('obs_a', n=n_a, p=p_a, observed=conv_a)
obs_b = pm.Binomial('obs_b', n=n_b, p=p_b, observed=conv_b)
diff = pm.Deterministic('diff', p_b - p_a)
trace = pm.sample(2000)
# 매 P(B > A)
prob_b_better = (trace.posterior['diff'] > 0).mean().item()
print(f'P(B > A) = {prob_b_better:.3f}')
```
→ 매 frequentist 보다 매 actionable.
### Variational inference (faster)
```python
import numpyro
import numpyro.distributions as dist
from numpyro.infer import SVI, Trace_ELBO
from numpyro.infer.autoguide import AutoNormal
def model(data):
p = numpyro.sample('p', dist.Beta(2, 2))
numpyro.sample('obs', dist.Bernoulli(p), obs=data)
guide = AutoNormal(model)
svi = SVI(model, guide, optim.Adam(0.01), Trace_ELBO())
state = svi.init(jax.random.PRNGKey(0), data)
for step in range(2000):
state, loss = svi.update(state, data)
```
### Bayesian Optimization (hyperparameter)
```python
from skopt import gp_minimize
from skopt.space import Real, Integer
def objective(params):
lr, depth = params
return train_and_eval(lr, depth) # 매 minimize
result = gp_minimize(
objective,
[Real(1e-5, 1e-1, prior='log-uniform', name='lr'),
Integer(1, 10, name='depth')],
n_calls=50,
)
```
### Posterior predictive check
```python
with model:
ppc = pm.sample_posterior_predictive(trace)
# 매 simulated data 의 actual 의 비교 — 매 model fit 의 visual.
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model))
```
## 🤔 결정 기준
| 상황 | Method |
|---|---|
| Small data + prior | Conjugate (analytical) |
| Complex model + accuracy | NUTS (PyMC / Stan) |
| Large data + speed | VI (Pyro / NumPyro) |
| Streaming | Particle filter |
| Hyperparameter tune | BO (skopt / Optuna) |
| A/B test | Beta-Binomial + Bayes |
| Topic modeling | LDA |
| Causal | Bayesian network |
**기본값**: PyMC + NUTS 의 baseline. 매 scale 가 NumPyro / VI.
## 🔗 Graph
- 부모: [[Statistics]] · [[Probability Theory]]
- 변형: [[MCMC]] · [[Variational-Inference]] · [[Bayesian-Network]]
- 응용: [[Bayesian-Optimization]] · [[LDA]] · [[SLAM]]
- Tool: [[PyMC]] · [[Stan]]
- Adjacent: [[Bayes-Theorem]] · [[Bayesian-Updating]]
## 🤖 LLM 활용
**언제**: 매 small data + prior. 매 uncertainty quantify. 매 hierarchical structure. 매 hyperparameter tune.
**언제 X**: 매 large data + speed > accuracy. 매 simple frequentist 의 OK.
## ❌ 안티패턴
- **Improper prior**: 매 posterior 의 invalid.
- **No PPC**: 매 fit 의 모름.
- **MCMC 의 chains 1**: 매 convergence 의 detect X.
- **Burn-in 무시**: 매 biased estimate.
- **Conjugate 의 force**: 매 wrong likelihood.
- **VI 의 over-confident** (mean-field): 매 underestimate uncertainty.
- **R-hat ignore**: 매 non-convergence.
## 🧪 검증 / 중복
- Verified (Gelman BDA, McElreath Statistical Rethinking, Stan/PyMC docs).
- 신뢰도 A.
- Related: [[Bayes-Theorem]] · [[MCMC]] · [[Bayesian-Optimization]] · [[Variational-Inference]].
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — Bayes formula + MCMC / VI + 매 PyMC / NumPyro / skopt code |