f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
183 lines
6.2 KiB
Markdown
183 lines
6.2 KiB
Markdown
---
|
|
id: wiki-2026-0508-bayesian-inference
|
|
title: Bayesian Inference
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [Bayesian Inference, Bayesian Statistics, Posterior Inference]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.95
|
|
verification_status: applied
|
|
tags: [statistics, ml, probabilistic, mcmc]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: pymc,numpyro,stan
|
|
---
|
|
|
|
# Bayesian Inference
|
|
|
|
## 매 한 줄
|
|
> **"매 prior + likelihood = posterior — 매 belief 의 evidence 에 의해 의 update"**. Bayes 1763 의 origin, 20세기 frequentist 의 dominance, 2026 의 NumPyro/PyMC + GPU MCMC + variational inference 의 mainstream — 매 LLM uncertainty quantification 의 backbone.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 Bayes rule
|
|
P(θ|D) = P(D|θ) P(θ) / P(D)
|
|
- **P(θ)**: prior — 매 data 이전 의 belief.
|
|
- **P(D|θ)**: likelihood — 매 model 의 data fit.
|
|
- **P(θ|D)**: posterior — updated belief.
|
|
- **P(D)**: evidence (marginal likelihood).
|
|
|
|
### 매 4 inference 방법
|
|
- **Conjugate**: closed-form (Beta-Bernoulli · Gaussian-Gaussian).
|
|
- **MCMC**: HMC · NUTS · Gibbs — exact (asymptotic) · slow.
|
|
- **Variational (VI)**: posterior 의 simpler family 의 approximate — fast · biased.
|
|
- **Sequential MC**: particle filter — 매 streaming · state-space.
|
|
|
|
### 매 응용
|
|
1. A/B test (Bayesian alternative 의 frequentist).
|
|
2. Hierarchical model (분류 multi-level).
|
|
3. ML calibration · uncertainty (BNN · Gaussian process).
|
|
4. LLM logit calibration · RAG confidence.
|
|
|
|
## 💻 패턴
|
|
|
|
### NumPyro: Bayesian linear regression (NUTS)
|
|
```python
|
|
import numpyro
|
|
import numpyro.distributions as dist
|
|
from numpyro.infer import NUTS, MCMC
|
|
import jax.numpy as jnp
|
|
import jax.random as random
|
|
|
|
def model(X, y=None):
|
|
alpha = numpyro.sample("alpha", dist.Normal(0., 10.))
|
|
beta = numpyro.sample("beta", dist.Normal(jnp.zeros(X.shape[1]), 1.))
|
|
sigma = numpyro.sample("sigma", dist.HalfNormal(1.))
|
|
mu = alpha + X @ beta
|
|
numpyro.sample("obs", dist.Normal(mu, sigma), obs=y)
|
|
|
|
mcmc = MCMC(NUTS(model), num_warmup=1000, num_samples=2000, num_chains=4)
|
|
mcmc.run(random.PRNGKey(0), X, y)
|
|
mcmc.print_summary()
|
|
```
|
|
|
|
### PyMC: Bayesian A/B test (Beta-Bernoulli)
|
|
```python
|
|
import pymc as pm
|
|
|
|
with pm.Model():
|
|
p_a = pm.Beta("p_a", alpha=1, beta=1)
|
|
p_b = pm.Beta("p_b", alpha=1, beta=1)
|
|
pm.Binomial("obs_a", n=n_a, p=p_a, observed=conv_a)
|
|
pm.Binomial("obs_b", n=n_b, p=p_b, observed=conv_b)
|
|
diff = pm.Deterministic("lift", p_b - p_a)
|
|
idata = pm.sample(2000, tune=1000, chains=4)
|
|
|
|
prob_b_better = (idata.posterior["lift"] > 0).mean().item()
|
|
```
|
|
|
|
### Hierarchical model (varying intercept)
|
|
```python
|
|
def hierarchical(X, group_idx, y=None, n_groups=10):
|
|
mu_a = numpyro.sample("mu_a", dist.Normal(0., 5.))
|
|
sigma_a = numpyro.sample("sigma_a", dist.HalfNormal(1.))
|
|
a = numpyro.sample("a", dist.Normal(mu_a, sigma_a).expand([n_groups]))
|
|
beta = numpyro.sample("beta", dist.Normal(0., 1.))
|
|
sigma = numpyro.sample("sigma", dist.HalfNormal(1.))
|
|
mu = a[group_idx] + beta * X
|
|
numpyro.sample("obs", dist.Normal(mu, sigma), obs=y)
|
|
```
|
|
|
|
### Variational inference (SVI)
|
|
```python
|
|
from numpyro.infer import SVI, Trace_ELBO
|
|
from numpyro.infer.autoguide import AutoNormal
|
|
|
|
guide = AutoNormal(model)
|
|
svi = SVI(model, guide, numpyro.optim.Adam(1e-3), Trace_ELBO())
|
|
state = svi.init(random.PRNGKey(0), X, y)
|
|
|
|
for i in range(2000):
|
|
state, loss = svi.update(state, X, y)
|
|
params = svi.get_params(state)
|
|
```
|
|
|
|
### Conjugate update (Beta-Bernoulli online)
|
|
```python
|
|
class BetaBernoulli:
|
|
def __init__(self, alpha=1, beta=1):
|
|
self.alpha, self.beta = alpha, beta
|
|
|
|
def update(self, success: bool):
|
|
self.alpha += int(success)
|
|
self.beta += int(not success)
|
|
|
|
def mean(self): return self.alpha / (self.alpha + self.beta)
|
|
def credible_interval(self, q=0.95):
|
|
from scipy.stats import beta
|
|
return beta.interval(q, self.alpha, self.beta)
|
|
```
|
|
|
|
### Bayesian neural net (Pyro Bayesian layer)
|
|
```python
|
|
import torch
|
|
import pyro
|
|
import pyro.nn as pnn
|
|
|
|
class BNN(pnn.PyroModule):
|
|
def __init__(self, in_d, out_d):
|
|
super().__init__()
|
|
self.linear = pnn.PyroModule[torch.nn.Linear](in_d, out_d)
|
|
self.linear.weight = pnn.PyroSample(dist.Normal(0., 1.).expand([out_d, in_d]).to_event(2))
|
|
self.linear.bias = pnn.PyroSample(dist.Normal(0., 1.).expand([out_d]).to_event(1))
|
|
|
|
def forward(self, x, y=None):
|
|
mean = self.linear(x).squeeze(-1)
|
|
with pyro.plate("data", x.shape[0]):
|
|
pyro.sample("obs", dist.Normal(mean, 0.1), obs=y)
|
|
return mean
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Method |
|
|
|---|---|
|
|
| Conjugate model · streaming | Conjugate update |
|
|
| 매 small model · accurate posterior | NUTS/HMC |
|
|
| Large data · fast approx | SVI · ADVI |
|
|
| State-space · time-series | Particle filter |
|
|
| Deep model · scale | BNN + variational · MC dropout |
|
|
| 매 hyperparameter optimization | Gaussian process + acquisition |
|
|
|
|
**기본값**: NumPyro + NUTS — 매 GPU/JAX 의 fast.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Probability Theory]] · [[Statistical Inference]]
|
|
- 변형: [[MCMC]] · [[Variational Inference]]
|
|
- 응용: [[Belief-System]]
|
|
- Adjacent: [[Causal Inference]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: model 의 prior · likelihood 의 spec 의 draft, posterior plot 의 interpret, NumPyro/PyMC code 의 generate.
|
|
**언제 X**: 매 convergence diagnostic (R-hat · ESS · trace plot) — LLM 의 statistical judgment 의 unreliable, statistician 의 review 의 require.
|
|
|
|
## ❌ 안티패턴
|
|
- **Flat prior 의 always**: weak data + flat prior → unstable posterior. Weakly informative prior 의 use.
|
|
- **No convergence check**: R-hat > 1.01 · ESS < 400 → posterior 의 invalid.
|
|
- **Single chain MCMC**: multi-chain 의 mix check 의 mandatory.
|
|
- **Posterior point-estimate 의 only**: 매 distribution 의 entirety 의 use — credible interval · posterior predictive.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Gelman *Bayesian Data Analysis* 3rd, NumPyro/PyMC/Stan docs, McElreath *Statistical Rethinking* 2nd).
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — Bayes rule + 4 methods + NumPyro/PyMC patterns |
|