--- id: wiki-2026-0508-bayesian-statistics title: Bayesian Statistics category: 10_Wiki/Topics status: verified canonical_id: self aliases: [베이지안 통계, Bayes' theorem, posterior, prior, MCMC, variational inference, PyMC, Stan] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [bayesian, statistics, mcmc, variational-inference, pymc, stan, probabilistic-programming, uncertainty] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: PyMC / Stan / NumPyro / Pyro --- # Bayesian Statistics ## 📌 한 줄 통찰 > **"매 probability = 매 belief 의 degree"**. 매 frequency X — 매 prior + data → posterior 의 update. 매 small data + prior knowledge 의 strong. 매 result = 매 distribution (not point). 매 modern compute (MCMC / VI) 의 mainstream. ## 📖 핵심 ### Bayes' theorem $$P(\theta | D) = \frac{P(D | \theta) \cdot P(\theta)}{P(D)}$$ - **P(θ)**: prior — 매 belief. - **P(D | θ)**: likelihood — 매 data 의 model. - **P(θ | D)**: posterior — 매 update 된 belief. - **P(D)**: evidence (normalizer). ### vs Frequentist | 측면 | Frequentist | Bayesian | |---|---|---| | Probability | 매 long-run frequency | 매 belief degree | | Parameter | 매 fixed unknown | 매 random variable | | Result | 매 point + CI | 매 posterior distribution | | Small data | 매 fragile | 매 prior 의 robust | | Compute | 매 cheap | 매 expensive (until MCMC) | | Interpretation | "95% of intervals contain θ" | "P(θ ∈ [a,b]) = 0.95" | ### 매 conjugate prior (analytical) | Likelihood | Prior | Posterior | |---|---|---| | Binomial | Beta | Beta | | Poisson | Gamma | Gamma | | Normal (known σ) | Normal | Normal | | Normal (unknown μ,σ) | Normal-Gamma | Normal-Gamma | | Multinomial | Dirichlet | Dirichlet | → 매 closed-form 가, 매 limited. ### 매 inference (modern) #### MCMC (Markov Chain Monte Carlo) - **Metropolis-Hastings**: 매 random walk + accept/reject. - **Hamiltonian MC (HMC)**: 매 gradient 활용. - **NUTS** (No-U-Turn): 매 HMC 의 auto-tune. - ✅ 매 정확. ❌ 매 slow. #### Variational Inference (VI) - 매 posterior 의 approximate distribution q(θ) 의 fit. - 매 KL divergence 의 minimize. - ✅ 매 fast + scale. ❌ 매 approximate. #### Sequential Monte Carlo - 매 particle filter. - 매 streaming OK. ### 매 응용 1. **A/B testing**: 매 frequentist 보다 매 interpretable. 2. **Hyperparameter tuning** (Bayesian Optimization): 매 GP + acquisition. 3. **Hierarchical models**: 매 group-level prior. 4. **Time series** (state-space): 매 Kalman, 매 particle filter. 5. **Causal inference** (Bayesian network): 매 DAG. 6. **Drug discovery / clinical**: 매 small N + strong prior. 7. **Robotics** (SLAM): 매 pose + map 의 joint. 8. **Topic modeling** (LDA): 매 Dirichlet prior. ### 매 modern stack - **Stan**: 매 NUTS, 매 mature. - **PyMC** (3 → 4 → 5): 매 Python + Aesara. - **NumPyro**: 매 JAX-based, 매 fast. - **Pyro**: 매 PyTorch + VI. - **TFP**: 매 TensorFlow Probability. - **Edward2 / blackjax**: 매 modular. ## 💻 패턴 ### Coin flip (PyMC) ```python import pymc as pm import numpy as np # 매 data: 매 8 head, 매 2 tail data = np.array([1]*8 + [0]*2) with pm.Model() as model: p = pm.Beta('p', alpha=2, beta=2) # 매 prior obs = pm.Bernoulli('obs', p=p, observed=data) trace = pm.sample(2000, return_inferencedata=True) # 매 posterior import arviz as az az.plot_posterior(trace) print(az.summary(trace)) # p mean ≈ 0.71, hdi_3% ≈ 0.50, hdi_97% ≈ 0.89 ``` ### Hierarchical (group-level) ```python with pm.Model() as h: # 매 hyperprior mu = pm.Normal('mu', 0, 10) sigma = pm.HalfNormal('sigma', 5) # 매 group-level theta = pm.Normal('theta', mu, sigma, shape=n_groups) # 매 likelihood y = pm.Normal('y', theta[group_idx], 1, observed=data) trace = pm.sample(2000) ``` → 매 partial pooling — 매 group 의 small N 의 borrow strength. ### Bayesian A/B test ```python with pm.Model() as ab: p_a = pm.Beta('p_a', 1, 1) p_b = pm.Beta('p_b', 1, 1) obs_a = pm.Binomial('obs_a', n=n_a, p=p_a, observed=conv_a) obs_b = pm.Binomial('obs_b', n=n_b, p=p_b, observed=conv_b) diff = pm.Deterministic('diff', p_b - p_a) trace = pm.sample(2000) # 매 P(B > A) prob_b_better = (trace.posterior['diff'] > 0).mean().item() print(f'P(B > A) = {prob_b_better:.3f}') ``` → 매 frequentist 보다 매 actionable. ### Variational inference (faster) ```python import numpyro import numpyro.distributions as dist from numpyro.infer import SVI, Trace_ELBO from numpyro.infer.autoguide import AutoNormal def model(data): p = numpyro.sample('p', dist.Beta(2, 2)) numpyro.sample('obs', dist.Bernoulli(p), obs=data) guide = AutoNormal(model) svi = SVI(model, guide, optim.Adam(0.01), Trace_ELBO()) state = svi.init(jax.random.PRNGKey(0), data) for step in range(2000): state, loss = svi.update(state, data) ``` ### Bayesian Optimization (hyperparameter) ```python from skopt import gp_minimize from skopt.space import Real, Integer def objective(params): lr, depth = params return train_and_eval(lr, depth) # 매 minimize result = gp_minimize( objective, [Real(1e-5, 1e-1, prior='log-uniform', name='lr'), Integer(1, 10, name='depth')], n_calls=50, ) ``` ### Posterior predictive check ```python with model: ppc = pm.sample_posterior_predictive(trace) # 매 simulated data 의 actual 의 비교 — 매 model fit 의 visual. az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model)) ``` ## 🤔 결정 기준 | 상황 | Method | |---|---| | Small data + prior | Conjugate (analytical) | | Complex model + accuracy | NUTS (PyMC / Stan) | | Large data + speed | VI (Pyro / NumPyro) | | Streaming | Particle filter | | Hyperparameter tune | BO (skopt / Optuna) | | A/B test | Beta-Binomial + Bayes | | Topic modeling | LDA | | Causal | Bayesian network | **기본값**: PyMC + NUTS 의 baseline. 매 scale 가 NumPyro / VI. ## 🔗 Graph - 부모: [[Statistics]] · [[Probability Theory]] - 변형: [[MCMC]] · [[Variational-Inference]] · [[Bayesian-Network]] - 응용: [[Bayesian-Optimization]] · [[LDA]] · [[SLAM]] - Tool: [[PyMC]] · [[Stan]] - Adjacent: [[Bayes-Theorem]] · [[Bayesian-Updating]] ## 🤖 LLM 활용 **언제**: 매 small data + prior. 매 uncertainty quantify. 매 hierarchical structure. 매 hyperparameter tune. **언제 X**: 매 large data + speed > accuracy. 매 simple frequentist 의 OK. ## ❌ 안티패턴 - **Improper prior**: 매 posterior 의 invalid. - **No PPC**: 매 fit 의 모름. - **MCMC 의 chains 1**: 매 convergence 의 detect X. - **Burn-in 무시**: 매 biased estimate. - **Conjugate 의 force**: 매 wrong likelihood. - **VI 의 over-confident** (mean-field): 매 underestimate uncertainty. - **R-hat ignore**: 매 non-convergence. ## 🧪 검증 / 중복 - Verified (Gelman BDA, McElreath Statistical Rethinking, Stan/PyMC docs). - 신뢰도 A. - Related: [[Bayes-Theorem]] · [[MCMC]] · [[Bayesian-Optimization]] · [[Variational-Inference]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Bayes formula + MCMC / VI + 매 PyMC / NumPyro / skopt code |