"매 probability = 매 belief 의 degree". 매 frequency X — 매 prior + data → posterior 의 update. 매 small data + prior knowledge 의 strong. 매 result = 매 distribution (not point). 매 modern compute (MCMC / VI) 의 mainstream.
Metropolis-Hastings: 매 random walk + accept/reject.
Hamiltonian MC (HMC): 매 gradient 활용.
NUTS (No-U-Turn): 매 HMC 의 auto-tune.
✅ 매 정확. ❌ 매 slow.
Variational Inference (VI)
매 posterior 의 approximate distribution q(θ) 의 fit.
매 KL divergence 의 minimize.
✅ 매 fast + scale. ❌ 매 approximate.
Sequential Monte Carlo
매 particle filter.
매 streaming OK.
매 응용
A/B testing: 매 frequentist 보다 매 interpretable.
Hyperparameter tuning (Bayesian Optimization): 매 GP + acquisition.
Hierarchical models: 매 group-level prior.
Time series (state-space): 매 Kalman, 매 particle filter.
Causal inference (Bayesian network): 매 DAG.
Drug discovery / clinical: 매 small N + strong prior.
Robotics (SLAM): 매 pose + map 의 joint.
Topic modeling (LDA): 매 Dirichlet prior.
매 modern stack
Stan: 매 NUTS, 매 mature.
PyMC (3 → 4 → 5): 매 Python + Aesara.
NumPyro: 매 JAX-based, 매 fast.
Pyro: 매 PyTorch + VI.
TFP: 매 TensorFlow Probability.
Edward2 / blackjax: 매 modular.
💻 패턴
Coin flip (PyMC)
importpymcaspmimportnumpyasnp# 매 data: 매 8 head, 매 2 taildata=np.array([1]*8+[0]*2)withpm.Model()asmodel:p=pm.Beta('p',alpha=2,beta=2)# 매 priorobs=pm.Bernoulli('obs',p=p,observed=data)trace=pm.sample(2000,return_inferencedata=True)# 매 posteriorimportarvizasazaz.plot_posterior(trace)print(az.summary(trace))# p mean ≈ 0.71, hdi_3% ≈ 0.50, hdi_97% ≈ 0.89
Hierarchical (group-level)
withpm.Model()ash:# 매 hyperpriormu=pm.Normal('mu',0,10)sigma=pm.HalfNormal('sigma',5)# 매 group-leveltheta=pm.Normal('theta',mu,sigma,shape=n_groups)# 매 likelihoody=pm.Normal('y',theta[group_idx],1,observed=data)trace=pm.sample(2000)
→ 매 partial pooling — 매 group 의 small N 의 borrow strength.
Bayesian A/B test
withpm.Model()asab:p_a=pm.Beta('p_a',1,1)p_b=pm.Beta('p_b',1,1)obs_a=pm.Binomial('obs_a',n=n_a,p=p_a,observed=conv_a)obs_b=pm.Binomial('obs_b',n=n_b,p=p_b,observed=conv_b)diff=pm.Deterministic('diff',p_b-p_a)trace=pm.sample(2000)# 매 P(B > A)prob_b_better=(trace.posterior['diff']>0).mean().item()print(f'P(B > A) = {prob_b_better:.3f}')
fromskoptimportgp_minimizefromskopt.spaceimportReal,Integerdefobjective(params):lr,depth=paramsreturntrain_and_eval(lr,depth)# 매 minimizeresult=gp_minimize(objective,[Real(1e-5,1e-1,prior='log-uniform',name='lr'),Integer(1,10,name='depth')],n_calls=50,)
Posterior predictive check
withmodel:ppc=pm.sample_posterior_predictive(trace)# 매 simulated data 의 actual 의 비교 — 매 model fit 의 visual.az.plot_ppc(az.from_pymc3(posterior_predictive=ppc,model=model))
🤔 결정 기준
상황
Method
Small data + prior
Conjugate (analytical)
Complex model + accuracy
NUTS (PyMC / Stan)
Large data + speed
VI (Pyro / NumPyro)
Streaming
Particle filter
Hyperparameter tune
BO (skopt / Optuna)
A/B test
Beta-Binomial + Bayes
Topic modeling
LDA
Causal
Bayesian network
기본값: PyMC + NUTS 의 baseline. 매 scale 가 NumPyro / VI.
언제: 매 small data + prior. 매 uncertainty quantify. 매 hierarchical structure. 매 hyperparameter tune.
언제 X: 매 large data + speed > accuracy. 매 simple frequentist 의 OK.
❌ 안티패턴
Improper prior: 매 posterior 의 invalid.
No PPC: 매 fit 의 모름.
MCMC 의 chains 1: 매 convergence 의 detect X.
Burn-in 무시: 매 biased estimate.
Conjugate 의 force: 매 wrong likelihood.
VI 의 over-confident (mean-field): 매 underestimate uncertainty.