Files

T

Antigravity Agent 504fd5fb42 [G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00

7.3 KiB

Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack

title

Bayesian Statistics

📌 한 줄 통찰

"매 probability = 매 belief 의 degree". 매 frequency X — 매 prior + data → posterior 의 update. 매 small data + prior knowledge 의 strong. 매 result = 매 distribution (not point). 매 modern compute (MCMC / VI) 의 mainstream.

📖 핵심

Bayes' theorem

P(\theta | D) = \frac{P(D | \theta) \cdot P(\theta)}{P(D)}

P(θ): prior — 매 belief.
P(D | θ): likelihood — 매 data 의 model.
P(θ | D): posterior — 매 update 된 belief.
P(D): evidence (normalizer).

vs Frequentist

측면	Frequentist	Bayesian
Probability	매 long-run frequency	매 belief degree
Parameter	매 fixed unknown	매 random variable
Result	매 point + CI	매 posterior distribution
Small data	매 fragile	매 prior 의 robust
Compute	매 cheap	매 expensive (until MCMC)
Interpretation	"95% of intervals contain θ"	"P(θ ∈ [a,b]) = 0.95"

매 conjugate prior (analytical)

Likelihood	Prior	Posterior
Binomial	Beta	Beta
Poisson	Gamma	Gamma
Normal (known σ)	Normal	Normal
Normal (unknown μ,σ)	Normal-Gamma	Normal-Gamma
Multinomial	Dirichlet	Dirichlet

→ 매 closed-form 가, 매 limited.

매 inference (modern)

MCMC (Markov Chain Monte Carlo)

Metropolis-Hastings: 매 random walk + accept/reject.
Hamiltonian MC (HMC): 매 gradient 활용.
NUTS (No-U-Turn): 매 HMC 의 auto-tune.
✅ 매 정확. ❌ 매 slow.

Variational Inference (VI)

매 posterior 의 approximate distribution q(θ) 의 fit.
매 KL divergence 의 minimize.
✅ 매 fast + scale. ❌ 매 approximate.

Sequential Monte Carlo

매 particle filter.
매 streaming OK.

매 응용

A/B testing: 매 frequentist 보다 매 interpretable.
Hyperparameter tuning (Bayesian Optimization): 매 GP + acquisition.
Hierarchical models: 매 group-level prior.
Time series (state-space): 매 Kalman, 매 particle filter.
Causal inference (Bayesian network): 매 DAG.
Drug discovery / clinical: 매 small N + strong prior.
Robotics (SLAM): 매 pose + map 의 joint.
Topic modeling (LDA): 매 Dirichlet prior.

매 modern stack

Stan: 매 NUTS, 매 mature.
PyMC (3 → 4 → 5): 매 Python + Aesara.
NumPyro: 매 JAX-based, 매 fast.
Pyro: 매 PyTorch + VI.
TFP: 매 TensorFlow Probability.
Edward2 / blackjax: 매 modular.

💻 패턴

Coin flip (PyMC)

import pymc as pm
import numpy as np

# 매 data: 매 8 head, 매 2 tail
data = np.array([1]*8 + [0]*2)

with pm.Model() as model:
    p = pm.Beta('p', alpha=2, beta=2)  # 매 prior
    obs = pm.Bernoulli('obs', p=p, observed=data)
    trace = pm.sample(2000, return_inferencedata=True)

# 매 posterior
import arviz as az
az.plot_posterior(trace)
print(az.summary(trace))
# p mean ≈ 0.71, hdi_3% ≈ 0.50, hdi_97% ≈ 0.89

Hierarchical (group-level)

with pm.Model() as h:
    # 매 hyperprior
    mu = pm.Normal('mu', 0, 10)
    sigma = pm.HalfNormal('sigma', 5)
    
    # 매 group-level
    theta = pm.Normal('theta', mu, sigma, shape=n_groups)
    
    # 매 likelihood
    y = pm.Normal('y', theta[group_idx], 1, observed=data)
    
    trace = pm.sample(2000)

→ 매 partial pooling — 매 group 의 small N 의 borrow strength.

Bayesian A/B test

with pm.Model() as ab:
    p_a = pm.Beta('p_a', 1, 1)
    p_b = pm.Beta('p_b', 1, 1)
    
    obs_a = pm.Binomial('obs_a', n=n_a, p=p_a, observed=conv_a)
    obs_b = pm.Binomial('obs_b', n=n_b, p=p_b, observed=conv_b)
    
    diff = pm.Deterministic('diff', p_b - p_a)
    
    trace = pm.sample(2000)

# 매 P(B > A)
prob_b_better = (trace.posterior['diff'] > 0).mean().item()
print(f'P(B > A) = {prob_b_better:.3f}')

→ 매 frequentist 보다 매 actionable.

Variational inference (faster)

import numpyro
import numpyro.distributions as dist
from numpyro.infer import SVI, Trace_ELBO
from numpyro.infer.autoguide import AutoNormal

def model(data):
    p = numpyro.sample('p', dist.Beta(2, 2))
    numpyro.sample('obs', dist.Bernoulli(p), obs=data)

guide = AutoNormal(model)
svi = SVI(model, guide, optim.Adam(0.01), Trace_ELBO())
state = svi.init(jax.random.PRNGKey(0), data)
for step in range(2000):
    state, loss = svi.update(state, data)

Bayesian Optimization (hyperparameter)

from skopt import gp_minimize
from skopt.space import Real, Integer

def objective(params):
    lr, depth = params
    return train_and_eval(lr, depth)  # 매 minimize

result = gp_minimize(
    objective,
    [Real(1e-5, 1e-1, prior='log-uniform', name='lr'),
     Integer(1, 10, name='depth')],
    n_calls=50,
)

Posterior predictive check

with model:
    ppc = pm.sample_posterior_predictive(trace)

# 매 simulated data 의 actual 의 비교 — 매 model fit 의 visual.
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model))

🤔 결정 기준

상황	Method
Small data + prior	Conjugate (analytical)
Complex model + accuracy	NUTS (PyMC / Stan)
Large data + speed	VI (Pyro / NumPyro)
Streaming	Particle filter
Hyperparameter tune	BO (skopt / Optuna)
A/B test	Beta-Binomial + Bayes
Topic modeling	LDA
Causal	Bayesian network

기본값: PyMC + NUTS 의 baseline. 매 scale 가 NumPyro / VI.

🔗 Graph

부모: Statistics · Probability-Theory
변형: MCMC · Variational-Inference · Bayesian-Network · Hierarchical-Model
응용: A-B-Testing · Bayesian-Optimization · Particle-Filter · LDA · SLAM
Tool: PyMC · Stan · NumPyro · Pyro
Adjacent: Bayes-Theorem · Bayesian-Updating · Conjugate-Prior · Frequentist

🤖 LLM 활용

언제: 매 small data + prior. 매 uncertainty quantify. 매 hierarchical structure. 매 hyperparameter tune. 언제 X: 매 large data + speed > accuracy. 매 simple frequentist 의 OK.

❌ 안티패턴

Improper prior: 매 posterior 의 invalid.
No PPC: 매 fit 의 모름.
MCMC 의 chains 1: 매 convergence 의 detect X.
Burn-in 무시: 매 biased estimate.
Conjugate 의 force: 매 wrong likelihood.
VI 의 over-confident (mean-field): 매 underestimate uncertainty.
R-hat ignore: 매 non-convergence.

🧪 검증 / 중복

Verified (Gelman BDA, McElreath Statistical Rethinking, Stan/PyMC docs).
신뢰도 A.
Related: Bayes-Theorem · MCMC · Bayesian-Optimization · Variational-Inference.

🕓 Changelog

날짜	변경
2026-05-08	Phase 1
2026-05-10	Manual cleanup — Bayes formula + MCMC / VI + 매 PyMC / NumPyro / skopt code

7.3 KiB Raw Blame History Unescape Escape