--- id: wiki-2026-0508-bayes-theorem title: Bayes' Theorem category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Bayes Rule, Bayes Law, Conditional Probability Inversion] duplicate_of: none source_trust_level: A confidence_score: 0.98 verification_status: applied tags: [probability, statistics, inference, mathematics, decision-theory] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: SciPy / NumPy --- # Bayes' Theorem ## 매 한 줄 > **"매 P(A|B) = P(B|A) × P(A) / P(B) — conditional probability 의 inversion 의 통한 evidence-based belief revision 의 mathematical foundation"**. Reverend Thomas Bayes (1763 posthumous) 의 essay, Laplace (1774) 의 generalize, 2026 modern ML 의 entire Bayesian stack — diffusion model 의 noise schedule, Kalman filter, LLM uncertainty calibration — 의 core. ## 매 핵심 ### 매 공식 the form - **Standard**: `P(A|B) = P(B|A) × P(A) / P(B)` - **Odds form**: `O(A|B) = O(A) × LR` where `LR = P(B|A)/P(B|¬A)` - **Discrete partition**: `P(H_i|E) = P(E|H_i)P(H_i) / Σⱼ P(E|H_j)P(H_j)` - **Continuous**: `p(θ|D) = p(D|θ)p(θ) / ∫p(D|θ)p(θ)dθ` ### 매 terminology - **Prior** P(A): pre-evidence belief - **Likelihood** P(B|A): evidence-given-hypothesis - **Posterior** P(A|B): post-evidence belief - **Evidence / Marginal** P(B): normalizing constant ### 매 응용 1. Medical testing — base-rate-aware diagnosis (mammography paradox). 2. Spam filtering — Naive Bayes classifier. 3. Search & rescue — posterior heatmap update from sensor sweep. 4. LLM 의 token sampling — temperature-scaled posterior over vocabulary. ## 💻 패턴 ### Medical test (base rate problem) ```python def bayes_diagnosis(prevalence: float, sensitivity: float, specificity: float) -> dict: """Disease prevalence 1%, test 99% sensitive + 95% specific. Positive test => actual disease probability?""" p_disease = prevalence p_pos_given_disease = sensitivity p_pos_given_healthy = 1 - specificity p_pos = p_pos_given_disease * p_disease + p_pos_given_healthy * (1 - p_disease) p_disease_given_pos = (p_pos_given_disease * p_disease) / p_pos return { "P(disease | +test)": p_disease_given_pos, "P(healthy | +test)": 1 - p_disease_given_pos, } print(bayes_diagnosis(0.01, 0.99, 0.95)) # ~16.6% — counter-intuitive ``` ### Naive Bayes spam (log-space) ```python import numpy as np from collections import Counter class NaiveBayesSpam: def __init__(self, alpha=1.0): self.alpha = alpha # Laplace smoothing def fit(self, docs, labels): self.classes = np.unique(labels) self.log_prior = {c: np.log((labels == c).mean()) for c in self.classes} self.vocab = set(w for d in docs for w in d.split()) V = len(self.vocab) self.log_lik = {} for c in self.classes: words = Counter(w for d, l in zip(docs, labels) if l == c for w in d.split()) total = sum(words.values()) + self.alpha * V self.log_lik[c] = {w: np.log((words.get(w, 0) + self.alpha) / total) for w in self.vocab} return self def predict(self, doc): scores = {c: self.log_prior[c] + sum(self.log_lik[c].get(w, 0) for w in doc.split()) for c in self.classes} return max(scores, key=scores.get) ``` ### Bayesian A/B (closed-form Beta-Binomial) ```python from scipy import stats def prob_b_beats_a(a_clicks, a_imp, b_clicks, b_imp, n_samples=100_000): a = stats.beta(1 + a_clicks, 1 + a_imp - a_clicks).rvs(n_samples) b = stats.beta(1 + b_clicks, 1 + b_imp - b_clicks).rvs(n_samples) return (b > a).mean() print(f"P(B>A) = {prob_b_beats_a(73, 1000, 91, 1010):.3f}") ``` ### Odds-form rapid update ```python def odds_update(prior_odds: float, likelihood_ratio: float) -> float: """Posterior odds = prior odds × LR. Mental-arithmetic friendly.""" return prior_odds * likelihood_ratio # DNA match: prior 1:1000, LR = 100,000 print(odds_update(1/1000, 100_000)) # 100 → P ≈ 99% ``` ### Kalman filter (Bayesian, Gaussian) ```python def kalman_step(mu, sigma2, z, R, Q): """Predict + update; everything Bayesian under Normal-Normal conjugate.""" # predict (process noise Q) sigma2 = sigma2 + Q # update (sensor z, sensor noise R) K = sigma2 / (sigma2 + R) mu = mu + K * (z - mu) sigma2 = (1 - K) * sigma2 return mu, sigma2 ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Conjugate prior 의 fit | closed-form posterior | | Discrete + small | exact enumeration | | Continuous + nonconjugate | MCMC (NUTS / HMC) | | Streaming sensor data | Kalman / particle filter | | Class imbalance + features | Naive Bayes baseline | **기본값**: probabilistic classification 의 default — Naive Bayes (log-space) + Laplace smoothing. ## 🔗 Graph - 부모: [[Statistical-Analysis]] - 변형: [[Bayesian-Updating]] · [[Belief-Revision]] - 응용: [[Item-Item-Collaborative-Filtering]] · [[몬테카를로 시뮬레이션]] - Adjacent: [[Inference-Coupled Persistence]] · [[Multi-agent-System]] ## 🤖 LLM 활용 **언제**: probabilistic reasoning 의 explanation, base-rate-aware decision, evidence weighting. **언제 X**: deterministic logic 의 sufficient 인 경우 — overhead 의 X. ## ❌ 안티패턴 - **Base-rate neglect**: P(B|A) 의 confuse with P(A|B) — prosecutor's fallacy. - **Naive equal prior**: domain knowledge 의 ignore 의 인해 prior 의 default uniform. - **Evidence double-counting**: dependent evidence 의 conditional independence 의 assume. - **Improper normalization**: continuous case 의 evidence integral 의 omit. ## 🧪 검증 / 중복 - Verified (Jaynes *Probability Theory: The Logic of Science*, Pearl *Causality* 2nd). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — full Bayes' theorem with medical, NB, A/B, Kalman patterns |