Files
2026-05-10 22:08:15 +09:00

168 lines
6.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-bayes-theorem
title: Bayes' Theorem
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [Bayes Rule, Bayes Law, Conditional Probability Inversion]
duplicate_of: none
source_trust_level: A
confidence_score: 0.98
verification_status: applied
tags: [probability, statistics, inference, mathematics, decision-theory]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: SciPy / NumPy
---
# Bayes' Theorem
## 매 한 줄
> **"매 P(A|B) = P(B|A) × P(A) / P(B) — conditional probability 의 inversion 의 통한 evidence-based belief revision 의 mathematical foundation"**. Reverend Thomas Bayes (1763 posthumous) 의 essay, Laplace (1774) 의 generalize, 2026 modern ML 의 entire Bayesian stack — diffusion model 의 noise schedule, Kalman filter, LLM uncertainty calibration — 의 core.
## 매 핵심
### 매 공식 the form
- **Standard**: `P(A|B) = P(B|A) × P(A) / P(B)`
- **Odds form**: `O(A|B) = O(A) × LR` where `LR = P(B|A)/P(B|¬A)`
- **Discrete partition**: `P(H_i|E) = P(E|H_i)P(H_i) / Σⱼ P(E|H_j)P(H_j)`
- **Continuous**: `p(θ|D) = p(D|θ)p(θ) / ∫p(D|θ)p(θ)dθ`
### 매 terminology
- **Prior** P(A): pre-evidence belief
- **Likelihood** P(B|A): evidence-given-hypothesis
- **Posterior** P(A|B): post-evidence belief
- **Evidence / Marginal** P(B): normalizing constant
### 매 응용
1. Medical testing — base-rate-aware diagnosis (mammography paradox).
2. Spam filtering — Naive Bayes classifier.
3. Search & rescue — posterior heatmap update from sensor sweep.
4. LLM 의 token sampling — temperature-scaled posterior over vocabulary.
## 💻 패턴
### Medical test (base rate problem)
```python
def bayes_diagnosis(prevalence: float, sensitivity: float, specificity: float) -> dict:
"""Disease prevalence 1%, test 99% sensitive + 95% specific.
Positive test => actual disease probability?"""
p_disease = prevalence
p_pos_given_disease = sensitivity
p_pos_given_healthy = 1 - specificity
p_pos = p_pos_given_disease * p_disease + p_pos_given_healthy * (1 - p_disease)
p_disease_given_pos = (p_pos_given_disease * p_disease) / p_pos
return {
"P(disease | +test)": p_disease_given_pos,
"P(healthy | +test)": 1 - p_disease_given_pos,
}
print(bayes_diagnosis(0.01, 0.99, 0.95)) # ~16.6% — counter-intuitive
```
### Naive Bayes spam (log-space)
```python
import numpy as np
from collections import Counter
class NaiveBayesSpam:
def __init__(self, alpha=1.0):
self.alpha = alpha # Laplace smoothing
def fit(self, docs, labels):
self.classes = np.unique(labels)
self.log_prior = {c: np.log((labels == c).mean()) for c in self.classes}
self.vocab = set(w for d in docs for w in d.split())
V = len(self.vocab)
self.log_lik = {}
for c in self.classes:
words = Counter(w for d, l in zip(docs, labels) if l == c for w in d.split())
total = sum(words.values()) + self.alpha * V
self.log_lik[c] = {w: np.log((words.get(w, 0) + self.alpha) / total)
for w in self.vocab}
return self
def predict(self, doc):
scores = {c: self.log_prior[c] + sum(self.log_lik[c].get(w, 0)
for w in doc.split())
for c in self.classes}
return max(scores, key=scores.get)
```
### Bayesian A/B (closed-form Beta-Binomial)
```python
from scipy import stats
def prob_b_beats_a(a_clicks, a_imp, b_clicks, b_imp, n_samples=100_000):
a = stats.beta(1 + a_clicks, 1 + a_imp - a_clicks).rvs(n_samples)
b = stats.beta(1 + b_clicks, 1 + b_imp - b_clicks).rvs(n_samples)
return (b > a).mean()
print(f"P(B>A) = {prob_b_beats_a(73, 1000, 91, 1010):.3f}")
```
### Odds-form rapid update
```python
def odds_update(prior_odds: float, likelihood_ratio: float) -> float:
"""Posterior odds = prior odds × LR. Mental-arithmetic friendly."""
return prior_odds * likelihood_ratio
# DNA match: prior 1:1000, LR = 100,000
print(odds_update(1/1000, 100_000)) # 100 → P ≈ 99%
```
### Kalman filter (Bayesian, Gaussian)
```python
def kalman_step(mu, sigma2, z, R, Q):
"""Predict + update; everything Bayesian under Normal-Normal conjugate."""
# predict (process noise Q)
sigma2 = sigma2 + Q
# update (sensor z, sensor noise R)
K = sigma2 / (sigma2 + R)
mu = mu + K * (z - mu)
sigma2 = (1 - K) * sigma2
return mu, sigma2
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Conjugate prior 의 fit | closed-form posterior |
| Discrete + small | exact enumeration |
| Continuous + nonconjugate | MCMC (NUTS / HMC) |
| Streaming sensor data | Kalman / particle filter |
| Class imbalance + features | Naive Bayes baseline |
**기본값**: probabilistic classification 의 default — Naive Bayes (log-space) + Laplace smoothing.
## 🔗 Graph
- 부모: [[Statistical-Analysis]]
- 변형: [[Bayesian-Updating]] · [[Belief-Revision]]
- 응용: [[Item-Item-Collaborative-Filtering]] · [[몬테카를로 시뮬레이션]]
- Adjacent: [[Inference-Coupled Persistence]] · [[Multi-agent-System]]
## 🤖 LLM 활용
**언제**: probabilistic reasoning 의 explanation, base-rate-aware decision, evidence weighting.
**언제 X**: deterministic logic 의 sufficient 인 경우 — overhead 의 X.
## ❌ 안티패턴
- **Base-rate neglect**: P(B|A) 의 confuse with P(A|B) — prosecutor's fallacy.
- **Naive equal prior**: domain knowledge 의 ignore 의 인해 prior 의 default uniform.
- **Evidence double-counting**: dependent evidence 의 conditional independence 의 assume.
- **Improper normalization**: continuous case 의 evidence integral 의 omit.
## 🧪 검증 / 중복
- Verified (Jaynes *Probability Theory: The Logic of Science*, Pearl *Causality* 2nd).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full Bayes' theorem with medical, NB, A/B, Kalman patterns |