--- id: wiki-2026-0508-expectation-maximization title: Expectation Maximization category: 10_Wiki/Topics status: verified canonical_id: self aliases: [EM Algorithm, Expectation-Maximization, GMM-EM, Baum-Welch] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [statistics, machine-learning, latent-variables, optimization, probabilistic-models] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: scikit-learn / NumPy / PyTorch --- # Expectation Maximization ## 매 한 줄 > **"매 latent variable 가진 model 의 maximum likelihood 의 iterative 추정 — E-step (posterior) ↔ M-step (parameter update) 교차"**. Dempster-Laird-Rubin 1977 의 unification — 매 GMM, HMM (Baum-Welch), LDA, factor analysis, missing data imputation 의 모두 instances. 매 modern variational autoencoder 의 amortized EM. ## 매 핵심 ### 매 Algorithm Goal: maximize log p(X|θ) where X observed, Z latent. - **E-step**: 매 posterior q(Z) = p(Z|X, θ_old). - **M-step**: θ_new = argmax_θ E_q[log p(X, Z | θ)]. - **Repeat**: until convergence (likelihood plateau). ### 매 ELBO interpretation log p(X|θ) ≥ E_q[log p(X,Z|θ)] - E_q[log q(Z)] = ELBO(q, θ). - E-step: 매 maximize ELBO over q (equiv. KL(q||p(Z|X,θ))=0 — 매 exact). - M-step: 매 maximize ELBO over θ. - 매 monotonic increase of log-likelihood guaranteed. ### 매 Convergence - 매 local optimum 으로만 converge (matter 의 likelihood 의 multimodal). - 매 multiple random init 권장. - 매 K-means 의 EM 의 hard-assignment limit (Gaussian variance → 0). ### 매 응용 1. **Gaussian Mixture Models**: 매 clustering with soft assignments. 2. **Hidden Markov Models** (Baum-Welch): 매 speech recognition, NLP, bioinformatics. 3. **Latent Dirichlet Allocation** (variational EM): topic modeling. 4. **Factor analysis / PPCA**: 매 dimensionality reduction. 5. **Missing data imputation**: 매 MICE. 6. **VAE training** (amortized EM): 매 modern deep generative. ## 💻 패턴 ### GMM-EM (매 from scratch, NumPy) ```python import numpy as np class GaussianMixtureEM: def __init__(self, K, max_iter=100, tol=1e-6): self.K = K self.max_iter = max_iter self.tol = tol def fit(self, X): n, d = X.shape # 매 init: random + uniform priors self.pi = np.ones(self.K) / self.K idx = np.random.choice(n, self.K, replace=False) self.mu = X[idx] self.sigma = np.array([np.cov(X.T) for _ in range(self.K)]) log_lik_old = -np.inf for it in range(self.max_iter): # E-step: posterior responsibilities γ_ik log_resp = self._log_responsibilities(X) # (n, K) resp = np.exp(log_resp - log_resp.max(axis=1, keepdims=True)) resp /= resp.sum(axis=1, keepdims=True) # M-step Nk = resp.sum(axis=0) # (K,) self.pi = Nk / n self.mu = (resp.T @ X) / Nk[:, None] for k in range(self.K): diff = X - self.mu[k] self.sigma[k] = (resp[:, k:k+1] * diff).T @ diff / Nk[k] self.sigma[k] += 1e-6 * np.eye(d) # 매 regularization # convergence log_lik = self._log_likelihood(X) if abs(log_lik - log_lik_old) < self.tol: break log_lik_old = log_lik return self def _log_gaussian(self, X, mu, sigma): d = X.shape[1] diff = X - mu inv = np.linalg.inv(sigma) det = np.linalg.det(sigma) return -0.5 * (d * np.log(2 * np.pi) + np.log(det) + np.einsum('ni,ij,nj->n', diff, inv, diff)) def _log_responsibilities(self, X): log_resp = np.zeros((X.shape[0], self.K)) for k in range(self.K): log_resp[:, k] = np.log(self.pi[k] + 1e-12) + \ self._log_gaussian(X, self.mu[k], self.sigma[k]) return log_resp def _log_likelihood(self, X): log_resp = self._log_responsibilities(X) from scipy.special import logsumexp return logsumexp(log_resp, axis=1).sum() # Demo np.random.seed(42) X1 = np.random.randn(100, 2) + np.array([5, 0]) X2 = np.random.randn(100, 2) + np.array([-5, 0]) X = np.vstack([X1, X2]) model = GaussianMixtureEM(K=2).fit(X) print(f"Means:\n{model.mu}") print(f"Mixing:\n{model.pi}") ``` ### scikit-learn (매 production) ```python from sklearn.mixture import GaussianMixture gmm = GaussianMixture(n_components=3, covariance_type='full', max_iter=100, n_init=10, random_state=42) gmm.fit(X) print(f"Converged: {gmm.converged_}") print(f"BIC: {gmm.bic(X):.2f}") # 매 model selection labels = gmm.predict(X) proba = gmm.predict_proba(X) # 매 soft assignment ``` ### Baum-Welch (HMM, 매 EM 의 instance) ```python def baum_welch(observations, n_states, n_iter=100): """매 HMM 의 forward-backward + EM updates.""" T = len(observations) pi = np.ones(n_states) / n_states A = np.random.rand(n_states, n_states); A /= A.sum(axis=1, keepdims=True) B = np.random.rand(n_states, max(observations)+1); B /= B.sum(axis=1, keepdims=True) for it in range(n_iter): # E-step: forward α, backward β alpha = np.zeros((T, n_states)) alpha[0] = pi * B[:, observations[0]] for t in range(1, T): alpha[t] = (alpha[t-1] @ A) * B[:, observations[t]] beta = np.zeros((T, n_states)) beta[T-1] = 1 for t in range(T-2, -1, -1): beta[t] = A @ (B[:, observations[t+1]] * beta[t+1]) # γ_t(i), ξ_t(i,j) gamma = alpha * beta gamma /= gamma.sum(axis=1, keepdims=True) xi = np.zeros((T-1, n_states, n_states)) for t in range(T-1): num = alpha[t][:, None] * A * B[:, observations[t+1]] * beta[t+1] xi[t] = num / num.sum() # M-step pi = gamma[0] A = xi.sum(axis=0) / gamma[:-1].sum(axis=0)[:, None] for k in range(B.shape[1]): mask = (observations == k) B[:, k] = gamma[mask].sum(axis=0) / gamma.sum(axis=0) return pi, A, B ``` ### VAE — 매 amortized variational EM ```python import torch import torch.nn as nn class VAE(nn.Module): def __init__(self, input_dim, latent_dim): super().__init__() self.enc_mu = nn.Linear(input_dim, latent_dim) self.enc_logvar = nn.Linear(input_dim, latent_dim) self.dec = nn.Linear(latent_dim, input_dim) def forward(self, x): # 매 E-step approximation: q(z|x) = N(μ_φ(x), σ²_φ(x)) mu = self.enc_mu(x) logvar = self.enc_logvar(x) eps = torch.randn_like(mu) z = mu + torch.exp(0.5 * logvar) * eps x_recon = torch.sigmoid(self.dec(z)) return x_recon, mu, logvar def vae_loss(x, x_recon, mu, logvar): recon = nn.functional.binary_cross_entropy(x_recon, x, reduction='sum') kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) return recon + kl # 매 SGD 의 joint optimization 의 amortized E+M ``` ### MAP-EM (매 with prior, regularized) ```python # 매 prior 의 add 시 monotonic posterior 증가. # Example: GMM 에 Dirichlet prior on π, NIW on (μ, Σ). # 매 sklearn BayesianGaussianMixture 의 internal. from sklearn.mixture import BayesianGaussianMixture bgmm = BayesianGaussianMixture(n_components=10, weight_concentration_prior=1e-2) bgmm.fit(X) # 매 effective K 의 자동 sparsification. ``` ## 매 결정 기준 | 상황 | Variant | |---|---| | Standard mixture clustering | Vanilla EM (sklearn) | | Sequential / temporal | Baum-Welch (HMM) | | Topic modeling | Variational EM (LDA) | | Scalable / online | Online EM, stochastic | | Deep latent model | VAE (amortized) | | Need MAP / regularization | MAP-EM, Bayesian-EM | | Hard assignment baseline | K-means (EM degenerate) | | Discrete latent | Categorical EM | **기본값**: 매 GMM clustering 매 sklearn `GaussianMixture(n_init=10)`. 매 deep 매 VAE. ## 🔗 Graph - 응용: [[VAE]] - Adjacent: [[Variational-Inference]] · [[Maximum-A-Posteriori]] · [[K-Means-Clustering-Foundations]] · [[Baum-Welch]] ## 🤖 LLM 활용 **언제**: 매 derivation 의 walk-through, 매 ELBO 의 explain, 매 model selection (BIC) 의 advice, 매 troubleshooting (e.g., 매 singular covariance). **언제 X**: 매 large-scale fitting — 매 sklearn / dedicated library 사용. 매 numerical issue 의 diagnosis 시 actual data 의 inspection 필요. ## ❌ 안티패턴 - **Single random init**: 매 local optimum trap — n_init=10 권장. - **Singular covariance ignore**: 매 sigma += εI 의 regularization 필수. - **Convergence 의 likelihood 가 아닌 parameter 의 monitor**: 매 wrong — likelihood / ELBO 의 monitor. - **K 의 randomly choose**: 매 BIC / AIC / cross-validation 사용. - **K-means 의 GMM 결과 비교**: 매 different — GMM 의 soft assignment + covariance. - **EM 의 global optimum 가정**: 매 local optimum 만 — multi-start 필수. ## 🧪 검증 / 중복 - Verified (Dempster-Laird-Rubin 1977, Bishop "PRML" Ch9, Murphy "Probabilistic ML" Ch11). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — algorithm, ELBO, GMM/HMM/VAE applications, NumPy from-scratch |