f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
237 lines
7.3 KiB
Markdown
237 lines
7.3 KiB
Markdown
---
|
|
id: wiki-2026-0508-boltzmann-machines
|
|
title: Boltzmann Machines
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [볼츠만 머신, RBM, restricted Boltzmann machine, deep belief network, energy-based model, contrastive divergence]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.88
|
|
verification_status: applied
|
|
tags: [boltzmann-machine, rbm, energy-based-model, deep-learning-history, hinton, contrastive-divergence]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: Python
|
|
framework: PyTorch / scikit-learn
|
|
---
|
|
|
|
# Boltzmann Machines
|
|
|
|
## 📌 한 줄 통찰
|
|
> **"매 data distribution 의 energy 의 model"**. 매 stat mech 의 Boltzmann distribution 의 inspire. 매 deep learning 의 spark (Hinton 2006 RBM pre-training). 매 modern: 매 energy-based model (EBM) 의 의 base + 매 score matching + 매 diffusion 의 connection.
|
|
|
|
## 📖 핵심
|
|
|
|
### 매 history
|
|
- 1985: Hinton & Sejnowski의 Boltzmann Machine.
|
|
- 2002: Hinton의 Contrastive Divergence (CD) 학습.
|
|
- 2006: Hinton's "A Fast Learning Algorithm for Deep Belief Networks" — 매 deep learning 의 부활.
|
|
- 2007-2012: 매 pre-training 의 ImageNet 의 unleash.
|
|
- 2010s: 매 backprop + ReLU + dropout 의 supersede.
|
|
- 2020s: 매 energy-based model 의 revival (Du, LeCun).
|
|
|
|
### 매 architecture
|
|
|
|
#### Vanilla Boltzmann Machine
|
|
- 매 모든 매 unit 가 connected.
|
|
- 매 visible + hidden.
|
|
- 매 train 어려움 (intractable).
|
|
|
|
#### RBM (Restricted)
|
|
- 매 same-layer connection X.
|
|
- 매 visible ↔ hidden 만.
|
|
- 매 efficient sampling.
|
|
|
|
#### DBN (Deep Belief Network)
|
|
- 매 RBM 의 stack.
|
|
- 매 layer-wise pre-training.
|
|
|
|
#### DBM (Deep Boltzmann Machine)
|
|
- 매 모든 layer 의 bidirectional.
|
|
- 매 train 매 hard.
|
|
|
|
### 매 energy formulation
|
|
$$E(v, h) = -\sum_i a_i v_i - \sum_j b_j h_j - \sum_{i,j} v_i W_{ij} h_j$$
|
|
|
|
$$P(v, h) = \frac{e^{-E(v, h)}}{Z}$$
|
|
|
|
- 매 Z = partition function (intractable).
|
|
|
|
### 매 학습: Contrastive Divergence (CD-k)
|
|
1. 매 data v0.
|
|
2. 매 sample h0 from P(h | v0).
|
|
3. 매 sample v1 from P(v | h0). [k step 의 Gibbs]
|
|
4. 매 update: ΔW = lr * (v0 h0^T - v1 h1^T).
|
|
|
|
### 매 modern relevance
|
|
- **Energy-Based Model (EBM)**: 매 LeCun 의 advocate.
|
|
- **Score matching**: 매 gradient 의 학습 — 매 diffusion model 의 base.
|
|
- **Diffusion model** (DDPM): 매 EBM 의 변형.
|
|
- **GAN**: 매 implicit EBM.
|
|
- **JEM** (Joint Energy Model): 매 classifier 의 EBM 의 reframe.
|
|
|
|
### 매 modern application
|
|
- **Anomaly detection**: 매 low energy = normal.
|
|
- **Generative model** (legacy): 매 collaborative filtering.
|
|
- **Recommender** (Netflix prize 의 RBM).
|
|
- **Pre-training** (legacy, mostly replaced).
|
|
- **Quantum Boltzmann** (quantum computing).
|
|
|
|
### 매 vs modern alternative
|
|
| 측면 | RBM | Modern |
|
|
|---|---|---|
|
|
| Density estimation | weak | Diffusion / Flow |
|
|
| Pre-training | weak | Self-supervised |
|
|
| Generation | OK | GAN / Diffusion |
|
|
| Tractability | hard | tractable (specific) |
|
|
|
|
→ 매 historical importance > 매 current usage.
|
|
|
|
## 💻 패턴
|
|
|
|
### RBM (scikit-learn)
|
|
```python
|
|
from sklearn.neural_network import BernoulliRBM
|
|
from sklearn.datasets import load_digits
|
|
|
|
X = load_digits().data / 16.0 # 매 normalize
|
|
|
|
rbm = BernoulliRBM(n_components=64, learning_rate=0.06, n_iter=20)
|
|
rbm.fit(X)
|
|
|
|
# 매 reconstruction
|
|
import numpy as np
|
|
hidden = rbm.transform(X[:1]) # 매 hidden activations
|
|
print(hidden.shape) # (1, 64)
|
|
```
|
|
|
|
### RBM (PyTorch from scratch)
|
|
```python
|
|
import torch
|
|
import torch.nn as nn
|
|
|
|
class RBM(nn.Module):
|
|
def __init__(self, n_visible, n_hidden):
|
|
super().__init__()
|
|
self.W = nn.Parameter(torch.randn(n_hidden, n_visible) * 0.01)
|
|
self.v_bias = nn.Parameter(torch.zeros(n_visible))
|
|
self.h_bias = nn.Parameter(torch.zeros(n_hidden))
|
|
|
|
def sample_h(self, v):
|
|
p_h = torch.sigmoid(F.linear(v, self.W, self.h_bias))
|
|
return p_h, torch.bernoulli(p_h)
|
|
|
|
def sample_v(self, h):
|
|
p_v = torch.sigmoid(F.linear(h, self.W.t(), self.v_bias))
|
|
return p_v, torch.bernoulli(p_v)
|
|
|
|
def free_energy(self, v):
|
|
wx_b = F.linear(v, self.W, self.h_bias)
|
|
return -torch.sum(F.softplus(wx_b), dim=1) - v @ self.v_bias
|
|
|
|
def cd_k(rbm, v0, k=1, lr=0.01):
|
|
"""매 Contrastive Divergence."""
|
|
p_h0, h0 = rbm.sample_h(v0)
|
|
|
|
vk = v0
|
|
for _ in range(k):
|
|
p_h, h = rbm.sample_h(vk)
|
|
p_v, vk = rbm.sample_v(h)
|
|
|
|
p_hk, hk = rbm.sample_h(vk)
|
|
|
|
# 매 gradient
|
|
rbm.W.grad = -((p_h0.t() @ v0 - p_hk.t() @ vk) / v0.size(0))
|
|
rbm.v_bias.grad = -((v0 - vk).mean(0))
|
|
rbm.h_bias.grad = -((p_h0 - p_hk).mean(0))
|
|
```
|
|
|
|
### Energy-Based Model (modern)
|
|
```python
|
|
class EBM(nn.Module):
|
|
"""매 energy F(x) = MLP."""
|
|
def __init__(self, dim):
|
|
super().__init__()
|
|
self.net = nn.Sequential(
|
|
nn.Linear(dim, 256), nn.ReLU(),
|
|
nn.Linear(256, 256), nn.ReLU(),
|
|
nn.Linear(256, 1),
|
|
)
|
|
|
|
def energy(self, x):
|
|
return self.net(x).squeeze(-1)
|
|
|
|
def langevin_sample(ebm, x, n_steps=100, step_size=0.1, noise=0.01):
|
|
"""매 Langevin dynamics 의 EBM 의 sample."""
|
|
x = x.detach().requires_grad_()
|
|
for _ in range(n_steps):
|
|
e = ebm.energy(x).sum()
|
|
grad = torch.autograd.grad(e, x)[0]
|
|
x = x - step_size * grad + noise * torch.randn_like(x)
|
|
x = x.detach().requires_grad_()
|
|
return x
|
|
```
|
|
|
|
### Diffusion model (related EBM)
|
|
```python
|
|
# 매 DDPM 의 sketch — 매 noise 의 add + reverse
|
|
def diffusion_train(model, x0, T=1000):
|
|
t = torch.randint(0, T, (x0.size(0),))
|
|
noise = torch.randn_like(x0)
|
|
alpha_bar = noise_schedule[t]
|
|
xt = torch.sqrt(alpha_bar) * x0 + torch.sqrt(1 - alpha_bar) * noise
|
|
|
|
pred_noise = model(xt, t)
|
|
return F.mse_loss(pred_noise, noise)
|
|
```
|
|
|
|
### Anomaly detection (EBM)
|
|
```python
|
|
def is_anomaly(ebm, x, threshold):
|
|
"""매 high energy = 매 unusual."""
|
|
return ebm.energy(x).item() > threshold
|
|
```
|
|
|
|
## 🤔 결정 기준
|
|
| 상황 | Approach |
|
|
|---|---|
|
|
| Modern generative | Diffusion / GAN |
|
|
| Anomaly detection | EBM / Autoencoder |
|
|
| Historical study | RBM / DBN |
|
|
| Quantum | Quantum Boltzmann |
|
|
| Pre-training | Self-supervised (BERT, MAE) |
|
|
| Sparse coding | Sparse autoencoder |
|
|
|
|
**기본값**: 매 historical 의 understand 가, 매 production 의 모더 매 alternative.
|
|
|
|
## 🔗 Graph
|
|
- 부모: [[Generative-AI|Generative-Models]]
|
|
- 변형: [[RBM]]
|
|
- 응용: [[Diffusion-Model]] · [[Generative-Adversarial-Networks|GAN]] · [[Anomaly-Detection]]
|
|
- Adjacent: [[Contrastive-Divergence]] · [[Auto-Encoding]] · [[Bayesian-Brain-Hypothesis]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: 매 deep learning history. 매 EBM 의 understand. 매 anomaly detection. 매 diffusion 의 connection.
|
|
**언제 X**: 매 production generative (use diffusion). 매 production pre-train (use SSL).
|
|
|
|
## ❌ 안티패턴
|
|
- **RBM 의 production 의 expect**: 매 outdated.
|
|
- **Pre-training 의 RBM 으로 의 modern (BERT 의 era)**: 매 use SSL.
|
|
- **Z (partition) 의 compute attempt**: 매 intractable.
|
|
- **Single-step CD**: 매 biased estimator.
|
|
- **Continuous data 의 binary RBM**: 매 wrong.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (Hinton 2002 CD, 2006 DBN paper).
|
|
- 신뢰도 A.
|
|
- Related: [[Diffusion-Model]] · [[Energy-Based-Models]] · [[Auto-Encoding]] · [[Self-Supervised-Learning]].
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — RBM + EBM + diffusion connection + 매 PyTorch / sklearn code |
|