"매 data distribution 의 energy 의 model". 매 stat mech 의 Boltzmann distribution 의 inspire. 매 deep learning 의 spark (Hinton 2006 RBM pre-training). 매 modern: 매 energy-based model (EBM) 의 의 base + 매 score matching + 매 diffusion 의 connection.
📖 핵심
매 history
1985: Hinton & Sejnowski의 Boltzmann Machine.
2002: Hinton의 Contrastive Divergence (CD) 학습.
2006: Hinton's "A Fast Learning Algorithm for Deep Belief Networks" — 매 deep learning 의 부활.
2007-2012: 매 pre-training 의 ImageNet 의 unleash.
2010s: 매 backprop + ReLU + dropout 의 supersede.
2020s: 매 energy-based model 의 revival (Du, LeCun).
Score matching: 매 gradient 의 학습 — 매 diffusion model 의 base.
Diffusion model (DDPM): 매 EBM 의 변형.
GAN: 매 implicit EBM.
JEM (Joint Energy Model): 매 classifier 의 EBM 의 reframe.
매 modern application
Anomaly detection: 매 low energy = normal.
Generative model (legacy): 매 collaborative filtering.
Recommender (Netflix prize 의 RBM).
Pre-training (legacy, mostly replaced).
Quantum Boltzmann (quantum computing).
매 vs modern alternative
측면
RBM
Modern
Density estimation
weak
Diffusion / Flow
Pre-training
weak
Self-supervised
Generation
OK
GAN / Diffusion
Tractability
hard
tractable (specific)
→ 매 historical importance > 매 current usage.
💻 패턴
RBM (scikit-learn)
fromsklearn.neural_networkimportBernoulliRBMfromsklearn.datasetsimportload_digitsX=load_digits().data/16.0# 매 normalizerbm=BernoulliRBM(n_components=64,learning_rate=0.06,n_iter=20)rbm.fit(X)# 매 reconstructionimportnumpyasnphidden=rbm.transform(X[:1])# 매 hidden activationsprint(hidden.shape)# (1, 64)
RBM (PyTorch from scratch)
importtorchimporttorch.nnasnnclassRBM(nn.Module):def__init__(self,n_visible,n_hidden):super().__init__()self.W=nn.Parameter(torch.randn(n_hidden,n_visible)*0.01)self.v_bias=nn.Parameter(torch.zeros(n_visible))self.h_bias=nn.Parameter(torch.zeros(n_hidden))defsample_h(self,v):p_h=torch.sigmoid(F.linear(v,self.W,self.h_bias))returnp_h,torch.bernoulli(p_h)defsample_v(self,h):p_v=torch.sigmoid(F.linear(h,self.W.t(),self.v_bias))returnp_v,torch.bernoulli(p_v)deffree_energy(self,v):wx_b=F.linear(v,self.W,self.h_bias)return-torch.sum(F.softplus(wx_b),dim=1)-v@self.v_biasdefcd_k(rbm,v0,k=1,lr=0.01):"""매 Contrastive Divergence."""p_h0,h0=rbm.sample_h(v0)vk=v0for_inrange(k):p_h,h=rbm.sample_h(vk)p_v,vk=rbm.sample_v(h)p_hk,hk=rbm.sample_h(vk)# 매 gradientrbm.W.grad=-((p_h0.t()@v0-p_hk.t()@vk)/v0.size(0))rbm.v_bias.grad=-((v0-vk).mean(0))rbm.h_bias.grad=-((p_h0-p_hk).mean(0))
Energy-Based Model (modern)
classEBM(nn.Module):"""매 energy F(x) = MLP."""def__init__(self,dim):super().__init__()self.net=nn.Sequential(nn.Linear(dim,256),nn.ReLU(),nn.Linear(256,256),nn.ReLU(),nn.Linear(256,1),)defenergy(self,x):returnself.net(x).squeeze(-1)deflangevin_sample(ebm,x,n_steps=100,step_size=0.1,noise=0.01):"""매 Langevin dynamics 의 EBM 의 sample."""x=x.detach().requires_grad_()for_inrange(n_steps):e=ebm.energy(x).sum()grad=torch.autograd.grad(e,x)[0]x=x-step_size*grad+noise*torch.randn_like(x)x=x.detach().requires_grad_()returnx
Diffusion model (related EBM)
# 매 DDPM 의 sketch — 매 noise 의 add + reversedefdiffusion_train(model,x0,T=1000):t=torch.randint(0,T,(x0.size(0),))noise=torch.randn_like(x0)alpha_bar=noise_schedule[t]xt=torch.sqrt(alpha_bar)*x0+torch.sqrt(1-alpha_bar)*noisepred_noise=model(xt,t)returnF.mse_loss(pred_noise,noise)
Anomaly detection (EBM)
defis_anomaly(ebm,x,threshold):"""매 high energy = 매 unusual."""returnebm.energy(x).item()>threshold
🤔 결정 기준
상황
Approach
Modern generative
Diffusion / GAN
Anomaly detection
EBM / Autoencoder
Historical study
RBM / DBN
Quantum
Quantum Boltzmann
Pre-training
Self-supervised (BERT, MAE)
Sparse coding
Sparse autoencoder
기본값: 매 historical 의 understand 가, 매 production 의 모더 매 alternative.
언제: 매 deep learning history. 매 EBM 의 understand. 매 anomaly detection. 매 diffusion 의 connection.
언제 X: 매 production generative (use diffusion). 매 production pre-train (use SSL).
❌ 안티패턴
RBM 의 production 의 expect: 매 outdated.
Pre-training 의 RBM 으로 의 modern (BERT 의 era): 매 use SSL.