Files
2nd/10_Wiki/Topics/Architecture/Variational-Autoencoders-VAE.md
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

194 lines
7.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-variational-autoencoders-vae
title: Variational Autoencoders (VAE)
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [VAE, Variational Autoencoder, β-VAE]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [generative-model, deep-learning, latent-variable, variational-inference]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: PyTorch 2.x
---
# Variational Autoencoders (VAE)
## 매 한 줄
> **"매 encoder 가 input 의 latent distribution (μ, σ) 의 produce → reparameterization trick 으로 sample → decoder 의 reconstruct. 매 ELBO = reconstruction loss + KL(q(z|x) || p(z))"**. 매 Kingma & Welling 2013 (Auto-Encoding Variational Bayes). 매 2026 의 modern role: standalone generation 의 X (diffusion 의 우위) BUT 매 Stable Diffusion / FLUX / Sora 의 latent space 의 backbone — 매 image 의 8× downsampled latent 의 work.
## 매 핵심
### 매 수학 (ELBO)
- **Goal**: maximize log p(x). 매 intractable.
- **Trick**: variational posterior q_φ(z|x) ≈ p(z|x). 매 ELBO 의 lower bound.
- **ELBO** = E_{z~q}[log p_θ(x|z)] D_KL(q_φ(z|x) || p(z))
- 1번 term: reconstruction (decoder).
- 2번 term: regularize latent 의 prior (보통 N(0,I)) 에 가깝게.
- **Reparameterization**: z = μ + σ ⊙ ε, ε~N(0,I) — 매 backprop through stochastic sampling.
### 매 vs 다른 generative
- **GAN**: sharp, no likelihood, mode collapse. VAE: blurry, likelihood, stable training.
- **Diffusion**: state-of-art quality. VAE: faster inference (single forward).
- **2026 dominant role**: latent diffusion 의 frontend — 매 VAE 가 pixel space → latent space 압축, diffusion 이 latent 의 denoise.
### 매 변종
- **β-VAE**: KL term 에 β 곱 → β>1 의 disentangled latent.
- **VQ-VAE**: continuous latent → discrete codebook (Vector Quantization). 매 DALL-E, Sora 의 핵심.
- **Hierarchical VAE / NVAE**: multi-scale latents.
- **Conditional VAE (CVAE)**: conditional generation p(x|c).
### 매 응용
1. Latent diffusion (Stable Diffusion / FLUX / Sora) — 매 8×8 patch → 4-ch latent.
2. Anomaly detection — high reconstruction error = anomaly.
3. Molecular generation — 매 chemistry latent space exploration.
## 💻 패턴
### Vanilla VAE (PyTorch 2.x)
```python
import torch
import torch.nn as nn
import torch.nn.functional as F
class VAE(nn.Module):
def __init__(self, in_dim=784, hidden=400, z_dim=20):
super().__init__()
self.fc1 = nn.Linear(in_dim, hidden)
self.fc_mu = nn.Linear(hidden, z_dim)
self.fc_logvar = nn.Linear(hidden, z_dim)
self.fc2 = nn.Linear(z_dim, hidden)
self.fc3 = nn.Linear(hidden, in_dim)
def encode(self, x):
h = F.relu(self.fc1(x))
return self.fc_mu(h), self.fc_logvar(h)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def decode(self, z):
return torch.sigmoid(self.fc3(F.relu(self.fc2(z))))
def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
def vae_loss(recon, x, mu, logvar):
bce = F.binary_cross_entropy(recon, x, reduction='sum')
kld = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return bce + kld
```
### Training loop
```python
model = VAE().cuda()
opt = torch.optim.AdamW(model.parameters(), lr=1e-3)
for epoch in range(50):
for x, _ in loader:
x = x.view(-1, 784).cuda()
recon, mu, logvar = model(x)
loss = vae_loss(recon, x, mu, logvar)
opt.zero_grad(); loss.backward(); opt.step()
```
### β-VAE (disentanglement)
```python
def beta_vae_loss(recon, x, mu, logvar, beta=4.0):
bce = F.binary_cross_entropy(recon, x, reduction='sum')
kld = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return bce + beta * kld
```
### VQ-VAE (vector quantization)
```python
class VectorQuantizer(nn.Module):
def __init__(self, num_embeddings=512, embedding_dim=64, commitment=0.25):
super().__init__()
self.embed = nn.Embedding(num_embeddings, embedding_dim)
self.embed.weight.data.uniform_(-1.0 / num_embeddings, 1.0 / num_embeddings)
self.commitment = commitment
def forward(self, z_e): # z_e: (B, C, H, W)
z_e_perm = z_e.permute(0, 2, 3, 1).contiguous()
flat = z_e_perm.view(-1, z_e_perm.size(-1))
# Nearest codebook vector
d = (flat.pow(2).sum(1, keepdim=True)
- 2 * flat @ self.embed.weight.t()
+ self.embed.weight.pow(2).sum(1))
idx = d.argmin(1)
z_q = self.embed(idx).view(z_e_perm.shape).permute(0, 3, 1, 2)
# Straight-through estimator
loss = F.mse_loss(z_q.detach(), z_e) + self.commitment * F.mse_loss(z_q, z_e.detach())
z_q = z_e + (z_q - z_e).detach()
return z_q, loss, idx
```
### Sample / generate
```python
model.eval()
with torch.no_grad():
z = torch.randn(64, 20).cuda()
samples = model.decode(z).view(-1, 1, 28, 28).cpu()
```
### Latent diffusion VAE (SD-style — using diffusers)
```python
from diffusers import AutoencoderKL
import torch
vae = AutoencoderKL.from_pretrained('stabilityai/sd-vae-ft-mse').cuda()
# Encode 512x512 image → 4-ch 64x64 latent
img = torch.randn(1, 3, 512, 512).cuda()
latent = vae.encode(img).latent_dist.sample() * vae.config.scaling_factor
# Diffusion happens in latent space, then decode
recon = vae.decode(latent / vae.config.scaling_factor).sample
```
## 매 결정 기준
| 목적 | Choice |
|---|---|
| 2026 SOTA 이미지 생성 | Diffusion (FLUX, Stable Diffusion 3.5) — 매 VAE 의 frontend 만 |
| Disentangled representation 연구 | β-VAE |
| Discrete latent (LLM tokenize 유사) | VQ-VAE / VQ-GAN |
| Anomaly detection | Vanilla VAE — reconstruction error |
| Latent diffusion 학습 | Pre-trained KL-regularized VAE (e.g. SD VAE) reuse |
| Molecular / structured generation | VAE (continuous latent) — 매 still competitive |
**기본값**: 매 image generation 의 directly 의 X — 매 latent diffusion 안 의 VAE 로 사용. 매 disentanglement / anomaly 의 standalone VAE.
## 🔗 Graph
- 부모: [[Variational Inference]]
- 변형: [[β-VAE]]
- 응용: [[Stable Diffusion]] · [[FLUX]]
- Adjacent: [[Diffusion Models]] · [[Generative-Adversarial-Networks|GAN]]
## 🤖 LLM 활용
**언제**: 매 latent diffusion 의 VAE component 설명, 매 anomaly detection baseline 작성, 매 ELBO 수학 의 derivation, 매 reparameterization trick 의 implementation.
**언제 X**: 매 standalone SOTA image generation (diffusion 우선), 매 sharp output 필수 (GAN/diffusion).
## ❌ 안티패턴
- **Posterior collapse**: q(z|x) → p(z) 의 무시 → KL=0, decoder 의 z 의 ignore. 매 KL annealing / β scheduling 필요.
- **Pixel-space VAE 의 high-res 직접**: 매 blurry, 매 8× downsample latent + diffusion 으로 decouple.
- **σ 의 직접 output**: 매 negative 가능. 매 logvar 의 output → σ = exp(0.5 * logvar).
- **KL 의 mean reduction**: 매 batch mean 의 reconstruction 의 sum 과 mismatch — 매 두 term 의 same reduction.
## 🧪 검증 / 중복
- Verified (Kingma & Welling 2013 ICLR, Stable Diffusion paper, NVIDIA NVAE, DeepMind β-VAE).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — full VAE with ELBO, β-VAE, VQ-VAE, latent diffusion role, 6 patterns |