"매 information diet + restore". 매 input → 매 bottleneck (latent) → 매 input 의 reconstruct. 매 unsupervised representation. 매 PCA 의 deep version. 매 modern generative (Stable Diffusion VAE) / self-supervised (MAE) 의 backbone.
📖 핵심
매 architecture
Encoder: 매 high-dim → 매 low-dim latent.
Bottleneck: 매 compressed representation.
Decoder: 매 latent → 매 input reconstruct.
매 loss: 매 reconstruction error.
매 variant
Vanilla AE
매 deterministic encoder.
매 simple MSE.
매 representation OK 가, 매 generation 의 weak.
Denoising AE (Vincent 2008)
매 input + noise → 매 clean output.
매 robustness 향상.
Sparse AE
매 latent activation 의 sparsity penalty.
매 interpretable feature.
Variational AE (VAE, Kingma 2013)
매 encoder = 매 distribution (μ, σ).
매 reparameterization trick.
매 ELBO loss = reconstruction - KL(q || prior).
매 generation 의 enable.
β-VAE (Higgins 2017)
매 KL term 의 weight β.
매 disentanglement.
Vector Quantized VAE (VQ-VAE)
매 discrete latent (codebook).
매 DALL-E, 매 Stable Diffusion latent.
Masked Autoencoder (MAE, He 2021)
매 75% patch 의 mask.
매 reconstruct 만 의 self-supervised.
매 ViT 의 best pretraining.
Adversarial AE (AAE)
매 GAN 의 latent prior 의 enforce.
매 응용
Dimensionality reduction: 매 PCA 의 nonlinear.
Denoising: 매 image / audio cleanup.
Anomaly detection: 매 reconstruction error 의 high.
Generative model: VAE → image / molecule.
Pretraining: MAE → ViT downstream.
Compression: 매 neural codec.
Recommender system: 매 user / item embedding.
Style transfer: 매 latent manipulation.
매 bottleneck design
Linear: 매 PCA-equivalent.
Nonlinear (deep): 매 manifold capture.
Discrete (VQ): 매 codebook.
Hierarchical (NVAE, VQ-VAE-2): 매 multi-scale.
매 modern critical
Stable Diffusion: 매 VAE 의 8× compress (HxWx3 → H/8 × W/8 × 4).
# 매 He et al. 2021 의 simplifieddefmae_forward(image,encoder,decoder,mask_ratio=0.75):# 매 patch 의 splitpatches=image_to_patches(image,patch_size=16)# 매 75% maskn_visible=int(len(patches)*(1-mask_ratio))visible_idx=torch.randperm(len(patches))[:n_visible]visible=patches[visible_idx]# 매 visible 만 의 encodeencoded=encoder(visible)# 매 mask token 의 addfull=insert_mask_tokens(encoded,visible_idx,total=len(patches))# 매 reconstructreturndecoder(full)# 매 loss = 매 masked patch 만loss=((reconstructed[masked]-original[masked])**2).mean()
Anomaly detection
defdetect_anomaly(model,x,threshold):x_recon,_=model(x)error=((x_recon-x)**2).mean(dim=tuple(range(1,x.dim())))returnerror>threshold# 매 normal data 만 train → 매 anomaly = 매 high reconstruction error
Stable Diffusion VAE (latent)
fromdiffusersimportAutoencoderKLvae=AutoencoderKL.from_pretrained('runwayml/stable-diffusion-v1-5',subfolder='vae')# 매 image (512x512x3) → 매 latent (64x64x4) — 매 8× compresslatent=vae.encode(image).latent_dist.sample()*0.18215# 매 latent → 매 imageimage_recon=vae.decode(latent/0.18215).sample
β-VAE (disentangle)
# 매 β > 1 → 매 disentanglement ↑, 매 reconstruction ↓loss=recon+beta*kl# 매 β = 4 ~ 10
🤔 결정 기준
응용
Variant
Dimensionality reduce
Vanilla AE
Denoising
Denoising AE
Generation
VAE / VQ-VAE
Disentanglement
β-VAE
Self-supervised vision
MAE
Latent diffusion
VAE (continuous) / VQ-VAE (discrete)
Anomaly
Vanilla AE + reconstruction error
Compression
Neural codec (rate-distortion)
기본값: Task-specific. 매 representation = AE. 매 generative = VAE. 매 vision pretrain = MAE.
언제: 매 representation learning. 매 anomaly detection. 매 generative latent. 매 vision pretrain.
언제 X: 매 supervised learning 의 sufficient. 매 highly structured data (graph 의 GNN).
❌ 안티패턴
Identity map (no bottleneck): 매 useless.
VAE 의 mode collapse: 매 KL term 의 over-strong.
β-VAE 의 too high β: 매 reconstruction 의 destroy.
MAE 의 low mask ratio: 매 trivial.
Anomaly 의 train on mixed: 매 anomaly 의 included.
Latent dim 의 too large: 매 overfit.
🧪 검증 / 중복
Verified (Hinton AE, Kingma VAE, He MAE, Stable Diffusion).