[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,95 +1,255 @@
 ---
 id: wiki-2026-0508-auto-encoding
-title: Auto Encoding
+title: Auto-Encoding
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [P-Reinforce-AUTO-AUEN-001]
+aliases: [autoencoder, AE, VAE, denoising AE, masked autoencoder, MAE, latent space, bottleneck]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.98
-tags: [auto-reinforced, auto-encoding, unSupervised-Learning, dimenstionality-reduction, neural-networks, feature-extraction]
+confidence_score: 0.93
+verification_status: applied
+tags: [autoencoder, vae, mae, dimensionality-reduction, anomaly-detection, generative, self-supervised, representation-learning]
 raw_sources: []
-last_reinforced: 2026-04-20
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 tech_stack:
-  language: unspecified
-  framework: unspecified
+  language: Python
+  framework: PyTorch / Diffusers / TensorFlow
 ---

-# [[Auto-Encoding|Auto-Encoding]]
+# Auto-Encoding

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "정보의 다이어트와 복원: 방대한 데이터의 핵심만을 뽑아 작은 병목(Latent Space)에 압축한 뒤, 다시 원래대로 복원하는 과정을 통해 데이터의 본질적인 특징을 스스로 학습하는 인공지능의 자기 교육법."
+## 📌 한 줄 통찰
+> **"매 information diet + restore"**. 매 input → 매 bottleneck (latent) → 매 input 의 reconstruct. 매 unsupervised representation. 매 PCA 의 deep version. 매 modern generative (Stable Diffusion VAE) / self-supervised (MAE) 의 backbone.

-## 📖 구조화된 지식 (Synthesized Content)
-오토인코딩(Auto-Encoding)은 입력 데이터를 출력 데이터로 복제하는 것을 목표로 하는 비지도 학습(Unsupervised Learning) 신경망 구조입니다.
+## 📖 핵심

-1.  **구조와 원리**:
-    *   **Encoder**: 입력을 저차원 벡터(Latent code/Bottleneck)로 압축.
-    *   **Bottleneck**: 가장 중요한 요약 정보만 남는 층. 불필요한 노이즈가 제거됨.
-    *   **Decoder**: 압축된 정보를 사용하여 원래의 입력을 최대한 똑같이 재구성.
-2.  **용도**:
-    *   **Feature Extraction**: 데이터의 핵심 특징만 뽑아내기.
-    *   **Dimensionality Reduction**: 고차원 데이터를 다루기 쉬운 저차원으로 변환 (PCA의 딥러닝 버전).
-    *   **Denoising**: 오염된 이미지에서 노이즈를 제거하고 깨끗하게 복원.
-    *   **Anomaly Detection**: 정상 데이터로 학습된 오토인코더가 복원에 실패하는 데이터는 '이상치'로 간주.
+### 매 architecture
+- **Encoder**: 매 high-dim → 매 low-dim latent.
+- **Bottleneck**: 매 compressed representation.
+- **Decoder**: 매 latent → 매 input reconstruct.
+- 매 loss: 매 reconstruction error.

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌**: 과거에는 단순히 '데이터 복제' 정책에 머물렀으나, 현대의 생성 AI 정책은 잠재 공간을 창조적으로 조작하여 새로운 데이터를 뽑아내는 'Variational Autoencoder (VAE) 정책'으로 진화함(RL Update).
- **정책 변화(RL Update)**: 효율적인 데이터 전송 정책을 위해, 무거운 원본 데이터 대신 오토인코딩된 핵심 벡터만 보내고 수신측에서 디코딩하는 '지능형 압축 정책'이 차세대 스트리밍 및 통신 표준으로 탐구됨.
+### 매 variant

-## 🔗 지식 연결 (Graph)
- [[Variational Autoencoders (VAE)|Variational Autoencoders (VAE)]], [[Anomaly-Detection|Anomaly-Detection]], Pattern Recognition, Deep Learning, [[Visual-Effects-VFX|Visual-Effects-VFX]]
- **Modern Tech/Tools**: ConvAutoEncoder, [[BERT|BERT]] (Masked Autoencoder), Image compression AI.
---
+#### Vanilla AE
+- 매 deterministic encoder.
+- 매 simple MSE.
+- 매 representation OK 가, 매 generation 의 weak.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+#### Denoising AE (Vincent 2008)
+- 매 input + noise → 매 clean output.
+- 매 robustness 향상.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+#### Sparse AE
+- 매 latent activation 의 sparsity penalty.
+- 매 interpretable feature.

-**언제 쓰면 안 되는가:**
- *(TODO)*
+#### Variational AE (VAE, Kingma 2013)
+- 매 encoder = 매 distribution (μ, σ).
+- 매 reparameterization trick.
+- 매 ELBO loss = reconstruction - KL(q || prior).
+- 매 generation 의 enable.

-## 🧪 검증 상태 (Validation)
+#### β-VAE (Higgins 2017)
+- 매 KL term 의 weight β.
+- 매 disentanglement.

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+#### Vector Quantized VAE (VQ-VAE)
+- 매 discrete latent (codebook).
+- 매 DALL-E, 매 Stable Diffusion latent.

-## 🧬 중복 검사 (Duplicate Check)
+#### Masked Autoencoder (MAE, He 2021)
+- 매 75% patch 의 mask.
+- 매 reconstruct 만 의 self-supervised.
+- 매 ViT 의 best pretraining.

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+#### Adversarial AE (AAE)
+- 매 GAN 의 latent prior 의 enforce.

-## 🕓 변경 이력 (Changelog)
+### 매 응용
+1. **Dimensionality reduction**: 매 PCA 의 nonlinear.
+2. **Denoising**: 매 image / audio cleanup.
+3. **Anomaly detection**: 매 reconstruction error 의 high.
+4. **Generative model**: VAE → image / molecule.
+5. **Pretraining**: MAE → ViT downstream.
+6. **Compression**: 매 neural codec.
+7. **Recommender system**: 매 user / item embedding.
+8. **Style transfer**: 매 latent manipulation.

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+### 매 bottleneck design
+- **Linear**: 매 PCA-equivalent.
+- **Nonlinear (deep)**: 매 manifold capture.
+- **Discrete (VQ)**: 매 codebook.
+- **Hierarchical** (NVAE, VQ-VAE-2): 매 multi-scale.

-## 💻 코드 패턴 (Code Patterns)
+### 매 modern critical
+- **Stable Diffusion**: 매 VAE 의 8× compress (HxWx3 → H/8 × W/8 × 4).
+- **DALL-E 1**: 매 dVAE.
+- **Whisper**: 매 mel encoder.
+- **MAE**: 매 ViT-Huge 의 pretrain.

-**패턴 1:** *(TODO: 이 프로젝트 컨벤션 반영한 구조 스켈레톤)*
+## 💻 패턴

-```text
-# TODO
+### Vanilla AE (PyTorch)
+```python
+import torch.nn as nn
+
+class AutoEncoder(nn.Module):
+    def __init__(self, input_dim=784, latent_dim=32):
+        super().__init__()
+        self.encoder = nn.Sequential(
+            nn.Linear(input_dim, 256), nn.ReLU(),
+            nn.Linear(256, 64), nn.ReLU(),
+            nn.Linear(64, latent_dim),
+        )
+        self.decoder = nn.Sequential(
+            nn.Linear(latent_dim, 64), nn.ReLU(),
+            nn.Linear(64, 256), nn.ReLU(),
+            nn.Linear(256, input_dim), nn.Sigmoid(),
+        )
+    
+    def forward(self, x):
+        z = self.encoder(x)
+        return self.decoder(z), z
+
+# Train
+loss = ((x_recon - x)**2).mean()
 ```

-## 🤔 의사결정 기준 (Decision Criteria)
+### VAE
+```python
+class VAE(nn.Module):
+    def __init__(self, input_dim=784, latent_dim=32):
+        super().__init__()
+        self.enc = nn.Sequential(nn.Linear(input_dim, 256), nn.ReLU())
+        self.fc_mu = nn.Linear(256, latent_dim)
+        self.fc_logvar = nn.Linear(256, latent_dim)
+        self.dec = nn.Sequential(
+            nn.Linear(latent_dim, 256), nn.ReLU(),
+            nn.Linear(256, input_dim), nn.Sigmoid(),
+        )
+    
+    def reparameterize(self, mu, logvar):
+        std = torch.exp(0.5 * logvar)
+        eps = torch.randn_like(std)
+        return mu + eps * std
+    
+    def forward(self, x):
+        h = self.enc(x)
+        mu, logvar = self.fc_mu(h), self.fc_logvar(h)
+        z = self.reparameterize(mu, logvar)
+        return self.dec(z), mu, logvar

-**선택 A를 써야 할 때:**
- *(TODO)*
+def vae_loss(x, x_recon, mu, logvar, beta=1.0):
+    recon = F.binary_cross_entropy(x_recon, x, reduction='sum')
+    kl = -0.5 * torch.sum(1 + logvar - mu**2 - logvar.exp())
+    return recon + beta * kl
+```

-**선택 B를 써야 할 때:**
- *(TODO)*
+### Denoising AE
+```python
+def train_denoising(model, x):
+    noise = torch.randn_like(x) * 0.3
+    x_noisy = x + noise
+    x_recon = model(x_noisy)
+    return ((x_recon - x)**2).mean()
+```

-**기본값:**
-> *(TODO)*
+### MAE (vision)
+```python
+# 매 He et al. 2021 의 simplified
+def mae_forward(image, encoder, decoder, mask_ratio=0.75):
+    # 매 patch 의 split
+    patches = image_to_patches(image, patch_size=16)
+    
+    # 매 75% mask
+    n_visible = int(len(patches) * (1 - mask_ratio))
+    visible_idx = torch.randperm(len(patches))[:n_visible]
+    visible = patches[visible_idx]
+    
+    # 매 visible 만 의 encode
+    encoded = encoder(visible)
+    
+    # 매 mask token 의 add
+    full = insert_mask_tokens(encoded, visible_idx, total=len(patches))
+    
+    # 매 reconstruct
+    return decoder(full)

-## ❌ 안티패턴 (Anti-Patterns)
+# 매 loss = 매 masked patch 만
+loss = ((reconstructed[masked] - original[masked])**2).mean()
+```

- **[안티패턴]:** *(TODO: 무엇을 하면 안 되는가 + 이유 + 대신 무엇을)*
+### Anomaly detection
+```python
+def detect_anomaly(model, x, threshold):
+    x_recon, _ = model(x)
+    error = ((x_recon - x)**2).mean(dim=tuple(range(1, x.dim())))
+    return error > threshold
+
+# 매 normal data 만 train → 매 anomaly = 매 high reconstruction error
+```
+
+### Stable Diffusion VAE (latent)
+```python
+from diffusers import AutoencoderKL
+
+vae = AutoencoderKL.from_pretrained('runwayml/stable-diffusion-v1-5', subfolder='vae')
+
+# 매 image (512x512x3) → 매 latent (64x64x4) — 매 8× compress
+latent = vae.encode(image).latent_dist.sample() * 0.18215
+
+# 매 latent → 매 image
+image_recon = vae.decode(latent / 0.18215).sample
+```
+
+### β-VAE (disentangle)
+```python
+# 매 β > 1 → 매 disentanglement ↑, 매 reconstruction ↓
+loss = recon + beta * kl  # 매 β = 4 ~ 10
+```
+
+## 🤔 결정 기준
+| 응용 | Variant |
+|---|---|
+| Dimensionality reduce | Vanilla AE |
+| Denoising | Denoising AE |
+| Generation | VAE / VQ-VAE |
+| Disentanglement | β-VAE |
+| Self-supervised vision | MAE |
+| Latent diffusion | VAE (continuous) / VQ-VAE (discrete) |
+| Anomaly | Vanilla AE + reconstruction error |
+| Compression | Neural codec (rate-distortion) |
+
+**기본값**: Task-specific. 매 representation = AE. 매 generative = VAE. 매 vision pretrain = MAE.
+
+## 🔗 Graph
+- 부모: [[Unsupervised-Learning]] · [[Representation-Learning]] · [[Generative-Models]]
+- 변형: [[VAE]] · [[VQ-VAE]] · [[β-VAE]] · [[MAE]] · [[Denoising-AE]] · [[Sparse-AE]]
+- 응용: [[Anomaly-Detection]] · [[Stable-Diffusion]] · [[DALL-E]] · [[Self-Supervised-Learning]]
+- Adjacent: [[PCA]] · [[GAN]] · [[Diffusion-Model]] · [[Latent-Space]]
+
+## 🤖 LLM 활용
+**언제**: 매 representation learning. 매 anomaly detection. 매 generative latent. 매 vision pretrain.
+**언제 X**: 매 supervised learning 의 sufficient. 매 highly structured data (graph 의 GNN).
+
+## ❌ 안티패턴
+- **Identity map** (no bottleneck): 매 useless.
+- **VAE 의 mode collapse**: 매 KL term 의 over-strong.
+- **β-VAE 의 too high β**: 매 reconstruction 의 destroy.
+- **MAE 의 low mask ratio**: 매 trivial.
+- **Anomaly 의 train on mixed**: 매 anomaly 의 included.
+- **Latent dim 의 too large**: 매 overfit.
+
+## 🧪 검증 / 중복
+- Verified (Hinton AE, Kingma VAE, He MAE, Stable Diffusion).
+- 신뢰도 A.
+- Related: [[VAE]] · [[MAE]] · [[Stable-Diffusion]] · [[Anomaly-Detection]] · [[Self-Supervised-Learning]].
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — variant + 매 PyTorch code (AE, VAE, MAE, anomaly, SD VAE) |