Files
2nd/10_Wiki/Topics/AI_and_ML/Generalization-in-AI.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

258 lines
7.7 KiB
Markdown

---
id: wiki-2026-0508-generalization-in-ai
title: Generalization in AI
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [generalization, OOD, distribution shift, robustness, double descent, scaling laws]
duplicate_of: none
source_trust_level: A
confidence_score: 0.96
verification_status: applied
tags: [ml, generalization, ood, robustness, scaling, double-descent, foundation-model]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
applicable_to: [ML Theory, Foundation Models, Robustness]
---
# Generalization in AI
## 매 한 줄
> **"매 unseen data 의 의 의 perform"**. 매 train ↔ test gap. 매 modern: 매 over-parameterization paradox, 매 double descent (Belkin), 매 grokking, 매 OOD robustness, 매 foundation model emergent generalization.
## 매 핵심
### 매 traditional view
- **Overfitting**: 매 capacity > complexity.
- **Underfitting**: 매 capacity < complexity.
- **Sweet spot**: 매 bias-variance trade-off.
### 매 modern view (DL)
- **Double descent** (Belkin 2019): 매 over-param → 매 generalize.
- **Grokking** (Power 2022): 매 long-after-overfit → 매 generalize.
- **Lottery ticket** (Frankle): 매 sparse subnet.
- **Implicit regularization** (SGD).
- **Flat minima** → 매 better generalize.
### 매 scaling laws
- **Kaplan 2020**: power law (loss vs N, D, C).
- **Chinchilla** (Hoffmann 2022): 매 D = 20·N optimal.
- **Llama 3 / 4**: 매 over-train 의 trend.
### 매 OOD robustness
- **Distribution shift**: covariate, label, concept.
- **Group robustness** (worst-case).
- **Invariant features** (causal).
- **Domain generalization**.
### 매 응용
1. **Production ML monitoring**.
2. **Self-driving safety**.
3. **Medical AI**.
4. **Foundation model evals**.
5. **Few-shot transfer**.
## 💻 패턴
### Train / val / test split
```python
from sklearn.model_selection import train_test_split
X_tr, X_temp, y_tr, y_temp = train_test_split(X, y, test_size=0.3, stratify=y)
X_val, X_te, y_val, y_te = train_test_split(X_temp, y_temp, test_size=0.5, stratify=y_temp)
```
### Detect overfit
```python
def overfit_check(train_loss, val_loss, threshold=0.1):
gap = (val_loss - train_loss) / train_loss
return gap > threshold
```
### Early stopping (val)
```python
class EarlyStop:
def __init__(self, patience=5):
self.patience = patience; self.best = float('inf'); self.bad = 0
def step(self, val_loss):
if val_loss < self.best: self.best = val_loss; self.bad = 0; return False
self.bad += 1; return self.bad > self.patience
```
### Double descent visualization
```python
def double_descent_curve(model_capacity_range, loss_fn):
"""매 small → optimum → big = train ↑ but generalize ↑."""
losses = []
for cap in model_capacity_range:
m = build_model(cap).fit(X_train, y_train)
losses.append(loss_fn(m, X_val, y_val))
return losses # 매 W-shaped curve
```
### OOD detection (Mahalanobis)
```python
def ood_score(test_features, train_features):
mu = train_features.mean(0)
cov_inv = np.linalg.pinv(np.cov(train_features.T))
diff = test_features - mu
return np.sqrt(np.einsum('bi,ij,bj->b', diff, cov_inv, diff))
```
### Distribution shift (PSI)
```python
def population_stability_index(expected, actual, bins=10):
e_hist, edges = np.histogram(expected, bins=bins)
a_hist, _ = np.histogram(actual, bins=edges)
e_pct = e_hist / len(expected) + 1e-9
a_pct = a_hist / len(actual) + 1e-9
return ((a_pct - e_pct) * np.log(a_pct / e_pct)).sum()
# 매 < 0.1: stable; > 0.25: significant shift
```
### Group robustness (Worst-Group)
```python
def worst_group_acc(predictions, labels, groups):
group_accs = {}
for g in np.unique(groups):
mask = groups == g
group_accs[g] = (predictions[mask] == labels[mask]).mean()
return min(group_accs.values()), group_accs
```
### Domain generalization (DRO)
```python
def dro_loss(losses_per_group, eta=1.0):
"""매 distributionally robust opt."""
return np.exp(losses_per_group * eta).mean()
```
### Augmentation (improve generalization)
```python
import torchvision.transforms as T
augment = T.Compose([
T.RandomHorizontalFlip(),
T.RandomCrop(32, padding=4),
T.ColorJitter(0.2, 0.2, 0.2),
T.AutoAugment(),
])
```
### Mixup (interpolation)
```python
def mixup(x, y, alpha=0.4):
lam = np.random.beta(alpha, alpha)
idx = torch.randperm(x.size(0))
x_mix = lam * x + (1 - lam) * x[idx]
y_a, y_b = y, y[idx]
return x_mix, y_a, y_b, lam
```
### SAM (Sharpness-Aware Minimization)
```python
from torch.optim import Optimizer
class SAM(Optimizer):
def __init__(self, params, base_optim, rho=0.05):
super().__init__(params, dict())
self.base = base_optim; self.rho = rho
```
### Flat-minima detection
```python
def flatness(model, loss_fn, X, y, eps=0.01, n_perturb=20):
base = loss_fn(model(X), y).item()
perturbed = []
for _ in range(n_perturb):
for p in model.parameters():
p.data += eps * torch.randn_like(p)
perturbed.append(loss_fn(model(X), y).item())
for p in model.parameters():
p.data -= eps * torch.randn_like(p) # 매 simplified
return np.mean(perturbed) - base
```
### Scaling law extrapolation
```python
def power_law(N, alpha, beta, eps):
return alpha + beta / N ** eps
from scipy.optimize import curve_fit
def fit_scaling(model_sizes, losses):
return curve_fit(power_law, model_sizes, losses, p0=[1, 1, 0.5])[0]
```
### Robustness eval
```python
def robustness_eval(model, attacks):
results = {}
for name, attack_fn in attacks.items():
adv_X = attack_fn(model, X_test, y_test)
results[name] = (model(adv_X).argmax(-1) == y_test).float().mean().item()
return results
```
### Calibration (ECE)
```python
def expected_calibration_error(probs, labels, n_bins=10):
bin_edges = np.linspace(0, 1, n_bins + 1)
ece = 0
for i in range(n_bins):
mask = (probs >= bin_edges[i]) & (probs < bin_edges[i+1])
if mask.sum() == 0: continue
bin_acc = labels[mask].mean()
bin_conf = probs[mask].mean()
ece += (mask.sum() / len(probs)) * abs(bin_acc - bin_conf)
return ece
```
### Transfer learning eval
```python
def transfer_score(source_model, target_X, target_y):
"""매 frozen feature → linear probe."""
feats = source_model.encode(target_X)
from sklearn.linear_model import LogisticRegression
return LogisticRegression().fit(feats, target_y).score(feats, target_y)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Overfit (small data) | Augment + early stop |
| Underfit | More capacity |
| Distribution shift | Monitoring + retrain |
| OOD robustness | Augment + DRO |
| Few-shot | Foundation model + transfer |
| Production | + monitor + calibration |
**기본값**: 매 augmentation + early stop + flat min (SAM/SWA) + OOD detect + monitor PSI in prod.
## 🔗 Graph
- 부모: [[Machine-Learning]]
- 변형: [[Double-Descent]]
- 응용: [[Foundation-Models]] · [[Domain-Adaptation]]
- Adjacent: [[Epistemic-Uncertainty]] · [[Concept-Drift]]
## 🤖 LLM 활용
**언제**: 매 모든 ML deployment. 매 monitoring. 매 robustness eval.
**언제 X**: 매 train-only academic.
## ❌ 안티패턴
- **Test set leak**: 매 fake high score.
- **No OOD eval**: 매 production failure.
- **Capacity ↓ 의 always**: 매 modern DL 의 reverse.
- **No calibration**: 매 confidence misleading.
- **No drift monitor**: 매 silent degrade.
## 🧪 검증 / 중복
- Verified (Belkin 2019, Power Grokking 2022, Hoffmann Chinchilla, Vapnik SLT).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — bias-var + 매 double descent / OOD / DRO / SAM / scaling code |