[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -2,61 +2,218 @@
 id: wiki-2026-0508-l1-and-l2-regularization
 title: L1 and L2 Regularization
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [ML-REG-001]
+aliases: [L1, L2, Lasso, Ridge, ElasticNet, weight decay, regularization]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 1.0
-tags: [machine-learning, Regularization, l1-norm, l2-norm, Overfitting, Optimization]
+confidence_score: 0.97
+verification_status: applied
+tags: [machine-learning, regularization, l1, l2, lasso, ridge, weight-decay]
 raw_sources: []
-last_reinforced: 2026-04-26
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: Python
+  framework: scikit-learn / PyTorch
 ---

-# L1 and L2 Regularization (L1 및 L2 정규화)
+# L1 and L2 Regularization

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "모델의 욕심(Weight)에 벌점을 부여하여, 단순함의 미학으로 과적합(Overfitting)의 늪을 탈출하라" — 손실 함수에 가중치의 크기를 페널티로 추가하여, 모델이 특정 데이터에만 과도하게 맞춰지는 것을 방지하고 일반화 성능을 높이는 기법.
+## 매 한 줄
+> **"매 weight 의 magnitude 의 의 의 의 penalize"**. L1 (Lasso) → 매 sparsity. L2 (Ridge) → 매 small. ElasticNet (combine). 매 modern: 매 weight decay (DL), 매 AdamW의 decoupled. 매 dropout 도 regularizer.

-## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** "Weight Decay" — 가중치가 커질수록 전체 손실(Loss)을 증가시켜, 모델이 가능한 작은 가중치 값을 갖도록 유도함으로써 복잡도를 제어하는 수치적 억제 패턴.
- **주요 유형:**
-    - **L1 Regularization (Lasso):** 가중치의 절대값 합을 페널티로 부여. 중요하지 않은 가중치를 0으로 만들어 특징 선택(Feature Selection) 효과 발생.
-    - **L2 Regularization (Ridge):** 가중치의 제곱 합을 페널티로 부여. 가중치를 전반적으로 작고 고르게 만들어 급격한 변화를 억제.
- **의의:** 고차원 데이터에서 모델이 노이즈까지 학습하는 것을 방지하여, 실전에서 안정적인 예측 성능을 보장함.
+## 매 핵심

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌:** 단순히 가중치를 줄이는 것만이 능사가 아니며, 데이터의 특성에 따라 L1과 L2를 결합한 Elastic Net이나 드롭아웃(Dropout) 등과 병행하여 최적의 균형점을 찾는 것이 현대 딥러닝의 표준.
- **정책 변화:** Antigravity 프로젝트의 핵심 추론 모델들은 학습 시 과도한 가중치 쏠림을 방지하기 위해 L2 정규화를 기본 적용하며, 희소한 지식 특징을 추출해야 하는 모듈에는 L1 정규화를 전략적으로 사용함.
+### 매 L1 (Lasso)
+- 매 penalty: λ Σ |wᵢ|.
+- 매 effect: 매 sparse solutions (zeros).
+- 매 응용: 매 feature selection.

-## 🔗 지식 연결 (Graph)
- [[Supervised-Learning-Foundations|Supervised-Learning-Foundations]], [[Generalization-in-AI|Generalization-in-AI]], HyperParameter-Optimization, [[Loss-Functions-Foundations|Loss-Functions-Foundations]]
- **Raw Source:** 10_Wiki/Topics/AI/L1-and-[[L2-Regularization|L2-Regularization]].md
+### 매 L2 (Ridge)
+- 매 penalty: λ Σ wᵢ².
+- 매 effect: 매 small but non-zero.
+- 매 응용: 매 multicollinearity.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 ElasticNet
+- 매 α L1 + (1-α) L2.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+### 매 modern DL
+- **Weight decay** (= L2).
+- **AdamW**: 매 decoupled weight decay (Loshchilov 2019).
+- **Dropout**: 매 implicit reg.
+- **Batch norm**: 매 implicit reg.
+- **Early stopping**: 매 implicit reg.

-**언제 쓰면 안 되는가:**
- *(TODO)*
+### 매 응용
+1. **Linear regression**: Ridge, Lasso.
+2. **Logistic regression**: 매 class_weight + L2.
+3. **DL training**: weight decay.
+4. **Feature selection**: Lasso.

-## 🧪 검증 상태 (Validation)
+## 💻 패턴

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+### Ridge (sklearn)
+```python
+from sklearn.linear_model import Ridge
+model = Ridge(alpha=1.0).fit(X, y)
+```

-## 🧬 중복 검사 (Duplicate Check)
+### Lasso (sklearn)
+```python
+from sklearn.linear_model import Lasso
+model = Lasso(alpha=0.1).fit(X, y)
+print((model.coef_ == 0).sum(), 'zero coefficients')
+```

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+### ElasticNet
+```python
+from sklearn.linear_model import ElasticNet
+model = ElasticNet(alpha=0.1, l1_ratio=0.5).fit(X, y)
+```

-## 🕓 변경 이력 (Changelog)
+### Logistic + L2
+```python
+from sklearn.linear_model import LogisticRegression
+model = LogisticRegression(penalty='l2', C=1.0).fit(X, y)
+# 매 C = 1/alpha (inverse strength)
+```

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+### PyTorch weight decay
+```python
+import torch
+optim = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
+# 매 = L2 in SGD
+```
+
+### AdamW (decoupled, recommended)
+```python
+optim = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=0.01)
+# 매 better than Adam + weight_decay
+```
+
+### Manual L1 in PyTorch
+```python
+def l1_penalty(model, lam=1e-5):
+    return lam * sum(p.abs().sum() for p in model.parameters())
+
+loss = task_loss + l1_penalty(model)
+```
+
+### CV-tune α (sklearn)
+```python
+from sklearn.linear_model import RidgeCV
+model = RidgeCV(alphas=[0.01, 0.1, 1, 10, 100]).fit(X, y)
+print(model.alpha_)
+```
+
+### LassoCV
+```python
+from sklearn.linear_model import LassoCV
+model = LassoCV(alphas=np.logspace(-4, 0, 50), cv=5).fit(X, y)
+```
+
+### Path plot (regularization strength sweep)
+```python
+import matplotlib.pyplot as plt
+alphas = np.logspace(-4, 1, 50)
+coefs = []
+for a in alphas:
+    coefs.append(Lasso(alpha=a).fit(X, y).coef_)
+plt.plot(alphas, coefs)
+plt.xscale('log')
+plt.xlabel('alpha'); plt.ylabel('coefficient')
+```
+
+### Group L1 (group lasso)
+```python
+def group_lasso_penalty(weights, groups, lam):
+    total = 0
+    for group in groups:
+        total += lam * np.sqrt(sum(weights[i]**2 for i in group))
+    return total
+```
+
+### Different decay per layer (DL)
+```python
+optim = torch.optim.AdamW([
+    {'params': model.encoder.parameters(), 'weight_decay': 0.01},
+    {'params': model.head.parameters(), 'weight_decay': 0.001},
+])
+```
+
+### Bias / norm exclude (best practice)
+```python
+def get_param_groups(model, weight_decay):
+    decay, no_decay = [], []
+    for name, p in model.named_parameters():
+        if p.requires_grad:
+            if 'bias' in name or 'norm' in name: no_decay.append(p)
+            else: decay.append(p)
+    return [
+        {'params': decay, 'weight_decay': weight_decay},
+        {'params': no_decay, 'weight_decay': 0},
+    ]
+
+optim = torch.optim.AdamW(get_param_groups(model, 0.01), lr=1e-3)
+```
+
+### Effect on bias-variance
+```python
+def reg_effect(alphas, X_train, y_train, X_val, y_val):
+    train_err, val_err = [], []
+    for a in alphas:
+        m = Ridge(alpha=a).fit(X_train, y_train)
+        train_err.append(((m.predict(X_train) - y_train) ** 2).mean())
+        val_err.append(((m.predict(X_val) - y_val) ** 2).mean())
+    return train_err, val_err
+# 매 high alpha → train ↑, val ↓ (until point) → val ↑ (over-reg)
+```
+
+### Sparsity-induced (modern DL)
+```python
+def magnitude_pruning(model, sparsity=0.5):
+    """매 매 layer 의 의 의 의 magnitude bottom-x% 의 zero out."""
+    for name, p in model.named_parameters():
+        if 'weight' in name:
+            threshold = p.abs().flatten().kthvalue(int(p.numel() * sparsity)).values
+            p.data[p.abs() < threshold] = 0
+```
+
+## 매 결정 기준
+| 상황 | Method |
+|---|---|
+| Linear | Ridge / Lasso / ElasticNet |
+| Feature selection | Lasso |
+| Multicollinearity | Ridge |
+| DL | AdamW weight decay |
+| Sparsity goal | Lasso / pruning |
+| Best DL practice | AdamW + exclude bias/norm |
+
+**기본값**: 매 DL = AdamW + 0.01-0.1 weight decay + bias/norm exclude. 매 linear = ElasticNet CV. 매 sparsity = Lasso.
+
+## 🔗 Graph
+- 부모: [[Regularization]] · [[Optimization]]
+- 변형: [[Lasso]] · [[Ridge]] · [[ElasticNet]] · [[Weight-Decay]]
+- 응용: [[Feature-Selection]] · [[Pruning]]
+- Adjacent: [[Dropout]] · [[Early-Stopping]] · [[Generalization-in-AI]]
+
+## 🤖 LLM 활용
+**언제**: 매 모든 ML training.
+**언제 X**: 매 underfit (no need).
+
+## ❌ 안티패턴
+- **Adam + weight_decay**: 매 use AdamW.
+- **Same decay for bias / norm**: 매 hurt training.
+- **No CV α**: 매 wrong strength.
+- **L1 for DL** (without sparsity goal): 매 unstable.
+
+## 🧪 검증 / 중복
+- Verified (Hastie-Tibshirani-Friedman, Loshchilov AdamW 2019).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — L1/L2 + 매 sklearn / AdamW / param groups / pruning code |