Files
2nd/10_Wiki/Topics/AI_and_ML/L1-and-L2-Regularization.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

219 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-l1-and-l2-regularization
title: L1 and L2 Regularization
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [L1, L2, Lasso, Ridge, ElasticNet, weight decay, regularization]
duplicate_of: none
source_trust_level: A
confidence_score: 0.97
verification_status: applied
tags: [machine-learning, regularization, l1, l2, lasso, ridge, weight-decay]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: Python
framework: scikit-learn / PyTorch
---
# L1 and L2 Regularization
## 매 한 줄
> **"매 weight 의 magnitude 의 의 의 의 penalize"**. L1 (Lasso) → 매 sparsity. L2 (Ridge) → 매 small. ElasticNet (combine). 매 modern: 매 weight decay (DL), 매 AdamW의 decoupled. 매 dropout 도 regularizer.
## 매 핵심
### 매 L1 (Lasso)
- 매 penalty: λ Σ |wᵢ|.
- 매 effect: 매 sparse solutions (zeros).
- 매 응용: 매 feature selection.
### 매 L2 (Ridge)
- 매 penalty: λ Σ wᵢ².
- 매 effect: 매 small but non-zero.
- 매 응용: 매 multicollinearity.
### 매 ElasticNet
-α L1 + (1-α) L2.
### 매 modern DL
- **Weight decay** (= L2).
- **AdamW**: 매 decoupled weight decay (Loshchilov 2019).
- **Dropout**: 매 implicit reg.
- **Batch norm**: 매 implicit reg.
- **Early stopping**: 매 implicit reg.
### 매 응용
1. **Linear regression**: Ridge, Lasso.
2. **Logistic regression**: 매 class_weight + L2.
3. **DL training**: weight decay.
4. **Feature selection**: Lasso.
## 💻 패턴
### Ridge (sklearn)
```python
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0).fit(X, y)
```
### Lasso (sklearn)
```python
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.1).fit(X, y)
print((model.coef_ == 0).sum(), 'zero coefficients')
```
### ElasticNet
```python
from sklearn.linear_model import ElasticNet
model = ElasticNet(alpha=0.1, l1_ratio=0.5).fit(X, y)
```
### Logistic + L2
```python
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(penalty='l2', C=1.0).fit(X, y)
# 매 C = 1/alpha (inverse strength)
```
### PyTorch weight decay
```python
import torch
optim = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
# 매 = L2 in SGD
```
### AdamW (decoupled, recommended)
```python
optim = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=0.01)
# 매 better than Adam + weight_decay
```
### Manual L1 in PyTorch
```python
def l1_penalty(model, lam=1e-5):
return lam * sum(p.abs().sum() for p in model.parameters())
loss = task_loss + l1_penalty(model)
```
### CV-tune α (sklearn)
```python
from sklearn.linear_model import RidgeCV
model = RidgeCV(alphas=[0.01, 0.1, 1, 10, 100]).fit(X, y)
print(model.alpha_)
```
### LassoCV
```python
from sklearn.linear_model import LassoCV
model = LassoCV(alphas=np.logspace(-4, 0, 50), cv=5).fit(X, y)
```
### Path plot (regularization strength sweep)
```python
import matplotlib.pyplot as plt
alphas = np.logspace(-4, 1, 50)
coefs = []
for a in alphas:
coefs.append(Lasso(alpha=a).fit(X, y).coef_)
plt.plot(alphas, coefs)
plt.xscale('log')
plt.xlabel('alpha'); plt.ylabel('coefficient')
```
### Group L1 (group lasso)
```python
def group_lasso_penalty(weights, groups, lam):
total = 0
for group in groups:
total += lam * np.sqrt(sum(weights[i]**2 for i in group))
return total
```
### Different decay per layer (DL)
```python
optim = torch.optim.AdamW([
{'params': model.encoder.parameters(), 'weight_decay': 0.01},
{'params': model.head.parameters(), 'weight_decay': 0.001},
])
```
### Bias / norm exclude (best practice)
```python
def get_param_groups(model, weight_decay):
decay, no_decay = [], []
for name, p in model.named_parameters():
if p.requires_grad:
if 'bias' in name or 'norm' in name: no_decay.append(p)
else: decay.append(p)
return [
{'params': decay, 'weight_decay': weight_decay},
{'params': no_decay, 'weight_decay': 0},
]
optim = torch.optim.AdamW(get_param_groups(model, 0.01), lr=1e-3)
```
### Effect on bias-variance
```python
def reg_effect(alphas, X_train, y_train, X_val, y_val):
train_err, val_err = [], []
for a in alphas:
m = Ridge(alpha=a).fit(X_train, y_train)
train_err.append(((m.predict(X_train) - y_train) ** 2).mean())
val_err.append(((m.predict(X_val) - y_val) ** 2).mean())
return train_err, val_err
# 매 high alpha → train ↑, val ↓ (until point) → val ↑ (over-reg)
```
### Sparsity-induced (modern DL)
```python
def magnitude_pruning(model, sparsity=0.5):
"""매 매 layer 의 의 의 의 magnitude bottom-x% 의 zero out."""
for name, p in model.named_parameters():
if 'weight' in name:
threshold = p.abs().flatten().kthvalue(int(p.numel() * sparsity)).values
p.data[p.abs() < threshold] = 0
```
## 매 결정 기준
| 상황 | Method |
|---|---|
| Linear | Ridge / Lasso / ElasticNet |
| Feature selection | Lasso |
| Multicollinearity | Ridge |
| DL | AdamW weight decay |
| Sparsity goal | Lasso / pruning |
| Best DL practice | AdamW + exclude bias/norm |
**기본값**: 매 DL = AdamW + 0.01-0.1 weight decay + bias/norm exclude. 매 linear = ElasticNet CV. 매 sparsity = Lasso.
## 🔗 Graph
- 부모: [[L1-and-L2-Regularization|Regularization]] · [[Optimization]]
- 변형: [[Lasso]] · [[Ridge]] · [[ElasticNet]] · [[Weight-Decay]]
- Adjacent: [[Generalization-in-AI]]
## 🤖 LLM 활용
**언제**: 매 모든 ML training.
**언제 X**: 매 underfit (no need).
## ❌ 안티패턴
- **Adam + weight_decay**: 매 use AdamW.
- **Same decay for bias / norm**: 매 hurt training.
- **No CV α**: 매 wrong strength.
- **L1 for DL** (without sparsity goal): 매 unstable.
## 🧪 검증 / 중복
- Verified (Hastie-Tibshirani-Friedman, Loshchilov AdamW 2019).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — L1/L2 + 매 sklearn / AdamW / param groups / pruning code |