--- id: wiki-2026-0508-bias-vs-variance title: Bias vs Variance Trade-off category: 10_Wiki/Topics status: verified canonical_id: self aliases: [bias-variance tradeoff, underfitting vs overfitting, double descent, generalization, regularization] duplicate_of: none source_trust_level: A confidence_score: 0.95 verification_status: applied tags: [ml-fundamentals, generalization, overfitting, underfitting, regularization, double-descent, deep-learning] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: scikit-learn / PyTorch --- # Bias vs Variance Trade-off ## 📌 한 줄 통찰 > **"매 model 의 simple 의 underfit + 매 complex 의 overfit"**. 매 generalization 의 sweet spot 의 search. 매 modern deep learning 의 **double descent** 의 classical U-shape 의 break — 매 over-parameterized 의 다시 낮은 error. ## 📖 핵심 ### 매 decomposition $$E[(y - \hat{f}(x))^2] = (Bias[\hat{f}(x)])^2 + Var[\hat{f}(x)] + \sigma^2$$ - **Bias²**: 매 systematic error (model 의 wrong assumption). - **Variance**: 매 sample variation 의 sensitivity. - **Irreducible noise** σ²: 매 cannot reduce. ### 매 symptom | 증상 | Bias | Variance | 진단 | |---|---|---|---| | Train↓ Test↓ | high | low | underfit | | Train↑ Test↓ | low | high | overfit | | Train↑ Test↑ | low | low | well-fit | | Train↓ Test↑ | — | — | bug (data leak / wrong split) | ### 매 control #### Bias ↓ (model 의 capacity ↑) - 매 더 큰 model. - 매 feature 의 add. - 매 less regularization. - 매 longer training. #### Variance ↓ (overfit 방지) - 매 더 많은 data. - 매 regularization (L1, L2). - 매 dropout. - 매 early stopping. - 매 ensemble. - 매 data augmentation. ### 매 modern surprise: Double Descent - 매 classical U-shape: 매 capacity ↑ → variance ↑. - 매 modern: 매 over-parameterized region 의 error 의 다시 ↓. - 매 phenomenon: model size ↑ + data ↑ → 매 zero training loss + good generalization. - 매 implicit regularization (SGD). - 매 GPT / Vision Transformer 의 underlying. → Belkin et al. 2019, Nakkiran et al. 2019. ### 매 tool #### Validation - **Train / val / test split**. - **K-fold cross-validation**. - **Stratified** (imbalanced). #### Diagnostic - **Learning curve** (data size vs error). - **Validation curve** (hyperparam vs error). - **Residual plot**. #### Regularization - **L1 (Lasso)**: 매 sparse. - **L2 (Ridge)**: 매 weight ↓. - **Elastic Net**: 매 mix. - **Dropout**: 매 NN. - **Batch norm**: 매 stabilize. - **Weight decay**: 매 AdamW. ### 매 ensemble - **Bagging**: 매 variance ↓ (Random Forest). - **Boosting**: 매 bias ↓ (XGBoost, LightGBM). - **Stacking**: 매 mix. ## 💻 패턴 ### Diagnostic — learning curve ```python from sklearn.model_selection import learning_curve import numpy as np train_sizes, train_scores, val_scores = learning_curve( estimator=model, X=X, y=y, train_sizes=np.linspace(0.1, 1.0, 10), cv=5, scoring='accuracy', ) # 매 plot import matplotlib.pyplot as plt plt.plot(train_sizes, train_scores.mean(axis=1), label='train') plt.plot(train_sizes, val_scores.mean(axis=1), label='val') plt.legend() # 매 gap 의 큰 = 매 high variance. # 매 둘 다 낮 = 매 high bias. ``` ### Validation curve (hyperparam) ```python from sklearn.model_selection import validation_curve param_range = np.logspace(-3, 3, 7) train_scores, val_scores = validation_curve( estimator=Ridge(), X=X, y=y, param_name='alpha', param_range=param_range, cv=5, ) plt.semilogx(param_range, train_scores.mean(axis=1), label='train') plt.semilogx(param_range, val_scores.mean(axis=1), label='val') # 매 sweet spot 의 visual. ``` ### Regularization (PyTorch) ```python import torch.nn as nn import torch.optim as optim model = nn.Sequential( nn.Linear(100, 256), nn.ReLU(), nn.Dropout(0.3), # 매 variance ↓ nn.Linear(256, 128), nn.ReLU(), nn.Dropout(0.3), nn.Linear(128, 10), ) # 매 weight decay = L2 optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4) ``` ### Early stopping ```python class EarlyStopping: def __init__(self, patience=5, min_delta=0): self.patience = patience self.min_delta = min_delta self.best = float('inf') self.counter = 0 def __call__(self, val_loss): if val_loss < self.best - self.min_delta: self.best = val_loss self.counter = 0 return False self.counter += 1 return self.counter >= self.patience stopper = EarlyStopping(patience=10) for epoch in range(max_epochs): train_step() val_loss = evaluate() if stopper(val_loss): break ``` ### Data augmentation (anti-overfit) ```python from torchvision import transforms aug = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomCrop(224, padding=4), transforms.ColorJitter(0.2, 0.2, 0.2), transforms.RandAugment(), # 매 modern transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ]) ``` ### Cross-validation ```python from sklearn.model_selection import cross_val_score scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error') print(f'MSE: {-scores.mean():.4f} ± {scores.std():.4f}') # 매 std 큼 = 매 unstable / high variance. ``` ## 🤔 결정 기준 | 진단 | 처방 | |---|---| | Underfit | 매 model bigger / 매 feature 추가 / 매 regularization ↓ | | Overfit | 매 data 추가 / 매 regularization ↑ / 매 simpler / 매 augment | | Stuck | 매 LR 조정 / 매 different optimizer / 매 architecture | | Train↑ Val↓ huge gap | 매 dropout / 매 weight decay / 매 early stop | | Both ↓ | 매 capacity ↑ / 매 longer / 매 better feature | **기본값**: 매 baseline + learning curve. 매 overfit 의 detect 후 regularize. ## 🔗 Graph - 부모: [[Generalization]] - 변형: [[Generalization-in-AI|Overfitting]] · [[Double-Descent]] - 응용: [[L1-and-L2-Regularization|Regularization]] · [[Data-Augmentation]] - Adjacent: [[Ensemble-Methods]] · [[Random-Forest]] · [[XGBoost]] ## 🤖 LLM 활용 **언제**: 매 model debugging. 매 hyperparameter tuning. 매 capacity decision. 매 regularization choice. **언제 X**: 매 zero-shot LLM (다른 paradigm). 매 RL (다른 metric). ## ❌ 안티패턴 - **Test set 의 hyperparameter tune**: 매 leakage. - **No validation set**: 매 overfit 의 detect X. - **Data leakage**: 매 fake low variance. - **U-shape 의 strict 신뢰**: 매 modern double descent 의 ignore. - **Single split**: 매 noisy estimate. - **K-fold without stratify** (imbalanced): 매 misleading. ## 🧪 검증 / 중복 - Verified (Hastie ESL, Belkin double descent). - 신뢰도 A. - Related: [[L1-and-L2-Regularization|Regularization]] · [[Cross-Validation]] · [[Double-Descent]] · [[Generalization]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — decomposition + double descent + 매 sklearn / pytorch code |