"매 model 의 simple 의 underfit + 매 complex 의 overfit". 매 generalization 의 sweet spot 의 search. 매 modern deep learning 의 double descent 의 classical U-shape 의 break — 매 over-parameterized 의 다시 낮은 error.
Bias²: 매 systematic error (model 의 wrong assumption).
Variance: 매 sample variation 의 sensitivity.
Irreducible noise σ²: 매 cannot reduce.
매 symptom
증상
Bias
Variance
진단
Train↓ Test↓
high
low
underfit
Train↑ Test↓
low
high
overfit
Train↑ Test↑
low
low
well-fit
Train↓ Test↑
—
—
bug (data leak / wrong split)
매 control
Bias ↓ (model 의 capacity ↑)
매 더 큰 model.
매 feature 의 add.
매 less regularization.
매 longer training.
Variance ↓ (overfit 방지)
매 더 많은 data.
매 regularization (L1, L2).
매 dropout.
매 early stopping.
매 ensemble.
매 data augmentation.
매 modern surprise: Double Descent
매 classical U-shape: 매 capacity ↑ → variance ↑.
매 modern: 매 over-parameterized region 의 error 의 다시 ↓.
매 phenomenon: model size ↑ + data ↑ → 매 zero training loss + good generalization.
매 implicit regularization (SGD).
매 GPT / Vision Transformer 의 underlying.
→ Belkin et al. 2019, Nakkiran et al. 2019.
매 tool
Validation
Train / val / test split.
K-fold cross-validation.
Stratified (imbalanced).
Diagnostic
Learning curve (data size vs error).
Validation curve (hyperparam vs error).
Residual plot.
Regularization
L1 (Lasso): 매 sparse.
L2 (Ridge): 매 weight ↓.
Elastic Net: 매 mix.
Dropout: 매 NN.
Batch norm: 매 stabilize.
Weight decay: 매 AdamW.
매 ensemble
Bagging: 매 variance ↓ (Random Forest).
Boosting: 매 bias ↓ (XGBoost, LightGBM).
Stacking: 매 mix.
💻 패턴
Diagnostic — learning curve
fromsklearn.model_selectionimportlearning_curveimportnumpyasnptrain_sizes,train_scores,val_scores=learning_curve(estimator=model,X=X,y=y,train_sizes=np.linspace(0.1,1.0,10),cv=5,scoring='accuracy',)# 매 plotimportmatplotlib.pyplotaspltplt.plot(train_sizes,train_scores.mean(axis=1),label='train')plt.plot(train_sizes,val_scores.mean(axis=1),label='val')plt.legend()# 매 gap 의 큰 = 매 high variance.# 매 둘 다 낮 = 매 high bias.
Validation curve (hyperparam)
fromsklearn.model_selectionimportvalidation_curveparam_range=np.logspace(-3,3,7)train_scores,val_scores=validation_curve(estimator=Ridge(),X=X,y=y,param_name='alpha',param_range=param_range,cv=5,)plt.semilogx(param_range,train_scores.mean(axis=1),label='train')plt.semilogx(param_range,val_scores.mean(axis=1),label='val')# 매 sweet spot 의 visual.
Regularization (PyTorch)
importtorch.nnasnnimporttorch.optimasoptimmodel=nn.Sequential(nn.Linear(100,256),nn.ReLU(),nn.Dropout(0.3),# 매 variance ↓nn.Linear(256,128),nn.ReLU(),nn.Dropout(0.3),nn.Linear(128,10),)# 매 weight decay = L2optimizer=optim.AdamW(model.parameters(),lr=1e-3,weight_decay=1e-4)
fromtorchvisionimporttransformsaug=transforms.Compose([transforms.RandomHorizontalFlip(),transforms.RandomCrop(224,padding=4),transforms.ColorJitter(0.2,0.2,0.2),transforms.RandAugment(),# 매 moderntransforms.ToTensor(),transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),])
Cross-validation
fromsklearn.model_selectionimportcross_val_scorescores=cross_val_score(model,X,y,cv=5,scoring='neg_mean_squared_error')print(f'MSE: {-scores.mean():.4f} ± {scores.std():.4f}')# 매 std 큼 = 매 unstable / high variance.
🤔 결정 기준
진단
처방
Underfit
매 model bigger / 매 feature 추가 / 매 regularization ↓
Overfit
매 data 추가 / 매 regularization ↑ / 매 simpler / 매 augment
Stuck
매 LR 조정 / 매 different optimizer / 매 architecture
Train↑ Val↓ huge gap
매 dropout / 매 weight decay / 매 early stop
Both ↓
매 capacity ↑ / 매 longer / 매 better feature
기본값: 매 baseline + learning curve. 매 overfit 의 detect 후 regularize.