"매 L2 penalty 의 OLS 의 stabilize". Hoerl & Kennard (1970) 의 multicollinearity 의 fix 의 introduce — 매 modern ML 의 baseline regularizer 의 사용 (sklearn Ridge, RidgeCV).
XᵀX + αI 는 invertible — 매 multicollinear 한 X 도 OK.
OLS 의 (XᵀX)⁻¹ 는 singular 가능 → ridge 가 fix.
매 응용
Multicollinear features (correlated predictors).
p > n (features more than samples) — gene expression, fMRI.
Baseline 의 sklearn pipelines.
💻 패턴
Sklearn Ridge
fromsklearn.linear_modelimportRidge,RidgeCVfromsklearn.preprocessingimportStandardScalerfromsklearn.pipelineimportmake_pipeline# Always scale before ridge — penalty is scale-dependentmodel=make_pipeline(StandardScaler(),Ridge(alpha=1.0))model.fit(X_train,y_train)print(model.score(X_test,y_test))
Cross-validated alpha
# RidgeCV picks best alpha from gridridge=RidgeCV(alphas=[0.01,0.1,1.0,10.0,100.0],cv=5)ridge.fit(X_train,y_train)print(f"Best alpha: {ridge.alpha_}")
fromsklearn.kernel_ridgeimportKernelRidge# Non-linear ridge via kernel trickkrr=KernelRidge(alpha=1.0,kernel='rbf',gamma=0.1)krr.fit(X_train,y_train)
Bayesian view
fromsklearn.linear_modelimportBayesianRidge# Ridge as Gaussian prior on β with variance 1/αbr=BayesianRidge()br.fit(X_train,y_train)print(br.coef_,br.alpha_)# learned alpha
Regularization path
importnumpyasnpfromsklearn.linear_modelimportRidgealphas=np.logspace(-3,3,50)coefs=[]forainalphas:r=Ridge(alpha=a).fit(X,y)coefs.append(r.coef_)# Plot coefs vs log(alpha) — see shrinkage
매 결정 기준
상황
Approach
Multicollinear features
Ridge
Need feature selection
Lasso (L1)
Mix of sparsity + grouping
Elastic Net
Non-linear pattern
Kernel Ridge
Bayesian uncertainty
BayesianRidge
기본값: RidgeCV with log-spaced alpha grid + StandardScaler.
언제: Tabular regression 의 strong baseline. Multicollinear features (correlated predictors) 의 시. p > n 의 high-dim setting. Linear model interpretability 의 keep.
언제 X: Sparse feature selection 의 필요 (use Lasso). Strong non-linearity (use trees/NN). N >> p 의 와 features uncorrelated → OLS 도 충분.
❌ 안티패턴
No scaling: Ridge penalty 의 scale-sensitive — features 의 다른 scale 의 → 매 unfair shrinkage.
Manual alpha pick: 매 RidgeCV 의 use, magic number alpha=1.0 의 X.
Ridge for sparsity: L2 의 X coefficient 의 zero 의 안 만든다 — Lasso 의 use.
Ignoring intercept: sklearn 의 default 는 intercept 의 X regularize — but custom impl 의 watch.