--- id: wiki-2026-0508-ridge-regression title: Ridge Regression category: 10_Wiki/Topics status: verified canonical_id: self aliases: [L2 Regularization, Tikhonov Regularization] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [machine-learning, regression, regularization, statistics] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: scikit-learn --- # Ridge Regression ## 매 한 줄 > **"매 L2 penalty 의 OLS 의 stabilize"**. Hoerl & Kennard (1970) 의 multicollinearity 의 fix 의 introduce — 매 modern ML 의 baseline regularizer 의 사용 (sklearn `Ridge`, `RidgeCV`). ## 매 핵심 ### 매 Loss function - OLS: `min ||y - Xβ||²` - Ridge: `min ||y - Xβ||² + α||β||²` - α (alpha) → regularization strength. α=0 → OLS. α→∞ → β→0. ### 매 Closed form - `β̂ = (XᵀX + αI)⁻¹ Xᵀy` - `XᵀX + αI` 는 invertible — 매 multicollinear 한 X 도 OK. - OLS 의 `(XᵀX)⁻¹` 는 singular 가능 → ridge 가 fix. ### 매 응용 1. Multicollinear features (correlated predictors). 2. p > n (features more than samples) — gene expression, fMRI. 3. Baseline 의 sklearn pipelines. ## 💻 패턴 ### Sklearn Ridge ```python from sklearn.linear_model import Ridge, RidgeCV from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline # Always scale before ridge — penalty is scale-dependent model = make_pipeline(StandardScaler(), Ridge(alpha=1.0)) model.fit(X_train, y_train) print(model.score(X_test, y_test)) ``` ### Cross-validated alpha ```python # RidgeCV picks best alpha from grid ridge = RidgeCV(alphas=[0.01, 0.1, 1.0, 10.0, 100.0], cv=5) ridge.fit(X_train, y_train) print(f"Best alpha: {ridge.alpha_}") ``` ### Closed-form by hand ```python import numpy as np def ridge_fit(X, y, alpha): n, p = X.shape I = np.eye(p) return np.linalg.solve(X.T @ X + alpha * I, X.T @ y) beta = ridge_fit(X_train, y_train, alpha=1.0) ``` ### SVD-based ridge (numerically stable) ```python def ridge_svd(X, y, alpha): U, s, Vt = np.linalg.svd(X, full_matrices=False) d = s / (s**2 + alpha) return Vt.T @ (d * (U.T @ y)) ``` ### Kernel Ridge ```python from sklearn.kernel_ridge import KernelRidge # Non-linear ridge via kernel trick krr = KernelRidge(alpha=1.0, kernel='rbf', gamma=0.1) krr.fit(X_train, y_train) ``` ### Bayesian view ```python from sklearn.linear_model import BayesianRidge # Ridge as Gaussian prior on β with variance 1/α br = BayesianRidge() br.fit(X_train, y_train) print(br.coef_, br.alpha_) # learned alpha ``` ### Regularization path ```python import numpy as np from sklearn.linear_model import Ridge alphas = np.logspace(-3, 3, 50) coefs = [] for a in alphas: r = Ridge(alpha=a).fit(X, y) coefs.append(r.coef_) # Plot coefs vs log(alpha) — see shrinkage ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Multicollinear features | Ridge | | Need feature selection | Lasso (L1) | | Mix of sparsity + grouping | Elastic Net | | Non-linear pattern | Kernel Ridge | | Bayesian uncertainty | BayesianRidge | **기본값**: `RidgeCV` with log-spaced alpha grid + StandardScaler. ## 🔗 Graph - 부모: [[Linear-Regression]] · [[L1-and-L2-Regularization|Regularization]] - 변형: [[Elastic-Net]] - 응용: [[Feature Engineering|Feature-Engineering]] · [[Bias vs Variance|Bias-Variance-Tradeoff]] - Adjacent: [[Singular-Value-Decomposition]] ## 🤖 LLM 활용 **언제**: Tabular regression 의 strong baseline. Multicollinear features (correlated predictors) 의 시. p > n 의 high-dim setting. Linear model interpretability 의 keep. **언제 X**: Sparse feature selection 의 필요 (use Lasso). Strong non-linearity (use trees/NN). N >> p 의 와 features uncorrelated → OLS 도 충분. ## ❌ 안티패턴 - **No scaling**: Ridge penalty 의 scale-sensitive — features 의 다른 scale 의 → 매 unfair shrinkage. - **Manual alpha pick**: 매 RidgeCV 의 use, magic number alpha=1.0 의 X. - **Ridge for sparsity**: L2 의 X coefficient 의 zero 의 안 만든다 — Lasso 의 use. - **Ignoring intercept**: sklearn 의 default 는 intercept 의 X regularize — but custom impl 의 watch. ## 🧪 검증 / 중복 - Verified (Hoerl & Kennard 1970, ESL Ch.3, sklearn docs). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Ridge regression with closed-form, SVD, kernel, Bayesian variants |