--- id: wiki-2026-0508-linear-regression-mastery title: Linear Regression Mastery category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Linear Regression, OLS, Ordinary Least Squares, Ridge, Lasso] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [machine-learning, regression, statistics, sklearn, ols, ridge, lasso] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: scikit-learn/statsmodels --- # Linear Regression Mastery ## 매 한 줄 > **"매 모든 ML의 출발점 — y = Xβ + ε"**. Linear regression은 feature와 target의 선형 관계를 OLS로 추정하는 모델. 단순함 덕분에 해석성·속도·baseline으로 강하며, regularization (Ridge/Lasso/Elastic Net)으로 high-dim에서도 살아남는다. 2026 시대에도 production tabular ML의 절반은 여전히 linear. ## 매 핵심 ### 매 OLS 수식 - 모델: $y = X\beta + \varepsilon$. - 목적: $\min_\beta \|y - X\beta\|^2$. - 닫힌해: $\hat\beta = (X^TX)^{-1}X^Ty$ (X full-rank일 때). - 기하적: y를 column space of X에 projection. ### 매 가정 (LINE) - **L**inearity: y와 X의 관계가 선형. - **I**ndependence: 잔차 i.i.d. - **N**ormality: 잔차 ~ N(0, σ²) (소표본일 때 inference에 필요). - **E**qual variance (homoscedasticity): 잔차 분산 일정. - **추가**: No multicollinearity (X feature 간 상관 낮음). ### 매 Regularized 변종 - **Ridge (L2)**: $\min \|y-X\beta\|^2 + \lambda\|\beta\|_2^2$ — 모든 계수 작게. - **Lasso (L1)**: $\min \|y-X\beta\|^2 + \lambda\|\beta\|_1$ — sparsity (feature selection). - **Elastic Net**: L1 + L2 — 상관된 feature 그룹 처리. ### 매 진단 - R² / Adjusted R²: 설명력. - RMSE / MAE: 예측 오차. - VIF > 10: multicollinearity 의심. - Residual plot: 패턴 있으면 비선형. - QQ plot: normality 체크. - Cook's distance: 영향력 큰 outlier. ### 매 응용 1. Tabular baseline (어떤 ML이든 첫 모델). 2. Feature 영향 해석 (coefficient). 3. Time-series trend. 4. A/B test effect size. 5. Causal inference (DiD, IV)의 backbone. ## 💻 패턴 ### sklearn — 기본 OLS ```python from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score import numpy as np X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42) model = LinearRegression().fit(X_tr, y_tr) pred = model.predict(X_te) print("R²:", r2_score(y_te, pred), "RMSE:", np.sqrt(mean_squared_error(y_te, pred))) print("coef:", dict(zip(feature_names, model.coef_))) ``` ### Ridge / Lasso / ElasticNet — CV로 alpha 선택 ```python from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline ridge = Pipeline([("sc", StandardScaler()), ("m", RidgeCV(alphas=np.logspace(-3, 3, 50), cv=5))]).fit(X_tr, y_tr) lasso = Pipeline([("sc", StandardScaler()), ("m", LassoCV(cv=5, max_iter=10000))]).fit(X_tr, y_tr) en = Pipeline([("sc", StandardScaler()), ("m", ElasticNetCV(l1_ratio=[.1,.5,.7,.9,.95,1], cv=5))]).fit(X_tr, y_tr) print("ridge alpha:", ridge.named_steps["m"].alpha_) print("lasso non-zero:", (lasso.named_steps["m"].coef_ != 0).sum()) ``` ### statsmodels — 통계적 추론 (p-value, CI) ```python import statsmodels.api as sm X_const = sm.add_constant(X_tr) ols = sm.OLS(y_tr, X_const).fit() print(ols.summary()) # coef, std-err, t, p, [95% CI] print("Cond no:", ols.condition_number) # >30 multicollinearity 의심 ``` ### 진단 — VIF + residual plot ```python from statsmodels.stats.outliers_influence import variance_inflation_factor import matplotlib.pyplot as plt vif = [variance_inflation_factor(X_tr.values, i) for i in range(X_tr.shape[1])] print(dict(zip(X_tr.columns, vif))) # >10이면 제거 또는 PCA resid = y_tr - model.predict(X_tr) plt.scatter(model.predict(X_tr), resid, alpha=.4); plt.axhline(0) plt.xlabel("fitted"); plt.ylabel("residual"); plt.show() ``` ### Polynomial features (비선형 처리) ```python from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2, interaction_only=False, include_bias=False) Xp_tr = poly.fit_transform(X_tr) Pipeline([("sc", StandardScaler()), ("m", RidgeCV())]).fit(Xp_tr, y_tr) # 항상 Ridge — 차원 폭증 ``` ### Bayesian linear regression — PyMC ```python import pymc as pm with pm.Model() as m: β = pm.Normal("β", 0, 10, shape=X_tr.shape[1]) α = pm.Normal("α", 0, 10) σ = pm.HalfNormal("σ", 5) y_obs = pm.Normal("y", α + X_tr.values @ β, σ, observed=y_tr) trace = pm.sample(1000, tune=1000, chains=4) pm.summary(trace, hdi_prob=0.95) ``` ### From scratch — gradient descent ```python import numpy as np def fit_gd(X, y, lr=1e-2, epochs=2000, l2=0.0): n, d = X.shape X_ = np.c_[np.ones(n), X] w = np.zeros(d + 1) for _ in range(epochs): grad = -2/n * X_.T @ (y - X_ @ w) + 2*l2 * np.r_[0, w[1:]] w -= lr * grad return w[0], w[1:] ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Baseline 빠르게 | OLS | | Multicollinearity / p>>n | Ridge | | Feature selection 원함 | Lasso | | 상관된 feature 그룹 | Elastic Net | | 비선형 의심 | Polynomial + Ridge or move to tree | | 통계적 추론 (p-value) | statsmodels | **기본값**: StandardScaler + RidgeCV — 안전, 해석 가능, 빠름. ## 🔗 Graph - 부모: [[Regression]] - 변형: [[Ridge-Regression]], [[Elastic-Net]], [[Logistic-Regression-Foundations]] - 응용: [[Time-Series-Analysis|Time-Series-Forecasting]], [[Causal-Inference]] - Adjacent: [[L1-and-L2-Regularization]], [[Feature Engineering|Feature-Engineering]] ## 🤖 LLM 활용 **언제**: feature engineering ideation, residual plot 해석, statsmodels output 설명. **언제 X**: 데이터 자체의 outlier 판단 — 도메인 지식 필요. ## ❌ 안티패턴 - **Scaling 안 함**: Ridge/Lasso는 scale에 민감. - **VIF 무시**: coefficient 부호 뒤집힘. - **R² 만 보고 판단**: 과적합 못 잡음 — adjusted R² 또는 CV 사용. - **잔차 plot 안 봄**: 비선형성 놓침. - **소표본에 polynomial deg 5+**: 폭주, overfit. ## 🧪 검증 / 중복 - Verified (ESL Hastie/Tibshirani, sklearn 1.5+, statsmodels 0.14). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — Ridge/Lasso/EN, 진단, Bayesian 추가 |