"매 conditional expectation 의 functional form 의 estimate". Galton/Pearson 의 origin 의 매 OLS, GLM, regularized, robust, quantile, mixed-effects 의 spectrum — 매 modern ML 의 baseline + interpretability tool.
매 핵심
매 OLS basics
Model: y = Xβ + ε, ε ~ 𝓝(0, σ²I).
Closed form: β̂ = (XᵀX)⁻¹Xᵀy.
Gauss-Markov: 매 BLUE 하 의 1-5 assumptions (linearity, exogeneity, no multicoll, homosked, no autocorr).
Inference: t-tests, F-test, R², adj-R².
매 extensions
GLM: g(E[y]) = Xβ — logistic, Poisson, Gamma, NB.
Regularized: Ridge (L2), Lasso (L1), Elastic Net.
Robust: Huber, RANSAC — 매 outlier resistant.
Quantile: 매 conditional quantile 의 estimate.
Mixed-effects: 매 random + fixed — clustered / hierarchical data.
매 진단
Residual plots (linearity, homosked).
QQ-plot (normality).
VIF (multicollinearity).
Cook's distance (influence).
Durbin-Watson (autocorrelation).
매 응용
Pricing / demand modeling.
A/B test analysis (regression-adjusted).
Causal inference (with assumptions).
ML baseline before deep models.
Ablation in scientific research.
💻 패턴
Statsmodels OLS with full inference
importstatsmodels.apiassmimportpandasaspdX=sm.add_constant(df[["x1","x2","x3"]])model=sm.OLS(df["y"],X).fit(cov_type="HC3")# 매 robust SEprint(model.summary())
importstatsmodels.apiassmm=sm.Logit(y,sm.add_constant(X)).fit()print(m.summary())print(np.exp(m.params))# 매 odds ratios
Poisson / Negative Binomial
m_poi=sm.GLM(y,sm.add_constant(X),family=sm.families.Poisson()).fit()m_nb=sm.GLM(y,sm.add_constant(X),family=sm.families.NegativeBinomial(alpha=1.0)).fit()# 매 dispersion check: chi²/df ≈ 1 → Poisson OK.
언제: 매 interpretability + uncertainty 의 priority — 매 deep model 보다 first try. 매 baseline establishment.
언제 X: 매 high-dim image / text / audio — 매 deep features 의 superior.
❌ 안티패턴
Assumptions 미검증: 매 Gauss-Markov 의 violate 한 채 inference 의 trust.
R² 추구 over-fit: 매 predictors 의 throw — 매 adjusted-R²/CV 의 use.
Standardization 누락 in regularized: 매 penalty 의 unit-dependent.
Multicollinearity 무시: 매 unstable coefficients, 매 VIF check.
Causal claim from observational regression: 매 confounders without DAG/IV/RD/DiD.
🧪 검증 / 중복
Verified (Hastie ESL 2e Ch3; Wooldridge Econometrics 7e; Gelman & Hill Data Analysis Using Regression).
신뢰도 A.
🕓 Changelog
날짜
변경
2026-05-08
Phase 1
2026-05-10
Manual cleanup — full OLS/GLM/regularized/robust/quantile/mixed spec