--- id: wiki-2026-0508-logistic-regression-foundations title: Logistic Regression Foundations category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Logistic Regression, Logit, Softmax Regression, Multinomial Logistic] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [machine-learning, classification, sklearn, mle, calibration] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: scikit-learn/statsmodels --- # Logistic Regression Foundations ## 매 한 줄 > **"매 분류의 baseline — 선형 logit + sigmoid + MLE"**. Logistic regression은 linear model로 log-odds를 추정하고 sigmoid로 확률화한 classifier. 해석성·calibration·속도가 뛰어나 production에서 여전히 1순위 baseline. 다중 class는 softmax(=multinomial)로 일반화. ## 매 핵심 ### 매 수식 - 모델: $P(y=1|x) = \sigma(x^T\beta) = \frac{1}{1+e^{-x^T\beta}}$. - Logit (log-odds): $\log\frac{p}{1-p} = x^T\beta$. - Loss (NLL/BCE): $-\sum_i [y_i\log p_i + (1-y_i)\log(1-p_i)]$. - 추정: MLE — closed-form 없음, IRLS / L-BFGS / SGD. ### 매 해석 - $\beta_j$ 1 단위 증가 → log-odds가 $\beta_j$ 만큼 증가. - $e^{\beta_j}$ = odds ratio (1.5면 50% odds 증가). - Sign과 magnitude 둘 다 의미 있음 (단, scale 통일 필수). ### 매 Multinomial / softmax - $P(y=k|x) = \frac{e^{x^T\beta_k}}{\sum_j e^{x^T\beta_j}}$. - sklearn `multi_class="multinomial"` (default 2026). - One-vs-Rest는 클래스 분포 불균형 시 유리할 수 있음. ### 매 Regularization - L2 (default): $C = 1/\lambda$. - L1: feature selection. - Elastic Net: SAGA solver. ### 매 Calibration - LR은 보통 잘 calibrated이지만 imbalanced + regularized하면 이탈. - 검증: `CalibratedClassifierCV`, reliability diagram. ### 매 응용 1. CTR / 전환 예측. 2. 신용 평가 (해석 필수 도메인). 3. 의료 risk score. 4. 텍스트 분류 (TF-IDF + LR — 강력한 baseline). 5. A/B test의 효과 추정 (treatment dummy). ## 💻 패턴 ### sklearn — 기본 ```python from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline clf = Pipeline([ ("sc", StandardScaler()), ("lr", LogisticRegression(C=1.0, max_iter=1000, n_jobs=-1)), ]).fit(X_tr, y_tr) proba = clf.predict_proba(X_te)[:, 1] ``` ### Hyper-param search (C, penalty) ```python from sklearn.model_selection import GridSearchCV grid = { "lr__C": [0.01, 0.1, 1, 10], "lr__penalty": ["l1", "l2"], "lr__solver": ["saga"], } gs = GridSearchCV(clf, grid, cv=5, scoring="roc_auc", n_jobs=-1).fit(X_tr, y_tr) print(gs.best_params_, gs.best_score_) ``` ### Multinomial (softmax) ```python multi = LogisticRegression( multi_class="multinomial", solver="lbfgs", C=1.0, max_iter=2000, ).fit(X_tr, y_tr) print(multi.classes_, multi.coef_.shape) # (n_classes, n_features) ``` ### statsmodels — odds ratio + p-value ```python import statsmodels.api as sm X_const = sm.add_constant(X_tr) logit = sm.Logit(y_tr, X_const).fit(disp=False) print(logit.summary()) import numpy as np print("Odds ratios:") print(np.exp(logit.params)) ``` ### Imbalanced classes ```python # 1) class_weight LogisticRegression(class_weight="balanced") # 2) threshold tuning (default 0.5는 거의 항상 잘못) from sklearn.metrics import precision_recall_curve prec, rec, thr = precision_recall_curve(y_te, proba) f1 = 2*prec*rec / (prec+rec+1e-9) best_thr = thr[f1[:-1].argmax()] pred = (proba >= best_thr).astype(int) ``` ### Calibration 점검 + 보정 ```python from sklearn.calibration import CalibratedClassifierCV, calibration_curve import matplotlib.pyplot as plt cal = CalibratedClassifierCV(clf, method="isotonic", cv=5).fit(X_tr, y_tr) prob_cal = cal.predict_proba(X_te)[:, 1] prob_obs, prob_pred = calibration_curve(y_te, prob_cal, n_bins=10) plt.plot(prob_pred, prob_obs, marker="o"); plt.plot([0,1],[0,1],"--"); plt.show() ``` ### From scratch — gradient descent ```python import numpy as np def sigmoid(z): return 1.0 / (1.0 + np.exp(-z)) def fit_lr(X, y, lr=0.1, epochs=1000, l2=0.01): X = np.c_[np.ones(len(X)), X]; w = np.zeros(X.shape[1]) for _ in range(epochs): p = sigmoid(X @ w) grad = X.T @ (p - y) / len(y) + l2 * np.r_[0, w[1:]] w -= lr * grad return w ``` ### PyTorch — large-scale logistic ```python import torch, torch.nn as nn class LR(nn.Module): def __init__(self, d, k): super().__init__() self.lin = nn.Linear(d, k) def forward(self, x): return self.lin(x) model = LR(X.shape[1], n_classes).cuda() opt = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4) loss_fn = nn.CrossEntropyLoss() for epoch in range(20): for xb, yb in loader: opt.zero_grad() loss = loss_fn(model(xb.cuda()), yb.cuda()) loss.backward(); opt.step() ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Tabular baseline | sklearn LR + StandardScaler | | 해석 필요 (의료/금융) | statsmodels Logit + odds ratio | | Imbalanced | class_weight + threshold tuning | | Multi-class | multinomial + lbfgs | | Sparse / feature selection | L1 + saga | | Probability를 점수로 사용 | CalibratedClassifierCV | | 대규모 | PyTorch SGD/Adam | **기본값**: StandardScaler + LR(C=1, L2) + threshold tuning. ## 🔗 Graph - 변형: [[Linear-Discriminant-Analysis]] - Adjacent: [[Linear-Regression-Mastery]], [[L1-and-L2-Regularization]] ## 🤖 LLM 활용 **언제**: 결과 해석 (odds ratio 설명), feature importance narrative, calibration plot 코멘트. **언제 X**: 도메인 cutoff (금융 risk threshold) — 비즈니스/규제가 결정. ## ❌ 안티패턴 - **Scaling 없이 regularize**: 큰 scale feature가 패널티 지배. - **Threshold 0.5 그대로**: imbalanced일 때 거의 잘못된 선택. - **Probability를 그대로 신뢰**: calibration 안 함. - **Multi-class에 OvR만 사용**: 클래스 간 정보 손실. - **수렴 안 되는데 max_iter 안 늘림**: warning 무시 → 부정확한 coef. ## 🧪 검증 / 중복 - Verified (ESL Ch.4, sklearn 1.5+, statsmodels 0.14, Kaggle baselines). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — calibration, threshold tuning, multinomial 추가 |