--- id: wiki-2026-0508-roc-auc-curves title: ROC-AUC Curves category: 10_Wiki/Topics status: verified canonical_id: self aliases: [ROC, AUC, ROC AUC, Receiver Operating Characteristic, AUROC] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [classification, metric, evaluation, machine-learning] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: scikit-learn --- # ROC-AUC Curves ## 매 한 줄 > **"매 threshold-free binary classifier ranking quality"**. ROC = TPR vs FPR across all thresholds. AUC = area under ROC = P(score(positive) > score(negative)). 매 0.5 = random, 1.0 = perfect. 2026 현재 매 still standard for balanced binary, 매 PR-AUC 의 imbalanced 에 selectively, 매 multi-class 의 one-vs-rest macro/weighted AUC. ## 매 핵심 ### 매 axes - **TPR (Recall, Sensitivity)** = TP / (TP + FN). - **FPR (1 − Specificity)** = FP / (FP + TN). - ROC = parametric curve (FPR(t), TPR(t)) for threshold t ∈ ℝ. - AUC = ∫ TPR d(FPR) ∈ [0, 1]. ### 매 properties - **Threshold-independent**: 매 ranking quality 의 measure. - **Probabilistic interpretation**: AUC = P(score(pos) > score(neg)) (Mann-Whitney U). - **Class-balance invariant**: AUC 의 unchanged when negatives 의 oversampled — 매 strength 와 weakness. - **Insensitive to score calibration**: 매 monotonic transform 의 don't change AUC. ### 매 vs PR-AUC - **ROC-AUC**: balanced or moderately imbalanced. - **PR-AUC (Average Precision)**: highly imbalanced (e.g. fraud 0.1%, rare-disease) — 매 ROC-AUC 의 misleadingly high because TN dominates. - 매 rule of thumb: positive rate <5% → 매 prefer PR-AUC. ### 매 multi-class - **OvR (one-vs-rest)**: 매 class 의 binary, 매 macro / weighted average. - **OvO (one-vs-one)**: pairwise, Hand & Till 2001. - `sklearn.metrics.roc_auc_score(..., multi_class="ovr", average="macro")`. ### 매 응용 1. Medical diagnosis (sensitivity / specificity tradeoff). 2. Credit scoring (Gini = 2·AUC − 1). 3. Ad CTR / fraud detection (with PR-AUC complement). 4. LLM hallucination detector eval. 5. Ranking system offline eval. ## 💻 패턴 ### Basic ROC + AUC (sklearn) ```python from sklearn.metrics import roc_curve, roc_auc_score, auc import matplotlib.pyplot as plt y_true = [0, 0, 1, 1, 0, 1, 1, 0] y_score = [0.1, 0.4, 0.35, 0.8, 0.2, 0.6, 0.7, 0.3] fpr, tpr, thr = roc_curve(y_true, y_score) roc_auc = roc_auc_score(y_true, y_score) # or auc(fpr, tpr) plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.3f}") plt.plot([0, 1], [0, 1], "--", color="gray") plt.xlabel("FPR"); plt.ylabel("TPR"); plt.legend(); plt.show() ``` ### Optimal threshold (Youden's J) ```python import numpy as np fpr, tpr, thr = roc_curve(y_true, y_score) J = tpr - fpr best = thr[np.argmax(J)] # 매 maximize sensitivity + specificity print(f"optimal threshold: {best:.3f}") ``` ### Cost-sensitive threshold ```python # 매 cost(FN) ≠ cost(FP) — pick threshold by expected cost def best_threshold(y_true, y_score, cost_fn=10, cost_fp=1, prior=None): if prior is None: prior = np.mean(y_true) fpr, tpr, thr = roc_curve(y_true, y_score) fnr = 1 - tpr cost = prior * cost_fn * fnr + (1 - prior) * cost_fp * fpr return thr[np.argmin(cost)] ``` ### Bootstrap CI for AUC ```python import numpy as np from sklearn.metrics import roc_auc_score def bootstrap_auc(y_true, y_score, n=1000, seed=0): rng = np.random.default_rng(seed) y_true = np.asarray(y_true); y_score = np.asarray(y_score) aucs = [] for _ in range(n): idx = rng.integers(0, len(y_true), len(y_true)) if len(np.unique(y_true[idx])) < 2: continue aucs.append(roc_auc_score(y_true[idx], y_score[idx])) return np.percentile(aucs, [2.5, 50, 97.5]) ``` ### DeLong test (compare two AUCs) ```python # 매 paired comparison of two classifier AUCs on same data from sklearn.metrics import roc_auc_score # 매 use scikit-posthocs / mlxtend, or manual DeLong impl import numpy as np def delong_var(y, p): pos, neg = p[y==1], p[y==0] m, n = len(pos), len(neg) # 매 see Sun & Xu 2014 fast algorithm ... ``` ### Multi-class OvR macro AUC ```python from sklearn.metrics import roc_auc_score # y_true: shape (N,), y_score: shape (N, C) auc_macro = roc_auc_score(y_true, y_score, multi_class="ovr", average="macro") auc_weighted = roc_auc_score(y_true, y_score, multi_class="ovr", average="weighted") ``` ### PR-AUC for imbalanced ```python from sklearn.metrics import average_precision_score, precision_recall_curve ap = average_precision_score(y_true, y_score) prec, rec, thr = precision_recall_curve(y_true, y_score) plt.plot(rec, prec, label=f"AP = {ap:.3f}") ``` ### Calibration-aware (ROC alone misleading) ```python from sklearn.calibration import CalibrationDisplay CalibrationDisplay.from_predictions(y_true, y_score, n_bins=10) # 매 AUC high 의 != calibrated probabilities. 매 isotonic / Platt 의 calibrate 별도. ``` ### sklearn RocCurveDisplay (modern API) ```python from sklearn.metrics import RocCurveDisplay import matplotlib.pyplot as plt fig, ax = plt.subplots() RocCurveDisplay.from_estimator(model_a, X_test, y_test, ax=ax, name="Model A") RocCurveDisplay.from_estimator(model_b, X_test, y_test, ax=ax, name="Model B") ax.plot([0, 1], [0, 1], "--", color="gray") plt.show() ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Balanced binary (~50/50) | ROC-AUC | | Imbalanced (<5% positive) | PR-AUC primary, ROC-AUC secondary | | Cost-asymmetric | cost-weighted threshold, not raw AUC | | Need probability calibration | Brier score / log-loss + calibration plot | | Multi-class | OvR macro AUC (balanced classes) / weighted (imbalanced) | | Compare 2 models | DeLong test, paired bootstrap | | Production threshold | optimize on validation, monitor drift in prod | **기본값**: 매 binary classifier eval 의 ROC-AUC + PR-AUC + calibration plot 의 trio. 매 single AUC 의 over-summarize. 매 imbalanced data 의 PR-AUC primary. ## 🔗 Graph ## 🤖 LLM 활용 **언제**: explain ROC / AUC intuition, generate sklearn eval boilerplate, interpret clinical / business meaning of AUC value. **언제 X**: as the metric itself — 매 deterministic, no LLM needed. 매 hallucinate AUC numbers if asked to "estimate". ## ❌ 안티패턴 - **AUC on imbalanced fraud / disease**: 매 0.99 AUC 의 still useless if precision = 1% — PR-AUC. - **Threshold pick = 0.5 default**: 매 tune on validation per cost. - **AUC on calibrated prob claim**: AUC 의 monotonic-invariant — say nothing about calibration. - **Single AUC, no CI**: bootstrap 95% CI 의 essential for small test sets. - **Cherry-pick threshold on test**: 매 leak — pick on val, evaluate on test. - **Ignore class prior shift**: AUC stable, but operating point 의 shift in production. ## 🧪 검증 / 중복 - Verified (Fawcett 2006 "An introduction to ROC analysis", Hand & Till 2001 multi-class, Saito & Rehmsmeier 2015 PR vs ROC, sklearn 1.5+ docs 2026). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — ROC/AUC patterns + PR-AUC + calibration + multi-class |