"매 threshold-free binary classifier ranking quality". ROC = TPR vs FPR across all thresholds. AUC = area under ROC = P(score(positive) > score(negative)). 매 0.5 = random, 1.0 = perfect. 2026 현재 매 still standard for balanced binary, 매 PR-AUC 의 imbalanced 에 selectively, 매 multi-class 의 one-vs-rest macro/weighted AUC.
매 핵심
매 axes
TPR (Recall, Sensitivity) = TP / (TP + FN).
FPR (1 − Specificity) = FP / (FP + TN).
ROC = parametric curve (FPR(t), TPR(t)) for threshold t ∈ ℝ.
AUC = ∫ TPR d(FPR) ∈ [0, 1].
매 properties
Threshold-independent: 매 ranking quality 의 measure.
# 매 paired comparison of two classifier AUCs on same datafromsklearn.metricsimportroc_auc_score# 매 use scikit-posthocs / mlxtend, or manual DeLong implimportnumpyasnpdefdelong_var(y,p):pos,neg=p[y==1],p[y==0]m,n=len(pos),len(neg)# 매 see Sun & Xu 2014 fast algorithm...
fromsklearn.calibrationimportCalibrationDisplayCalibrationDisplay.from_predictions(y_true,y_score,n_bins=10)# 매 AUC high 의 != calibrated probabilities. 매 isotonic / Platt 의 calibrate 별도.
기본값: 매 binary classifier eval 의 ROC-AUC + PR-AUC + calibration plot 의 trio. 매 single AUC 의 over-summarize. 매 imbalanced data 의 PR-AUC primary.
🔗 Graph
🤖 LLM 활용
언제: explain ROC / AUC intuition, generate sklearn eval boilerplate, interpret clinical / business meaning of AUC value.
언제 X: as the metric itself — 매 deterministic, no LLM needed. 매 hallucinate AUC numbers if asked to "estimate".
❌ 안티패턴
AUC on imbalanced fraud / disease: 매 0.99 AUC 의 still useless if precision = 1% — PR-AUC.
Threshold pick = 0.5 default: 매 tune on validation per cost.
AUC on calibrated prob claim: AUC 의 monotonic-invariant — say nothing about calibration.
Single AUC, no CI: bootstrap 95% CI 의 essential for small test sets.
Cherry-pick threshold on test: 매 leak — pick on val, evaluate on test.
Ignore class prior shift: AUC stable, but operating point 의 shift in production.
🧪 검증 / 중복
Verified (Fawcett 2006 "An introduction to ROC analysis", Hand & Till 2001 multi-class, Saito & Rehmsmeier 2015 PR vs ROC, sklearn 1.5+ docs 2026).