Files
2nd/10_Wiki/Topics/AI_and_ML/ROC-AUC-Curves.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

7.1 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-roc-auc-curves ROC-AUC Curves 10_Wiki/Topics verified self
ROC
AUC
ROC AUC
Receiver Operating Characteristic
AUROC
none A 0.9 applied
classification
metric
evaluation
machine-learning
2026-05-10 pending
language framework
python scikit-learn

ROC-AUC Curves

매 한 줄

"매 threshold-free binary classifier ranking quality". ROC = TPR vs FPR across all thresholds. AUC = area under ROC = P(score(positive) > score(negative)). 매 0.5 = random, 1.0 = perfect. 2026 현재 매 still standard for balanced binary, 매 PR-AUC 의 imbalanced 에 selectively, 매 multi-class 의 one-vs-rest macro/weighted AUC.

매 핵심

매 axes

  • TPR (Recall, Sensitivity) = TP / (TP + FN).
  • FPR (1 Specificity) = FP / (FP + TN).
  • ROC = parametric curve (FPR(t), TPR(t)) for threshold t ∈ .
  • AUC = ∫ TPR d(FPR) ∈ [0, 1].

매 properties

  • Threshold-independent: 매 ranking quality 의 measure.
  • Probabilistic interpretation: AUC = P(score(pos) > score(neg)) (Mann-Whitney U).
  • Class-balance invariant: AUC 의 unchanged when negatives 의 oversampled — 매 strength 와 weakness.
  • Insensitive to score calibration: 매 monotonic transform 의 don't change AUC.

매 vs PR-AUC

  • ROC-AUC: balanced or moderately imbalanced.
  • PR-AUC (Average Precision): highly imbalanced (e.g. fraud 0.1%, rare-disease) — 매 ROC-AUC 의 misleadingly high because TN dominates.
  • 매 rule of thumb: positive rate <5% → 매 prefer PR-AUC.

매 multi-class

  • OvR (one-vs-rest): 매 class 의 binary, 매 macro / weighted average.
  • OvO (one-vs-one): pairwise, Hand & Till 2001.
  • sklearn.metrics.roc_auc_score(..., multi_class="ovr", average="macro").

매 응용

  1. Medical diagnosis (sensitivity / specificity tradeoff).
  2. Credit scoring (Gini = 2·AUC 1).
  3. Ad CTR / fraud detection (with PR-AUC complement).
  4. LLM hallucination detector eval.
  5. Ranking system offline eval.

💻 패턴

Basic ROC + AUC (sklearn)

from sklearn.metrics import roc_curve, roc_auc_score, auc
import matplotlib.pyplot as plt

y_true = [0, 0, 1, 1, 0, 1, 1, 0]
y_score = [0.1, 0.4, 0.35, 0.8, 0.2, 0.6, 0.7, 0.3]

fpr, tpr, thr = roc_curve(y_true, y_score)
roc_auc = roc_auc_score(y_true, y_score)        # or auc(fpr, tpr)

plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.3f}")
plt.plot([0, 1], [0, 1], "--", color="gray")
plt.xlabel("FPR"); plt.ylabel("TPR"); plt.legend(); plt.show()

Optimal threshold (Youden's J)

import numpy as np
fpr, tpr, thr = roc_curve(y_true, y_score)
J = tpr - fpr
best = thr[np.argmax(J)]    # 매 maximize sensitivity + specificity
print(f"optimal threshold: {best:.3f}")

Cost-sensitive threshold

# 매 cost(FN) ≠ cost(FP) — pick threshold by expected cost
def best_threshold(y_true, y_score, cost_fn=10, cost_fp=1, prior=None):
    if prior is None: prior = np.mean(y_true)
    fpr, tpr, thr = roc_curve(y_true, y_score)
    fnr = 1 - tpr
    cost = prior * cost_fn * fnr + (1 - prior) * cost_fp * fpr
    return thr[np.argmin(cost)]

Bootstrap CI for AUC

import numpy as np
from sklearn.metrics import roc_auc_score

def bootstrap_auc(y_true, y_score, n=1000, seed=0):
    rng = np.random.default_rng(seed)
    y_true = np.asarray(y_true); y_score = np.asarray(y_score)
    aucs = []
    for _ in range(n):
        idx = rng.integers(0, len(y_true), len(y_true))
        if len(np.unique(y_true[idx])) < 2: continue
        aucs.append(roc_auc_score(y_true[idx], y_score[idx]))
    return np.percentile(aucs, [2.5, 50, 97.5])

DeLong test (compare two AUCs)

# 매 paired comparison of two classifier AUCs on same data
from sklearn.metrics import roc_auc_score
# 매 use scikit-posthocs / mlxtend, or manual DeLong impl
import numpy as np
def delong_var(y, p):
    pos, neg = p[y==1], p[y==0]
    m, n = len(pos), len(neg)
    # 매 see Sun & Xu 2014 fast algorithm
    ...

Multi-class OvR macro AUC

from sklearn.metrics import roc_auc_score
# y_true: shape (N,), y_score: shape (N, C)
auc_macro = roc_auc_score(y_true, y_score, multi_class="ovr", average="macro")
auc_weighted = roc_auc_score(y_true, y_score, multi_class="ovr", average="weighted")

PR-AUC for imbalanced

from sklearn.metrics import average_precision_score, precision_recall_curve

ap = average_precision_score(y_true, y_score)
prec, rec, thr = precision_recall_curve(y_true, y_score)
plt.plot(rec, prec, label=f"AP = {ap:.3f}")

Calibration-aware (ROC alone misleading)

from sklearn.calibration import CalibrationDisplay
CalibrationDisplay.from_predictions(y_true, y_score, n_bins=10)
# 매 AUC high 의 != calibrated probabilities. 매 isotonic / Platt 의 calibrate 별도.

sklearn RocCurveDisplay (modern API)

from sklearn.metrics import RocCurveDisplay
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
RocCurveDisplay.from_estimator(model_a, X_test, y_test, ax=ax, name="Model A")
RocCurveDisplay.from_estimator(model_b, X_test, y_test, ax=ax, name="Model B")
ax.plot([0, 1], [0, 1], "--", color="gray")
plt.show()

매 결정 기준

상황 Approach
Balanced binary (~50/50) ROC-AUC
Imbalanced (<5% positive) PR-AUC primary, ROC-AUC secondary
Cost-asymmetric cost-weighted threshold, not raw AUC
Need probability calibration Brier score / log-loss + calibration plot
Multi-class OvR macro AUC (balanced classes) / weighted (imbalanced)
Compare 2 models DeLong test, paired bootstrap
Production threshold optimize on validation, monitor drift in prod

기본값: 매 binary classifier eval 의 ROC-AUC + PR-AUC + calibration plot 의 trio. 매 single AUC 의 over-summarize. 매 imbalanced data 의 PR-AUC primary.

🔗 Graph

🤖 LLM 활용

언제: explain ROC / AUC intuition, generate sklearn eval boilerplate, interpret clinical / business meaning of AUC value. 언제 X: as the metric itself — 매 deterministic, no LLM needed. 매 hallucinate AUC numbers if asked to "estimate".

안티패턴

  • AUC on imbalanced fraud / disease: 매 0.99 AUC 의 still useless if precision = 1% — PR-AUC.
  • Threshold pick = 0.5 default: 매 tune on validation per cost.
  • AUC on calibrated prob claim: AUC 의 monotonic-invariant — say nothing about calibration.
  • Single AUC, no CI: bootstrap 95% CI 의 essential for small test sets.
  • Cherry-pick threshold on test: 매 leak — pick on val, evaluate on test.
  • Ignore class prior shift: AUC stable, but operating point 의 shift in production.

🧪 검증 / 중복

  • Verified (Fawcett 2006 "An introduction to ROC analysis", Hand & Till 2001 multi-class, Saito & Rehmsmeier 2015 PR vs ROC, sklearn 1.5+ docs 2026).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — ROC/AUC patterns + PR-AUC + calibration + multi-class