Files
2nd/10_Wiki/Topics/AI_and_ML/ROC-AUC-Curves.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

196 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-roc-auc-curves
title: ROC-AUC Curves
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [ROC, AUC, ROC AUC, Receiver Operating Characteristic, AUROC]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [classification, metric, evaluation, machine-learning]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: scikit-learn
---
# ROC-AUC Curves
## 매 한 줄
> **"매 threshold-free binary classifier ranking quality"**. ROC = TPR vs FPR across all thresholds. AUC = area under ROC = P(score(positive) > score(negative)). 매 0.5 = random, 1.0 = perfect. 2026 현재 매 still standard for balanced binary, 매 PR-AUC 의 imbalanced 에 selectively, 매 multi-class 의 one-vs-rest macro/weighted AUC.
## 매 핵심
### 매 axes
- **TPR (Recall, Sensitivity)** = TP / (TP + FN).
- **FPR (1 Specificity)** = FP / (FP + TN).
- ROC = parametric curve (FPR(t), TPR(t)) for threshold t ∈ .
- AUC = ∫ TPR d(FPR) ∈ [0, 1].
### 매 properties
- **Threshold-independent**: 매 ranking quality 의 measure.
- **Probabilistic interpretation**: AUC = P(score(pos) > score(neg)) (Mann-Whitney U).
- **Class-balance invariant**: AUC 의 unchanged when negatives 의 oversampled — 매 strength 와 weakness.
- **Insensitive to score calibration**: 매 monotonic transform 의 don't change AUC.
### 매 vs PR-AUC
- **ROC-AUC**: balanced or moderately imbalanced.
- **PR-AUC (Average Precision)**: highly imbalanced (e.g. fraud 0.1%, rare-disease) — 매 ROC-AUC 의 misleadingly high because TN dominates.
- 매 rule of thumb: positive rate <5% → 매 prefer PR-AUC.
### 매 multi-class
- **OvR (one-vs-rest)**: 매 class 의 binary, 매 macro / weighted average.
- **OvO (one-vs-one)**: pairwise, Hand & Till 2001.
- `sklearn.metrics.roc_auc_score(..., multi_class="ovr", average="macro")`.
### 매 응용
1. Medical diagnosis (sensitivity / specificity tradeoff).
2. Credit scoring (Gini = 2·AUC 1).
3. Ad CTR / fraud detection (with PR-AUC complement).
4. LLM hallucination detector eval.
5. Ranking system offline eval.
## 💻 패턴
### Basic ROC + AUC (sklearn)
```python
from sklearn.metrics import roc_curve, roc_auc_score, auc
import matplotlib.pyplot as plt
y_true = [0, 0, 1, 1, 0, 1, 1, 0]
y_score = [0.1, 0.4, 0.35, 0.8, 0.2, 0.6, 0.7, 0.3]
fpr, tpr, thr = roc_curve(y_true, y_score)
roc_auc = roc_auc_score(y_true, y_score) # or auc(fpr, tpr)
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.3f}")
plt.plot([0, 1], [0, 1], "--", color="gray")
plt.xlabel("FPR"); plt.ylabel("TPR"); plt.legend(); plt.show()
```
### Optimal threshold (Youden's J)
```python
import numpy as np
fpr, tpr, thr = roc_curve(y_true, y_score)
J = tpr - fpr
best = thr[np.argmax(J)] # 매 maximize sensitivity + specificity
print(f"optimal threshold: {best:.3f}")
```
### Cost-sensitive threshold
```python
# 매 cost(FN) ≠ cost(FP) — pick threshold by expected cost
def best_threshold(y_true, y_score, cost_fn=10, cost_fp=1, prior=None):
if prior is None: prior = np.mean(y_true)
fpr, tpr, thr = roc_curve(y_true, y_score)
fnr = 1 - tpr
cost = prior * cost_fn * fnr + (1 - prior) * cost_fp * fpr
return thr[np.argmin(cost)]
```
### Bootstrap CI for AUC
```python
import numpy as np
from sklearn.metrics import roc_auc_score
def bootstrap_auc(y_true, y_score, n=1000, seed=0):
rng = np.random.default_rng(seed)
y_true = np.asarray(y_true); y_score = np.asarray(y_score)
aucs = []
for _ in range(n):
idx = rng.integers(0, len(y_true), len(y_true))
if len(np.unique(y_true[idx])) < 2: continue
aucs.append(roc_auc_score(y_true[idx], y_score[idx]))
return np.percentile(aucs, [2.5, 50, 97.5])
```
### DeLong test (compare two AUCs)
```python
# 매 paired comparison of two classifier AUCs on same data
from sklearn.metrics import roc_auc_score
# 매 use scikit-posthocs / mlxtend, or manual DeLong impl
import numpy as np
def delong_var(y, p):
pos, neg = p[y==1], p[y==0]
m, n = len(pos), len(neg)
# 매 see Sun & Xu 2014 fast algorithm
...
```
### Multi-class OvR macro AUC
```python
from sklearn.metrics import roc_auc_score
# y_true: shape (N,), y_score: shape (N, C)
auc_macro = roc_auc_score(y_true, y_score, multi_class="ovr", average="macro")
auc_weighted = roc_auc_score(y_true, y_score, multi_class="ovr", average="weighted")
```
### PR-AUC for imbalanced
```python
from sklearn.metrics import average_precision_score, precision_recall_curve
ap = average_precision_score(y_true, y_score)
prec, rec, thr = precision_recall_curve(y_true, y_score)
plt.plot(rec, prec, label=f"AP = {ap:.3f}")
```
### Calibration-aware (ROC alone misleading)
```python
from sklearn.calibration import CalibrationDisplay
CalibrationDisplay.from_predictions(y_true, y_score, n_bins=10)
# 매 AUC high 의 != calibrated probabilities. 매 isotonic / Platt 의 calibrate 별도.
```
### sklearn RocCurveDisplay (modern API)
```python
from sklearn.metrics import RocCurveDisplay
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
RocCurveDisplay.from_estimator(model_a, X_test, y_test, ax=ax, name="Model A")
RocCurveDisplay.from_estimator(model_b, X_test, y_test, ax=ax, name="Model B")
ax.plot([0, 1], [0, 1], "--", color="gray")
plt.show()
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Balanced binary (~50/50) | ROC-AUC |
| Imbalanced (<5% positive) | PR-AUC primary, ROC-AUC secondary |
| Cost-asymmetric | cost-weighted threshold, not raw AUC |
| Need probability calibration | Brier score / log-loss + calibration plot |
| Multi-class | OvR macro AUC (balanced classes) / weighted (imbalanced) |
| Compare 2 models | DeLong test, paired bootstrap |
| Production threshold | optimize on validation, monitor drift in prod |
**기본값**: 매 binary classifier eval 의 ROC-AUC + PR-AUC + calibration plot 의 trio. 매 single AUC 의 over-summarize. 매 imbalanced data 의 PR-AUC primary.
## 🔗 Graph
## 🤖 LLM 활용
**언제**: explain ROC / AUC intuition, generate sklearn eval boilerplate, interpret clinical / business meaning of AUC value.
**언제 X**: as the metric itself — 매 deterministic, no LLM needed. 매 hallucinate AUC numbers if asked to "estimate".
## ❌ 안티패턴
- **AUC on imbalanced fraud / disease**: 매 0.99 AUC 의 still useless if precision = 1% — PR-AUC.
- **Threshold pick = 0.5 default**: 매 tune on validation per cost.
- **AUC on calibrated prob claim**: AUC 의 monotonic-invariant — say nothing about calibration.
- **Single AUC, no CI**: bootstrap 95% CI 의 essential for small test sets.
- **Cherry-pick threshold on test**: 매 leak — pick on val, evaluate on test.
- **Ignore class prior shift**: AUC stable, but operating point 의 shift in production.
## 🧪 검증 / 중복
- Verified (Fawcett 2006 "An introduction to ROC analysis", Hand & Till 2001 multi-class, Saito & Rehmsmeier 2015 PR vs ROC, sklearn 1.5+ docs 2026).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — ROC/AUC patterns + PR-AUC + calibration + multi-class |