f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
196 lines
7.1 KiB
Markdown
196 lines
7.1 KiB
Markdown
---
|
||
id: wiki-2026-0508-roc-auc-curves
|
||
title: ROC-AUC Curves
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [ROC, AUC, ROC AUC, Receiver Operating Characteristic, AUROC]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.9
|
||
verification_status: applied
|
||
tags: [classification, metric, evaluation, machine-learning]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: scikit-learn
|
||
---
|
||
|
||
# ROC-AUC Curves
|
||
|
||
## 매 한 줄
|
||
> **"매 threshold-free binary classifier ranking quality"**. ROC = TPR vs FPR across all thresholds. AUC = area under ROC = P(score(positive) > score(negative)). 매 0.5 = random, 1.0 = perfect. 2026 현재 매 still standard for balanced binary, 매 PR-AUC 의 imbalanced 에 selectively, 매 multi-class 의 one-vs-rest macro/weighted AUC.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 axes
|
||
- **TPR (Recall, Sensitivity)** = TP / (TP + FN).
|
||
- **FPR (1 − Specificity)** = FP / (FP + TN).
|
||
- ROC = parametric curve (FPR(t), TPR(t)) for threshold t ∈ ℝ.
|
||
- AUC = ∫ TPR d(FPR) ∈ [0, 1].
|
||
|
||
### 매 properties
|
||
- **Threshold-independent**: 매 ranking quality 의 measure.
|
||
- **Probabilistic interpretation**: AUC = P(score(pos) > score(neg)) (Mann-Whitney U).
|
||
- **Class-balance invariant**: AUC 의 unchanged when negatives 의 oversampled — 매 strength 와 weakness.
|
||
- **Insensitive to score calibration**: 매 monotonic transform 의 don't change AUC.
|
||
|
||
### 매 vs PR-AUC
|
||
- **ROC-AUC**: balanced or moderately imbalanced.
|
||
- **PR-AUC (Average Precision)**: highly imbalanced (e.g. fraud 0.1%, rare-disease) — 매 ROC-AUC 의 misleadingly high because TN dominates.
|
||
- 매 rule of thumb: positive rate <5% → 매 prefer PR-AUC.
|
||
|
||
### 매 multi-class
|
||
- **OvR (one-vs-rest)**: 매 class 의 binary, 매 macro / weighted average.
|
||
- **OvO (one-vs-one)**: pairwise, Hand & Till 2001.
|
||
- `sklearn.metrics.roc_auc_score(..., multi_class="ovr", average="macro")`.
|
||
|
||
### 매 응용
|
||
1. Medical diagnosis (sensitivity / specificity tradeoff).
|
||
2. Credit scoring (Gini = 2·AUC − 1).
|
||
3. Ad CTR / fraud detection (with PR-AUC complement).
|
||
4. LLM hallucination detector eval.
|
||
5. Ranking system offline eval.
|
||
|
||
## 💻 패턴
|
||
|
||
### Basic ROC + AUC (sklearn)
|
||
```python
|
||
from sklearn.metrics import roc_curve, roc_auc_score, auc
|
||
import matplotlib.pyplot as plt
|
||
|
||
y_true = [0, 0, 1, 1, 0, 1, 1, 0]
|
||
y_score = [0.1, 0.4, 0.35, 0.8, 0.2, 0.6, 0.7, 0.3]
|
||
|
||
fpr, tpr, thr = roc_curve(y_true, y_score)
|
||
roc_auc = roc_auc_score(y_true, y_score) # or auc(fpr, tpr)
|
||
|
||
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.3f}")
|
||
plt.plot([0, 1], [0, 1], "--", color="gray")
|
||
plt.xlabel("FPR"); plt.ylabel("TPR"); plt.legend(); plt.show()
|
||
```
|
||
|
||
### Optimal threshold (Youden's J)
|
||
```python
|
||
import numpy as np
|
||
fpr, tpr, thr = roc_curve(y_true, y_score)
|
||
J = tpr - fpr
|
||
best = thr[np.argmax(J)] # 매 maximize sensitivity + specificity
|
||
print(f"optimal threshold: {best:.3f}")
|
||
```
|
||
|
||
### Cost-sensitive threshold
|
||
```python
|
||
# 매 cost(FN) ≠ cost(FP) — pick threshold by expected cost
|
||
def best_threshold(y_true, y_score, cost_fn=10, cost_fp=1, prior=None):
|
||
if prior is None: prior = np.mean(y_true)
|
||
fpr, tpr, thr = roc_curve(y_true, y_score)
|
||
fnr = 1 - tpr
|
||
cost = prior * cost_fn * fnr + (1 - prior) * cost_fp * fpr
|
||
return thr[np.argmin(cost)]
|
||
```
|
||
|
||
### Bootstrap CI for AUC
|
||
```python
|
||
import numpy as np
|
||
from sklearn.metrics import roc_auc_score
|
||
|
||
def bootstrap_auc(y_true, y_score, n=1000, seed=0):
|
||
rng = np.random.default_rng(seed)
|
||
y_true = np.asarray(y_true); y_score = np.asarray(y_score)
|
||
aucs = []
|
||
for _ in range(n):
|
||
idx = rng.integers(0, len(y_true), len(y_true))
|
||
if len(np.unique(y_true[idx])) < 2: continue
|
||
aucs.append(roc_auc_score(y_true[idx], y_score[idx]))
|
||
return np.percentile(aucs, [2.5, 50, 97.5])
|
||
```
|
||
|
||
### DeLong test (compare two AUCs)
|
||
```python
|
||
# 매 paired comparison of two classifier AUCs on same data
|
||
from sklearn.metrics import roc_auc_score
|
||
# 매 use scikit-posthocs / mlxtend, or manual DeLong impl
|
||
import numpy as np
|
||
def delong_var(y, p):
|
||
pos, neg = p[y==1], p[y==0]
|
||
m, n = len(pos), len(neg)
|
||
# 매 see Sun & Xu 2014 fast algorithm
|
||
...
|
||
```
|
||
|
||
### Multi-class OvR macro AUC
|
||
```python
|
||
from sklearn.metrics import roc_auc_score
|
||
# y_true: shape (N,), y_score: shape (N, C)
|
||
auc_macro = roc_auc_score(y_true, y_score, multi_class="ovr", average="macro")
|
||
auc_weighted = roc_auc_score(y_true, y_score, multi_class="ovr", average="weighted")
|
||
```
|
||
|
||
### PR-AUC for imbalanced
|
||
```python
|
||
from sklearn.metrics import average_precision_score, precision_recall_curve
|
||
|
||
ap = average_precision_score(y_true, y_score)
|
||
prec, rec, thr = precision_recall_curve(y_true, y_score)
|
||
plt.plot(rec, prec, label=f"AP = {ap:.3f}")
|
||
```
|
||
|
||
### Calibration-aware (ROC alone misleading)
|
||
```python
|
||
from sklearn.calibration import CalibrationDisplay
|
||
CalibrationDisplay.from_predictions(y_true, y_score, n_bins=10)
|
||
# 매 AUC high 의 != calibrated probabilities. 매 isotonic / Platt 의 calibrate 별도.
|
||
```
|
||
|
||
### sklearn RocCurveDisplay (modern API)
|
||
```python
|
||
from sklearn.metrics import RocCurveDisplay
|
||
import matplotlib.pyplot as plt
|
||
|
||
fig, ax = plt.subplots()
|
||
RocCurveDisplay.from_estimator(model_a, X_test, y_test, ax=ax, name="Model A")
|
||
RocCurveDisplay.from_estimator(model_b, X_test, y_test, ax=ax, name="Model B")
|
||
ax.plot([0, 1], [0, 1], "--", color="gray")
|
||
plt.show()
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| Balanced binary (~50/50) | ROC-AUC |
|
||
| Imbalanced (<5% positive) | PR-AUC primary, ROC-AUC secondary |
|
||
| Cost-asymmetric | cost-weighted threshold, not raw AUC |
|
||
| Need probability calibration | Brier score / log-loss + calibration plot |
|
||
| Multi-class | OvR macro AUC (balanced classes) / weighted (imbalanced) |
|
||
| Compare 2 models | DeLong test, paired bootstrap |
|
||
| Production threshold | optimize on validation, monitor drift in prod |
|
||
|
||
**기본값**: 매 binary classifier eval 의 ROC-AUC + PR-AUC + calibration plot 의 trio. 매 single AUC 의 over-summarize. 매 imbalanced data 의 PR-AUC primary.
|
||
|
||
## 🔗 Graph
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: explain ROC / AUC intuition, generate sklearn eval boilerplate, interpret clinical / business meaning of AUC value.
|
||
**언제 X**: as the metric itself — 매 deterministic, no LLM needed. 매 hallucinate AUC numbers if asked to "estimate".
|
||
|
||
## ❌ 안티패턴
|
||
- **AUC on imbalanced fraud / disease**: 매 0.99 AUC 의 still useless if precision = 1% — PR-AUC.
|
||
- **Threshold pick = 0.5 default**: 매 tune on validation per cost.
|
||
- **AUC on calibrated prob claim**: AUC 의 monotonic-invariant — say nothing about calibration.
|
||
- **Single AUC, no CI**: bootstrap 95% CI 의 essential for small test sets.
|
||
- **Cherry-pick threshold on test**: 매 leak — pick on val, evaluate on test.
|
||
- **Ignore class prior shift**: AUC stable, but operating point 의 shift in production.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Fawcett 2006 "An introduction to ROC analysis", Hand & Till 2001 multi-class, Saito & Rehmsmeier 2015 PR vs ROC, sklearn 1.5+ docs 2026).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — ROC/AUC patterns + PR-AUC + calibration + multi-class |
|