[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,62 +1,199 @@
 ---
 id: wiki-2026-0508-roc-auc-curves
-title: ROC AUC Curves
+title: ROC-AUC Curves
 category: 10_Wiki/Topics
-status: needs_review
+status: verified
 canonical_id: self
-aliases: [AI-MET-ROC-001]
+aliases: [ROC, AUC, ROC AUC, Receiver Operating Characteristic, AUROC]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 1.0
-tags: [ai, machine-learning, metrics, roc-curve, auc, classification, evaluation]
+confidence_score: 0.9
+verification_status: applied
+tags: [classification, metric, evaluation, machine-learning]
 raw_sources: []
-last_reinforced: 2026-04-26
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: python
+  framework: scikit-learn
 ---

-# ROC-AUC Curves (ROC-AUC 곡선)
+# ROC-AUC Curves

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "임계값(Threshold)의 변화에도 흔들리지 않는 모델의 진정한 '변별력'을 한 장의 그래프와 하나의 숫자로 증명하라" — 분류 모델의 성능을 정밀도-재현율의 상충 관계 속에서 다각도로 평가하기 위한 표준 시각화 및 수치화 도구.
+## 매 한 줄
+> **"매 threshold-free binary classifier ranking quality"**. ROC = TPR vs FPR across all thresholds. AUC = area under ROC = P(score(positive) > score(negative)). 매 0.5 = random, 1.0 = perfect. 2026 현재 매 still standard for balanced binary, 매 PR-AUC 의 imbalanced 에 selectively, 매 multi-class 의 one-vs-rest macro/weighted AUC.

-## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** "Threshold-Agnostic Performance Evaluation" — 모델이 정답과 오답을 얼마나 잘 갈라내는지(Discrimination)를 확인하기 위해, 모든 가능한 임계값에 대해 TPR(재현율)과 FPR(오탐율)의 궤적을 그리고 그 아래 면적(AUC)을 계산하는 패턴.
- **핵심 지표:**
-    - **ROC (Receiver [[Opera|Opera]]ting Characteristic):** 가로축 FPR, 세로축 TPR의 곡선. 왼쪽 상단에 붙을수록 고성능.
-    - **AUC (Area Under the Curve):** 곡선 아래 면적. 1.0에 가까울수록 완벽한 분류, 0.5는 무작위 추측.
- **의의:** 특정 임계값에서의 성능이 아닌, 모델의 전체적인 잠재력을 평가할 수 있게 해주며, 데이터 불균형 상황에서도 모델의 변별력을 비교적 객관적으로 나타냄.
+## 매 핵심

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
- **과거 데이터와의 충돌:** AUC가 높으면 무조건 좋다는 맹신에서 벗어나, 데이터가 극단적으로 불균형할 때는 ROC-AUC보다 PR-AUC(Precision-Recall AUC)가 모델의 실질적인 성능을 더 잘 반영할 수 있음을 인지해야 함.
- **정책 변화:** Antigravity 프로젝트는 에이전트의 분류 모델 성능 보고 시, 단일 정확도(Accuracy) 지표 대신 ROC-AUC 점수를 병기하여 모델의 신뢰도를 다각도로 검증함.
+### 매 axes
+- **TPR (Recall, Sensitivity)** = TP / (TP + FN).
+- **FPR (1 − Specificity)** = FP / (FP + TN).
+- ROC = parametric curve (FPR(t), TPR(t)) for threshold t ∈ ℝ.
+- AUC = ∫ TPR d(FPR) ∈ [0, 1].

-## 🔗 지식 연결 (Graph)
- [[Precision-Recall-Tradeoff|Precision-Recall-Tradeoff]], [[Performance-Metrics-in-AI|Performance-Metrics-in-AI]], [[Logistic-Regression|Logistic-Regression]], [[Imbalanced-Data-Handling|Imbalanced-Data-Handling]]
- **Raw Source:** 10_Wiki/Topics/AI/ROC-AUC-Curves.md
+### 매 properties
+- **Threshold-independent**: 매 ranking quality 의 measure.
+- **Probabilistic interpretation**: AUC = P(score(pos) > score(neg)) (Mann-Whitney U).
+- **Class-balance invariant**: AUC 의 unchanged when negatives 의 oversampled — 매 strength 와 weakness.
+- **Insensitive to score calibration**: 매 monotonic transform 의 don't change AUC.

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### 매 vs PR-AUC
+- **ROC-AUC**: balanced or moderately imbalanced.
+- **PR-AUC (Average Precision)**: highly imbalanced (e.g. fraud 0.1%, rare-disease) — 매 ROC-AUC 의 misleadingly high because TN dominates.
+- 매 rule of thumb: positive rate <5% → 매 prefer PR-AUC.

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+### 매 multi-class
+- **OvR (one-vs-rest)**: 매 class 의 binary, 매 macro / weighted average.
+- **OvO (one-vs-one)**: pairwise, Hand & Till 2001.
+- `sklearn.metrics.roc_auc_score(..., multi_class="ovr", average="macro")`.

-**언제 쓰면 안 되는가:**
- *(TODO)*
+### 매 응용
+1. Medical diagnosis (sensitivity / specificity tradeoff).
+2. Credit scoring (Gini = 2·AUC − 1).
+3. Ad CTR / fraud detection (with PR-AUC complement).
+4. LLM hallucination detector eval.
+5. Ranking system offline eval.

-## 🧪 검증 상태 (Validation)
+## 💻 패턴

- **정보 상태:** needs_review
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+### Basic ROC + AUC (sklearn)
+```python
+from sklearn.metrics import roc_curve, roc_auc_score, auc
+import matplotlib.pyplot as plt

-## 🧬 중복 검사 (Duplicate Check)
+y_true = [0, 0, 1, 1, 0, 1, 1, 0]
+y_score = [0.1, 0.4, 0.35, 0.8, 0.2, 0.6, 0.7, 0.3]

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+fpr, tpr, thr = roc_curve(y_true, y_score)
+roc_auc = roc_auc_score(y_true, y_score)        # or auc(fpr, tpr)

-## 🕓 변경 이력 (Changelog)
+plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.3f}")
+plt.plot([0, 1], [0, 1], "--", color="gray")
+plt.xlabel("FPR"); plt.ylabel("TPR"); plt.legend(); plt.show()
+```

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+### Optimal threshold (Youden's J)
+```python
+import numpy as np
+fpr, tpr, thr = roc_curve(y_true, y_score)
+J = tpr - fpr
+best = thr[np.argmax(J)]    # 매 maximize sensitivity + specificity
+print(f"optimal threshold: {best:.3f}")
+```
+
+### Cost-sensitive threshold
+```python
+# 매 cost(FN) ≠ cost(FP) — pick threshold by expected cost
+def best_threshold(y_true, y_score, cost_fn=10, cost_fp=1, prior=None):
+    if prior is None: prior = np.mean(y_true)
+    fpr, tpr, thr = roc_curve(y_true, y_score)
+    fnr = 1 - tpr
+    cost = prior * cost_fn * fnr + (1 - prior) * cost_fp * fpr
+    return thr[np.argmin(cost)]
+```
+
+### Bootstrap CI for AUC
+```python
+import numpy as np
+from sklearn.metrics import roc_auc_score
+
+def bootstrap_auc(y_true, y_score, n=1000, seed=0):
+    rng = np.random.default_rng(seed)
+    y_true = np.asarray(y_true); y_score = np.asarray(y_score)
+    aucs = []
+    for _ in range(n):
+        idx = rng.integers(0, len(y_true), len(y_true))
+        if len(np.unique(y_true[idx])) < 2: continue
+        aucs.append(roc_auc_score(y_true[idx], y_score[idx]))
+    return np.percentile(aucs, [2.5, 50, 97.5])
+```
+
+### DeLong test (compare two AUCs)
+```python
+# 매 paired comparison of two classifier AUCs on same data
+from sklearn.metrics import roc_auc_score
+# 매 use scikit-posthocs / mlxtend, or manual DeLong impl
+import numpy as np
+def delong_var(y, p):
+    pos, neg = p[y==1], p[y==0]
+    m, n = len(pos), len(neg)
+    # 매 see Sun & Xu 2014 fast algorithm
+    ...
+```
+
+### Multi-class OvR macro AUC
+```python
+from sklearn.metrics import roc_auc_score
+# y_true: shape (N,), y_score: shape (N, C)
+auc_macro = roc_auc_score(y_true, y_score, multi_class="ovr", average="macro")
+auc_weighted = roc_auc_score(y_true, y_score, multi_class="ovr", average="weighted")
+```
+
+### PR-AUC for imbalanced
+```python
+from sklearn.metrics import average_precision_score, precision_recall_curve
+
+ap = average_precision_score(y_true, y_score)
+prec, rec, thr = precision_recall_curve(y_true, y_score)
+plt.plot(rec, prec, label=f"AP = {ap:.3f}")
+```
+
+### Calibration-aware (ROC alone misleading)
+```python
+from sklearn.calibration import CalibrationDisplay
+CalibrationDisplay.from_predictions(y_true, y_score, n_bins=10)
+# 매 AUC high 의 != calibrated probabilities. 매 isotonic / Platt 의 calibrate 별도.
+```
+
+### sklearn RocCurveDisplay (modern API)
+```python
+from sklearn.metrics import RocCurveDisplay
+import matplotlib.pyplot as plt
+
+fig, ax = plt.subplots()
+RocCurveDisplay.from_estimator(model_a, X_test, y_test, ax=ax, name="Model A")
+RocCurveDisplay.from_estimator(model_b, X_test, y_test, ax=ax, name="Model B")
+ax.plot([0, 1], [0, 1], "--", color="gray")
+plt.show()
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Balanced binary (~50/50) | ROC-AUC |
+| Imbalanced (<5% positive) | PR-AUC primary, ROC-AUC secondary |
+| Cost-asymmetric | cost-weighted threshold, not raw AUC |
+| Need probability calibration | Brier score / log-loss + calibration plot |
+| Multi-class | OvR macro AUC (balanced classes) / weighted (imbalanced) |
+| Compare 2 models | DeLong test, paired bootstrap |
+| Production threshold | optimize on validation, monitor drift in prod |
+
+**기본값**: 매 binary classifier eval 의 ROC-AUC + PR-AUC + calibration plot 의 trio. 매 single AUC 의 over-summarize. 매 imbalanced data 의 PR-AUC primary.
+
+## 🔗 Graph
+- 부모: [[Classification-Metrics]] · [[Model-Evaluation]]
+- 변형: [[PR-AUC]] · [[Multi-class-AUC]]
+- 응용: [[Medical-Diagnosis]] · [[Fraud-Detection]] · [[Credit-Scoring]]
+- Adjacent: [[Confusion-Matrix]] · [[F1-Score]] · [[Calibration]] · [[Brier-Score]]
+
+## 🤖 LLM 활용
+**언제**: explain ROC / AUC intuition, generate sklearn eval boilerplate, interpret clinical / business meaning of AUC value.
+**언제 X**: as the metric itself — 매 deterministic, no LLM needed. 매 hallucinate AUC numbers if asked to "estimate".
+
+## ❌ 안티패턴
+- **AUC on imbalanced fraud / disease**: 매 0.99 AUC 의 still useless if precision = 1% — PR-AUC.
+- **Threshold pick = 0.5 default**: 매 tune on validation per cost.
+- **AUC on calibrated prob claim**: AUC 의 monotonic-invariant — say nothing about calibration.
+- **Single AUC, no CI**: bootstrap 95% CI 의 essential for small test sets.
+- **Cherry-pick threshold on test**: 매 leak — pick on val, evaluate on test.
+- **Ignore class prior shift**: AUC stable, but operating point 의 shift in production.
+
+## 🧪 검증 / 중복
+- Verified (Fawcett 2006 "An introduction to ROC analysis", Hand & Till 2001 multi-class, Saito & Rehmsmeier 2015 PR vs ROC, sklearn 1.5+ docs 2026).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — ROC/AUC patterns + PR-AUC + calibration + multi-class |