Files
2nd/10_Wiki/Topics/AI_and_ML/Precision-Recall-Tradeoff.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

162 lines
5.6 KiB
Markdown

---
id: wiki-2026-0508-precision-recall-tradeoff
title: Precision-Recall Tradeoff
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [PR Tradeoff, Threshold Tuning, F1 Optimization]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [classification, evaluation, metrics, threshold, imbalanced]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: scikit-learn
---
# Precision-Recall Tradeoff
## 매 한 줄
> **"매 classifier threshold를 올리면 precision↑ recall↓ — 두 metric 동시에 최대화 불가."**. F1 / F-beta / PR-AUC 가 매 두 축의 통합 score. 매 imbalanced data (의료, fraud, anomaly)에서 ROC-AUC보다 매 PR-AUC가 honest.
## 매 핵심
### 매 정의
- **Precision** = TP / (TP + FP) — "alarm 중 진짜 비율".
- **Recall** = TP / (TP + FN) — "진짜 중 잡은 비율" (= sensitivity, TPR).
- **F1** = 2·P·R / (P+R) — harmonic mean.
- **F-beta** = (1+β²)·P·R / (β²P + R) — β>1은 recall 가중, β<1은 precision 가중.
### 매 tradeoff mechanism
- Classifier output score에 **threshold τ** 적용.
- τ ↑ → 더 까다롭게 positive 선언 → precision ↑, recall ↓.
- τ ↓ → 더 많이 positive → recall ↑, precision ↓.
- Pareto curve = Precision-Recall curve.
### 매 vs ROC-AUC
- **ROC**: TPR vs FPR — class balance에 매 insensitive (오해 유발).
- **PR**: P vs R — positive class에 focus, imbalanced 에 매 informative.
- 매 99% 음성 dataset에서 매 ROC-AUC=0.95여도 PR-AUC=0.3일 수 있음.
### 매 응용
1. 의료 진단 (recall 우선 — miss 위험).
2. Spam filter (precision 우선 — false alarm 비용).
3. Fraud detection (cost-sensitive threshold).
4. Information retrieval (P@k, R@k).
5. Object detection (mAP = PR-AUC 기반).
6. RAG retrieval evaluation.
## 💻 패턴
### sklearn PR curve + best F1 threshold
```python
import numpy as np
from sklearn.metrics import precision_recall_curve, average_precision_score
probs = clf.predict_proba(X_val)[:, 1]
p, r, thr = precision_recall_curve(y_val, probs)
f1 = 2 * p * r / (p + r + 1e-12)
best = f1.argmax()
print(f"τ={thr[best]:.3f} P={p[best]:.3f} R={r[best]:.3f} F1={f1[best]:.3f}")
print(f"PR-AUC = {average_precision_score(y_val, probs):.3f}")
```
### F-beta threshold (recall 가중)
```python
def best_fbeta_threshold(y, probs, beta=2.0):
p, r, thr = precision_recall_curve(y, probs)
fb = (1+beta**2) * p * r / (beta**2 * p + r + 1e-12)
i = fb.argmax()
return thr[i] if i < len(thr) else 1.0, fb[i]
```
### Cost-based threshold
```python
def cost_threshold(y, probs, cost_fp=1.0, cost_fn=10.0, n_thr=200):
thrs = np.linspace(0, 1, n_thr)
best_t, best_c = 0.0, np.inf
for t in thrs:
pred = (probs >= t).astype(int)
fp = ((pred == 1) & (y == 0)).sum()
fn = ((pred == 0) & (y == 1)).sum()
c = cost_fp*fp + cost_fn*fn
if c < best_c: best_c, best_t = c, t
return best_t, best_c
```
### Plot PR curve
```python
import matplotlib.pyplot as plt
plt.plot(r, p, label=f'AP={average_precision_score(y_val, probs):.3f}')
plt.xlabel('Recall'); plt.ylabel('Precision'); plt.legend()
```
### PR vs ROC on imbalanced
```python
from sklearn.metrics import roc_auc_score, average_precision_score
# y는 imbalanced (1% positive)
print('ROC-AUC :', roc_auc_score(y_val, probs)) # 매 inflated
print('PR-AUC :', average_precision_score(y_val, probs)) # 매 honest
```
### Calibration before threshold tuning
```python
from sklearn.calibration import CalibratedClassifierCV
cal = CalibratedClassifierCV(clf, method='isotonic', cv=5)
cal.fit(X_train, y_train)
probs_cal = cal.predict_proba(X_val)[:, 1]
# 매 calibrated probability — threshold 의미 직관적
```
### Per-class threshold (multi-label)
```python
def tune_per_label(y_true, probs): # (N, L)
L = probs.shape[1]
thrs = np.zeros(L)
for k in range(L):
thrs[k], _ = best_fbeta_threshold(y_true[:, k], probs[:, k], beta=1.0)
return thrs
```
## 매 결정 기준
| 상황 | Threshold 우선 |
|---|---|
| 의료 screening (놓치면 위험) | High recall (low τ), F2 |
| Spam / 광고 차단 (오차단 곤란) | High precision (high τ), F0.5 |
| Balanced cost | F1 maximize |
| 명시적 cost ratio 있음 | Cost-based threshold |
| 이미 imbalanced + 비교 | PR-AUC report (ROC 보조) |
| Multi-label | Per-label threshold tune |
**기본값**: F1 maximize on validation set + calibration. Imbalanced면 PR-AUC report.
## 🔗 Graph
- 변형: [[ROC_AUC]]
- 응용: [[Imbalanced_Data]] · [[Anomaly_Detection]] · [[Information Retrieval (IR)|Information_Retrieval]]
- Adjacent: [[Threshold_Tuning]]
## 🤖 LLM 활용
**언제**: classifier deployment threshold 결정, imbalanced eval report, cost-sensitive decision.
**언제 X**: 매 ranking task에서 매 단일 threshold 무의미 — top-k metric 또는 nDCG 사용.
## ❌ 안티패턴
- **Default 0.5 threshold without tuning**: 매 imbalanced 에서 매 useless. Validation tune 필수.
- **Tune threshold on test set**: 매 leak. Validation 만.
- **ROC-AUC only on imbalanced**: 매 inflated 결과 — PR-AUC 동반.
- **Ignore calibration before threshold**: 매 uncalibrated probability 의 threshold 매 transferable X.
- **F1 maximize when costs are asymmetric**: 매 cost ratio 있으면 F-beta 또는 explicit cost.
## 🧪 검증 / 중복
- Verified (sklearn metrics docs precision_recall_curve, Saito & Rehmsmeier 2015 'PR vs ROC for imbalanced').
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — PR tradeoff math + threshold tuning + PR vs ROC on imbalanced |