f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
162 lines
5.6 KiB
Markdown
162 lines
5.6 KiB
Markdown
---
|
|
id: wiki-2026-0508-precision-recall-tradeoff
|
|
title: Precision-Recall Tradeoff
|
|
category: 10_Wiki/Topics
|
|
status: verified
|
|
canonical_id: self
|
|
aliases: [PR Tradeoff, Threshold Tuning, F1 Optimization]
|
|
duplicate_of: none
|
|
source_trust_level: A
|
|
confidence_score: 0.9
|
|
verification_status: applied
|
|
tags: [classification, evaluation, metrics, threshold, imbalanced]
|
|
raw_sources: []
|
|
last_reinforced: 2026-05-10
|
|
github_commit: pending
|
|
tech_stack:
|
|
language: python
|
|
framework: scikit-learn
|
|
---
|
|
|
|
# Precision-Recall Tradeoff
|
|
|
|
## 매 한 줄
|
|
> **"매 classifier threshold를 올리면 precision↑ recall↓ — 두 metric 동시에 최대화 불가."**. F1 / F-beta / PR-AUC 가 매 두 축의 통합 score. 매 imbalanced data (의료, fraud, anomaly)에서 ROC-AUC보다 매 PR-AUC가 honest.
|
|
|
|
## 매 핵심
|
|
|
|
### 매 정의
|
|
- **Precision** = TP / (TP + FP) — "alarm 중 진짜 비율".
|
|
- **Recall** = TP / (TP + FN) — "진짜 중 잡은 비율" (= sensitivity, TPR).
|
|
- **F1** = 2·P·R / (P+R) — harmonic mean.
|
|
- **F-beta** = (1+β²)·P·R / (β²P + R) — β>1은 recall 가중, β<1은 precision 가중.
|
|
|
|
### 매 tradeoff mechanism
|
|
- Classifier output score에 **threshold τ** 적용.
|
|
- τ ↑ → 더 까다롭게 positive 선언 → precision ↑, recall ↓.
|
|
- τ ↓ → 더 많이 positive → recall ↑, precision ↓.
|
|
- Pareto curve = Precision-Recall curve.
|
|
|
|
### 매 vs ROC-AUC
|
|
- **ROC**: TPR vs FPR — class balance에 매 insensitive (오해 유발).
|
|
- **PR**: P vs R — positive class에 focus, imbalanced 에 매 informative.
|
|
- 매 99% 음성 dataset에서 매 ROC-AUC=0.95여도 PR-AUC=0.3일 수 있음.
|
|
|
|
### 매 응용
|
|
1. 의료 진단 (recall 우선 — miss 위험).
|
|
2. Spam filter (precision 우선 — false alarm 비용).
|
|
3. Fraud detection (cost-sensitive threshold).
|
|
4. Information retrieval (P@k, R@k).
|
|
5. Object detection (mAP = PR-AUC 기반).
|
|
6. RAG retrieval evaluation.
|
|
|
|
## 💻 패턴
|
|
|
|
### sklearn PR curve + best F1 threshold
|
|
```python
|
|
import numpy as np
|
|
from sklearn.metrics import precision_recall_curve, average_precision_score
|
|
|
|
probs = clf.predict_proba(X_val)[:, 1]
|
|
p, r, thr = precision_recall_curve(y_val, probs)
|
|
f1 = 2 * p * r / (p + r + 1e-12)
|
|
best = f1.argmax()
|
|
print(f"τ={thr[best]:.3f} P={p[best]:.3f} R={r[best]:.3f} F1={f1[best]:.3f}")
|
|
print(f"PR-AUC = {average_precision_score(y_val, probs):.3f}")
|
|
```
|
|
|
|
### F-beta threshold (recall 가중)
|
|
```python
|
|
def best_fbeta_threshold(y, probs, beta=2.0):
|
|
p, r, thr = precision_recall_curve(y, probs)
|
|
fb = (1+beta**2) * p * r / (beta**2 * p + r + 1e-12)
|
|
i = fb.argmax()
|
|
return thr[i] if i < len(thr) else 1.0, fb[i]
|
|
```
|
|
|
|
### Cost-based threshold
|
|
```python
|
|
def cost_threshold(y, probs, cost_fp=1.0, cost_fn=10.0, n_thr=200):
|
|
thrs = np.linspace(0, 1, n_thr)
|
|
best_t, best_c = 0.0, np.inf
|
|
for t in thrs:
|
|
pred = (probs >= t).astype(int)
|
|
fp = ((pred == 1) & (y == 0)).sum()
|
|
fn = ((pred == 0) & (y == 1)).sum()
|
|
c = cost_fp*fp + cost_fn*fn
|
|
if c < best_c: best_c, best_t = c, t
|
|
return best_t, best_c
|
|
```
|
|
|
|
### Plot PR curve
|
|
```python
|
|
import matplotlib.pyplot as plt
|
|
plt.plot(r, p, label=f'AP={average_precision_score(y_val, probs):.3f}')
|
|
plt.xlabel('Recall'); plt.ylabel('Precision'); plt.legend()
|
|
```
|
|
|
|
### PR vs ROC on imbalanced
|
|
```python
|
|
from sklearn.metrics import roc_auc_score, average_precision_score
|
|
# y는 imbalanced (1% positive)
|
|
print('ROC-AUC :', roc_auc_score(y_val, probs)) # 매 inflated
|
|
print('PR-AUC :', average_precision_score(y_val, probs)) # 매 honest
|
|
```
|
|
|
|
### Calibration before threshold tuning
|
|
```python
|
|
from sklearn.calibration import CalibratedClassifierCV
|
|
cal = CalibratedClassifierCV(clf, method='isotonic', cv=5)
|
|
cal.fit(X_train, y_train)
|
|
probs_cal = cal.predict_proba(X_val)[:, 1]
|
|
# 매 calibrated probability — threshold 의미 직관적
|
|
```
|
|
|
|
### Per-class threshold (multi-label)
|
|
```python
|
|
def tune_per_label(y_true, probs): # (N, L)
|
|
L = probs.shape[1]
|
|
thrs = np.zeros(L)
|
|
for k in range(L):
|
|
thrs[k], _ = best_fbeta_threshold(y_true[:, k], probs[:, k], beta=1.0)
|
|
return thrs
|
|
```
|
|
|
|
## 매 결정 기준
|
|
| 상황 | Threshold 우선 |
|
|
|---|---|
|
|
| 의료 screening (놓치면 위험) | High recall (low τ), F2 |
|
|
| Spam / 광고 차단 (오차단 곤란) | High precision (high τ), F0.5 |
|
|
| Balanced cost | F1 maximize |
|
|
| 명시적 cost ratio 있음 | Cost-based threshold |
|
|
| 이미 imbalanced + 비교 | PR-AUC report (ROC 보조) |
|
|
| Multi-label | Per-label threshold tune |
|
|
|
|
**기본값**: F1 maximize on validation set + calibration. Imbalanced면 PR-AUC report.
|
|
|
|
## 🔗 Graph
|
|
- 변형: [[ROC_AUC]]
|
|
- 응용: [[Imbalanced_Data]] · [[Anomaly_Detection]] · [[Information Retrieval (IR)|Information_Retrieval]]
|
|
- Adjacent: [[Threshold_Tuning]]
|
|
|
|
## 🤖 LLM 활용
|
|
**언제**: classifier deployment threshold 결정, imbalanced eval report, cost-sensitive decision.
|
|
**언제 X**: 매 ranking task에서 매 단일 threshold 무의미 — top-k metric 또는 nDCG 사용.
|
|
|
|
## ❌ 안티패턴
|
|
- **Default 0.5 threshold without tuning**: 매 imbalanced 에서 매 useless. Validation tune 필수.
|
|
- **Tune threshold on test set**: 매 leak. Validation 만.
|
|
- **ROC-AUC only on imbalanced**: 매 inflated 결과 — PR-AUC 동반.
|
|
- **Ignore calibration before threshold**: 매 uncalibrated probability 의 threshold 매 transferable X.
|
|
- **F1 maximize when costs are asymmetric**: 매 cost ratio 있으면 F-beta 또는 explicit cost.
|
|
|
|
## 🧪 검증 / 중복
|
|
- Verified (sklearn metrics docs precision_recall_curve, Saito & Rehmsmeier 2015 'PR vs ROC for imbalanced').
|
|
- 신뢰도 A.
|
|
|
|
## 🕓 Changelog
|
|
| 날짜 | 변경 |
|
|
|---|---|
|
|
| 2026-05-08 | Phase 1 |
|
|
| 2026-05-10 | Manual cleanup — PR tradeoff math + threshold tuning + PR vs ROC on imbalanced |
|