---
id: wiki-2026-0508-precision-recall-tradeoff
title: Precision-Recall Tradeoff
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [PR Tradeoff, Threshold Tuning, F1 Optimization]
duplicate_of: none
source_trust_level: A
confidence_score: 0.9
verification_status: applied
tags: [classification, evaluation, metrics, threshold, imbalanced]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: scikit-learn
---

# Precision-Recall Tradeoff

## 매 한 줄
> **"매 classifier threshold를 올리면 precision↑ recall↓ — 두 metric 동시에 최대화 불가."**. F1 / F-beta / PR-AUC 가 매 두 축의 통합 score. 매 imbalanced data (의료, fraud, anomaly)에서 ROC-AUC보다 매 PR-AUC가 honest.

## 매 핵심

### 매 정의
- **Precision** = TP / (TP + FP) — "alarm 중 진짜 비율".
- **Recall** = TP / (TP + FN) — "진짜 중 잡은 비율" (= sensitivity, TPR).
- **F1** = 2·P·R / (P+R) — harmonic mean.
- **F-beta** = (1+β²)·P·R / (β²P + R) — β>1은 recall 가중, β<1은 precision 가중.

### 매 tradeoff mechanism
- Classifier output score에 **threshold τ** 적용.
- τ ↑ → 더 까다롭게 positive 선언 → precision ↑, recall ↓.
- τ ↓ → 더 많이 positive → recall ↑, precision ↓.
- Pareto curve = Precision-Recall curve.

### 매 vs ROC-AUC
- **ROC**: TPR vs FPR — class balance에 매 insensitive (오해 유발).
- **PR**: P vs R — positive class에 focus, imbalanced 에 매 informative.
- 매 99% 음성 dataset에서 매 ROC-AUC=0.95여도 PR-AUC=0.3일 수 있음.

### 매 응용
1. 의료 진단 (recall 우선 — miss 위험).
2. Spam filter (precision 우선 — false alarm 비용).
3. Fraud detection (cost-sensitive threshold).
4. Information retrieval (P@k, R@k).
5. Object detection (mAP = PR-AUC 기반).
6. RAG retrieval evaluation.

## 💻 패턴

### sklearn PR curve + best F1 threshold
```python
import numpy as np
from sklearn.metrics import precision_recall_curve, average_precision_score

probs = clf.predict_proba(X_val)[:, 1]
p, r, thr = precision_recall_curve(y_val, probs)
f1 = 2 * p * r / (p + r + 1e-12)
best = f1.argmax()
print(f"τ={thr[best]:.3f}  P={p[best]:.3f}  R={r[best]:.3f}  F1={f1[best]:.3f}")
print(f"PR-AUC = {average_precision_score(y_val, probs):.3f}")
```

### F-beta threshold (recall 가중)
```python
def best_fbeta_threshold(y, probs, beta=2.0):
    p, r, thr = precision_recall_curve(y, probs)
    fb = (1+beta**2) * p * r / (beta**2 * p + r + 1e-12)
    i = fb.argmax()
    return thr[i] if i < len(thr) else 1.0, fb[i]
```

### Cost-based threshold
```python
def cost_threshold(y, probs, cost_fp=1.0, cost_fn=10.0, n_thr=200):
    thrs = np.linspace(0, 1, n_thr)
    best_t, best_c = 0.0, np.inf
    for t in thrs:
        pred = (probs >= t).astype(int)
        fp = ((pred == 1) & (y == 0)).sum()
        fn = ((pred == 0) & (y == 1)).sum()
        c = cost_fp*fp + cost_fn*fn
        if c < best_c: best_c, best_t = c, t
    return best_t, best_c
```

### Plot PR curve
```python
import matplotlib.pyplot as plt
plt.plot(r, p, label=f'AP={average_precision_score(y_val, probs):.3f}')
plt.xlabel('Recall'); plt.ylabel('Precision'); plt.legend()
```

### PR vs ROC on imbalanced
```python
from sklearn.metrics import roc_auc_score, average_precision_score
# y는 imbalanced (1% positive)
print('ROC-AUC :', roc_auc_score(y_val, probs))      # 매 inflated
print('PR-AUC  :', average_precision_score(y_val, probs))  # 매 honest
```

### Calibration before threshold tuning
```python
from sklearn.calibration import CalibratedClassifierCV
cal = CalibratedClassifierCV(clf, method='isotonic', cv=5)
cal.fit(X_train, y_train)
probs_cal = cal.predict_proba(X_val)[:, 1]
# 매 calibrated probability — threshold 의미 직관적
```

### Per-class threshold (multi-label)
```python
def tune_per_label(y_true, probs):  # (N, L)
    L = probs.shape[1]
    thrs = np.zeros(L)
    for k in range(L):
        thrs[k], _ = best_fbeta_threshold(y_true[:, k], probs[:, k], beta=1.0)
    return thrs
```

## 매 결정 기준
| 상황 | Threshold 우선 |
|---|---|
| 의료 screening (놓치면 위험) | High recall (low τ), F2 |
| Spam / 광고 차단 (오차단 곤란) | High precision (high τ), F0.5 |
| Balanced cost | F1 maximize |
| 명시적 cost ratio 있음 | Cost-based threshold |
| 이미 imbalanced + 비교 | PR-AUC report (ROC 보조) |
| Multi-label | Per-label threshold tune |

**기본값**: F1 maximize on validation set + calibration. Imbalanced면 PR-AUC report.

## 🔗 Graph
- 변형: [[ROC_AUC]]
- 응용: [[Imbalanced_Data]] · [[Anomaly Detection]] · [[Information Retrieval (IR)|Information_Retrieval]]
- Adjacent: [[Threshold_Tuning]]

## 🤖 LLM 활용
**언제**: classifier deployment threshold 결정, imbalanced eval report, cost-sensitive decision.
**언제 X**: 매 ranking task에서 매 단일 threshold 무의미 — top-k metric 또는 nDCG 사용.

## ❌ 안티패턴
- **Default 0.5 threshold without tuning**: 매 imbalanced 에서 매 useless. Validation tune 필수.
- **Tune threshold on test set**: 매 leak. Validation 만.
- **ROC-AUC only on imbalanced**: 매 inflated 결과 — PR-AUC 동반.
- **Ignore calibration before threshold**: 매 uncalibrated probability 의 threshold 매 transferable X.
- **F1 maximize when costs are asymmetric**: 매 cost ratio 있으면 F-beta 또는 explicit cost.

## 🧪 검증 / 중복
- Verified (sklearn metrics docs precision_recall_curve, Saito & Rehmsmeier 2015 'PR vs ROC for imbalanced').
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — PR tradeoff math + threshold tuning + PR vs ROC on imbalanced |