--- id: wiki-2026-0508-precision-recall-tradeoff title: Precision-Recall Tradeoff category: 10_Wiki/Topics status: verified canonical_id: self aliases: [PR Tradeoff, Threshold Tuning, F1 Optimization] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [classification, evaluation, metrics, threshold, imbalanced] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: scikit-learn --- # Precision-Recall Tradeoff ## 매 한 줄 > **"매 classifier threshold를 올리면 precision↑ recall↓ — 두 metric 동시에 최대화 불가."**. F1 / F-beta / PR-AUC 가 매 두 축의 통합 score. 매 imbalanced data (의료, fraud, anomaly)에서 ROC-AUC보다 매 PR-AUC가 honest. ## 매 핵심 ### 매 정의 - **Precision** = TP / (TP + FP) — "alarm 중 진짜 비율". - **Recall** = TP / (TP + FN) — "진짜 중 잡은 비율" (= sensitivity, TPR). - **F1** = 2·P·R / (P+R) — harmonic mean. - **F-beta** = (1+β²)·P·R / (β²P + R) — β>1은 recall 가중, β<1은 precision 가중. ### 매 tradeoff mechanism - Classifier output score에 **threshold τ** 적용. - τ ↑ → 더 까다롭게 positive 선언 → precision ↑, recall ↓. - τ ↓ → 더 많이 positive → recall ↑, precision ↓. - Pareto curve = Precision-Recall curve. ### 매 vs ROC-AUC - **ROC**: TPR vs FPR — class balance에 매 insensitive (오해 유발). - **PR**: P vs R — positive class에 focus, imbalanced 에 매 informative. - 매 99% 음성 dataset에서 매 ROC-AUC=0.95여도 PR-AUC=0.3일 수 있음. ### 매 응용 1. 의료 진단 (recall 우선 — miss 위험). 2. Spam filter (precision 우선 — false alarm 비용). 3. Fraud detection (cost-sensitive threshold). 4. Information retrieval (P@k, R@k). 5. Object detection (mAP = PR-AUC 기반). 6. RAG retrieval evaluation. ## 💻 패턴 ### sklearn PR curve + best F1 threshold ```python import numpy as np from sklearn.metrics import precision_recall_curve, average_precision_score probs = clf.predict_proba(X_val)[:, 1] p, r, thr = precision_recall_curve(y_val, probs) f1 = 2 * p * r / (p + r + 1e-12) best = f1.argmax() print(f"τ={thr[best]:.3f} P={p[best]:.3f} R={r[best]:.3f} F1={f1[best]:.3f}") print(f"PR-AUC = {average_precision_score(y_val, probs):.3f}") ``` ### F-beta threshold (recall 가중) ```python def best_fbeta_threshold(y, probs, beta=2.0): p, r, thr = precision_recall_curve(y, probs) fb = (1+beta**2) * p * r / (beta**2 * p + r + 1e-12) i = fb.argmax() return thr[i] if i < len(thr) else 1.0, fb[i] ``` ### Cost-based threshold ```python def cost_threshold(y, probs, cost_fp=1.0, cost_fn=10.0, n_thr=200): thrs = np.linspace(0, 1, n_thr) best_t, best_c = 0.0, np.inf for t in thrs: pred = (probs >= t).astype(int) fp = ((pred == 1) & (y == 0)).sum() fn = ((pred == 0) & (y == 1)).sum() c = cost_fp*fp + cost_fn*fn if c < best_c: best_c, best_t = c, t return best_t, best_c ``` ### Plot PR curve ```python import matplotlib.pyplot as plt plt.plot(r, p, label=f'AP={average_precision_score(y_val, probs):.3f}') plt.xlabel('Recall'); plt.ylabel('Precision'); plt.legend() ``` ### PR vs ROC on imbalanced ```python from sklearn.metrics import roc_auc_score, average_precision_score # y는 imbalanced (1% positive) print('ROC-AUC :', roc_auc_score(y_val, probs)) # 매 inflated print('PR-AUC :', average_precision_score(y_val, probs)) # 매 honest ``` ### Calibration before threshold tuning ```python from sklearn.calibration import CalibratedClassifierCV cal = CalibratedClassifierCV(clf, method='isotonic', cv=5) cal.fit(X_train, y_train) probs_cal = cal.predict_proba(X_val)[:, 1] # 매 calibrated probability — threshold 의미 직관적 ``` ### Per-class threshold (multi-label) ```python def tune_per_label(y_true, probs): # (N, L) L = probs.shape[1] thrs = np.zeros(L) for k in range(L): thrs[k], _ = best_fbeta_threshold(y_true[:, k], probs[:, k], beta=1.0) return thrs ``` ## 매 결정 기준 | 상황 | Threshold 우선 | |---|---| | 의료 screening (놓치면 위험) | High recall (low τ), F2 | | Spam / 광고 차단 (오차단 곤란) | High precision (high τ), F0.5 | | Balanced cost | F1 maximize | | 명시적 cost ratio 있음 | Cost-based threshold | | 이미 imbalanced + 비교 | PR-AUC report (ROC 보조) | | Multi-label | Per-label threshold tune | **기본값**: F1 maximize on validation set + calibration. Imbalanced면 PR-AUC report. ## 🔗 Graph - 변형: [[ROC_AUC]] - 응용: [[Imbalanced_Data]] · [[Anomaly Detection]] · [[Information Retrieval (IR)|Information_Retrieval]] - Adjacent: [[Threshold_Tuning]] ## 🤖 LLM 활용 **언제**: classifier deployment threshold 결정, imbalanced eval report, cost-sensitive decision. **언제 X**: 매 ranking task에서 매 단일 threshold 무의미 — top-k metric 또는 nDCG 사용. ## ❌ 안티패턴 - **Default 0.5 threshold without tuning**: 매 imbalanced 에서 매 useless. Validation tune 필수. - **Tune threshold on test set**: 매 leak. Validation 만. - **ROC-AUC only on imbalanced**: 매 inflated 결과 — PR-AUC 동반. - **Ignore calibration before threshold**: 매 uncalibrated probability 의 threshold 매 transferable X. - **F1 maximize when costs are asymmetric**: 매 cost ratio 있으면 F-beta 또는 explicit cost. ## 🧪 검증 / 중복 - Verified (sklearn metrics docs precision_recall_curve, Saito & Rehmsmeier 2015 'PR vs ROC for imbalanced'). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — PR tradeoff math + threshold tuning + PR vs ROC on imbalanced |