Files
2nd/10_Wiki/Topics/AI_and_ML/Precision-Recall-Tradeoff.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

5.6 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-precision-recall-tradeoff Precision-Recall Tradeoff 10_Wiki/Topics verified self
PR Tradeoff
Threshold Tuning
F1 Optimization
none A 0.9 applied
classification
evaluation
metrics
threshold
imbalanced
2026-05-10 pending
language framework
python scikit-learn

Precision-Recall Tradeoff

매 한 줄

"매 classifier threshold를 올리면 precision↑ recall↓ — 두 metric 동시에 최대화 불가.". F1 / F-beta / PR-AUC 가 매 두 축의 통합 score. 매 imbalanced data (의료, fraud, anomaly)에서 ROC-AUC보다 매 PR-AUC가 honest.

매 핵심

매 정의

  • Precision = TP / (TP + FP) — "alarm 중 진짜 비율".
  • Recall = TP / (TP + FN) — "진짜 중 잡은 비율" (= sensitivity, TPR).
  • F1 = 2·P·R / (P+R) — harmonic mean.
  • F-beta = (1+β²)·P·R / (β²P + R) — β>1은 recall 가중, β<1은 precision 가중.

매 tradeoff mechanism

  • Classifier output score에 threshold τ 적용.
  • τ ↑ → 더 까다롭게 positive 선언 → precision ↑, recall ↓.
  • τ ↓ → 더 많이 positive → recall ↑, precision ↓.
  • Pareto curve = Precision-Recall curve.

매 vs ROC-AUC

  • ROC: TPR vs FPR — class balance에 매 insensitive (오해 유발).
  • PR: P vs R — positive class에 focus, imbalanced 에 매 informative.
  • 매 99% 음성 dataset에서 매 ROC-AUC=0.95여도 PR-AUC=0.3일 수 있음.

매 응용

  1. 의료 진단 (recall 우선 — miss 위험).
  2. Spam filter (precision 우선 — false alarm 비용).
  3. Fraud detection (cost-sensitive threshold).
  4. Information retrieval (P@k, R@k).
  5. Object detection (mAP = PR-AUC 기반).
  6. RAG retrieval evaluation.

💻 패턴

sklearn PR curve + best F1 threshold

import numpy as np
from sklearn.metrics import precision_recall_curve, average_precision_score

probs = clf.predict_proba(X_val)[:, 1]
p, r, thr = precision_recall_curve(y_val, probs)
f1 = 2 * p * r / (p + r + 1e-12)
best = f1.argmax()
print(f"τ={thr[best]:.3f}  P={p[best]:.3f}  R={r[best]:.3f}  F1={f1[best]:.3f}")
print(f"PR-AUC = {average_precision_score(y_val, probs):.3f}")

F-beta threshold (recall 가중)

def best_fbeta_threshold(y, probs, beta=2.0):
    p, r, thr = precision_recall_curve(y, probs)
    fb = (1+beta**2) * p * r / (beta**2 * p + r + 1e-12)
    i = fb.argmax()
    return thr[i] if i < len(thr) else 1.0, fb[i]

Cost-based threshold

def cost_threshold(y, probs, cost_fp=1.0, cost_fn=10.0, n_thr=200):
    thrs = np.linspace(0, 1, n_thr)
    best_t, best_c = 0.0, np.inf
    for t in thrs:
        pred = (probs >= t).astype(int)
        fp = ((pred == 1) & (y == 0)).sum()
        fn = ((pred == 0) & (y == 1)).sum()
        c = cost_fp*fp + cost_fn*fn
        if c < best_c: best_c, best_t = c, t
    return best_t, best_c

Plot PR curve

import matplotlib.pyplot as plt
plt.plot(r, p, label=f'AP={average_precision_score(y_val, probs):.3f}')
plt.xlabel('Recall'); plt.ylabel('Precision'); plt.legend()

PR vs ROC on imbalanced

from sklearn.metrics import roc_auc_score, average_precision_score
# y는 imbalanced (1% positive)
print('ROC-AUC :', roc_auc_score(y_val, probs))      # 매 inflated
print('PR-AUC  :', average_precision_score(y_val, probs))  # 매 honest

Calibration before threshold tuning

from sklearn.calibration import CalibratedClassifierCV
cal = CalibratedClassifierCV(clf, method='isotonic', cv=5)
cal.fit(X_train, y_train)
probs_cal = cal.predict_proba(X_val)[:, 1]
# 매 calibrated probability — threshold 의미 직관적

Per-class threshold (multi-label)

def tune_per_label(y_true, probs):  # (N, L)
    L = probs.shape[1]
    thrs = np.zeros(L)
    for k in range(L):
        thrs[k], _ = best_fbeta_threshold(y_true[:, k], probs[:, k], beta=1.0)
    return thrs

매 결정 기준

상황 Threshold 우선
의료 screening (놓치면 위험) High recall (low τ), F2
Spam / 광고 차단 (오차단 곤란) High precision (high τ), F0.5
Balanced cost F1 maximize
명시적 cost ratio 있음 Cost-based threshold
이미 imbalanced + 비교 PR-AUC report (ROC 보조)
Multi-label Per-label threshold tune

기본값: F1 maximize on validation set + calibration. Imbalanced면 PR-AUC report.

🔗 Graph

🤖 LLM 활용

언제: classifier deployment threshold 결정, imbalanced eval report, cost-sensitive decision. 언제 X: 매 ranking task에서 매 단일 threshold 무의미 — top-k metric 또는 nDCG 사용.

안티패턴

  • Default 0.5 threshold without tuning: 매 imbalanced 에서 매 useless. Validation tune 필수.
  • Tune threshold on test set: 매 leak. Validation 만.
  • ROC-AUC only on imbalanced: 매 inflated 결과 — PR-AUC 동반.
  • Ignore calibration before threshold: 매 uncalibrated probability 의 threshold 매 transferable X.
  • F1 maximize when costs are asymmetric: 매 cost ratio 있으면 F-beta 또는 explicit cost.

🧪 검증 / 중복

  • Verified (sklearn metrics docs precision_recall_curve, Saito & Rehmsmeier 2015 'PR vs ROC for imbalanced').
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — PR tradeoff math + threshold tuning + PR vs ROC on imbalanced