Files
2nd/10_Wiki/Topics/AI_and_ML/Algorithmic Fairness.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

7.8 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit
wiki-2026-0508-algorithmic-fairness Algorithmic Fairness 10_Wiki/Topics verified self
AI fairness
ML bias
fair ML
algorithmic bias
group fairness
none B 0.85 conceptual
fairness
bias
ml-ethics
disparate-impact
audit
ai-governance
compas
gender-shades
2026-05-09 pending

Algorithmic Fairness

📌 한 줄 통찰

"매 group 의 differential impact 의 measure + mitigate". 매 ML system 의 bias = data + algorithm + deployment. Pre-processing / In-processing / Post-processing 의 3 stage 의 fairness intervention.

📖 핵심

매 fairness 의 definition

1. Group fairness

  • Demographic parity: 매 group 의 same positive rate.
  • Equal opportunity: 매 group 의 same TPR.
  • Equalized odds: TPR + FPR 둘 다 same.
  • Calibration: 매 score 의 same meaning.

→ 매 mathematically incompatible (impossibility theorem).

2. Individual fairness

  • 매 similar individual 의 similar treatment.
  • "Similar" 의 definition 어려움.

3. Counterfactual fairness

  • 매 prediction 의 unchanged if protected attribute 변경.
  • 매 causal model 필요.

매 famous case

COMPAS (recidivism)

  • ProPublica 2016.
  • 매 black defendant 의 false positive rate 2x.
  • 매 risk score 의 racial bias.

Gender Shades (face recognition)

  • Joy Buolamwini, Timnit Gebru 2018.
  • 매 dark-skinned female 의 error rate 35% (vs light male 1%).

Amazon hiring AI (2018)

  • 매 resume + woman keyword 의 penalty.
  • 매 historical bias 의 reproduce.

→ 매 abandon.

Apple Card (2019)

  • 매 credit limit 의 woman 의 lower (same financial profile).

Healthcare risk score (2019)

  • 매 black patient 의 lower risk score (same need).
  • 매 historical 의 healthcare expenditure (proxy bias).

매 source of bias

Data

  • Historical: 매 past discrimination.
  • Representation: 매 underrepresented group.
  • Measurement: 매 different signal quality per group.

Algorithm

  • 매 objective function 의 majority bias.
  • 매 feature selection.
  • 매 hyperparameter tuning.

Deployment

  • 매 user feedback loop.
  • 매 differential adoption.
  • 매 contextual mismatch.

매 mitigation strategy

Pre-processing (data)

  • 매 reweight sample.
  • 매 generate synthetic minority.
  • 매 protected attribute 의 remove (often insufficient — proxy).

In-processing (training)

  • 매 fairness constraint 의 add to loss.
  • Adversarial debiasing.
  • 매 prejudice remover.

Post-processing (output)

  • 매 threshold 의 group-specific.
  • 매 score calibration.
  • Rejection option classification.

매 audit / measurement

Disparate impact

  • 4/5 rule (US EEOC).
  • 매 minority 의 selection rate < 80% of majority = potential discrimination.

AIF360 (IBM)

  • 매 70+ fairness metric.
  • 매 9 mitigation algorithm.
  • Open source.

Aequitas (Univ. Chicago)

  • 매 audit toolkit.

Google What-If Tool

  • 매 interactive exploration.

매 regulation

  • EU AI Act: 매 high-risk 의 bias check.
  • NYC Local Law 144: hiring AI 의 annual audit.
  • EEOC (US): employment discrimination.
  • GDPR Article 22: 매 automated decision 의 human review.

매 organizational practice

Pre-deployment

  • 매 audit.
  • 매 disparate impact analysis.
  • 매 adversarial test.
  • 매 model card 의 disclosure.

Production

  • 매 monitoring.
  • 매 user feedback.
  • 매 quarterly review.

Incident

  • 매 user 의 complaint.
  • 매 root cause.
  • 매 remediation.

💻 Code

Disparate impact (AIF360)

from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

# Wrap your dataset
dataset = BinaryLabelDataset(
    df=df,
    label_names=['hired'],
    protected_attribute_names=['gender'],
    favorable_label=1,
    unfavorable_label=0,
)

# Pre-train metric
metric = BinaryLabelDatasetMetric(
    dataset,
    unprivileged_groups=[{'gender': 0}],
    privileged_groups=[{'gender': 1}],
)
print(f"Disparate impact: {metric.disparate_impact()}")
# < 0.8 = potential bias (4/5 rule)

# After model: classification metric
classified = BinaryLabelDataset(...)  # with predictions
clf_metric = ClassificationMetric(
    dataset, classified,
    unprivileged_groups=[{'gender': 0}],
    privileged_groups=[{'gender': 1}],
)
print(f"Equal opportunity diff: {clf_metric.equal_opportunity_difference()}")
print(f"Avg odds diff: {clf_metric.average_odds_difference()}")

Reweighting (pre-processing)

from aif360.algorithms.preprocessing import Reweighing

rw = Reweighing(
    unprivileged_groups=[{'gender': 0}],
    privileged_groups=[{'gender': 1}],
)
dataset_rw = rw.fit_transform(dataset)

# Train on reweighted data
model = train(dataset_rw)

Adversarial debiasing (in-processing)

from aif360.algorithms.inprocessing import AdversarialDebiasing
import tensorflow as tf

sess = tf.Session()
debiased = AdversarialDebiasing(
    privileged_groups=[{'gender': 1}],
    unprivileged_groups=[{'gender': 0}],
    scope_name='debiased',
    debias=True,
    sess=sess,
)
debiased.fit(dataset_train)
preds = debiased.predict(dataset_test)

Threshold optimization (post-processing)

from aif360.algorithms.postprocessing import EqOddsPostprocessing

eq_odds = EqOddsPostprocessing(
    unprivileged_groups=[{'gender': 0}],
    privileged_groups=[{'gender': 1}],
)
eq_odds.fit(dataset_val, predictions_val)
predictions_balanced = eq_odds.predict(predictions_test)

Fairness in CI

def fairness_test(model, X_test, y_test, groups):
    """매 release 의 fairness gate."""
    accuracies = {}
    for group_value in np.unique(groups):
        mask = groups == group_value
        accuracies[group_value] = model.score(X_test[mask], y_test[mask])
    
    disparity = max(accuracies.values()) - min(accuracies.values())
    if disparity > 0.05:
        raise FairnessFailure(f"Disparity: {disparity:.2%}")

Counterfactual test

def counterfactual_test(model, instance, protected_attr='gender'):
    """매 attribute 의 flip 의 prediction change."""
    pred_original = model.predict([instance])
    
    flipped = instance.copy()
    flipped[protected_attr] = 1 - flipped[protected_attr]
    pred_flipped = model.predict([flipped])
    
    if pred_original != pred_flipped:
        return f"Bias detected: {protected_attr} flip changes prediction"

🤔 결정 기준

Risk level Mitigation
Low (spam filter) Audit log + monitor
Medium (recommendation) + Disparate impact check
High (hiring, lending) + Pre/in/post-processing
Critical (criminal justice, medical) + Strict regulation + human review

기본값: 4/5 rule check + per-group accuracy + counterfactual test + disclosure.

🔗 Graph

🤖 LLM 활용

언제: 매 ML system 의 deployment review. 매 audit. 매 high-risk 의 design. 언제 X: Specific legal advice (lawyer). Specific implementation 의 detail.

안티패턴

  • "Just remove protected attribute": 매 proxy 의 still bias.
  • Single fairness metric: 매 trade-off 의 ignore.
  • No audit: silent bias.
  • Historical data 의 trust: 매 past discrimination 의 amplify.
  • Disparate impact 의 fix only: 매 individual 의 unfair still.

🧪 검증 / 중복

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-09 Manual cleanup — fairness type + famous case + AIF360 code + 결정