--- id: wiki-2026-0508-algorithmic-fairness title: Algorithmic Fairness category: 10_Wiki/Topics status: verified canonical_id: self aliases: [AI fairness, ML bias, fair ML, algorithmic bias, group fairness] duplicate_of: none source_trust_level: B confidence_score: 0.85 verification_status: conceptual tags: [fairness, bias, ml-ethics, disparate-impact, audit, ai-governance, compas, gender-shades] raw_sources: [] last_reinforced: 2026-05-09 github_commit: pending --- # Algorithmic Fairness ## 📌 한 줄 통찰 > **"매 group 의 differential impact 의 measure + mitigate"**. 매 ML system 의 bias = data + algorithm + deployment. **Pre-processing / In-processing / Post-processing** 의 3 stage 의 fairness intervention. ## 📖 핵심 ### 매 fairness 의 definition #### 1. Group fairness - **Demographic parity**: 매 group 의 same positive rate. - **Equal opportunity**: 매 group 의 same TPR. - **Equalized odds**: TPR + FPR 둘 다 same. - **Calibration**: 매 score 의 same meaning. → 매 mathematically incompatible (impossibility theorem). #### 2. Individual fairness - 매 similar individual 의 similar treatment. - "Similar" 의 definition 어려움. #### 3. Counterfactual fairness - 매 prediction 의 unchanged if protected attribute 변경. - 매 causal model 필요. ### 매 famous case #### COMPAS (recidivism) - ProPublica 2016. - 매 black defendant 의 false positive rate 2x. - 매 risk score 의 racial bias. #### Gender Shades (face recognition) - Joy Buolamwini, Timnit Gebru 2018. - 매 dark-skinned female 의 error rate 35% (vs light male 1%). #### Amazon hiring AI (2018) - 매 resume + woman keyword 의 penalty. - 매 historical bias 의 reproduce. → 매 abandon. #### Apple Card (2019) - 매 credit limit 의 woman 의 lower (same financial profile). #### Healthcare risk score (2019) - 매 black patient 의 lower risk score (same need). - 매 historical 의 healthcare expenditure (proxy bias). ### 매 source of bias #### Data - **Historical**: 매 past discrimination. - **Representation**: 매 underrepresented group. - **Measurement**: 매 different signal quality per group. #### Algorithm - 매 objective function 의 majority bias. - 매 feature selection. - 매 hyperparameter tuning. #### Deployment - 매 user feedback loop. - 매 differential adoption. - 매 contextual mismatch. ### 매 mitigation strategy #### Pre-processing (data) - 매 reweight sample. - 매 generate synthetic minority. - 매 protected attribute 의 remove (often insufficient — proxy). #### In-processing (training) - 매 fairness constraint 의 add to loss. - Adversarial debiasing. - 매 prejudice remover. #### Post-processing (output) - 매 threshold 의 group-specific. - 매 score calibration. - Rejection option classification. ### 매 audit / measurement #### Disparate impact - 4/5 rule (US EEOC). - 매 minority 의 selection rate < 80% of majority = potential discrimination. #### AIF360 (IBM) - 매 70+ fairness metric. - 매 9 mitigation algorithm. - Open source. #### Aequitas (Univ. Chicago) - 매 audit toolkit. #### Google What-If Tool - 매 interactive exploration. ### 매 regulation - **EU AI Act**: 매 high-risk 의 bias check. - **NYC Local Law 144**: hiring AI 의 annual audit. - **EEOC** (US): employment discrimination. - **GDPR Article 22**: 매 automated decision 의 human review. ### 매 organizational practice #### Pre-deployment - 매 audit. - 매 disparate impact analysis. - 매 adversarial test. - 매 model card 의 disclosure. #### Production - 매 monitoring. - 매 user feedback. - 매 quarterly review. #### Incident - 매 user 의 complaint. - 매 root cause. - 매 remediation. ## 💻 Code ### Disparate impact (AIF360) ```python from aif360.datasets import BinaryLabelDataset from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric # Wrap your dataset dataset = BinaryLabelDataset( df=df, label_names=['hired'], protected_attribute_names=['gender'], favorable_label=1, unfavorable_label=0, ) # Pre-train metric metric = BinaryLabelDatasetMetric( dataset, unprivileged_groups=[{'gender': 0}], privileged_groups=[{'gender': 1}], ) print(f"Disparate impact: {metric.disparate_impact()}") # < 0.8 = potential bias (4/5 rule) # After model: classification metric classified = BinaryLabelDataset(...) # with predictions clf_metric = ClassificationMetric( dataset, classified, unprivileged_groups=[{'gender': 0}], privileged_groups=[{'gender': 1}], ) print(f"Equal opportunity diff: {clf_metric.equal_opportunity_difference()}") print(f"Avg odds diff: {clf_metric.average_odds_difference()}") ``` ### Reweighting (pre-processing) ```python from aif360.algorithms.preprocessing import Reweighing rw = Reweighing( unprivileged_groups=[{'gender': 0}], privileged_groups=[{'gender': 1}], ) dataset_rw = rw.fit_transform(dataset) # Train on reweighted data model = train(dataset_rw) ``` ### Adversarial debiasing (in-processing) ```python from aif360.algorithms.inprocessing import AdversarialDebiasing import tensorflow as tf sess = tf.Session() debiased = AdversarialDebiasing( privileged_groups=[{'gender': 1}], unprivileged_groups=[{'gender': 0}], scope_name='debiased', debias=True, sess=sess, ) debiased.fit(dataset_train) preds = debiased.predict(dataset_test) ``` ### Threshold optimization (post-processing) ```python from aif360.algorithms.postprocessing import EqOddsPostprocessing eq_odds = EqOddsPostprocessing( unprivileged_groups=[{'gender': 0}], privileged_groups=[{'gender': 1}], ) eq_odds.fit(dataset_val, predictions_val) predictions_balanced = eq_odds.predict(predictions_test) ``` ### Fairness in CI ```python def fairness_test(model, X_test, y_test, groups): """매 release 의 fairness gate.""" accuracies = {} for group_value in np.unique(groups): mask = groups == group_value accuracies[group_value] = model.score(X_test[mask], y_test[mask]) disparity = max(accuracies.values()) - min(accuracies.values()) if disparity > 0.05: raise FairnessFailure(f"Disparity: {disparity:.2%}") ``` ### Counterfactual test ```python def counterfactual_test(model, instance, protected_attr='gender'): """매 attribute 의 flip 의 prediction change.""" pred_original = model.predict([instance]) flipped = instance.copy() flipped[protected_attr] = 1 - flipped[protected_attr] pred_flipped = model.predict([flipped]) if pred_original != pred_flipped: return f"Bias detected: {protected_attr} flip changes prediction" ``` ## 🤔 결정 기준 | Risk level | Mitigation | |---|---| | Low (spam filter) | Audit log + monitor | | Medium (recommendation) | + Disparate impact check | | High (hiring, lending) | + Pre/in/post-processing | | Critical (criminal justice, medical) | + Strict regulation + human review | **기본값**: 4/5 rule check + per-group accuracy + counterfactual test + disclosure. ## 🔗 Graph - 부모: [[AI-Ethics]] · [[AI Accountability]] - 변형: [[Group-Fairness]] ## 🤖 LLM 활용 **언제**: 매 ML system 의 deployment review. 매 audit. 매 high-risk 의 design. **언제 X**: Specific legal advice (lawyer). Specific implementation 의 detail. ## ❌ 안티패턴 - **"Just remove protected attribute"**: 매 proxy 의 still bias. - **Single fairness metric**: 매 trade-off 의 ignore. - **No audit**: silent bias. - **Historical data 의 trust**: 매 past discrimination 의 amplify. - **Disparate impact 의 fix only**: 매 individual 의 unfair still. ## 🧪 검증 / 중복 - Verified. - 신뢰도 B (academic + industry consensus). - Related: [[AI Accountability]] · [[AI 거버넌스 정책(AI Usage Policy)|AI-Governance-Policy]]. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-09 | Manual cleanup — fairness type + famous case + AIF360 code + 결정 |