d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7.8 KiB
7.8 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-algorithmic-fairness | Algorithmic Fairness | 10_Wiki/Topics | verified | self |
|
none | B | 0.85 | conceptual |
|
2026-05-09 | pending |
Algorithmic Fairness
📌 한 줄 통찰
"매 group 의 differential impact 의 measure + mitigate". 매 ML system 의 bias = data + algorithm + deployment. Pre-processing / In-processing / Post-processing 의 3 stage 의 fairness intervention.
📖 핵심
매 fairness 의 definition
1. Group fairness
- Demographic parity: 매 group 의 same positive rate.
- Equal opportunity: 매 group 의 same TPR.
- Equalized odds: TPR + FPR 둘 다 same.
- Calibration: 매 score 의 same meaning.
→ 매 mathematically incompatible (impossibility theorem).
2. Individual fairness
- 매 similar individual 의 similar treatment.
- "Similar" 의 definition 어려움.
3. Counterfactual fairness
- 매 prediction 의 unchanged if protected attribute 변경.
- 매 causal model 필요.
매 famous case
COMPAS (recidivism)
- ProPublica 2016.
- 매 black defendant 의 false positive rate 2x.
- 매 risk score 의 racial bias.
Gender Shades (face recognition)
- Joy Buolamwini, Timnit Gebru 2018.
- 매 dark-skinned female 의 error rate 35% (vs light male 1%).
Amazon hiring AI (2018)
- 매 resume + woman keyword 의 penalty.
- 매 historical bias 의 reproduce.
→ 매 abandon.
Apple Card (2019)
- 매 credit limit 의 woman 의 lower (same financial profile).
Healthcare risk score (2019)
- 매 black patient 의 lower risk score (same need).
- 매 historical 의 healthcare expenditure (proxy bias).
매 source of bias
Data
- Historical: 매 past discrimination.
- Representation: 매 underrepresented group.
- Measurement: 매 different signal quality per group.
Algorithm
- 매 objective function 의 majority bias.
- 매 feature selection.
- 매 hyperparameter tuning.
Deployment
- 매 user feedback loop.
- 매 differential adoption.
- 매 contextual mismatch.
매 mitigation strategy
Pre-processing (data)
- 매 reweight sample.
- 매 generate synthetic minority.
- 매 protected attribute 의 remove (often insufficient — proxy).
In-processing (training)
- 매 fairness constraint 의 add to loss.
- Adversarial debiasing.
- 매 prejudice remover.
Post-processing (output)
- 매 threshold 의 group-specific.
- 매 score calibration.
- Rejection option classification.
매 audit / measurement
Disparate impact
- 4/5 rule (US EEOC).
- 매 minority 의 selection rate < 80% of majority = potential discrimination.
AIF360 (IBM)
- 매 70+ fairness metric.
- 매 9 mitigation algorithm.
- Open source.
Aequitas (Univ. Chicago)
- 매 audit toolkit.
Google What-If Tool
- 매 interactive exploration.
매 regulation
- EU AI Act: 매 high-risk 의 bias check.
- NYC Local Law 144: hiring AI 의 annual audit.
- EEOC (US): employment discrimination.
- GDPR Article 22: 매 automated decision 의 human review.
매 organizational practice
Pre-deployment
- 매 audit.
- 매 disparate impact analysis.
- 매 adversarial test.
- 매 model card 의 disclosure.
Production
- 매 monitoring.
- 매 user feedback.
- 매 quarterly review.
Incident
- 매 user 의 complaint.
- 매 root cause.
- 매 remediation.
💻 Code
Disparate impact (AIF360)
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
# Wrap your dataset
dataset = BinaryLabelDataset(
df=df,
label_names=['hired'],
protected_attribute_names=['gender'],
favorable_label=1,
unfavorable_label=0,
)
# Pre-train metric
metric = BinaryLabelDatasetMetric(
dataset,
unprivileged_groups=[{'gender': 0}],
privileged_groups=[{'gender': 1}],
)
print(f"Disparate impact: {metric.disparate_impact()}")
# < 0.8 = potential bias (4/5 rule)
# After model: classification metric
classified = BinaryLabelDataset(...) # with predictions
clf_metric = ClassificationMetric(
dataset, classified,
unprivileged_groups=[{'gender': 0}],
privileged_groups=[{'gender': 1}],
)
print(f"Equal opportunity diff: {clf_metric.equal_opportunity_difference()}")
print(f"Avg odds diff: {clf_metric.average_odds_difference()}")
Reweighting (pre-processing)
from aif360.algorithms.preprocessing import Reweighing
rw = Reweighing(
unprivileged_groups=[{'gender': 0}],
privileged_groups=[{'gender': 1}],
)
dataset_rw = rw.fit_transform(dataset)
# Train on reweighted data
model = train(dataset_rw)
Adversarial debiasing (in-processing)
from aif360.algorithms.inprocessing import AdversarialDebiasing
import tensorflow as tf
sess = tf.Session()
debiased = AdversarialDebiasing(
privileged_groups=[{'gender': 1}],
unprivileged_groups=[{'gender': 0}],
scope_name='debiased',
debias=True,
sess=sess,
)
debiased.fit(dataset_train)
preds = debiased.predict(dataset_test)
Threshold optimization (post-processing)
from aif360.algorithms.postprocessing import EqOddsPostprocessing
eq_odds = EqOddsPostprocessing(
unprivileged_groups=[{'gender': 0}],
privileged_groups=[{'gender': 1}],
)
eq_odds.fit(dataset_val, predictions_val)
predictions_balanced = eq_odds.predict(predictions_test)
Fairness in CI
def fairness_test(model, X_test, y_test, groups):
"""매 release 의 fairness gate."""
accuracies = {}
for group_value in np.unique(groups):
mask = groups == group_value
accuracies[group_value] = model.score(X_test[mask], y_test[mask])
disparity = max(accuracies.values()) - min(accuracies.values())
if disparity > 0.05:
raise FairnessFailure(f"Disparity: {disparity:.2%}")
Counterfactual test
def counterfactual_test(model, instance, protected_attr='gender'):
"""매 attribute 의 flip 의 prediction change."""
pred_original = model.predict([instance])
flipped = instance.copy()
flipped[protected_attr] = 1 - flipped[protected_attr]
pred_flipped = model.predict([flipped])
if pred_original != pred_flipped:
return f"Bias detected: {protected_attr} flip changes prediction"
🤔 결정 기준
| Risk level | Mitigation |
|---|---|
| Low (spam filter) | Audit log + monitor |
| Medium (recommendation) | + Disparate impact check |
| High (hiring, lending) | + Pre/in/post-processing |
| Critical (criminal justice, medical) | + Strict regulation + human review |
기본값: 4/5 rule check + per-group accuracy + counterfactual test + disclosure.
🔗 Graph
- 부모: AI-Ethics · AI Accountability
- 변형: Group-Fairness
🤖 LLM 활용
언제: 매 ML system 의 deployment review. 매 audit. 매 high-risk 의 design. 언제 X: Specific legal advice (lawyer). Specific implementation 의 detail.
❌ 안티패턴
- "Just remove protected attribute": 매 proxy 의 still bias.
- Single fairness metric: 매 trade-off 의 ignore.
- No audit: silent bias.
- Historical data 의 trust: 매 past discrimination 의 amplify.
- Disparate impact 의 fix only: 매 individual 의 unfair still.
🧪 검증 / 중복
- Verified.
- 신뢰도 B (academic + industry consensus).
- Related: AI Accountability · AI 거버넌스 정책(AI Usage Policy).
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-09 | Manual cleanup — fairness type + famous case + AIF360 code + 결정 |