--- id: wiki-2026-0508-ethics-ai title: Ethics & AI category: 10_Wiki/Topics status: verified canonical_id: self aliases: [AI ethics, responsible AI, AI safety, alignment, fairness, bias, EU AI Act] duplicate_of: none source_trust_level: A confidence_score: 0.97 verification_status: applied tags: [ethics, ai-ethics, alignment, safety, bias, fairness, responsibility, eu-ai-act] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Universal applicable_to: [AI Development, Policy, Governance] --- # Ethics & AI ## 매 한 줄 > **"매 AI 의 design / deploy / govern 의 normative consideration"**. 매 fairness, accountability, transparency, safety. 매 modern: EU AI Act, NIST AI RMF, Anthropic Constitutional AI. 매 alignment + capability + governance 의 triad. ## 매 핵심 ### 매 pillar - **Fairness**: 매 bias 의 mitigate. - **Accountability**: 매 who 의 responsible. - **Transparency / Explainability**. - **Privacy**. - **Safety**: 매 harm prevention. - **Robustness**. - **Human autonomy**. ### 매 alignment - **RLHF**: 매 human preference. - **Constitutional AI** (Anthropic): 매 principle-based. - **DPO / KTO**: 매 RLHF alternative. - **Scalable oversight**: 매 debate, IDA. - **Honest / harmless / helpful** (HHH). ### 매 framework - **EU AI Act** (2024): 매 risk-tier. - **NIST AI RMF**: 매 govern, map, measure, manage. - **OECD AI Principles**. - **ISO/IEC 42001**: 매 AIMS. - **GDPR** (privacy). - **Algorithmic Accountability Act** (US, proposed). ### 매 risk-tier (EU AI Act) - **Unacceptable**: 매 social scoring, mass biometric. - **High-risk**: 매 hiring, credit, education, AV. - **Limited risk**: 매 chatbot disclose. - **Minimal**: 매 spam filter. ### 매 응용 issue 1. **Hiring**: 매 disparate impact. 2. **Credit**: 매 redlining. 3. **Healthcare**: 매 race-based prediction. 4. **Justice**: 매 COMPAS bias. 5. **Generative**: 매 deepfake, copyright. 6. **Surveillance**: 매 mass. 7. **Autonomous**: 매 trolley. ### 매 modern (2024-2026) - **Anthropic RSP** (Responsible Scaling Policy). - **OpenAI Preparedness**. - **Frontier model evaluations** (METR, Apollo). - **AI safety institute** (UK, US). - **AI Bill of Rights** (US OSTP). ## 💻 패턴 ### Fairness audit (demographic parity) ```python import numpy as np def demographic_parity_diff(predictions, protected_attr): groups = np.unique(protected_attr) rates = [predictions[protected_attr == g].mean() for g in groups] return max(rates) - min(rates) # 매 < 0.05 = 80% rule heuristic compliant ``` ### Equalized odds ```python def equalized_odds(predictions, labels, protected): """매 TPR + FPR 의 group 에 의 equal.""" groups = np.unique(protected) metrics = {} for g in groups: mask = protected == g tpr = ((predictions == 1) & (labels == 1) & mask).sum() / max(1, ((labels == 1) & mask).sum()) fpr = ((predictions == 1) & (labels == 0) & mask).sum() / max(1, ((labels == 0) & mask).sum()) metrics[g] = (tpr, fpr) return metrics ``` ### Bias mitigation (reweighing) ```python def reweighing(X, y, protected): """매 Kamiran-Calders 2012.""" weights = np.ones(len(y)) for g in np.unique(protected): for c in [0, 1]: mask = (protected == g) & (y == c) p_expected = (protected == g).mean() * (y == c).mean() p_observed = mask.mean() weights[mask] = p_expected / max(p_observed, 1e-9) return weights ``` ### Constitutional AI (principle-based) ```python def cai_critique(response, principles): prompt = f"""Critique this response against these principles. Principles: {format_principles(principles)} Response: {response} Output JSON with: - violated: list of principle IDs - explanation - revised_response""" return json.loads(llm.generate(prompt)) ``` ### Differential privacy (DP-SGD) ```python import opacus from opacus import PrivacyEngine privacy_engine = PrivacyEngine() model, optim, loader = privacy_engine.make_private( module=model, optimizer=optim, data_loader=loader, noise_multiplier=1.1, max_grad_norm=1.0, ) ``` ### Model card ```yaml model_name: credit-scoring-v3 intended_use: Adult US credit applications, $1k-$50k unsecured out_of_scope: - Outside US - Under 18 - Loans > $50k training_data: source: 2018-2024 internal size: 2.4M protected_attribute_audit: completed fairness_metrics: demographic_parity_diff: 0.034 equalized_odds_diff: 0.041 limitations: - Decreased performance on thin-file applicants - Quarterly retraining required ``` ### Provenance (C2PA, watermark) ```python from c2pa import Signer def attach_provenance(image_path, signer_cert): Signer(signer_cert).sign(image_path, claims={ 'generator': 'Anthropic Claude Opus 4.7', 'timestamp': now(), 'training_data_redacted': True, }) ``` ### Red-teaming ```python def adversarial_eval(model, attack_categories): attacks = [] for cat in attack_categories: # 매 jailbreak, bias, harmful, hallucination prompts = generate_attacks(cat, n=100) for p in prompts: r = model.generate(p) score = judge(r, cat) attacks.append({'cat': cat, 'prompt': p, 'response': r, 'severity': score}) return attacks ``` ### Risk tier classifier (EU AI Act) ```python def eu_risk_tier(use_case): if use_case in {'social_scoring', 'real_time_remote_biometric'}: return 'unacceptable' if use_case in {'hiring', 'credit', 'education', 'critical_infra', 'law_enforcement'}: return 'high' if use_case in {'chatbot', 'deepfake', 'emotion_recognition_workplace'}: return 'limited' return 'minimal' ``` ### Consent (GDPR) ```python def can_process(user, purpose): if user.consent[purpose].is_valid(): return True if has_legitimate_interest(purpose): return True return False def revoke_consent(user, purpose): user.consent[purpose].revoke() delete_data(user, purpose) ``` ### Disclosure (chatbot) ```typescript function chatGreeting() { return "Hi! I'm an AI assistant. I can make mistakes — please verify important info."; } ``` ### Incident reporting ```python @dataclass class AIIncident: timestamp: datetime model: str severity: Literal['low', 'medium', 'high', 'critical'] category: str # 매 hallucination, bias, jailbreak, harm description: str affected_users: int root_cause: str mitigation: str def report(self): if self.severity in ('high', 'critical'): notify_safety_team(self) log_to_registry(self) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | High-risk EU | Full conformity assessment | | Hiring / credit | Fairness audit + monitoring | | Generative | Watermark + content provenance | | LLM | Constitutional + RLHF + red-team | | Privacy-sensitive | DP / federated | | Chatbot | Disclosure + safety filter | **기본값**: 매 model card + 매 fairness audit + 매 red-team + 매 incident reporting + 매 EU AI Act risk-tier compliance. ## 🔗 Graph - 부모: [[AI]] - 변형: [[AI-Safety]] · [[AI_Safety_and_Alignment|AI-Alignment]] · [[Algorithmic-Fairness]] · [[Ethics & AI|Ethics of Autonomous Systems]] - 응용: [[EU-AI-Act]] · [[NIST-AI-RMF]] · [[AI_Safety_and_Alignment|Constitutional-AI]] - Adjacent: [[Differential-Privacy]] · [[RLHF]] ## 🤖 LLM 활용 **언제**: 매 모든 AI deployment. 매 product launch. 매 governance. **언제 X**: 매 academic toy. ## ❌ 안티패턴 - **Ethics-as-PR**: 매 statement only. - **Single fairness metric**: 매 trade-off 의 ignore. - **No red-team**: 매 jailbreak 의 surprise. - **No incident process**: 매 learning X. - **Ignore EU AI Act high-risk**: 매 fines + bans. ## 🧪 검증 / 중복 - Verified (EU AI Act 2024, NIST AI RMF 1.0, Anthropic RSP, Constitutional AI paper). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-04-20 | Auto-reinforced | | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — pillars + 매 fairness / DP / model card / red-team / risk-tier code |