Files
2nd/10_Wiki/Topics/AI_and_ML/Ethics & AI.md
T
koriweb d8a80f6272 chore(wiki): dangling 링크 canonical 정규화 (768파일/1200건)
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해
끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은
과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업.
도구: Datacollect/scripts/link_reconcile_apply.mjs

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:24:15 +09:00

8.0 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-ethics-ai Ethics & AI 10_Wiki/Topics verified self
AI ethics
responsible AI
AI safety
alignment
fairness
bias
EU AI Act
none A 0.97 applied
ethics
ai-ethics
alignment
safety
bias
fairness
responsibility
eu-ai-act
2026-05-10 pending
language applicable_to
Universal
AI Development
Policy
Governance

Ethics & AI

매 한 줄

"매 AI 의 design / deploy / govern 의 normative consideration". 매 fairness, accountability, transparency, safety. 매 modern: EU AI Act, NIST AI RMF, Anthropic Constitutional AI. 매 alignment + capability + governance 의 triad.

매 핵심

매 pillar

  • Fairness: 매 bias 의 mitigate.
  • Accountability: 매 who 의 responsible.
  • Transparency / Explainability.
  • Privacy.
  • Safety: 매 harm prevention.
  • Robustness.
  • Human autonomy.

매 alignment

  • RLHF: 매 human preference.
  • Constitutional AI (Anthropic): 매 principle-based.
  • DPO / KTO: 매 RLHF alternative.
  • Scalable oversight: 매 debate, IDA.
  • Honest / harmless / helpful (HHH).

매 framework

  • EU AI Act (2024): 매 risk-tier.
  • NIST AI RMF: 매 govern, map, measure, manage.
  • OECD AI Principles.
  • ISO/IEC 42001: 매 AIMS.
  • GDPR (privacy).
  • Algorithmic Accountability Act (US, proposed).

매 risk-tier (EU AI Act)

  • Unacceptable: 매 social scoring, mass biometric.
  • High-risk: 매 hiring, credit, education, AV.
  • Limited risk: 매 chatbot disclose.
  • Minimal: 매 spam filter.

매 응용 issue

  1. Hiring: 매 disparate impact.
  2. Credit: 매 redlining.
  3. Healthcare: 매 race-based prediction.
  4. Justice: 매 COMPAS bias.
  5. Generative: 매 deepfake, copyright.
  6. Surveillance: 매 mass.
  7. Autonomous: 매 trolley.

매 modern (2024-2026)

  • Anthropic RSP (Responsible Scaling Policy).
  • OpenAI Preparedness.
  • Frontier model evaluations (METR, Apollo).
  • AI safety institute (UK, US).
  • AI Bill of Rights (US OSTP).

💻 패턴

Fairness audit (demographic parity)

import numpy as np

def demographic_parity_diff(predictions, protected_attr):
    groups = np.unique(protected_attr)
    rates = [predictions[protected_attr == g].mean() for g in groups]
    return max(rates) - min(rates)

# 매 < 0.05 = 80% rule heuristic compliant

Equalized odds

def equalized_odds(predictions, labels, protected):
    """매 TPR + FPR 의 group 에 의 equal."""
    groups = np.unique(protected)
    metrics = {}
    for g in groups:
        mask = protected == g
        tpr = ((predictions == 1) & (labels == 1) & mask).sum() / max(1, ((labels == 1) & mask).sum())
        fpr = ((predictions == 1) & (labels == 0) & mask).sum() / max(1, ((labels == 0) & mask).sum())
        metrics[g] = (tpr, fpr)
    return metrics

Bias mitigation (reweighing)

def reweighing(X, y, protected):
    """매 Kamiran-Calders 2012."""
    weights = np.ones(len(y))
    for g in np.unique(protected):
        for c in [0, 1]:
            mask = (protected == g) & (y == c)
            p_expected = (protected == g).mean() * (y == c).mean()
            p_observed = mask.mean()
            weights[mask] = p_expected / max(p_observed, 1e-9)
    return weights

Constitutional AI (principle-based)

def cai_critique(response, principles):
    prompt = f"""Critique this response against these principles.
Principles:
{format_principles(principles)}

Response: {response}

Output JSON with:
- violated: list of principle IDs
- explanation
- revised_response"""
    return json.loads(llm.generate(prompt))

Differential privacy (DP-SGD)

import opacus
from opacus import PrivacyEngine

privacy_engine = PrivacyEngine()
model, optim, loader = privacy_engine.make_private(
    module=model,
    optimizer=optim,
    data_loader=loader,
    noise_multiplier=1.1,
    max_grad_norm=1.0,
)

Model card

model_name: credit-scoring-v3
intended_use: Adult US credit applications, $1k-$50k unsecured
out_of_scope:
  - Outside US
  - Under 18
  - Loans > $50k
training_data:
  source: 2018-2024 internal
  size: 2.4M
  protected_attribute_audit: completed
  fairness_metrics:
    demographic_parity_diff: 0.034
    equalized_odds_diff: 0.041
limitations:
  - Decreased performance on thin-file applicants
  - Quarterly retraining required

Provenance (C2PA, watermark)

from c2pa import Signer
def attach_provenance(image_path, signer_cert):
    Signer(signer_cert).sign(image_path, claims={
        'generator': 'Anthropic Claude Opus 4.7',
        'timestamp': now(),
        'training_data_redacted': True,
    })

Red-teaming

def adversarial_eval(model, attack_categories):
    attacks = []
    for cat in attack_categories:  # 매 jailbreak, bias, harmful, hallucination
        prompts = generate_attacks(cat, n=100)
        for p in prompts:
            r = model.generate(p)
            score = judge(r, cat)
            attacks.append({'cat': cat, 'prompt': p, 'response': r, 'severity': score})
    return attacks

Risk tier classifier (EU AI Act)

def eu_risk_tier(use_case):
    if use_case in {'social_scoring', 'real_time_remote_biometric'}:
        return 'unacceptable'
    if use_case in {'hiring', 'credit', 'education', 'critical_infra', 'law_enforcement'}:
        return 'high'
    if use_case in {'chatbot', 'deepfake', 'emotion_recognition_workplace'}:
        return 'limited'
    return 'minimal'
def can_process(user, purpose):
    if user.consent[purpose].is_valid():
        return True
    if has_legitimate_interest(purpose):
        return True
    return False

def revoke_consent(user, purpose):
    user.consent[purpose].revoke()
    delete_data(user, purpose)

Disclosure (chatbot)

function chatGreeting() {
  return "Hi! I'm an AI assistant. I can make mistakes — please verify important info.";
}

Incident reporting

@dataclass
class AIIncident:
    timestamp: datetime
    model: str
    severity: Literal['low', 'medium', 'high', 'critical']
    category: str  # 매 hallucination, bias, jailbreak, harm
    description: str
    affected_users: int
    root_cause: str
    mitigation: str
    
    def report(self):
        if self.severity in ('high', 'critical'):
            notify_safety_team(self)
        log_to_registry(self)

매 결정 기준

상황 Approach
High-risk EU Full conformity assessment
Hiring / credit Fairness audit + monitoring
Generative Watermark + content provenance
LLM Constitutional + RLHF + red-team
Privacy-sensitive DP / federated
Chatbot Disclosure + safety filter

기본값: 매 model card + 매 fairness audit + 매 red-team + 매 incident reporting + 매 EU AI Act risk-tier compliance.

🔗 Graph

🤖 LLM 활용

언제: 매 모든 AI deployment. 매 product launch. 매 governance. 언제 X: 매 academic toy.

안티패턴

  • Ethics-as-PR: 매 statement only.
  • Single fairness metric: 매 trade-off 의 ignore.
  • No red-team: 매 jailbreak 의 surprise.
  • No incident process: 매 learning X.
  • Ignore EU AI Act high-risk: 매 fines + bans.

🧪 검증 / 중복

  • Verified (EU AI Act 2024, NIST AI RMF 1.0, Anthropic RSP, Constitutional AI paper).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-04-20 Auto-reinforced
2026-05-08 Phase 1
2026-05-10 Manual cleanup — pillars + 매 fairness / DP / model card / red-team / risk-tier code