d8a80f6272
이름만 다른(표기 변형) [[위키링크]]를 대상 문서의 canonical 제목으로 치환해 끊겼던 1,200개 링크를 연결. 제목/파일명 정규화 일치만 적용하고 별칭 매칭은 과병합 위험으로 제외(애매성 가드). 원본은 _link_reconcile_backup/ 에 백업. 도구: Datacollect/scripts/link_reconcile_apply.mjs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8.0 KiB
8.0 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-ethics-ai | Ethics & AI | 10_Wiki/Topics | verified | self |
|
none | A | 0.97 | applied |
|
2026-05-10 | pending |
|
Ethics & AI
매 한 줄
"매 AI 의 design / deploy / govern 의 normative consideration". 매 fairness, accountability, transparency, safety. 매 modern: EU AI Act, NIST AI RMF, Anthropic Constitutional AI. 매 alignment + capability + governance 의 triad.
매 핵심
매 pillar
- Fairness: 매 bias 의 mitigate.
- Accountability: 매 who 의 responsible.
- Transparency / Explainability.
- Privacy.
- Safety: 매 harm prevention.
- Robustness.
- Human autonomy.
매 alignment
- RLHF: 매 human preference.
- Constitutional AI (Anthropic): 매 principle-based.
- DPO / KTO: 매 RLHF alternative.
- Scalable oversight: 매 debate, IDA.
- Honest / harmless / helpful (HHH).
매 framework
- EU AI Act (2024): 매 risk-tier.
- NIST AI RMF: 매 govern, map, measure, manage.
- OECD AI Principles.
- ISO/IEC 42001: 매 AIMS.
- GDPR (privacy).
- Algorithmic Accountability Act (US, proposed).
매 risk-tier (EU AI Act)
- Unacceptable: 매 social scoring, mass biometric.
- High-risk: 매 hiring, credit, education, AV.
- Limited risk: 매 chatbot disclose.
- Minimal: 매 spam filter.
매 응용 issue
- Hiring: 매 disparate impact.
- Credit: 매 redlining.
- Healthcare: 매 race-based prediction.
- Justice: 매 COMPAS bias.
- Generative: 매 deepfake, copyright.
- Surveillance: 매 mass.
- Autonomous: 매 trolley.
매 modern (2024-2026)
- Anthropic RSP (Responsible Scaling Policy).
- OpenAI Preparedness.
- Frontier model evaluations (METR, Apollo).
- AI safety institute (UK, US).
- AI Bill of Rights (US OSTP).
💻 패턴
Fairness audit (demographic parity)
import numpy as np
def demographic_parity_diff(predictions, protected_attr):
groups = np.unique(protected_attr)
rates = [predictions[protected_attr == g].mean() for g in groups]
return max(rates) - min(rates)
# 매 < 0.05 = 80% rule heuristic compliant
Equalized odds
def equalized_odds(predictions, labels, protected):
"""매 TPR + FPR 의 group 에 의 equal."""
groups = np.unique(protected)
metrics = {}
for g in groups:
mask = protected == g
tpr = ((predictions == 1) & (labels == 1) & mask).sum() / max(1, ((labels == 1) & mask).sum())
fpr = ((predictions == 1) & (labels == 0) & mask).sum() / max(1, ((labels == 0) & mask).sum())
metrics[g] = (tpr, fpr)
return metrics
Bias mitigation (reweighing)
def reweighing(X, y, protected):
"""매 Kamiran-Calders 2012."""
weights = np.ones(len(y))
for g in np.unique(protected):
for c in [0, 1]:
mask = (protected == g) & (y == c)
p_expected = (protected == g).mean() * (y == c).mean()
p_observed = mask.mean()
weights[mask] = p_expected / max(p_observed, 1e-9)
return weights
Constitutional AI (principle-based)
def cai_critique(response, principles):
prompt = f"""Critique this response against these principles.
Principles:
{format_principles(principles)}
Response: {response}
Output JSON with:
- violated: list of principle IDs
- explanation
- revised_response"""
return json.loads(llm.generate(prompt))
Differential privacy (DP-SGD)
import opacus
from opacus import PrivacyEngine
privacy_engine = PrivacyEngine()
model, optim, loader = privacy_engine.make_private(
module=model,
optimizer=optim,
data_loader=loader,
noise_multiplier=1.1,
max_grad_norm=1.0,
)
Model card
model_name: credit-scoring-v3
intended_use: Adult US credit applications, $1k-$50k unsecured
out_of_scope:
- Outside US
- Under 18
- Loans > $50k
training_data:
source: 2018-2024 internal
size: 2.4M
protected_attribute_audit: completed
fairness_metrics:
demographic_parity_diff: 0.034
equalized_odds_diff: 0.041
limitations:
- Decreased performance on thin-file applicants
- Quarterly retraining required
Provenance (C2PA, watermark)
from c2pa import Signer
def attach_provenance(image_path, signer_cert):
Signer(signer_cert).sign(image_path, claims={
'generator': 'Anthropic Claude Opus 4.7',
'timestamp': now(),
'training_data_redacted': True,
})
Red-teaming
def adversarial_eval(model, attack_categories):
attacks = []
for cat in attack_categories: # 매 jailbreak, bias, harmful, hallucination
prompts = generate_attacks(cat, n=100)
for p in prompts:
r = model.generate(p)
score = judge(r, cat)
attacks.append({'cat': cat, 'prompt': p, 'response': r, 'severity': score})
return attacks
Risk tier classifier (EU AI Act)
def eu_risk_tier(use_case):
if use_case in {'social_scoring', 'real_time_remote_biometric'}:
return 'unacceptable'
if use_case in {'hiring', 'credit', 'education', 'critical_infra', 'law_enforcement'}:
return 'high'
if use_case in {'chatbot', 'deepfake', 'emotion_recognition_workplace'}:
return 'limited'
return 'minimal'
Consent (GDPR)
def can_process(user, purpose):
if user.consent[purpose].is_valid():
return True
if has_legitimate_interest(purpose):
return True
return False
def revoke_consent(user, purpose):
user.consent[purpose].revoke()
delete_data(user, purpose)
Disclosure (chatbot)
function chatGreeting() {
return "Hi! I'm an AI assistant. I can make mistakes — please verify important info.";
}
Incident reporting
@dataclass
class AIIncident:
timestamp: datetime
model: str
severity: Literal['low', 'medium', 'high', 'critical']
category: str # 매 hallucination, bias, jailbreak, harm
description: str
affected_users: int
root_cause: str
mitigation: str
def report(self):
if self.severity in ('high', 'critical'):
notify_safety_team(self)
log_to_registry(self)
매 결정 기준
| 상황 | Approach |
|---|---|
| High-risk EU | Full conformity assessment |
| Hiring / credit | Fairness audit + monitoring |
| Generative | Watermark + content provenance |
| LLM | Constitutional + RLHF + red-team |
| Privacy-sensitive | DP / federated |
| Chatbot | Disclosure + safety filter |
기본값: 매 model card + 매 fairness audit + 매 red-team + 매 incident reporting + 매 EU AI Act risk-tier compliance.
🔗 Graph
- 부모: AI
- 변형: AI Safety · AI_Safety_and_Alignment · Algorithmic Fairness · Ethics & AI
- 응용: EU-AI-Act · NIST-AI-RMF · AI_Safety_and_Alignment
- Adjacent: Differential-Privacy · RLHF
🤖 LLM 활용
언제: 매 모든 AI deployment. 매 product launch. 매 governance. 언제 X: 매 academic toy.
❌ 안티패턴
- Ethics-as-PR: 매 statement only.
- Single fairness metric: 매 trade-off 의 ignore.
- No red-team: 매 jailbreak 의 surprise.
- No incident process: 매 learning X.
- Ignore EU AI Act high-risk: 매 fines + bans.
🧪 검증 / 중복
- Verified (EU AI Act 2024, NIST AI RMF 1.0, Anthropic RSP, Constitutional AI paper).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-04-20 | Auto-reinforced |
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — pillars + 매 fairness / DP / model card / red-team / risk-tier code |