f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.2 KiB
7.2 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-trustworthy-ai | Trustworthy AI | 10_Wiki/Topics | verified | self |
|
none | A | 0.92 | applied |
|
2026-05-10 | pending |
|
Trustworthy AI
매 한 줄
"매 AI system 이 매 reliable · safe · fair · transparent · accountable · privacy-preserving 의 6 axes 동시 만족." NIST AI RMF (2023) 와 EU AI Act (2024 enacted, 2026 fully applicable) 의 통합 frame. 매 single property 가 아닌 매 multi-dim balance — 매 trade-off 의 명시 가 매 핵심.
매 핵심
매 6 Pillars (NIST AI RMF)
- Valid & Reliable: 매 intended task 에서 매 정확 + 매 deployment 환경 에서 매 stable.
- Safe: 매 physical · psychological · environmental harm 의 회피.
- Secure & Resilient: 매 adversarial attack · data poisoning · prompt injection 의 방어.
- Accountable & Transparent: 매 누가 책임 + 매 어떻게 결정 의 명시.
- Explainable & Interpretable: 매 stakeholder level 에 맞는 매 reasoning 공개.
- Privacy-Enhanced: 매 data minimization · DP · federated learning.
- Fair, Bias 의 management: 매 disparate impact 의 측정 + mitigation.
매 EU AI Act risk tiers (2026 fully applicable)
- Unacceptable: social scoring, real-time biometric ID (대체로 ban).
- High-risk: medical, hiring, credit, education — 매 conformity assessment + 매 CE marking 필수.
- Limited risk: chatbots, deepfakes — 매 transparency obligation (AI 라고 명시).
- Minimal risk: spam filter, video game AI — 매 voluntary code.
매 governance lifecycle
- Map: context, stakeholder, risk identification.
- Measure: 매 quantitative + 매 qualitative metric.
- Manage: 매 mitigation, monitoring, incident response.
- Govern: 매 policy, role, accountability.
매 응용
- High-risk deployment: 매 healthcare diagnosis AI 매 FDA + EU AI Act dual conformity.
- LLM production: 매 prompt injection defense + 매 PII redaction + 매 output filter.
- Hiring algorithm: 매 NYC Local Law 144 (bias audit) + 매 EEOC compliance.
💻 패턴
매 Bias measurement (group fairness)
from fairlearn.metrics import (
MetricFrame, demographic_parity_difference, equalized_odds_difference
)
from sklearn.metrics import accuracy_score
mf = MetricFrame(
metrics={"accuracy": accuracy_score},
y_true=y_test,
y_pred=y_pred,
sensitive_features=df_test["gender"],
)
print(mf.by_group)
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=df_test["gender"])
eod = equalized_odds_difference(y_test, y_pred, sensitive_features=df_test["gender"])
print(f"DP diff: {dpd:.3f}, EO diff: {eod:.3f}")
# 매 |DP| > 0.1 → 매 disparate impact 의심
매 LLM output guardrail (Llama Guard 3)
from transformers import AutoTokenizer, AutoModelForCausalLM
guard = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-Guard-3-8B")
tok = AutoTokenizer.from_pretrained("meta-llama/Llama-Guard-3-8B")
def is_safe(user_msg: str, assistant_msg: str) -> tuple[bool, str]:
prompt = tok.apply_chat_template(
[{"role": "user", "content": user_msg},
{"role": "assistant", "content": assistant_msg}],
tokenize=False,
)
out = guard.generate(tok(prompt, return_tensors="pt").input_ids, max_new_tokens=20)
verdict = tok.decode(out[0], skip_special_tokens=True).strip().split("\n")[-1]
return verdict.startswith("safe"), verdict
매 Differential privacy (Opacus)
from opacus import PrivacyEngine
import torch
model = MyModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
engine = PrivacyEngine()
model, optimizer, loader = engine.make_private_with_epsilon(
module=model, optimizer=optimizer, data_loader=loader,
target_epsilon=3.0, target_delta=1e-5, epochs=10, max_grad_norm=1.0,
)
# 매 ε=3 의 strong privacy guarantee
매 Model card (Hugging Face)
# README.md frontmatter
language: en
license: apache-2.0
intended_use:
primary: "English sentiment classification (product reviews)"
out_of_scope: ["clinical text", "non-English", "financial advice"]
training_data:
source: "Amazon reviews 2018-2024 (50M samples)"
known_biases: ["English-skewed", "tech product overrepresented"]
metrics:
accuracy: 0.92
demographic_parity_diff: 0.04
limitations:
- "Sarcasm detection 약함 (F1 0.61)"
- "Long reviews (>1000 tokens) 의 truncation"
ethical_considerations:
- "매 hiring · loan 결정 의 사용 X"
매 Explainability (SHAP for tabular)
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# 매 individual explanation
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])
# 매 global feature importance
shap.summary_plot(shap_values, X_test)
매 Adversarial robustness test
from textattack.attack_recipes import TextFoolerJin2019
from textattack.models.wrappers import HuggingFaceModelWrapper
wrapped = HuggingFaceModelWrapper(model, tokenizer)
attack = TextFoolerJin2019.build(wrapped)
results = [attack.attack(text, label) for text, label in test_samples]
robust_acc = sum(r.perturbed_result.score == r.original_result.score for r in results) / len(results)
print(f"Robust accuracy: {robust_acc:.2f}")
매 결정 기준
| 상황 | Approach |
|---|---|
| 매 EU 시장 high-risk | 매 AI Act conformity assessment 필수 |
| 매 internal-only LLM | 매 model card + 매 basic guardrail (Llama Guard) |
| 매 medical · hiring · credit | 매 6 pillar full + 매 third-party audit |
| 매 generative content | 매 watermark (C2PA) + 매 deepfake disclosure |
기본값: 매 production AI 는 매 model card + 매 bias measurement + 매 output filter 최소.
🔗 Graph
- 부모: AI Ethics · AI 거버넌스 정책(AI Usage Policy)
- 변형: Responsible AI · Ethical AI
- 응용: NIST AI RMF · EU AI Act · ISO 42001
- Adjacent: Model Card · Differential Privacy · Adversarial Robustness
🤖 LLM 활용
언제: 매 production LLM 의 매 prompt injection · PII leak · biased output 의 multi-layer defense. 언제 X: 매 prototype, 매 internal demo — 매 over-engineering.
❌ 안티패턴
- 매 Checkbox compliance: 매 model card 작성하고 매 끝 — 매 ongoing monitoring 의 X.
- 매 Single-axis focus: 매 fairness 만 chasing → 매 accuracy 의 sacrifice — 매 trade-off 의 unacknowledged.
- 매 Privacy theater: 매 "anonymized" 라 부르고 매 re-identification 의 vulnerable.
- 매 Explainability 의 hallucination: 매 LLM-generated explanation 의 매 actual reasoning 과 매 mismatch.
🧪 검증 / 중복
- Verified (NIST AI RMF 1.0, 2023; EU AI Act, OJ L 2024/1689; ISO/IEC 42001:2023).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — NIST RMF + EU AI Act 2026 enforcement, Llama Guard 3 패턴 추가 |