"매 AI system 이 매 reliable · safe · fair · transparent · accountable · privacy-preserving 의 6 axes 동시 만족." NIST AI RMF (2023) 와 EU AI Act (2024 enacted, 2026 fully applicable) 의 통합 frame. 매 single property 가 아닌 매 multi-dim balance — 매 trade-off 의 명시 가 매 핵심.
매 핵심
매 6 Pillars (NIST AI RMF)
Valid & Reliable: 매 intended task 에서 매 정확 + 매 deployment 환경 에서 매 stable.
Safe: 매 physical · psychological · environmental harm 의 회피.
Secure & Resilient: 매 adversarial attack · data poisoning · prompt injection 의 방어.
Accountable & Transparent: 매 누가 책임 + 매 어떻게 결정 의 명시.
Explainable & Interpretable: 매 stakeholder level 에 맞는 매 reasoning 공개.
Privacy-Enhanced: 매 data minimization · DP · federated learning.
Fair, Bias 의 management: 매 disparate impact 의 측정 + mitigation.
매 EU AI Act risk tiers (2026 fully applicable)
Unacceptable: social scoring, real-time biometric ID (대체로 ban).
High-risk: medical, hiring, credit, education — 매 conformity assessment + 매 CE marking 필수.
Limited risk: chatbots, deepfakes — 매 transparency obligation (AI 라고 명시).
Minimal risk: spam filter, video game AI — 매 voluntary code.
매 governance lifecycle
Map: context, stakeholder, risk identification.
Measure: 매 quantitative + 매 qualitative metric.
Manage: 매 mitigation, monitoring, incident response.
Govern: 매 policy, role, accountability.
매 응용
High-risk deployment: 매 healthcare diagnosis AI 매 FDA + EU AI Act dual conformity.
LLM production: 매 prompt injection defense + 매 PII redaction + 매 output filter.
Hiring algorithm: 매 NYC Local Law 144 (bias audit) + 매 EEOC compliance.
💻 패턴
매 Bias measurement (group fairness)
fromfairlearn.metricsimport(MetricFrame,demographic_parity_difference,equalized_odds_difference)fromsklearn.metricsimportaccuracy_scoremf=MetricFrame(metrics={"accuracy":accuracy_score},y_true=y_test,y_pred=y_pred,sensitive_features=df_test["gender"],)print(mf.by_group)dpd=demographic_parity_difference(y_test,y_pred,sensitive_features=df_test["gender"])eod=equalized_odds_difference(y_test,y_pred,sensitive_features=df_test["gender"])print(f"DP diff: {dpd:.3f}, EO diff: {eod:.3f}")# 매 |DP| > 0.1 → 매 disparate impact 의심
fromopacusimportPrivacyEngineimporttorchmodel=MyModel()optimizer=torch.optim.SGD(model.parameters(),lr=0.05)engine=PrivacyEngine()model,optimizer,loader=engine.make_private_with_epsilon(module=model,optimizer=optimizer,data_loader=loader,target_epsilon=3.0,target_delta=1e-5,epochs=10,max_grad_norm=1.0,)# 매 ε=3 의 strong privacy guarantee
매 Model card (Hugging Face)
# README.md frontmatterlanguage:enlicense:apache-2.0intended_use:primary:"English sentiment classification (product reviews)"out_of_scope:["clinical text","non-English","financial advice"]training_data:source:"Amazon reviews 2018-2024 (50M samples)"known_biases:["English-skewed","tech product overrepresented"]metrics:accuracy:0.92demographic_parity_diff:0.04limitations:- "Sarcasm detection 약함 (F1 0.61)"- "Long reviews (>1000 tokens) 의 truncation"ethical_considerations:- "매 hiring · loan 결정 의 사용 X"
매 Explainability (SHAP for tabular)
importshapexplainer=shap.TreeExplainer(model)shap_values=explainer.shap_values(X_test)# 매 individual explanationshap.force_plot(explainer.expected_value,shap_values[0],X_test.iloc[0])# 매 global feature importanceshap.summary_plot(shap_values,X_test)