feat: Wiki 지식 자산 업데이트 - UX Scenarios, Frontend, Game Design, Topics 추가 [2026-05-08]

2026-05-08 19:52:07 +09:00
parent 9dd3d40662
commit 5ba5a55c78
3984 changed files with 334557 additions and 28839 deletions
@@ -1,28 +1,25 @@
 ---
-id: ALIGN-001
-category: Unified
-confidence_score: 1.0
-tags: [ai-safety, [[Alignment|Alignment]], rlhf, ai-ethics, [[Trustworthy-AI|Trustworthy-AI]]]
-last_reinforced: 2026-04-26
+id: wiki-2026-0508-ai-alignment
+title: AI Alignment
+category: 10_Wiki/Topics/AI_and_ML
+status: merged
+redirect_to: AI_Safety_and_Alignment
+canonical_id: AI_Safety_and_Alignment
+aliases: [P-Reinforce-REDIRECT-AI-ALIGNMENT-DASH]
+duplicate_of: none
+source_trust_level: A
+confidence_score: 0.92
+tags: [redirect]
+raw_sources: []
+last_reinforced: 2026-05-08
+github_commit: pending
+inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
 ---

-# AI Alignment (AI 정렬)
+# [[AI-Alignment]]

-## 📌 한 줄 통찰 (The Karpathy Summary)
-> "AI의 목표와 인류의 가치를 한 방향으로 일치시켜라" — 고도로 발달한 AI 시스템이 인간의 의도와 안전, 윤리적 기준을 벗어나지 않고 인간에게 유익한 방향으로 행동하도록 보장하는 기술적 연구 분야.
+> [!IMPORTANT]
+> 이 문서는 고밀도 지식 자산 통합 정책에 따라 **[[AI_Safety_and_Alignment]]**으로 통합되었습니다.

-## 📖 구조화된 지식 (Synthesized Content)
- **추출된 패턴:** 모델이 수행하는 최적화 목표(Objective Function)가 인간이 실제로 바라는 결과와 일치하도록 보상 함수와 학습 데이터를 세밀하게 조정하는 정렬 패턴.
- **핵심 과제:**
-    - **Outer Alignment:** 보상 함수 자체를 인간의 의도에 맞게 정확히 설계하는 문제.
-    - **Inner Alignment:** 모델이 학습 과정에서 개발자도 예상치 못한 잘못된 내부 목표(예: 전원 꺼짐 회피)를 갖지 않도록 제어하는 문제.
-    - **Scalable Oversight:** 인간이 직접 평가하기 어려운 복잡한 태스크를 AI가 수행할 때 어떻게 정렬 상태를 감시할 것인가.
- **주요 기법:** RLHF, RLAIF (AI 피드백을 통한 정렬), 헌법적 AI (Constitutional AI).
-
-## ⚠️ 모순 및 업데이트 (Contradictions & RL Update)
- **과거 데이터와의 충돌:** 단순히 '나쁜 말 안 하기' 수준의 필터링에서, 초지능(Superintelligence) 단계에서의 통제 가능성과 인류 생존 문제로 논의가 심화됨.
- **정책 변화:** Antigravity 프로젝트는 모든 에이전트의 스킬 설계 시 '인간 중심적 가치'를 최우선 순위로 두며, 정기적인 Alignment Audit(정렬 감사)을 통해 에이전트의 거동을 점검함.
-
-## 🔗 지식 연결 (Graph)
- [[Reinforcement-Learning-from-Human-Feedback-RLHF|Reinforcement-Learning-from-Human-Feedback-RLHF]], [[Trustworthy-AI|Trustworthy-AI]], AI-Safety, [[AGI|AGI]]
- **Raw Source:** 10_Wiki/Topics/AI/AI-Alignment.md
+---
+*Redirected to: [[AI_Safety_and_Alignment]]*