[G1-Sync] Manual knowledge update

2026-04-30 22:42:02 +09:00
parent 0bd4f19e38
commit c36c0644a1
4888 changed files with 18470 additions and 18602 deletions
@@ -1,8 +1,8 @@
 ---
-id: P-REINFORCE-AI-IMITATION
+id: [[P-Reinforce]]-AI-IMITATION
 category: "10_Wiki/💡 Topics/AI"
 confidence_score: 0.95
-tags: [AI, ReinforcementLearning, ImitationLearning, Robotics]
+tags: [AI, ReinforcementLearning, ImitationLearning, [[Robotics]]]
 last_reinforced: 2026-04-20
 ---

@@ -14,7 +14,7 @@ last_reinforced: 2026-04-20
 ## 📖 구조화된 지식 (Synthesized Content)
 - **Why Imitation?**: 강화학습에서 희소한 보상(Sparse Reward) 문제는 학습을 불가능하게 한다. 전문가의 자취를 따라가는 것은 훨씬 빠른 경로를 제공한다.
 - **Methods**:
-    - **Behavioral Cloning (BC)**: 시연 데이터를 단순한 지도 학습(Supervised Learning)으로 학습. (데이터 밖의 상황에 취약)
+    - **[[Behavior]]al Cloning (BC)**: 시연 데이터를 단순한 지도 학습(Supervised Learning)으로 학습. (데이터 밖의 상황에 취약)
    - **Inverse Reinforcement Learning (IRL)**: 전문가의 행동으로부터 그가 추구하는 '보상 함수'를 역으로 추론함.
    - **GAIL (Generative Adversarial Imitation Learning)**: GAN 구조를 활용해 시연자와 구분이 안 되는 행동을 하도록 학습.
 - **Domain**: 자율주행, 로봇 팔 제어, 개인화된 에이전트.