--- id: RL-INV-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, reinforcement-learning, inverse-rl, imitation-learning, apprenticeship-learning] last_reinforced: 2026-04-26 --- # Inverse Reinforcement Learning (μ—­κ°•ν™”ν•™μŠ΅) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λͺ¨λΈμ—κ²Œ 무엇이 쒋은지 μ•Œλ €μ£Όμ§€ 말고, μ „λ¬Έκ°€μ˜ 행동을 κ΄€μ°°ν•˜μ—¬ 슀슀둜 '보상(Reward)'의 의미λ₯Ό μΆ”λ‘ ν•˜κ²Œ ν•˜λΌ" β€” λͺ…μ‹œμ μΈ 보상 ν•¨μˆ˜λ₯Ό μ •μ˜ν•˜κΈ° μ–΄λ €μš΄ λ³΅μž‘ν•œ νƒœμŠ€ν¬μ—μ„œ, μ „λ¬Έκ°€μ˜ μ‹œμ—°(Demonstration)을 보고 μ—μ΄μ „νŠΈκ°€ κ·Έ 내면에 κΉ”λ¦° 보상 체계λ₯Ό μ—­μœΌλ‘œ ν•™μŠ΅ν•˜λŠ” 기법. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Learning from Observation" β€” κ²°κ³Όκ°’(Reward)이 μ£Όμ–΄μ§€λŠ” 일반 κ°•ν™”ν•™μŠ΅κ³Ό 달리, μ „λ¬Έκ°€μ˜ ꢀ적(Trajectories)을 λ°μ΄ν„°λ‘œ μ‚Όμ•„ μ—μ΄μ „νŠΈκ°€ μ§€ν–₯ν•΄μ•Ό ν•  λͺ©ν‘œ ν•¨μˆ˜ 자체λ₯Ό λ„μΆœν•˜λŠ” κ΄€μ°° 기반 ν•™μŠ΅ νŒ¨ν„΄. - **μ£Όμš” μ•Œκ³ λ¦¬μ¦˜:** - **Maximum Entropy IRL:** μ „λ¬Έκ°€μ˜ 행동을 κ°€μž₯ 잘 μ„€λͺ…ν•˜λ©΄μ„œλ„ κ°€μž₯ λΆˆν™•μ‹€μ„±μ΄ 높은(편ν–₯λ˜μ§€ μ•Šμ€) 보상 ν•¨μˆ˜ 탐색. - **Apprenticeship Learning:** μΆ”μΆœλœ 보상 ν•¨μˆ˜λ₯Ό λ°”νƒ•μœΌλ‘œ μ „λ¬Έκ°€μ˜ μ„±λŠ₯을 μž¬ν˜„ν•˜κ±°λ‚˜ λŠ₯κ°€ν•˜λ„λ‘ ν•™μŠ΅. - **의의:** 인간이 말둜 μ„€λͺ…ν•˜κΈ° νž˜λ“  λ³΅μž‘ν•œ κ°€μΉ˜ νŒλ‹¨μ΄λ‚˜ 'μš΄μ „ μŠ€νƒ€μΌ', 'μˆ™λ ¨λœ μž‘μ—… 방식' 등을 AIμ—κ²Œ 효과적으둜 이식할 수 있음. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœ λͺ¨λ°© ν•™μŠ΅(Behavioral Cloning)은 κ΄€μΈ‘λ˜μ§€ μ•Šμ€ μƒν™©μ—μ„œ κΈ‰κ²©νžˆ μ„±λŠ₯이 μ €ν•˜λ˜μ§€λ§Œ, IRL은 ν–‰λ™μ˜ 'κ·Όλ³Έ λͺ©μ '을 λ°°μš°κΈ°μ— 훨씬 더 높은 μΌλ°˜ν™” λŠ₯λ ₯을 λ³΄μ—¬μ€Œ. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” μ—μ΄μ „νŠΈκ°€ μ‚¬μš©μžμ˜ μž‘μ—… νŒ¨ν„΄μ„ ν•™μŠ΅ν•  λ•Œ, λ‹¨μˆœν•œ λͺ…λ Ή λ³΅μ œκ°€ μ•„λ‹Œ IRL을 μ μš©ν•˜μ—¬ μ‚¬μš©μžκ°€ μ§„μ •μœΌλ‘œ μ˜λ„ν•œ 'μž‘μ—…μ˜ ν’ˆμ§ˆ κΈ°μ€€'을 슀슀둜 νŒŒμ•…ν•˜λ„λ‘ 섀계함. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Reinforcement-Learning|Reinforcement-Learning]], [[Imitation-Learning|Imitation-Learning]], Reward-Shaping, [[Generalization-in-AI|Generalization-in-AI]] - **Raw Source:** 10_Wiki/Topics/AI/Inverse-Reinforcement-Learning.md