--- id: COMP-NEURO-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [neuroscience, reinforcement-learning, dopamine, brain-modeling] last_reinforced: 2026-04-26 --- # Computational Neuroscience of Reinforcement Learning (κ°•ν™”ν•™μŠ΅μ˜ 계산 μ‹ κ²½κ³Όν•™) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μΈκ°„μ˜ ν•™μŠ΅ λ©”μ»€λ‹ˆμ¦˜μ„ μˆ˜ν•™μ  κ°•ν™”ν•™μŠ΅ μ–Έμ–΄λ‘œ ν•΄λ…ν•˜λΌ" β€” λ‡Œμ˜ 보상 μ‹œμŠ€ν…œκ³Ό λ„νŒŒλ―Ό λΆ„λΉ„ 기제λ₯Ό μ‹œκ°„μ°¨ ν•™μŠ΅(TD Learning) 및 κ°€μΉ˜ 기반 선택 λͺ¨λΈλ‘œ μ„€λͺ…ν•˜λ €λŠ” λ‡Œκ³Όν•™κ³Ό AI의 μœ΅ν•© ν•™λ¬Έ. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** μ‹€μ œ 생물학적 λ‰΄λŸ°μ˜ ν™œλ™κ³Ό κ°•ν™”ν•™μŠ΅ μ•Œκ³ λ¦¬μ¦˜(예: Q-Learning) κ°„μ˜ 상관관계λ₯Ό λͺ¨λΈλ§ν•˜μ—¬ ν•™μŠ΅μ˜ 생물학적 ν•˜λ“œμ›¨μ–΄ 원리λ₯Ό νŒŒμ•…ν•˜λŠ” νŒ¨ν„΄. - **μ„ΈλΆ€ λ‚΄μš©:** - **Reward Prediction Error (RPE):** λ„νŒŒλ―Ό λ‰΄λŸ°μ΄ 보상 μžμ²΄κ°€ μ•„λ‹Œ, 'κΈ°λŒ€μ™€ μ‹€μ œ λ³΄μƒμ˜ 차이'에 λ°˜μ‘ν•œλ‹€λŠ” 사싀을 TD μ—λŸ¬ λͺ¨λΈλ‘œ 증λͺ…. - **Basal Ganglia Modeling:** λ‡Œμ˜ 기저핡이 κ°€μΉ˜ ν•¨μˆ˜λ₯Ό μ €μž₯ν•˜κ³  행동 선택을 μˆ˜ν–‰ν•˜λŠ” μ•‘ν„°-크리틱(Actor-Critic) ꡬ쑰와 μœ μ‚¬ν•¨μ„ 뢄석. - **Exploration vs Exploitation:** 전전두엽과 μ€„λ¬΄λŠ¬μ²΄ κ°„μ˜ μƒν˜Έμž‘μš©μ„ 톡해 λ―Έμ§€μ˜ 보상을 탐색할지, κΈ°μ‘΄ 보상을 μ·¨ν• μ§€ κ²°μ •ν•˜λŠ” λ©”μ»€λ‹ˆμ¦˜ μˆ˜μΉ˜ν™”. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœ 쑰건 λ°˜μ‚¬(Pavlovian) λͺ¨λΈμ—μ„œ ν˜„λŒ€μ˜ μ •κ΅ν•œ 예츑 λΆ€ν˜Έν™”(Predictive Coding) 및 계측적 RL λͺ¨λΈλ‘œ ν™•μž₯. - **μ •μ±… λ³€ν™”:** Antigravity μ—μ΄μ „νŠΈμ˜ 보상 ν•¨μˆ˜ 섀계 μ‹œ, μΈκ°„μ˜ 'λ§Œμ‘±λ„ μ§€μ—°' 기제λ₯Ό μ°Έκ³ ν•˜μ—¬ μž₯기적 λͺ©ν‘œ 달성 ν™•λ₯ μ„ λ†’μ΄λŠ” 둜직 적용. ## πŸ”— 지식 μ—°κ²° (Graph) - **Parent:** 10_Wiki/πŸ’‘ Topics/AI - **Related:** Dopamine-RPE, TD-Learning, Basal-Ganglia, Decision-Making - **Raw Source:** 10_Wiki/Topics/AI/Computational Neuroscience of Reinforcement Learning.md