--- id: wiki-2026-0508-computational-neuroscience-rl title: Computational Neuroscience RL category: 10_Wiki/Topics status: needs_review canonical_id: self aliases: [P-Reinforce-AUTO-CNRL-001] duplicate_of: none source_trust_level: A confidence_score: 0.94 tags: [auto-reinforced, computational-neuroscience, Reinforcement-Learning, Dopamine, brain-model, reward-prediction-error, neuroscience] raw_sources: [] last_reinforced: 2026-04-20 github_commit: pending inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08) tech_stack: language: unspecified framework: unspecified --- # [[Computational-Neuroscience-RL|Computational-Neuroscience-RL]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "์ง€๋Šฅ์˜ ์ƒ๋ฌผํ•™์  ๋ฟŒ๋ฆฌ: ๋‡Œ์˜ ๋„ํŒŒ๋ฏผ ์‹œ์Šคํ…œ์ด ์–ด๋–ป๊ฒŒ '๋ณด์ƒ ์˜ˆ์ธก ์˜ค์ฐจ'๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์œ ๊ธฐ์ฒด์˜ ํ–‰๋™์„ ์ตœ์ ํ™”ํ•˜๋Š”์ง€ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ, ์ธ๊ฐ„์˜ ํ•™์Šต ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ AI ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์ด์˜ ์—ฐ๊ฒฐ ๊ณ ๋ฆฌ๋ฅผ ๋ฐํžˆ๋Š” ํ•™๋ฌธ." ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ณ„์‚ฐ ์‹ ๊ฒฝ๊ณผํ•™(Computational-Neuroscience-RL)์€ ๋‡Œ์˜ ๋ณด์ƒ ์‹œ์Šคํ…œ๊ณผ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์ด์˜ ์ƒํ˜ธ ์ž‘์šฉ์„ ์—ฐ๊ตฌํ•ฉ๋‹ˆ๋‹ค. 1. **๋„ํŒŒ๋ฏผ๊ณผ ๋ณด์ƒ ์˜ˆ์ธก ์˜ค์ฐจ(RPE)**: * **Schultz์˜ ๋ฐœ๊ฒฌ**: ๋„ํŒŒ๋ฏผ ๋‰ด๋Ÿฐ์€ ๋ณด์ƒ ๊ทธ ์ž์ฒด๋ณด๋‹ค '๊ธฐ๋Œ€ํ–ˆ๋˜ ๋ณด์ƒ๊ณผ ์‹ค์ œ ๋ณด์ƒ์˜ ์ฐจ์ด'์— ๋ฐ˜์‘ํ•จ. * **TD-Learning ์—ฐ๋™**: ์ด๋Š” ์ธ๊ณต์ง€๋Šฅ์˜ ์‹œ๊ฐ„์ฐจ ํ•™์Šต(Temporal Difference Learning) ๋ฐฉ์‹๊ณผ ์ˆ˜ํ•™์ ์œผ๋กœ ์ •ํ™•ํžˆ ์ผ์น˜ํ•จ. ([[Reinforcement Learning (RL)|Reinforcement Learning (RL)]]์™€ ์—ฐ๊ฒฐ) 2. **์™œ ์ค‘์š”ํ•œ๊ฐ€?**: * AI ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋‹จ์ˆœํžˆ ์ˆ˜ํ•™์  ๊ธฐ๊ต๊ฐ€ ์•„๋‹ˆ๋ผ ์ƒ๋ฌผํ•™์  ํƒ€๋‹น์„ฑ(Bio[[Logic|Logic]]al Plausibility)์„ ๊ฐ–์ท„์Œ์„ ์ฆ๋ช…ํ•˜๋ฉฐ, ์—ญ์œผ๋กœ ๋‡Œ ์งˆํ™˜(์ค‘๋…, ํŒŒํ‚จ์Šจ ๋“ฑ)์„ ์ดํ•ดํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์ž„. ([[Research|Re[[Search]]-Framework]]์™€ ์—ฐ๊ฒฐ) ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & Updates) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ**: ๊ณผ๊ฑฐ์—๋Š” ๋‹จ์ˆœ ๋ณด์ƒ ์ •์ฑ…(Scalar reward)๋งŒ ์ค‘์š”ํ•˜๊ฒŒ ์—ฌ๊ฒผ์œผ๋‚˜, ํ˜„๋Œ€ ์ •์ฑ…์€ ๋‡Œ๊ฐ€ ๋ฏธ๋ž˜์˜ ๋‹ค์–‘ํ•œ ๊ฐ€๋Šฅ์„ฑ ์ •์ฑ…์„ ํ•œ๊บผ๋ฒˆ์— ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋Š” '๋ถ„ํฌ์  ๊ฐ•ํ™”ํ•™์Šต(Distributional RL) ์ •์ฑ…'์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ฐœ๊ฒฌํ•จ(RL Update). - **์ •์ฑ… ๋ณ€ํ™”(RL Update)**: ์ด์ œ๋Š” ๋‹จ์ˆœ ๋ณด์ƒ์„ ๋„˜์–ด, ๋ชจ๋ธ ๊ธฐ๋ฐ˜(Model-based) ์‚ฌ๊ณ ์™€ ์ „์ „๋‘์—ฝ(PFC)์˜ ๋ฉ”ํƒ€ ํ•™์Šต ์ •์ฑ…(Meta-learning)์„ ํ†ตํ•ด AI ๊ฐ€ ์–ด๋–ป๊ฒŒ ์ธ๊ฐ„์ฒ˜๋Ÿผ ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๋น ๋ฅด๊ฒŒ ์ผ๋ฐ˜ํ™” ์ •์ฑ…์„ ์ˆ˜ํ–‰ํ•˜๋Š”์ง€ ์—ฐ๊ตฌํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ™” ์ค‘์ž„. (Generalization์™€ ์—ฐ๊ฒฐ) ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Reinforcement Learning (RL)|Reinforcement Learning (RL)]], [[Research-Framework|Research-Framework]], Generalization, [[State-Space|State-Space]], [[Sensitivity-Analysis|Sensitivity-Analysis]] - **Key Concepts**: Basal ganglia, Dopamine, [[Reward Prediction Error|Reward Prediction Error]] (RPE). --- ## ๐Ÿค– LLM ํ™œ์šฉ ํžŒํŠธ (How to Use This Knowledge) **์–ธ์ œ ์ด ์ง€์‹์„ ์“ฐ๋Š”๊ฐ€:** - *(TODO)* **์–ธ์ œ ์“ฐ๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€:** - *(TODO)* ## ๐Ÿงช ๊ฒ€์ฆ ์ƒํƒœ (Validation) - **์ •๋ณด ์ƒํƒœ:** needs_review - **์ถœ์ฒ˜ ์‹ ๋ขฐ๋„:** A - **๊ฒ€ํ†  ์ด์œ :** *(P-Reinforce Phase 1 ์ž๋™ ์ •๊ทœํ™”. ๋ณธ๋ฌธ ๊ฒ€์ฆ ํ•„์š”.)* ## ๐Ÿงฌ ์ค‘๋ณต ๊ฒ€์‚ฌ (Duplicate Check) - **๊ธฐ์กด ์œ ์‚ฌ ๋ฌธ์„œ:** *(TODO: ์ธ๋ฑ์„œ ํด๋Ÿฌ์Šคํ„ฐ ๋ฆฌํฌํŠธ ์ฐธ์กฐ)* - **์ฒ˜๋ฆฌ ๋ฐฉ์‹:** UPDATE (์ž๋™ ์ •๊ทœํ™”) - **์ฒ˜๋ฆฌ ์ด์œ :** Phase 1 ์ •๊ทœํ™” โ€” ์˜› ํ…œํ”Œ๋ฆฟ/๋ˆ„๋ฝ ํ•„๋“œ ๋ณด๊ฐ•. ## ๐Ÿ•“ ๋ณ€๊ฒฝ ์ด๋ ฅ (Changelog) | ๋‚ ์งœ | ๋ณ€๊ฒฝ ๋‚ด์šฉ | ์ฒ˜๋ฆฌ ๋ฐฉ์‹ | ์‹ ๋ขฐ๋„ | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 ์ •๊ทœํ™” (frontmatter + ํ—ค๋” ํ‘œ์ค€ํ™”) | UPDATE | A | ## ๐Ÿ’ป ์ฝ”๋“œ ํŒจํ„ด (Code Patterns) **ํŒจํ„ด 1:** *(TODO: ์ด ํ”„๋กœ์ ํŠธ ์ปจ๋ฒค์…˜ ๋ฐ˜์˜ํ•œ ๊ตฌ์กฐ ์Šค์ผˆ๋ ˆํ†ค)* ```text # TODO ``` ## ๐Ÿค” ์˜์‚ฌ๊ฒฐ์ • ๊ธฐ์ค€ (Decision Criteria) **์„ ํƒ A๋ฅผ ์จ์•ผ ํ•  ๋•Œ:** - *(TODO)* **์„ ํƒ B๋ฅผ ์จ์•ผ ํ•  ๋•Œ:** - *(TODO)* **๊ธฐ๋ณธ๊ฐ’:** > *(TODO)* ## โŒ ์•ˆํ‹ฐํŒจํ„ด (Anti-Patterns) - **[์•ˆํ‹ฐํŒจํ„ด]:** *(TODO: ๋ฌด์—‡์„ ํ•˜๋ฉด ์•ˆ ๋˜๋Š”๊ฐ€ + ์ด์œ  + ๋Œ€์‹  ๋ฌด์—‡์„)*