--- id: [[P-Reinforce|P-Reinforce]]-AI-003 category: Dev confidence_score: 0.98 tags: [ai, rl, neuroscience, brain] last_reinforced: 2026-04-20 github_commit: "batch-reinforce-04" --- # RL_Neuroscience (Computational Reinforcement Learning) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > ๋ณด์ƒ ํ•™์Šต์˜ ์ƒ๋ฌผํ•™์  ๊ธฐ์ œ์™€ ๊ธฐ๊ณ„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜๋ ด์„ ํ†ตํ•ด ์ง€๋Šฅ์˜ ๋ณธ์งˆ์„ ๊ทœ๋ช…ํ•˜๋Š” ๊ณ„์‚ฐ ๋‡Œ๊ณผํ•™์˜ ์ •์ . ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์—์„œ ์–ป์€ ๋ณด์ƒ ์‹ ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์ฑ…(Policy)๊ณผ ๊ฐ€์น˜ ํ•จ์ˆ˜(Value Function)๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ์ˆœํ™˜์  ์ตœ์ ํ™” ํŒจํ„ด. - **์„ธ๋ถ€ ๋‚ด์šฉ:** - TD-Learning(Temporal Difference)๊ณผ ๋„ํŒŒ๋ฏผ ์‹ ํ˜ธ์˜ ์ˆ˜ํ•™์  ์ผ์น˜์„ฑ ์ž…์ฆ. - ๋ชจ๋ธ ๊ธฐ๋ฐ˜(Model-based) vs ๋ชจ๋ธ ์ž์œ (Model-free) ํ•™์Šต์˜ ๋‡Œ๋‚ด ์ฒ˜๋ฆฌ ๊ฒฝ๋กœ ๋ถ„์„. - ํƒ์ƒ‰(Exploration)๊ณผ ์ฐฉ์ทจ(Exploitation)์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ์ „๋‘์—ฝ์˜ ๊ธฐ๋Šฅ ๋ชจ์‚ฌ. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ๋‹จ์ˆœ ์กฐ๊ฑด ๋ฐ˜์‚ฌ ๋ชจ๋ธ์—์„œ ๋ฏธ๋ž˜ ๊ฐ€์น˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” '๊ณ„์‚ฐ์  ์—์ด์ „ํŠธ' ๋ชจ๋ธ๋กœ ํ™•์žฅ. - **์ •์ฑ… ๋ณ€ํ™”:** P-Reinforce ์—”์ง„์˜ ํ•ต์‹ฌ ๋กœ์ง(Self-[[Optimization|Optimization]])์„ ๋’ท๋ฐ›์นจํ•˜๋Š” ์ด๋ก ์  ๊ทผ๊ฑฐ๋กœ ์ตœ์ƒ๋‹จ ๋ฐฐ์น˜. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - **Parent:** 10_Wiki/๐Ÿ’ก Topics/AI - **Related:** [[Dopamine|Dopamine]], [[Operant_Conditioning|Operant_Conditioning]], [[Reinforcement-Learning|Reinforcement-Learning]] - **Raw Source:** 00_Raw/2026-04-20/[[Computational Neuroscience of Reinforcement Learning|Computational Neuroscience of Reinforcement Learning]].md