--- id: wiki-2026-0508-rl-neuroscience title: RL Neuroscience category: 10_Wiki/Topics status: needs_review canonical_id: self aliases: [P-Reinforce-AI-003] duplicate_of: none source_trust_level: A confidence_score: 0.98 tags: [ai, rl, neuroscience, brain] raw_sources: [] last_reinforced: 2026-04-20 github_commit: batch-reinforce-04 inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08) --- # RL_Neuroscience (Computational Reinforcement Learning) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > 보상 ν•™μŠ΅μ˜ 생물학적 κΈ°μ œμ™€ 기계 ν•™μŠ΅ μ•Œκ³ λ¦¬μ¦˜μ˜ μˆ˜λ ΄μ„ 톡해 μ§€λŠ₯의 λ³Έμ§ˆμ„ 규λͺ…ν•˜λŠ” 계산 λ‡Œκ³Όν•™μ˜ 정점. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** ν™˜κ²½κ³Όμ˜ μƒν˜Έμž‘μš©μ—μ„œ 얻은 보상 μ‹ ν˜Έλ₯Ό μ‚¬μš©ν•˜μ—¬ μ •μ±…(Policy)κ³Ό κ°€μΉ˜ ν•¨μˆ˜(Value Function)λ₯Ό μ—…λ°μ΄νŠΈν•˜λŠ” μˆœν™˜μ  μ΅œμ ν™” νŒ¨ν„΄. - **μ„ΈλΆ€ λ‚΄μš©:** - TD-Learning(Temporal Difference)κ³Ό λ„νŒŒλ―Ό μ‹ ν˜Έμ˜ μˆ˜ν•™μ  μΌμΉ˜μ„± μž…μ¦. - λͺ¨λΈ 기반(Model-based) vs λͺ¨λΈ 자유(Model-free) ν•™μŠ΅μ˜ λ‡Œλ‚΄ 처리 경둜 뢄석. - 탐색(Exploration)κ³Ό μ°©μ·¨(Exploitation)의 κ· ν˜•μ„ λ§žμΆ”λŠ” μ „λ‘μ—½μ˜ κΈ°λŠ₯ λͺ¨μ‚¬. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & Updates) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** λ‹¨μˆœ 쑰건 λ°˜μ‚¬ λͺ¨λΈμ—μ„œ 미래 κ°€μΉ˜λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” '계산적 μ—μ΄μ „νŠΈ' λͺ¨λΈλ‘œ ν™•μž₯. - **μ •μ±… λ³€ν™”:** P-Reinforce μ—”μ§„μ˜ 핡심 둜직(Self-[[Optimization|Optimization]])을 λ’·λ°›μΉ¨ν•˜λŠ” 이둠적 근거둜 μ΅œμƒλ‹¨ 배치. ## πŸ”— 지식 μ—°κ²° (Graph) - **Parent:** 10_Wiki/πŸ’‘ Topics/AI - **Related:** [[Dopamine|Dopamine]], [[Operant_Conditioning|Operant_Conditioning]], [[Reinforcement-Learning|Reinforcement-Learning]] - **Raw Source:** 00_Raw/2026-04-20/[[Computational Neuroscience of Reinforcement Learning|Computational Neuroscience of Reinforcement Learning]].md ## πŸ€– LLM ν™œμš© 힌트 (How to Use This Knowledge) **μ–Έμ œ 이 지식을 μ“°λŠ”κ°€:** - *(TODO)* **μ–Έμ œ μ“°λ©΄ μ•ˆ λ˜λŠ”κ°€:** - *(TODO)* ## πŸ§ͺ 검증 μƒνƒœ (Validation) - **정보 μƒνƒœ:** needs_review - **좜처 신뒰도:** A - **κ²€ν†  이유:** *(P-Reinforce Phase 1 μžλ™ μ •κ·œν™”. λ³Έλ¬Έ 검증 ν•„μš”.)* ## 🧬 쀑볡 검사 (Duplicate Check) - **κΈ°μ‘΄ μœ μ‚¬ λ¬Έμ„œ:** *(TODO: μΈλ±μ„œ ν΄λŸ¬μŠ€ν„° 리포트 μ°Έμ‘°)* - **처리 방식:** UPDATE (μžλ™ μ •κ·œν™”) - **처리 이유:** Phase 1 μ •κ·œν™” β€” μ˜› ν…œν”Œλ¦Ώ/λˆ„λ½ ν•„λ“œ 보강. ## πŸ•“ λ³€κ²½ 이λ ₯ (Changelog) | λ‚ μ§œ | λ³€κ²½ λ‚΄μš© | 처리 방식 | 신뒰도 | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 μ •κ·œν™” (frontmatter + 헀더 ν‘œμ€€ν™”) | UPDATE | A |