--- id: wiki-2026-0508-dopamine-modeling title: Dopamine Modeling category: 10_Wiki/Topics status: needs_review canonical_id: self aliases: [P-Reinforce-AUTO-DOMO-001] duplicate_of: none source_trust_level: A confidence_score: 0.91 tags: [auto-reinforced, Dopamine, neurobiology, reward-prediction-error, motivation, addiction] raw_sources: [] last_reinforced: 2026-04-20 github_commit: pending inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08) tech_stack: language: unspecified framework: unspecified --- # [[Dopamine-Modeling|Dopamine-Modeling]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μ˜μš•κ³Ό ν•™μŠ΅μ˜ λ©”μ‹ μ €: λ‹¨μˆœνžˆ μ¦κ±°μ›€μ˜ μ „λ‹¬μžκ°€ μ•„λ‹ˆλΌ, μ˜ˆμƒμΉ˜ λͺ»ν•œ 보상을 λ°›μ•˜μ„ λ•Œ λΆ„λΉ„λ˜μ–΄ λ‡Œκ°€ 무엇을 더 ν•™μŠ΅ν•΄μ•Ό ν• μ§€ μ•Œλ €μ£ΌλŠ” 생물학적 'μ‹ μš© ν• λ‹Ή(Credit Assignment)' μ‹ ν˜Έ." ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) λ„νŒŒλ―Ό λͺ¨λΈλ§(Dopamine-Modeling)은 λ‡Œμ˜ μ‹ κ²½μ „λ‹¬λ¬Όμ§ˆμΈ λ„νŒŒλ―Όμ˜ μž‘μš©μ„ μˆ˜ν•™μ , μ „μ‚°μ μœΌλ‘œ λΆ„μ„ν•˜λŠ” μ—°κ΅¬μž…λ‹ˆλ‹€. 1. **핡심 이둠 - [[Reward Prediction Error|Reward Prediction Error]] (RPE)**: * 우리의 λ‡ŒλŠ” λŠμž„μ—†μ΄ 미래의 보상을 μ˜ˆμΈ‘ν•¨. * **μ˜ˆμΈ‘λ³΄λ‹€ 더 쒋은 κ²°κ³Ό**: λ„νŒŒλ―Ό λŒ€ν­ λΆ„λΉ„ (ν•™μŠ΅ 가속). * **μ˜ˆμΈ‘ν•œ 만큼 κ²°κ³Ό**: λ„νŒŒλ―Ό μœ μ§€. * **μ˜ˆμΈ‘λ³΄λ‹€ λ‚˜μœ κ²°κ³Ό**: λ„νŒŒλ―Ό κ°μ†Œ (행동 μ–΅μ œ). 2. **μ™œ μ€‘μš”ν•œκ°€?**: * 이 λ©”μ»€λ‹ˆμ¦˜μ€ ν˜„λŒ€ 인곡지λŠ₯의 **κ°•ν™”ν•™μŠ΅(Reinforcement Learning)** μ•Œκ³ λ¦¬μ¦˜μΈ 'Temporal Difference Learning'κ³Ό μˆ˜ν•™μ μœΌλ‘œ μ™„μ „νžˆ μΌμΉ˜ν•¨μ΄ λ°ν˜€μ§. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & Updates) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌**: κ³Όκ±°μ—λŠ” λ„νŒŒλ―Όμ„ λ‹¨μˆœ '쾌락 호λ₯΄λͺ¬ μ •μ±…'으둜 λ³΄μ•˜μœΌλ‚˜, ν˜„λŒ€ 정책은 μ§€λŠ₯ μ‹œμŠ€ν…œμ˜ '였차 μ‹ ν˜Έ μ •μ±…'이자 '정보 μŠ΅λ“ 동기 λΆ€μ—¬ μ •μ±…'으둜 μž¬μ •μ˜ν•¨(RL Update). - **μ •μ±… λ³€ν™”(RL Update)**: λ””μ§€ν„Έ 쀑독(SNS, 숏폼) μ •μ±… 뢄석 μ‹œ, λ„νŒŒλ―Ό λͺ¨λΈλ§μ„ 톡해 μ–΄λ–»κ²Œ μΈκ°„μ˜ 주의λ ₯을 μΈμœ„μ μœΌλ‘œ νƒˆμ·¨ν•˜λŠ”μ§€ λΆ„μ„ν•˜κ³  이λ₯Ό λ°©μ–΄ν•˜λŠ” 'λ””μ§€ν„Έ μ›°λΉ™ μ •μ±…' μˆ˜λ¦½μ— ν™œμš©λ¨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Reward Prediction Error|Reward Prediction Error]], [[Reinforcement Learning (RL)|Reinforcement Learning (RL)]], [[Psychology & Behavior|Psychology & Behavior]], [[Cybernetics|Cybernetics]], Neurobiology - **Modern Tech/Tools**: TD-learning algorithms, Brain-imaging studies (fMRI). --- ## πŸ€– LLM ν™œμš© 힌트 (How to Use This Knowledge) **μ–Έμ œ 이 지식을 μ“°λŠ”κ°€:** - *(TODO)* **μ–Έμ œ μ“°λ©΄ μ•ˆ λ˜λŠ”κ°€:** - *(TODO)* ## πŸ§ͺ 검증 μƒνƒœ (Validation) - **정보 μƒνƒœ:** needs_review - **좜처 신뒰도:** A - **κ²€ν†  이유:** *(P-Reinforce Phase 1 μžλ™ μ •κ·œν™”. λ³Έλ¬Έ 검증 ν•„μš”.)* ## 🧬 쀑볡 검사 (Duplicate Check) - **κΈ°μ‘΄ μœ μ‚¬ λ¬Έμ„œ:** *(TODO: μΈλ±μ„œ ν΄λŸ¬μŠ€ν„° 리포트 μ°Έμ‘°)* - **처리 방식:** UPDATE (μžλ™ μ •κ·œν™”) - **처리 이유:** Phase 1 μ •κ·œν™” β€” μ˜› ν…œν”Œλ¦Ώ/λˆ„λ½ ν•„λ“œ 보강. ## πŸ•“ λ³€κ²½ 이λ ₯ (Changelog) | λ‚ μ§œ | λ³€κ²½ λ‚΄μš© | 처리 방식 | 신뒰도 | |------|-----------|-----------|--------| | 2026-05-08 | P-Reinforce Phase 1 μ •κ·œν™” (frontmatter + 헀더 ν‘œμ€€ν™”) | UPDATE | A | ## πŸ’» μ½”λ“œ νŒ¨ν„΄ (Code Patterns) **νŒ¨ν„΄ 1:** *(TODO: 이 ν”„λ‘œμ νŠΈ μ»¨λ²€μ…˜ λ°˜μ˜ν•œ ꡬ쑰 μŠ€μΌˆλ ˆν†€)* ```text # TODO ``` ## πŸ€” μ˜μ‚¬κ²°μ • κΈ°μ€€ (Decision Criteria) **선택 Aλ₯Ό 써야 ν•  λ•Œ:** - *(TODO)* **선택 Bλ₯Ό 써야 ν•  λ•Œ:** - *(TODO)* **κΈ°λ³Έκ°’:** > *(TODO)* ## ❌ μ•ˆν‹°νŒ¨ν„΄ (Anti-Patterns) - **[μ•ˆν‹°νŒ¨ν„΄]:** *(TODO: 무엇을 ν•˜λ©΄ μ•ˆ λ˜λŠ”κ°€ + 이유 + λŒ€μ‹  무엇을)*