--- id: wiki-2026-0508-dopamine-signaling title: Dopamine Signaling category: 10_Wiki/Topics status: verified canonical_id: self aliases: [P-REINFORCE-AUTO-C204E9, DA Signaling, RPE, Reward Prediction Error] duplicate_of: none source_trust_level: A confidence_score: 0.92 verification_status: applied tags: [neuroscience, reinforcement-learning, reward, addiction, RPE] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: numpy+gymnasium --- # Dopamine Signaling ## 매 한 줄 > **"매 reward prediction error 의 neural currency"**. 매 Dopamine (DA) signaling 은 mesolimbic + nigrostriatal pathway 에서 phasic burst 로 actual − expected reward (RPE) 를 broadcast — 매 Schultz 1997 의 landmark finding 이 modern reinforcement learning + addiction model 의 bridge. 매 2026 의 frontier 는 multi-dimensional DA (D1/D2 receptor, axonal vs somatic, microcircuit) 의 dissection. ## 매 핵심 ### 매 pathway - **Mesolimbic** (VTA → NAc): 매 reward, motivation, addiction. - **Mesocortical** (VTA → PFC): 매 cognition, working memory. - **Nigrostriatal** (SNc → dorsal striatum): 매 motor learning, habit. - **Tuberoinfundibular** (hypothalamus → pituitary): 매 prolactin inhibition. ### 매 RPE encoding - **Phasic burst**: 매 unexpected reward → DA 의 firing rate ↑. - **Phasic dip**: 매 omitted expected reward → firing 의 below-baseline. - **CS-shift**: 매 conditioning 의 결과 — burst 의 reward 시점에서 cue 시점으로 shift. - **Tonic level**: 매 background motivation, vigor. ### 매 receptor 타입 - **D1-like** (D1, D5): Gs-coupled, cAMP↑, direct pathway, "Go". - **D2-like** (D2, D3, D4): Gi-coupled, cAMP↓, indirect pathway, "NoGo". - **Asymmetric learning**: D1 의 positive RPE, D2 의 negative RPE. ### 매 응용 1. Reinforcement learning algorithm (TD-learning) 의 biological basis. 2. Parkinson's disease — SNc 의 dopaminergic neuron 손실. 3. Addiction — mesolimbic 의 hijacking. 4. Schizophrenia — mesocortical hypofunction + mesolimbic hyperfunction. 5. ADHD — DA reuptake 의 dysregulation. ## 💻 패턴 ### 패턴 1: TD(0) RPE simulation ```python import numpy as np def td_zero(states, rewards, alpha=0.1, gamma=0.95, episodes=1000): V = np.zeros(len(states)) rpe_log = [] for _ in range(episodes): for t in range(len(states) - 1): rpe = rewards[t] + gamma * V[t+1] - V[t] V[t] += alpha * rpe rpe_log.append((t, rpe)) return V, rpe_log ``` ### 패턴 2: CS-US shift (Pavlovian) ```python def conditioning(trials=200, cs_step=10, us_step=20, alpha=0.2, gamma=0.9): """RPE 가 US 에서 CS 시점으로 shift 하는지 확인.""" V = np.zeros(30) history = [] for trial in range(trials): r = np.zeros(30); r[us_step] = 1.0 for t in range(29): rpe = r[t] + gamma * V[t+1] - V[t] V[t] += alpha * rpe if t in (cs_step, us_step): history.append((trial, t, rpe)) return history # late trials: cs_step burst, us_step ≈ 0 ``` ### 패턴 3: D1/D2 asymmetric update ```python def d1d2_update(weights, rpe, lr_d1=0.1, lr_d2=0.1): """positive RPE → D1 facilitation, negative → D2 facilitation.""" if rpe > 0: weights["go"] += lr_d1 * rpe else: weights["nogo"] += lr_d2 * (-rpe) return weights ``` ### 패턴 4: Tonic-phasic interaction ```python def vigor_modulated_action(action_values, tonic_da, beta=2.0): """Niv 2007: tonic DA → response rate.""" p = np.exp(beta * tonic_da * action_values) return p / p.sum() ``` ### 패턴 5: Distributional RPE (Dabney 2020 finding) ```python import torch import torch.nn.functional as F class DistributionalDA(torch.nn.Module): """다양한 optimism 의 DA neuron population.""" def __init__(self, n_neurons=20): super().__init__() self.taus = torch.linspace(0.1, 0.9, n_neurons) self.values = torch.nn.Parameter(torch.zeros(n_neurons)) def update(self, reward, lr=0.1): delta = reward - self.values # asymmetric update per neuron with torch.no_grad(): self.values += lr * torch.where( delta > 0, self.taus * delta, (1 - self.taus) * delta ) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Behavioral RL model | TD(λ) / Q-learning, scalar RPE | | Asymmetric learning bias 모델 | D1/D2 dual-pathway | | Vigor / response rate 모델 | tonic DA + beta scaling | | Risk-sensitive / distributional | distributional RL (Dabney 2020) | | Clinical Parkinson | SNc loss + L-DOPA pharmacology | | Addiction model | mesolimbic phasic 의 hijack + tolerance | **기본값**: TD(0) scalar RPE 의 baseline. 매 D1/D2 의 추가 시점은 asymmetric bias 가 핵심인 task. ## 🔗 Graph - 부모: [[Reinforcement Learning]] - 응용: [[Addiction]] · [[Reward Prediction Error]] - Adjacent: [[TD-Learning]] · [[Actor-Critic]] · [[Distributional RL]] ## 🤖 LLM 활용 **언제**: literature review (RPE, distributional DA), modeling hypothesis generation, neurobiology + RL bridge 의 explanation. **언제 X**: 매 clinical diagnosis / prescription — 매 neurologist 의 영역. ## ❌ 안티패턴 - **Scalar oversimplification**: 매 single RPE channel 가정 — distributional + multi-receptor reality 의 무시. - **DA = pleasure 오해**: 매 DA 는 prediction error / motivation, hedonic experience 는 opioid system. - **Human ↔ rodent extrapolation**: 매 microcircuit + receptor expression 의 species difference 의 무시. ## 🧪 검증 / 중복 - Verified (Schultz 1997 Science, Niv 2007, Dabney 2020 Nature, modern review Berke 2018). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — substantive content + distributional DA finding |