2nd/10_Wiki/Topics/General Knowledge/Dopamine Signaling.md

---
id: wiki-2026-0508-dopamine-signaling
title: Dopamine Signaling
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [P-REINFORCE-AUTO-C204E9, DA Signaling, RPE, Reward Prediction Error]
duplicate_of: none
source_trust_level: A
confidence_score: 0.92
verification_status: applied
tags: [neuroscience, reinforcement-learning, reward, addiction, RPE]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: python
  framework: numpy+gymnasium
---

# Dopamine Signaling

## 매 한 줄
> **"매 reward prediction error 의 neural currency"**. 매 Dopamine (DA) signaling 은 mesolimbic + nigrostriatal pathway 에서 phasic burst 로 actual − expected reward (RPE) 를 broadcast — 매 Schultz 1997 의 landmark finding 이 modern reinforcement learning + addiction model 의 bridge. 매 2026 의 frontier 는 multi-dimensional DA (D1/D2 receptor, axonal vs somatic, microcircuit) 의 dissection.

## 매 핵심

### 매 pathway
- **Mesolimbic** (VTA → NAc): 매 reward, motivation, addiction.
- **Mesocortical** (VTA → PFC): 매 cognition, working memory.
- **Nigrostriatal** (SNc → dorsal striatum): 매 motor learning, habit.
- **Tuberoinfundibular** (hypothalamus → pituitary): 매 prolactin inhibition.

### 매 RPE encoding
- **Phasic burst**: 매 unexpected reward → DA 의 firing rate ↑.
- **Phasic dip**: 매 omitted expected reward → firing 의 below-baseline.
- **CS-shift**: 매 conditioning 의 결과 — burst 의 reward 시점에서 cue 시점으로 shift.
- **Tonic level**: 매 background motivation, vigor.

### 매 receptor 타입
- **D1-like** (D1, D5): Gs-coupled, cAMP↑, direct pathway, "Go".
- **D2-like** (D2, D3, D4): Gi-coupled, cAMP↓, indirect pathway, "NoGo".
- **Asymmetric learning**: D1 의 positive RPE, D2 의 negative RPE.

### 매 응용
1. Reinforcement learning algorithm (TD-learning) 의 biological basis.
2. Parkinson's disease — SNc 의 dopaminergic neuron 손실.
3. Addiction — mesolimbic 의 hijacking.
4. Schizophrenia — mesocortical hypofunction + mesolimbic hyperfunction.
5. ADHD — DA reuptake 의 dysregulation.

## 💻 패턴

### 패턴 1: TD(0) RPE simulation
```python
import numpy as np

def td_zero(states, rewards, alpha=0.1, gamma=0.95, episodes=1000):
    V = np.zeros(len(states))
    rpe_log = []
    for _ in range(episodes):
        for t in range(len(states) - 1):
            rpe = rewards[t] + gamma * V[t+1] - V[t]
            V[t] += alpha * rpe
            rpe_log.append((t, rpe))
    return V, rpe_log
```

### 패턴 2: CS-US shift (Pavlovian)
```python
def conditioning(trials=200, cs_step=10, us_step=20, alpha=0.2, gamma=0.9):
    """RPE 가 US 에서 CS 시점으로 shift 하는지 확인."""
    V = np.zeros(30)
    history = []
    for trial in range(trials):
        r = np.zeros(30); r[us_step] = 1.0
        for t in range(29):
            rpe = r[t] + gamma * V[t+1] - V[t]
            V[t] += alpha * rpe
            if t in (cs_step, us_step):
                history.append((trial, t, rpe))
    return history  # late trials: cs_step burst, us_step ≈ 0
```

### 패턴 3: D1/D2 asymmetric update
```python
def d1d2_update(weights, rpe, lr_d1=0.1, lr_d2=0.1):
    """positive RPE → D1 facilitation, negative → D2 facilitation."""
    if rpe > 0:
        weights["go"] += lr_d1 * rpe
    else:
        weights["nogo"] += lr_d2 * (-rpe)
    return weights
```

### 패턴 4: Tonic-phasic interaction
```python
def vigor_modulated_action(action_values, tonic_da, beta=2.0):
    """Niv 2007: tonic DA → response rate."""
    p = np.exp(beta * tonic_da * action_values)
    return p / p.sum()
```

### 패턴 5: Distributional RPE (Dabney 2020 finding)
```python
import torch
import torch.nn.functional as F

class DistributionalDA(torch.nn.Module):
    """다양한 optimism 의 DA neuron population."""
    def __init__(self, n_neurons=20):
        super().__init__()
        self.taus = torch.linspace(0.1, 0.9, n_neurons)
        self.values = torch.nn.Parameter(torch.zeros(n_neurons))

    def update(self, reward, lr=0.1):
        delta = reward - self.values
        # asymmetric update per neuron
        with torch.no_grad():
            self.values += lr * torch.where(
                delta > 0, self.taus * delta, (1 - self.taus) * delta
            )
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Behavioral RL model | TD(λ) / Q-learning, scalar RPE |
| Asymmetric learning bias 모델 | D1/D2 dual-pathway |
| Vigor / response rate 모델 | tonic DA + beta scaling |
| Risk-sensitive / distributional | distributional RL (Dabney 2020) |
| Clinical Parkinson | SNc loss + L-DOPA pharmacology |
| Addiction model | mesolimbic phasic 의 hijack + tolerance |

**기본값**: TD(0) scalar RPE 의 baseline. 매 D1/D2 의 추가 시점은 asymmetric bias 가 핵심인 task.

## 🔗 Graph
- 부모: [[Reinforcement Learning]]
- 응용: [[Addiction]] · [[Reward Prediction Error]]
- Adjacent: [[TD-Learning]] · [[Actor-Critic]] · [[Distributional RL]]

## 🤖 LLM 활용
**언제**: literature review (RPE, distributional DA), modeling hypothesis generation, neurobiology + RL bridge 의 explanation.
**언제 X**: 매 clinical diagnosis / prescription — 매 neurologist 의 영역.

## ❌ 안티패턴
- **Scalar oversimplification**: 매 single RPE channel 가정 — distributional + multi-receptor reality 의 무시.
- **DA = pleasure 오해**: 매 DA 는 prediction error / motivation, hedonic experience 는 opioid system.
- **Human ↔ rodent extrapolation**: 매 microcircuit + receptor expression 의 species difference 의 무시.

## 🧪 검증 / 중복
- Verified (Schultz 1997 Science, Niv 2007, Dabney 2020 Nature, modern review Berke 2018).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — substantive content + distributional DA finding |