f8b21af4be
10_Wiki/Topics 대규모 정리: - 오류 캡처/미완성 stub 문서 227개 제거 - 교차폴더 중복 43클러스터 병합 (63파일 → redirect) - 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건 - 카테고리 MOC 6개 신규 생성 - Graph 섹션 미해결 related-keyword 링크 10,058건 제거 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
160 lines
5.8 KiB
Markdown
160 lines
5.8 KiB
Markdown
---
|
||
id: wiki-2026-0508-dopamine-signaling
|
||
title: Dopamine Signaling
|
||
category: 10_Wiki/Topics
|
||
status: verified
|
||
canonical_id: self
|
||
aliases: [P-REINFORCE-AUTO-C204E9, DA Signaling, RPE, Reward Prediction Error]
|
||
duplicate_of: none
|
||
source_trust_level: A
|
||
confidence_score: 0.92
|
||
verification_status: applied
|
||
tags: [neuroscience, reinforcement-learning, reward, addiction, RPE]
|
||
raw_sources: []
|
||
last_reinforced: 2026-05-10
|
||
github_commit: pending
|
||
tech_stack:
|
||
language: python
|
||
framework: numpy+gymnasium
|
||
---
|
||
|
||
# Dopamine Signaling
|
||
|
||
## 매 한 줄
|
||
> **"매 reward prediction error 의 neural currency"**. 매 Dopamine (DA) signaling 은 mesolimbic + nigrostriatal pathway 에서 phasic burst 로 actual − expected reward (RPE) 를 broadcast — 매 Schultz 1997 의 landmark finding 이 modern reinforcement learning + addiction model 의 bridge. 매 2026 의 frontier 는 multi-dimensional DA (D1/D2 receptor, axonal vs somatic, microcircuit) 의 dissection.
|
||
|
||
## 매 핵심
|
||
|
||
### 매 pathway
|
||
- **Mesolimbic** (VTA → NAc): 매 reward, motivation, addiction.
|
||
- **Mesocortical** (VTA → PFC): 매 cognition, working memory.
|
||
- **Nigrostriatal** (SNc → dorsal striatum): 매 motor learning, habit.
|
||
- **Tuberoinfundibular** (hypothalamus → pituitary): 매 prolactin inhibition.
|
||
|
||
### 매 RPE encoding
|
||
- **Phasic burst**: 매 unexpected reward → DA 의 firing rate ↑.
|
||
- **Phasic dip**: 매 omitted expected reward → firing 의 below-baseline.
|
||
- **CS-shift**: 매 conditioning 의 결과 — burst 의 reward 시점에서 cue 시점으로 shift.
|
||
- **Tonic level**: 매 background motivation, vigor.
|
||
|
||
### 매 receptor 타입
|
||
- **D1-like** (D1, D5): Gs-coupled, cAMP↑, direct pathway, "Go".
|
||
- **D2-like** (D2, D3, D4): Gi-coupled, cAMP↓, indirect pathway, "NoGo".
|
||
- **Asymmetric learning**: D1 의 positive RPE, D2 의 negative RPE.
|
||
|
||
### 매 응용
|
||
1. Reinforcement learning algorithm (TD-learning) 의 biological basis.
|
||
2. Parkinson's disease — SNc 의 dopaminergic neuron 손실.
|
||
3. Addiction — mesolimbic 의 hijacking.
|
||
4. Schizophrenia — mesocortical hypofunction + mesolimbic hyperfunction.
|
||
5. ADHD — DA reuptake 의 dysregulation.
|
||
|
||
## 💻 패턴
|
||
|
||
### 패턴 1: TD(0) RPE simulation
|
||
```python
|
||
import numpy as np
|
||
|
||
def td_zero(states, rewards, alpha=0.1, gamma=0.95, episodes=1000):
|
||
V = np.zeros(len(states))
|
||
rpe_log = []
|
||
for _ in range(episodes):
|
||
for t in range(len(states) - 1):
|
||
rpe = rewards[t] + gamma * V[t+1] - V[t]
|
||
V[t] += alpha * rpe
|
||
rpe_log.append((t, rpe))
|
||
return V, rpe_log
|
||
```
|
||
|
||
### 패턴 2: CS-US shift (Pavlovian)
|
||
```python
|
||
def conditioning(trials=200, cs_step=10, us_step=20, alpha=0.2, gamma=0.9):
|
||
"""RPE 가 US 에서 CS 시점으로 shift 하는지 확인."""
|
||
V = np.zeros(30)
|
||
history = []
|
||
for trial in range(trials):
|
||
r = np.zeros(30); r[us_step] = 1.0
|
||
for t in range(29):
|
||
rpe = r[t] + gamma * V[t+1] - V[t]
|
||
V[t] += alpha * rpe
|
||
if t in (cs_step, us_step):
|
||
history.append((trial, t, rpe))
|
||
return history # late trials: cs_step burst, us_step ≈ 0
|
||
```
|
||
|
||
### 패턴 3: D1/D2 asymmetric update
|
||
```python
|
||
def d1d2_update(weights, rpe, lr_d1=0.1, lr_d2=0.1):
|
||
"""positive RPE → D1 facilitation, negative → D2 facilitation."""
|
||
if rpe > 0:
|
||
weights["go"] += lr_d1 * rpe
|
||
else:
|
||
weights["nogo"] += lr_d2 * (-rpe)
|
||
return weights
|
||
```
|
||
|
||
### 패턴 4: Tonic-phasic interaction
|
||
```python
|
||
def vigor_modulated_action(action_values, tonic_da, beta=2.0):
|
||
"""Niv 2007: tonic DA → response rate."""
|
||
p = np.exp(beta * tonic_da * action_values)
|
||
return p / p.sum()
|
||
```
|
||
|
||
### 패턴 5: Distributional RPE (Dabney 2020 finding)
|
||
```python
|
||
import torch
|
||
import torch.nn.functional as F
|
||
|
||
class DistributionalDA(torch.nn.Module):
|
||
"""다양한 optimism 의 DA neuron population."""
|
||
def __init__(self, n_neurons=20):
|
||
super().__init__()
|
||
self.taus = torch.linspace(0.1, 0.9, n_neurons)
|
||
self.values = torch.nn.Parameter(torch.zeros(n_neurons))
|
||
|
||
def update(self, reward, lr=0.1):
|
||
delta = reward - self.values
|
||
# asymmetric update per neuron
|
||
with torch.no_grad():
|
||
self.values += lr * torch.where(
|
||
delta > 0, self.taus * delta, (1 - self.taus) * delta
|
||
)
|
||
```
|
||
|
||
## 매 결정 기준
|
||
| 상황 | Approach |
|
||
|---|---|
|
||
| Behavioral RL model | TD(λ) / Q-learning, scalar RPE |
|
||
| Asymmetric learning bias 모델 | D1/D2 dual-pathway |
|
||
| Vigor / response rate 모델 | tonic DA + beta scaling |
|
||
| Risk-sensitive / distributional | distributional RL (Dabney 2020) |
|
||
| Clinical Parkinson | SNc loss + L-DOPA pharmacology |
|
||
| Addiction model | mesolimbic phasic 의 hijack + tolerance |
|
||
|
||
**기본값**: TD(0) scalar RPE 의 baseline. 매 D1/D2 의 추가 시점은 asymmetric bias 가 핵심인 task.
|
||
|
||
## 🔗 Graph
|
||
- 부모: [[Reinforcement Learning]]
|
||
- 응용: [[Addiction]] · [[Reward Prediction Error]]
|
||
- Adjacent: [[TD-Learning]] · [[Actor-Critic]] · [[Distributional RL]]
|
||
|
||
## 🤖 LLM 활용
|
||
**언제**: literature review (RPE, distributional DA), modeling hypothesis generation, neurobiology + RL bridge 의 explanation.
|
||
**언제 X**: 매 clinical diagnosis / prescription — 매 neurologist 의 영역.
|
||
|
||
## ❌ 안티패턴
|
||
- **Scalar oversimplification**: 매 single RPE channel 가정 — distributional + multi-receptor reality 의 무시.
|
||
- **DA = pleasure 오해**: 매 DA 는 prediction error / motivation, hedonic experience 는 opioid system.
|
||
- **Human ↔ rodent extrapolation**: 매 microcircuit + receptor expression 의 species difference 의 무시.
|
||
|
||
## 🧪 검증 / 중복
|
||
- Verified (Schultz 1997 Science, Niv 2007, Dabney 2020 Nature, modern review Berke 2018).
|
||
- 신뢰도 A.
|
||
|
||
## 🕓 Changelog
|
||
| 날짜 | 변경 |
|
||
|---|---|
|
||
| 2026-05-08 | Phase 1 |
|
||
| 2026-05-10 | Manual cleanup — substantive content + distributional DA finding |
|