Files
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

160 lines
5.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: wiki-2026-0508-dopamine-signaling
title: Dopamine Signaling
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [P-REINFORCE-AUTO-C204E9, DA Signaling, RPE, Reward Prediction Error]
duplicate_of: none
source_trust_level: A
confidence_score: 0.92
verification_status: applied
tags: [neuroscience, reinforcement-learning, reward, addiction, RPE]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
language: python
framework: numpy+gymnasium
---
# Dopamine Signaling
## 매 한 줄
> **"매 reward prediction error 의 neural currency"**. 매 Dopamine (DA) signaling 은 mesolimbic + nigrostriatal pathway 에서 phasic burst 로 actual expected reward (RPE) 를 broadcast — 매 Schultz 1997 의 landmark finding 이 modern reinforcement learning + addiction model 의 bridge. 매 2026 의 frontier 는 multi-dimensional DA (D1/D2 receptor, axonal vs somatic, microcircuit) 의 dissection.
## 매 핵심
### 매 pathway
- **Mesolimbic** (VTA → NAc): 매 reward, motivation, addiction.
- **Mesocortical** (VTA → PFC): 매 cognition, working memory.
- **Nigrostriatal** (SNc → dorsal striatum): 매 motor learning, habit.
- **Tuberoinfundibular** (hypothalamus → pituitary): 매 prolactin inhibition.
### 매 RPE encoding
- **Phasic burst**: 매 unexpected reward → DA 의 firing rate ↑.
- **Phasic dip**: 매 omitted expected reward → firing 의 below-baseline.
- **CS-shift**: 매 conditioning 의 결과 — burst 의 reward 시점에서 cue 시점으로 shift.
- **Tonic level**: 매 background motivation, vigor.
### 매 receptor 타입
- **D1-like** (D1, D5): Gs-coupled, cAMP↑, direct pathway, "Go".
- **D2-like** (D2, D3, D4): Gi-coupled, cAMP↓, indirect pathway, "NoGo".
- **Asymmetric learning**: D1 의 positive RPE, D2 의 negative RPE.
### 매 응용
1. Reinforcement learning algorithm (TD-learning) 의 biological basis.
2. Parkinson's disease — SNc 의 dopaminergic neuron 손실.
3. Addiction — mesolimbic 의 hijacking.
4. Schizophrenia — mesocortical hypofunction + mesolimbic hyperfunction.
5. ADHD — DA reuptake 의 dysregulation.
## 💻 패턴
### 패턴 1: TD(0) RPE simulation
```python
import numpy as np
def td_zero(states, rewards, alpha=0.1, gamma=0.95, episodes=1000):
V = np.zeros(len(states))
rpe_log = []
for _ in range(episodes):
for t in range(len(states) - 1):
rpe = rewards[t] + gamma * V[t+1] - V[t]
V[t] += alpha * rpe
rpe_log.append((t, rpe))
return V, rpe_log
```
### 패턴 2: CS-US shift (Pavlovian)
```python
def conditioning(trials=200, cs_step=10, us_step=20, alpha=0.2, gamma=0.9):
"""RPE 가 US 에서 CS 시점으로 shift 하는지 확인."""
V = np.zeros(30)
history = []
for trial in range(trials):
r = np.zeros(30); r[us_step] = 1.0
for t in range(29):
rpe = r[t] + gamma * V[t+1] - V[t]
V[t] += alpha * rpe
if t in (cs_step, us_step):
history.append((trial, t, rpe))
return history # late trials: cs_step burst, us_step ≈ 0
```
### 패턴 3: D1/D2 asymmetric update
```python
def d1d2_update(weights, rpe, lr_d1=0.1, lr_d2=0.1):
"""positive RPE → D1 facilitation, negative → D2 facilitation."""
if rpe > 0:
weights["go"] += lr_d1 * rpe
else:
weights["nogo"] += lr_d2 * (-rpe)
return weights
```
### 패턴 4: Tonic-phasic interaction
```python
def vigor_modulated_action(action_values, tonic_da, beta=2.0):
"""Niv 2007: tonic DA → response rate."""
p = np.exp(beta * tonic_da * action_values)
return p / p.sum()
```
### 패턴 5: Distributional RPE (Dabney 2020 finding)
```python
import torch
import torch.nn.functional as F
class DistributionalDA(torch.nn.Module):
"""다양한 optimism 의 DA neuron population."""
def __init__(self, n_neurons=20):
super().__init__()
self.taus = torch.linspace(0.1, 0.9, n_neurons)
self.values = torch.nn.Parameter(torch.zeros(n_neurons))
def update(self, reward, lr=0.1):
delta = reward - self.values
# asymmetric update per neuron
with torch.no_grad():
self.values += lr * torch.where(
delta > 0, self.taus * delta, (1 - self.taus) * delta
)
```
## 매 결정 기준
| 상황 | Approach |
|---|---|
| Behavioral RL model | TD(λ) / Q-learning, scalar RPE |
| Asymmetric learning bias 모델 | D1/D2 dual-pathway |
| Vigor / response rate 모델 | tonic DA + beta scaling |
| Risk-sensitive / distributional | distributional RL (Dabney 2020) |
| Clinical Parkinson | SNc loss + L-DOPA pharmacology |
| Addiction model | mesolimbic phasic 의 hijack + tolerance |
**기본값**: TD(0) scalar RPE 의 baseline. 매 D1/D2 의 추가 시점은 asymmetric bias 가 핵심인 task.
## 🔗 Graph
- 부모: [[Reinforcement Learning]]
- 응용: [[Addiction]] · [[Reward Prediction Error]]
- Adjacent: [[TD-Learning]] · [[Actor-Critic]] · [[Distributional RL]]
## 🤖 LLM 활용
**언제**: literature review (RPE, distributional DA), modeling hypothesis generation, neurobiology + RL bridge 의 explanation.
**언제 X**: 매 clinical diagnosis / prescription — 매 neurologist 의 영역.
## ❌ 안티패턴
- **Scalar oversimplification**: 매 single RPE channel 가정 — distributional + multi-receptor reality 의 무시.
- **DA = pleasure 오해**: 매 DA 는 prediction error / motivation, hedonic experience 는 opioid system.
- **Human ↔ rodent extrapolation**: 매 microcircuit + receptor expression 의 species difference 의 무시.
## 🧪 검증 / 중복
- Verified (Schultz 1997 Science, Niv 2007, Dabney 2020 Nature, modern review Berke 2018).
- 신뢰도 A.
## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — substantive content + distributional DA finding |