Files
2nd/10_Wiki/Topics/General Knowledge/Dopamine Signaling.md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.8 KiB
Raw Blame History

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-dopamine-signaling Dopamine Signaling 10_Wiki/Topics verified self
P-REINFORCE-AUTO-C204E9
DA Signaling
RPE
Reward Prediction Error
none A 0.92 applied
neuroscience
reinforcement-learning
reward
addiction
RPE
2026-05-10 pending
language framework
python numpy+gymnasium

Dopamine Signaling

매 한 줄

"매 reward prediction error 의 neural currency". 매 Dopamine (DA) signaling 은 mesolimbic + nigrostriatal pathway 에서 phasic burst 로 actual expected reward (RPE) 를 broadcast — 매 Schultz 1997 의 landmark finding 이 modern reinforcement learning + addiction model 의 bridge. 매 2026 의 frontier 는 multi-dimensional DA (D1/D2 receptor, axonal vs somatic, microcircuit) 의 dissection.

매 핵심

매 pathway

  • Mesolimbic (VTA → NAc): 매 reward, motivation, addiction.
  • Mesocortical (VTA → PFC): 매 cognition, working memory.
  • Nigrostriatal (SNc → dorsal striatum): 매 motor learning, habit.
  • Tuberoinfundibular (hypothalamus → pituitary): 매 prolactin inhibition.

매 RPE encoding

  • Phasic burst: 매 unexpected reward → DA 의 firing rate ↑.
  • Phasic dip: 매 omitted expected reward → firing 의 below-baseline.
  • CS-shift: 매 conditioning 의 결과 — burst 의 reward 시점에서 cue 시점으로 shift.
  • Tonic level: 매 background motivation, vigor.

매 receptor 타입

  • D1-like (D1, D5): Gs-coupled, cAMP↑, direct pathway, "Go".
  • D2-like (D2, D3, D4): Gi-coupled, cAMP↓, indirect pathway, "NoGo".
  • Asymmetric learning: D1 의 positive RPE, D2 의 negative RPE.

매 응용

  1. Reinforcement learning algorithm (TD-learning) 의 biological basis.
  2. Parkinson's disease — SNc 의 dopaminergic neuron 손실.
  3. Addiction — mesolimbic 의 hijacking.
  4. Schizophrenia — mesocortical hypofunction + mesolimbic hyperfunction.
  5. ADHD — DA reuptake 의 dysregulation.

💻 패턴

패턴 1: TD(0) RPE simulation

import numpy as np

def td_zero(states, rewards, alpha=0.1, gamma=0.95, episodes=1000):
    V = np.zeros(len(states))
    rpe_log = []
    for _ in range(episodes):
        for t in range(len(states) - 1):
            rpe = rewards[t] + gamma * V[t+1] - V[t]
            V[t] += alpha * rpe
            rpe_log.append((t, rpe))
    return V, rpe_log

패턴 2: CS-US shift (Pavlovian)

def conditioning(trials=200, cs_step=10, us_step=20, alpha=0.2, gamma=0.9):
    """RPE 가 US 에서 CS 시점으로 shift 하는지 확인."""
    V = np.zeros(30)
    history = []
    for trial in range(trials):
        r = np.zeros(30); r[us_step] = 1.0
        for t in range(29):
            rpe = r[t] + gamma * V[t+1] - V[t]
            V[t] += alpha * rpe
            if t in (cs_step, us_step):
                history.append((trial, t, rpe))
    return history  # late trials: cs_step burst, us_step ≈ 0

패턴 3: D1/D2 asymmetric update

def d1d2_update(weights, rpe, lr_d1=0.1, lr_d2=0.1):
    """positive RPE → D1 facilitation, negative → D2 facilitation."""
    if rpe > 0:
        weights["go"] += lr_d1 * rpe
    else:
        weights["nogo"] += lr_d2 * (-rpe)
    return weights

패턴 4: Tonic-phasic interaction

def vigor_modulated_action(action_values, tonic_da, beta=2.0):
    """Niv 2007: tonic DA → response rate."""
    p = np.exp(beta * tonic_da * action_values)
    return p / p.sum()

패턴 5: Distributional RPE (Dabney 2020 finding)

import torch
import torch.nn.functional as F

class DistributionalDA(torch.nn.Module):
    """다양한 optimism 의 DA neuron population."""
    def __init__(self, n_neurons=20):
        super().__init__()
        self.taus = torch.linspace(0.1, 0.9, n_neurons)
        self.values = torch.nn.Parameter(torch.zeros(n_neurons))

    def update(self, reward, lr=0.1):
        delta = reward - self.values
        # asymmetric update per neuron
        with torch.no_grad():
            self.values += lr * torch.where(
                delta > 0, self.taus * delta, (1 - self.taus) * delta
            )

매 결정 기준

상황 Approach
Behavioral RL model TD(λ) / Q-learning, scalar RPE
Asymmetric learning bias 모델 D1/D2 dual-pathway
Vigor / response rate 모델 tonic DA + beta scaling
Risk-sensitive / distributional distributional RL (Dabney 2020)
Clinical Parkinson SNc loss + L-DOPA pharmacology
Addiction model mesolimbic phasic 의 hijack + tolerance

기본값: TD(0) scalar RPE 의 baseline. 매 D1/D2 의 추가 시점은 asymmetric bias 가 핵심인 task.

🔗 Graph

  • 부모: Reinforcement Learning
  • 응용: Addiction · Reward Prediction Error
  • Adjacent: TD-Learning · Actor-Critic · Distributional RL

🤖 LLM 활용

언제: literature review (RPE, distributional DA), modeling hypothesis generation, neurobiology + RL bridge 의 explanation. 언제 X: 매 clinical diagnosis / prescription — 매 neurologist 의 영역.

안티패턴

  • Scalar oversimplification: 매 single RPE channel 가정 — distributional + multi-receptor reality 의 무시.
  • DA = pleasure 오해: 매 DA 는 prediction error / motivation, hedonic experience 는 opioid system.
  • Human ↔ rodent extrapolation: 매 microcircuit + receptor expression 의 species difference 의 무시.

🧪 검증 / 중복

  • Verified (Schultz 1997 Science, Niv 2007, Dabney 2020 Nature, modern review Berke 2018).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — substantive content + distributional DA finding