--- id: P-REINFORCE-AI-BELLMAN category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [Bellman Equation, Reinforcement Learning, Dynamic Programming, MDP] last_reinforced: 2026-04-20 --- # [[Bellman-Equation|Bellman-Equation]] (벨만 방정식) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "였늘의 선택은 λ‚΄μΌμ˜ κ°€μΉ˜λ₯Ό ν’ˆκ³  μžˆλ‹€." ν˜„μž¬ μƒνƒœμ˜ κ°€μΉ˜λ₯Ό 'ν˜„μž¬ λ°›λŠ” 보상'κ³Ό 'λ‹€μŒ μƒνƒœμ˜ κΈ°λŒ€ κ°€μΉ˜'의 ν•©μœΌλ‘œ μ •μ˜ν•˜λŠ” κ°•ν™”ν•™μŠ΅κ³Ό 동적 κ³„νšλ²•μ˜ μˆ˜ν•™μ  μ΄ˆμ„μ΄λ‹€. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **Recursive Structure**: - λ³΅μž‘ν•œ 미래의 합을 ν˜„μž¬μ™€ λ°”λ‘œ λ‹€μŒ λ‹¨κ³„μ˜ κ΄€κ³„λ‘œ μͺΌκ°¬μœΌλ‘œμ¨, κ±°λŒ€ν•œ μ˜μ‚¬κ²°μ • 문제λ₯Ό 계산 κ°€λŠ₯ν•œ λ‹¨μœ„λ‘œ λΆ„ν•΄ν•œλ‹€. - **State-Value Function (V)**: - νŠΉμ • μƒνƒœμ— μžˆλŠ” 것이 μž₯기적으둜 λ³Ό λ•Œ μ–Όλ§ˆλ‚˜ 쒋은지 μˆ˜μΉ˜ν™”ν•œλ‹€. - **Action-Value Function (Q)**: - νŠΉμ • μƒνƒœμ—μ„œ νŠΉμ • 행동을 ν•˜λŠ” 것이 μ–Όλ§ˆλ‚˜ 쒋은지 μˆ˜μΉ˜ν™”ν•˜λ©°, μ΄λŠ” Q-Learning의 핡심이 λœλ‹€. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (RL Update) - 벨만 방정식은 ν™˜κ²½μ˜ λ³€ν™”λ₯Ό μ™„λ²½νžˆ μ•ˆλ‹€λŠ” κ°€μ •ν•˜μ— μž‘λ™ν•œλ‹€. μ‹€μ œ μ„Έμƒμ²˜λŸΌ ν™˜κ²½μ΄ 뢈투λͺ…ν•  λ•ŒλŠ” κ·Όμ‚¬μΉ˜(Approximation)λ₯Ό μ‚¬μš©ν•˜λŠ” Deep Q-Network(DQN) 등이 λŒ€μ•ˆμœΌλ‘œ μ‚¬μš©λœλ‹€. ## πŸ”— 지식 μ—°κ²° (Graph) - Related: [[DQN|DQN]] , [[Reinforcement-Learning|Reinforcement-Learning]] - Foundation: Computational Theory & Math/Information Theory