--- id: P-REINFORCE-AI-BELLMAN category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 0.99 tags: [Bellman Equation, Reinforcement Learning, Math, Dynamic Programming] last_reinforced: 2026-04-20 --- # [[Bellman-Equation|Bellman-Equation]] (벨만 방정식) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "였늘의 보상(Step reward) + λ‚΄μΌμ˜ κ°€μΉ˜(Future value) = 였늘의 κ°€μΉ˜." μ‹œκ°„μ˜ 흐름 속에 흩어진 κ°€μΉ˜λ₯Ό ν•˜λ‚˜λ‘œ λ¬Άμ–΄μ£ΌλŠ” μž¬κ·€μ˜ 미학이닀. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **Recursive Utility**: - ν˜„μž¬ μƒνƒœμ˜ κ°€μΉ˜(Value)λ₯Ό '즉각적 보상'κ³Ό 'λ‹€μŒ μƒνƒœμ˜ κΈ°λŒ€ κ°€μΉ˜'의 ν•©μœΌλ‘œ μ •μ˜ν•œλ‹€. μ΄λŠ” λ³΅μž‘ν•œ 미래 결정을 μž‘μ€ ν˜„μž¬ κ²°μ •μœΌλ‘œ μͺΌκ°œμ–΄ ν’€ 수 있게 ν•œλ‹€. - **Dynamic Programming (동적 κ³„νšλ²•)**: - 벨만 방정식은 큰 문제λ₯Ό μž‘μ€ λΆ€λΆ„ 문제둜 λ‚˜λˆ„μ–΄ ν‘ΈλŠ” 근간이 λœλ‹€. λ°”λ‘‘(AlphaGo)μ΄λ‚˜ 체슀 AI의 핡심 μ—°μ‚° 원리닀. - **Discount Factor (Gamma)**: - 미래의 κ°€μΉ˜λ₯Ό ν˜„μž¬ μ‹œμ μœΌλ‘œ ν™˜μ‚°ν•  λ•Œ μ–Όλ§ˆλ‚˜ κΉŽμ„μ§€(κ°€μ€‘μΉ˜)λ₯Ό κ²°μ •ν•˜λŠ” λ³€μˆ˜. 1에 κ°€κΉŒμšΈμˆ˜λ‘ λ¨Ό 미래λ₯Ό 보고, 0에 κ°€κΉŒμšΈμˆ˜λ‘ λ‹Ήμž₯의 이읡에 μ§‘μ€‘ν•œλ‹€. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (RL Update) - μ‹€μ œ 세계(Model-free)μ—μ„œλŠ” λ‹€μŒ μƒνƒœμ˜ κ°€μΉ˜λ₯Ό μ •ν™•νžˆ μ•Œ 수 μ—†λ‹€. κ·Έλž˜μ„œ 벨만 방정식을 기반으둜 κ²½ν—˜μ„ 톡해 κ°€μΉ˜λ₯Ό μΆ”μΈ‘ν•΄κ°€λŠ” 'Q-Learning'μ΄λ‚˜ 'Deep Q-Networks(DQN)'둜 λ°œμ „ν•΄μ™”λ‹€. ## πŸ”— 지식 μ—°κ²° (Graph) - Related: Reinforcement Learning , Deep-Reinforcement-Learning - Foundation: Computational Theory & Math/Information Theory