--- id: Q-LEARN-001 category: Unified confidence_score: 1.0 tags: [[Reinforcement-Learning|[Reinforcement-Learning]], ai, q-learning, [[Bellman-Equation|Bellman-Equation]], [[Optimization|Optimization]]] last_reinforced: 2026-04-26 --- # Q-Learning Foundations (Q-λŸ¬λ‹ 기초) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μ–΄λ–€ μƒνƒœμ—μ„œ μ–΄λ–€ 행동이 κ°€μž₯ κ°€μΉ˜ μžˆλŠ”μ§€ 슀슀둜 κΉ¨λ‹«κ²Œ ν•˜λΌ" β€” ν™˜κ²½κ³Όμ˜ μƒν˜Έμž‘μš©μ„ 톡해 각 'μƒνƒœ-행동' μŒμ— λŒ€ν•œ κΈ°λŒ€ 보상값(Q-value)을 반볡적으둜 μ—…λ°μ΄νŠΈν•˜μ—¬ 졜적의 정책을 μ°Ύμ•„λ‚΄λŠ” κ°•ν™”ν•™μŠ΅ μ•Œκ³ λ¦¬μ¦˜. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** ν˜„μž¬μ˜ 보상과 미래의 κΈ°λŒ€ 보상을 벨만 방정식([[Bellman Equation|Bellman Equation]])으둜 μ—°κ²°ν•˜μ—¬, μ‹œκ°„μ΄ 지남에 따라 μ—μ΄μ „νŠΈμ˜ μ˜μ‚¬κ²°μ • ν’ˆμ§ˆμ„ ν–₯μƒμ‹œν‚€λŠ” κ°€μΉ˜ 반볡(Value [[Iteration|Iteration]]) νŒ¨ν„΄. - **μ„ΈλΆ€ λ‚΄μš©:** - **Q-Table:** λͺ¨λ“  μƒνƒœ([[State|State]])와 행동(Action) 쑰합에 λŒ€ν•œ κ°€μΉ˜λ₯Ό μ €μž₯ν•˜λŠ” ν‘œ. - **Temporal Difference (TD):** ν˜„μž¬ μ˜ˆμΈ‘ν•œ Qκ°’κ³Ό μ‹€μ œ κ΄€μΈ‘λœ 보상(및 λ‹€μŒ μƒνƒœμ˜ μ˜ˆμΈ‘κ°’) μ‚¬μ΄μ˜ 차이λ₯Ό μ΄μš©ν•΄ κ°€μ€‘μΉ˜λ₯Ό μˆ˜μ •. - **[[Exploration vs Exploitation|Exploration vs Exploitation]]:** λ¬΄μž‘μœ„ 행동($\epsilon$-greedy λ“±)을 톡해 μƒˆλ‘œμš΄ 경둜λ₯Ό 탐색할지, 이미 μ•Œκ³  μžˆλŠ” 졜적의 행동을 ν• μ§€ κ²°μ •. - **Discount Factor ($\gamma$):** 미래 λ³΄μƒμ˜ κ°€μΉ˜λ₯Ό ν˜„μž¬ μ‹œμ μ—μ„œ μ–Όλ§ˆλ‚˜ μ€‘μš”ν•˜κ²Œ μ—¬κΈΈμ§€ κ²°μ •ν•˜λŠ” μƒμˆ˜. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** κ±°λŒ€ν•œ μƒνƒœ κ³΅κ°„μ—μ„œ Q-Table을 μœ μ§€ν•˜λŠ” 것이 λΆˆκ°€λŠ₯ν•΄μ§€μž, 신경망을 톡해 Q값을 κ·Όμ‚¬ν•˜λŠ” DQN(Deep Q-Network)으둜 진화함. - **μ •μ±… λ³€ν™”:** Skybound ν”„λ‘œμ νŠΈμ˜ 일반 적 μœ λ‹› AIλŠ” κ°€λ²Όμš΄ Q-Learning 기반 λ‘œμ§μ„ μ‚¬μš©ν•˜μ—¬ ν”Œλ ˆμ΄μ–΄μ˜ 곡격 νŒ¨ν„΄μ— 맞좰 νšŒν”Ό ν™•λ₯ μ„ μ‘°μ ˆν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - [[Reinforcement-Learning|Reinforcement-Learning]], [[Temporal-Difference-Learning|Temporal-Difference-Learning]], Deep-Q-Networks, [[Bellman-Equation|Bellman-Equation]] - **Raw Source:** 10_Wiki/Topics/AI/Q-Learning Foundations.md