--- category: Unified tags: [auto-consolidated, technical-documentation] title: [[Markov Decision Processes|Markov Decision Processes]] last_updated: 2026-05-02 --- # [[Markov Decision Processes|Markov Decision Processes]] ## πŸ“Œ Brief Summary > 지식 μš”μ•½ 정보 μΆ”μΆœ 쀑... --- > "μ˜μ‚¬κ²°μ •μ˜ μˆ˜ν•™μ  지도: λΆˆν™•μ‹€ν•œ ν™˜κ²½ μ†μ—μ„œ λ‘œλ΄‡μ΄λ‚˜ μ—μ΄μ „νŠΈκ°€ μ–΄λ–€ '행동'을 ν•΄μ•Ό κ°€μž₯ 큰 '보상'을 얻을 수 μžˆλŠ”μ§€, μƒνƒœ-행동-보상-μ „μ΄μ˜ μ‚¬μŠ¬λ‘œ μ •μ˜ν•˜μ—¬ 인곡지λŠ₯이 슀슀둜 μ „λž΅μ„ 짜게 λ§Œλ“œλŠ” κ°•ν™” ν•™μŠ΅μ˜ 청사진." ## πŸ“– Core Content λ³Έλ¬Έ ꡬ쑰화 μž‘μ—… 쀑... --- 마λ₯΄μ½”ν”„ κ²°μ • κ³Όμ •(MDP)은 μ˜μ‚¬κ²°μ • 문제λ₯Ό ν™•λ₯ λ‘ μ  μ΅œμš°μ„ μœΌλ‘œ λͺ¨λΈλ§ν•˜λŠ” μˆ˜ν•™μ  ν”„λ ˆμž„μ›Œν¬μž…λ‹ˆλ‹€. 1. **5λŒ€ μš”μ†Œ (S, A, P, R, $\gamma$)**: * **[[State|State]] (S)**: ν˜„μž¬ 상황. * **Action (A)**: ν•  수 μžˆλŠ” 행동. * **Transition Probability (P)**: 행동 ν›„ λ‹€μŒ μƒνƒœλ‘œ 갈 ν™•λ₯ . * **Reward (R)**: ν–‰λ™μ˜ 결과둜 λ°›λŠ” 보상. * **Discount Factor ($\gamma$)**: 미래의 보상을 ν˜„μž¬ κ°€μΉ˜λ‘œ μ–Όλ§ˆλ‚˜ 쳐쀄 것인가. 2. **μ™œ μ€‘μš”ν•œκ°€?**: * 인곡지λŠ₯이 λ‹¨μˆœνžˆ 데이터λ₯Ό μ™Έμš°λŠ” 게 μ•„λ‹ˆλΌ, λ³΅μž‘ν•œ ν™˜κ²½κ³Ό μƒν˜Έμž‘μš©ν•˜λ©° '졜적의 μ •μ±…(Policy)'을 μ°Ύμ•„κ°€λŠ” λͺ¨λ“  κ°•ν™” ν•™μŠ΅ μ•Œκ³ λ¦¬μ¦˜μ˜ ν‘œμ€€ 이둠이기 λ•Œλ¬Έμž„. ([[Reinforcement Learning (RL)|Reinforcement Learning (RL)]]와 μ—°κ²°) ## βš–οΈ Trade-offs & Caveats - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** μžλ™ν™” 엔진에 μ˜ν•΄ λ§€ν•‘λœ μ§€μ‹μœΌλ‘œ, μΆ”ν›„ μ •λ°€ 검증 ν•„μš”. - **μ •μ±… λ³€ν™”:** Graphics & Performance λΆ„μ•Όμ˜ μžλ™ μžμ‚°ν™” μˆ˜ν–‰. --- - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌**: κ³Όκ±°μ—λŠ” ν™˜κ²½μ˜ λͺ¨λ“  정보λ₯Ό μ•„λŠ” μ •μ±…(Full Observability)을 μ „μ œν–ˆμœΌλ‚˜, ν˜„λŒ€ 정책은 ν™˜κ²½μ˜ μΌλΆ€λ§Œ λ³΄μ΄λŠ” 상황([[POMDP|POMDP]]) μ •μ±…μ—μ„œλ„ 졜적의 수λ₯Ό μ°Ύμ•„λ‚΄λŠ” 볡합 μΆ”λ‘  μ •μ±…μœΌλ‘œ 진화함(RL Update). - **μ •μ±… λ³€ν™”(RL Update)**: λ°”λ‘‘(μ•ŒνŒŒκ³ )μ΄λ‚˜ κ²Œμž„μ„ λ„˜μ–΄, μžμœ¨μ£Όν–‰μ΄λ‚˜ 도심 항곡 λͺ¨λΉŒλ¦¬ν‹°(UAM)의 경둜 μ •μ±… 수립 λ“± μ‹€μƒν™œμ˜ κ±°λŒ€ν•˜κ³  λ³΅μž‘ν•œ μ‹œμŠ€ν…œ μ΅œμ ν™” μ •μ±…μ˜ ν•΅μ‹¬μœΌλ‘œ μž‘λ™ μ€‘μž„. ## πŸ”— Knowledge Connections - Raw Source: 00_Raw/2026-04-20/Markov Decision Processes.md --- --- - [[Reinforcement Learning (RL)|Reinforcement Learning (RL)]], [[Markov-Chains|Markov-Chains]], [[Optimization|Optimization]], [[Decision Theory|Decision Theory]], [[Logic|Logic]] - **Modern Tech/Tools**: [[Bellman Equation|Bellman Equation]], Q-Learning, PPO, Deep Reinforcement Learning. ---