--- category: Unified tags: [auto-consolidated, technical-documentation] title: [[Deep Q-Networks (DQN)|Deep Q-Networks (DQN)]] last_updated: 2026-05-02 --- # [[Deep Q-Networks (DQN)|Deep Q-Networks (DQN)]] ## ๐Ÿ“Œ Brief Summary > "๊ณ ์ „ ๊ฒŒ์ž„๊ธฐ๋ฅผ ์ •๋ณตํ•œ ๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ•ํ™”ํ•™์Šต์˜ ์‚ฌ์ƒ ์ฒซ ๋ฒˆ์งธ ๊ฒฐํ•ฉ." ์ƒํƒœ ๊ฐ€์น˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ณ ์ „์ ์ธ Q-Learning์— ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ๋„์ž…ํ•˜์—ฌ ํ”ฝ์…€ ์ •๋ณด๋งŒ์œผ๋กœ ์ธ๊ฐ„ ์ด์ƒ์˜ ๊ฒŒ์ž„ ์‹ค๋ ฅ์„ ๋‹ฌ์„ฑํ•œ ๊ธฐ๋…๋น„์  ๋…ผ๋ฌธ์ด๋‹ค. --- > "๊ฐ•ํ™”ํ•™์Šต์˜ ์˜์‚ฌ๊ฒฐ์ • ํ…Œ์ด๋ธ”์„ ๊ฑฐ๋Œ€ ์‹ ๊ฒฝ๋ง์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ๋ฌดํ•œํ•œ ๋ณต์žก์„ฑ์— ๋„์ „ํ•˜๋ผ" โ€” ๊ณ ์ „์  Q-Learning์˜ ํ…Œ์ด๋ธ” ๋ฐฉ์‹ ํ•œ๊ณ„๋ฅผ ๋”ฅ๋Ÿฌ๋‹์œผ๋กœ ๊ทน๋ณตํ•˜์—ฌ, ์•„ํƒ€๋ฆฌ ๊ฒŒ์ž„์„ ์ธ๊ฐ„ ์ˆ˜์ค€์œผ๋กœ ์ •๋ณตํ•˜๋ฉฐ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต(Deep RL)์˜ ์‹œ๋Œ€๋ฅผ ์—ฐ ๋ชจ๋ธ. ## ๐Ÿ“– Core Content - **Key [[Innovation|Innovation]]s**: - **Deep Neural Network as Q-Function**: ๋ณต์žกํ•˜๊ณ  ๊ณ ์ฐจ์›์ ์ธ ์ƒํƒœ(์˜ˆ: ํ™”๋ฉด ํ”ฝ์…€)๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ ํ–‰๋™์˜ ๊ฐ€์น˜๋ฅผ ๊ณ„์‚ฐํ•˜๋„๋ก CNN์„ ์‚ฌ์šฉํ•จ. - **Experience Replay**: ๊ฒฝํ—˜ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅํ•ด๋‘๊ณ  ๋ฌด์ž‘์œ„๋กœ ์ถ”์ถœํ•˜์—ฌ ํ•™์Šตํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ ๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„(Correlation)๋ฅผ ๋Š๊ณ  ์•ˆ์ •์„ฑ์„ ํ™•๋ณดํ•จ. - **Target Network**: ๊ฐ€์น˜ ์˜ˆ์ธก๊ฐ’๊ณผ ๋ชฉํ‘œ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ํ•™์Šต ์ค‘ ๋ชฉํ‘œ๊ฐ’์ด ์š”๋™์น˜๋Š” ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•จ. - **Legacy**: ์•„ํƒ€๋ฆฌ(Atari) ๊ฒŒ์ž„ ์ •๋ณต์„ ํ†ตํ•ด ํ˜„๋Œ€ ์‹ฌ์ธต ๊ฐ•ํ™”ํ•™์Šต(Deep RL) ์‹œ๋Œ€๋ฅผ ์—ด์—ˆ๋‹ค. --- - **์ถ”์ถœ๋œ ํŒจํ„ด:** ์ƒํƒœ([[State|State]])๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ ํ–‰๋™(Action)์˜ ๊ฐ€์น˜(Q-value)๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์‹ ๊ฒฝ๋ง์œผ๋กœ ๊ทผ์‚ฌํ•˜๊ณ , ๊ฒฝํ—˜ ์žฌํ”Œ๋ ˆ์ด์™€ ํƒ€๊ฒŸ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ํ•™์Šต์„ ์•ˆ์ •ํ™”ํ•˜๋Š” ํŒจํ„ด. - **ํ•ต์‹ฌ ๊ธฐ์ˆ :** - **Experience Replay:** ์—์ด์ „ํŠธ์˜ ๊ฒฝํ—˜($s, a, r, s'$)์„ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅํ•˜๊ณ  ๋ฌด์ž‘์œ„๋กœ ์ถ”์ถœํ•˜์—ฌ ํ•™์Šตํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ ๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋Š๊ณ  ํ•™์Šต ํšจ์œจ ์ฆ๋Œ€. - **Target Network:** ๊ฐ€์น˜ ๊ณ„์‚ฐ์šฉ ๋„คํŠธ์›Œํฌ๋ฅผ ๋ณ„๋„๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ํ•™์Šต ์ค‘ ๋ชฉํ‘œ๊ฐ’์ด ์š”๋™์น˜๋Š” ํ˜„์ƒ ๋ฐฉ์ง€. - **Deep Neural Network as Function Approximator:** ๊ณ ์ฐจ์›์˜ ์ž…๋ ฅ(์˜ˆ: ๊ฒŒ์ž„ ํ™”๋ฉด ํ”ฝ์…€)์„ ์ง์ ‘ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ. - **์˜์˜:** ์‚ฌ๋žŒ์ด ๊ทœ์น™์„ ๊ฐ€๋ฅด์ณ์ฃผ์ง€ ์•Š์•„๋„ ์‹œ๊ฐ ์ •๋ณด๋งŒ์œผ๋กœ ์Šค์Šค๋กœ ์ „๋žต์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…. ## โš–๏ธ Trade-offs & Caveats - DQN์€ ๊ฐ€์น˜ ๊ธฐ๋ฐ˜(Value-based) ๋ฐฉ์‹์ด๊ธฐ์— ํ–‰๋™ ๊ณต๊ฐ„์ด ์—ฐ์†์ ์ธ(Continuous) ๋ฌธ์ œ์—๋Š” ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค. ๋˜ํ•œ ๊ฐ€์น˜ ๊ฐ’์„ ๊ณผ๋Œ€ํ‰๊ฐ€(Overestimation)ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด, ์ด๋ฅผ ๋ณด์™„ํ•œ Double DQN, Dueling DQN ๋“ฑ์œผ๋กœ ์ง„ํ™”ํ•˜์˜€๋‹ค. --- - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ์ƒํƒœ๊ฐ€ ์กฐ๊ธˆ๋งŒ ๋ณต์žกํ•ด์ ธ๋„ ํ…Œ์ด๋ธ” ํฌ๊ธฐ๊ฐ€ ํญ๋ฐœํ•˜์—ฌ ๋ถˆ๊ฐ€๋Šฅํ–ˆ๋˜ ๊ฐ•ํ™”ํ•™์Šต์„ ํ˜„์‹ค์ ์ธ ์—ฐ์‚ฐ ์˜์—ญ์œผ๋กœ ๊ฐ€์ ธ์˜ด. - **์ •์ฑ… ๋ณ€ํ™”:** Skybound ํ”„๋กœ์ ํŠธ์˜ ๋ณต์žกํ•œ ์  AI ํ–‰๋™ ํŒจํ„ด ํ•™์Šต ์‹œ DQN ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ณธ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, Double DQN์ด๋‚˜ Dueling DQN ๋“ฑ ๊ฐœ์„ ๋œ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•จ. ## ๐Ÿ”— Knowledge Connections - Related: [[Reinforcement Learning (RL)|Reinforcement Learning (RL)]] , [[Bellman-Equation|Bellman-Equation]] - Contrast: Policy Gradient Methods --- - Q-Learning-Foundations, [[Reinforcement-Learning|Reinforcement-Learning]], Deep-Learning-Foundations, [[Experience-Replay|Experience-Replay]] - **Raw Source:** 10_Wiki/Topics/AI/Deep-Q-Networks-DQN.md