--- id: DQN-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [reinforcement-learning, ai, dqn, q-learning, deep-learning] last_reinforced: 2026-04-26 --- # [[Deep Q-Networks (DQN)]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "κ°•ν™”ν•™μŠ΅μ˜ μ˜μ‚¬κ²°μ • ν…Œμ΄λΈ”μ„ κ±°λŒ€ μ‹ κ²½λ§μœΌλ‘œ λŒ€μ²΄ν•˜μ—¬ λ¬΄ν•œν•œ λ³΅μž‘μ„±μ— λ„μ „ν•˜λΌ" β€” 고전적 Q-Learning의 ν…Œμ΄λΈ” 방식 ν•œκ³„λ₯Ό λ”₯λŸ¬λ‹μœΌλ‘œ κ·Ήλ³΅ν•˜μ—¬, 아타리 κ²Œμž„μ„ 인간 μˆ˜μ€€μœΌλ‘œ μ •λ³΅ν•˜λ©° 심측 κ°•ν™”ν•™μŠ΅(Deep RL)의 μ‹œλŒ€λ₯Ό μ—° λͺ¨λΈ. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** μƒνƒœ(State)λ₯Ό μž…λ ₯λ°›μ•„ 각 행동(Action)의 κ°€μΉ˜(Q-value)λ₯Ό μ˜ˆμΈ‘ν•˜λŠ” ν•¨μˆ˜λ₯Ό μ‹ κ²½λ§μœΌλ‘œ κ·Όμ‚¬ν•˜κ³ , κ²½ν—˜ μž¬ν”Œλ ˆμ΄μ™€ νƒ€κ²Ÿ λ„€νŠΈμ›Œν¬λ₯Ό 톡해 ν•™μŠ΅μ„ μ•ˆμ •ν™”ν•˜λŠ” νŒ¨ν„΄. - **핡심 기술:** - **Experience Replay:** μ—μ΄μ „νŠΈμ˜ κ²½ν—˜($s, a, r, s'$)을 λ©”λͺ¨λ¦¬μ— μ €μž₯ν•˜κ³  λ¬΄μž‘μœ„λ‘œ μΆ”μΆœν•˜μ—¬ ν•™μŠ΅ν•¨μœΌλ‘œμ¨ 데이터 κ°„ 상관관계λ₯Ό 끊고 ν•™μŠ΅ 효율 μ¦λŒ€. - **Target Network:** κ°€μΉ˜ κ³„μ‚°μš© λ„€νŠΈμ›Œν¬λ₯Ό λ³„λ„λ‘œ λΆ„λ¦¬ν•˜μ—¬ ν•™μŠ΅ 쀑 λͺ©ν‘œκ°’이 μš”λ™μΉ˜λŠ” ν˜„μƒ λ°©μ§€. - **Deep Neural Network as Function Approximator:** κ³ μ°¨μ›μ˜ μž…λ ₯(예: κ²Œμž„ ν™”λ©΄ ν”½μ…€)을 직접 처리 κ°€λŠ₯ν•˜κ²Œ 함. - **의의:** μ‚¬λžŒμ΄ κ·œμΉ™μ„ κ°€λ₯΄μ³μ£Όμ§€ μ•Šμ•„λ„ μ‹œκ° μ •λ³΄λ§ŒμœΌλ‘œ 슀슀둜 μ „λž΅μ„ ν•™μŠ΅ν•  수 μžˆμŒμ„ 증λͺ…. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** μƒνƒœκ°€ 쑰금만 λ³΅μž‘ν•΄μ Έλ„ ν…Œμ΄λΈ” 크기가 ν­λ°œν•˜μ—¬ λΆˆκ°€λŠ₯ν–ˆλ˜ κ°•ν™”ν•™μŠ΅μ„ ν˜„μ‹€μ μΈ μ—°μ‚° μ˜μ—­μœΌλ‘œ κ°€μ Έμ˜΄. - **μ •μ±… λ³€ν™”:** Skybound ν”„λ‘œμ νŠΈμ˜ λ³΅μž‘ν•œ 적 AI 행동 νŒ¨ν„΄ ν•™μŠ΅ μ‹œ DQN μ•„ν‚€ν…μ²˜λ₯Ό κΈ°λ³Έ λͺ¨λΈλ‘œ μ‚¬μš©ν•˜λ©°, Double DQNμ΄λ‚˜ Dueling DQN λ“± κ°œμ„ λœ 기법을 μ μš©ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - Q-Learning-Foundations, [[Reinforcement-Learning]], Deep-Learning-Foundations, [[Experience-Replay]] - **Raw Source:** 10_Wiki/Topics/AI/Deep-Q-Networks-DQN.md