--- id: RL-REPLAY-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [reinforcement-learning, ai, experience-replay, dqn, stable-learning] last_reinforced: 2026-04-26 --- # Experience Replay (๊ฒฝํ—˜ ์žฌํ”Œ๋ ˆ์ด) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๊ณผ๊ฑฐ์˜ ๊ฒฝํ—˜์„ ๋ง๊ฐ ์†์— ๋ฒ„๋ฆฌ์ง€ ๋ง๊ณ , ๋ฌด์ž‘์œ„๋กœ ๊บผ๋‚ด์–ด ํ˜„์žฌ์˜ ์ง€๋Šฅ์„ ๋‹ค์ ธ๋ผ" โ€” ์—์ด์ „ํŠธ๊ฐ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ์–ป์€ ๊ฒฝํ—˜ ๋ฐ์ดํ„ฐ($s, a, r, s'$)๋ฅผ ๋ฒ„ํผ์— ์ €์žฅํ•˜๊ณ , ํ•™์Šต ์‹œ ์ด๋“ค์„ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋Š๊ณ  ํšจ์œจ์„ ๋†’์ด๋Š” ๊ธฐ๋ฒ•. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** ์‹ค์‹œ๊ฐ„์œผ๋กœ ์œ ์ž…๋˜๋Š” ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ๊ฐ•ํ•œ ์‹œ๊ฐ„์  ์ƒ๊ด€๊ด€๊ณ„(Correlation)๋ฅผ ๋ฌด์ž‘์œ„ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ํŒŒ๊ดดํ•จ์œผ๋กœ์จ, ๋ชจ๋ธ์ด ํŠน์ • ์ƒํ™ฉ์— ํŽธํ–ฅ๋˜๊ฑฐ๋‚˜ ๋ฐœ์‚ฐํ•˜๋Š” ๊ฒƒ์„ ๋ง‰๋Š” ํ•™์Šต ์•ˆ์ •ํ™” ํŒจํ„ด. - **์ฃผ์š” ํšจ๊ณผ:** - **Reduced Correlation:** ์—ฐ์†๋œ ์ƒ˜ํ”Œ๋“ค์ด ์„œ๋กœ ๋น„์Šทํ•˜์—ฌ ์ƒ๊ธฐ๋Š” ํ•™์Šต์˜ ๋น„ํšจ์œจ์„ฑ ํ•ด๊ฒฐ. - **Data Efficiency:** ํ•œ ๋ฒˆ์˜ ๊ฒฝํ—˜์„ ์—ฌ๋Ÿฌ ๋ฒˆ ํ•™์Šต์— ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๊ฐ€์น˜ ๊ทน๋Œ€ํ™”. - **Stability:** ํ•™์Šต์˜ ๋ถ„์‚ฐ์„ ๋‚ฎ์ถ”์–ด ์‹ ๊ฒฝ๋ง์ด ๋” ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜๋ ดํ•˜๋„๋ก ๋„์›€. - **๊ณ ๊ธ‰ ๊ธฐ๋ฒ•:** - **Prioritized Experience Replay (PER):** ํ•™์Šต์— ๋” ๋„์›€์ด ๋  ๊ฒƒ ๊ฐ™์€(์˜ค์ฐจ๊ฐ€ ํฐ) ์ค‘์š”ํ•œ ๊ฒฝํ—˜์„ ๋” ์ž์ฃผ ์ƒ˜ํ”Œ๋ง. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ์ฆ‰๊ฐ์ ์ธ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ฐ€ ์ตœ์„ ์ด๋ผ๋Š” ๊ณ ์ •๊ด€๋…์—์„œ ๋ฒ—์–ด๋‚˜, ๋ฐ์ดํ„ฐ๋ฅผ '์ถ•์ 'ํ•˜๊ณ  '์žฌ๋ฐฐ์น˜'ํ•˜๋Š” ๊ณผ์ •์ด ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ์งˆ์„ ๊ฒฐ์ •ํ•จ์„ ์ฆ๋ช…. - **์ •์ฑ… ๋ณ€ํ™”:** Skybound ํ”„๋กœ์ ํŠธ์˜ ์  ๊ธฐ์ฒด AI๋Š” ํ”Œ๋ ˆ์ด์–ด์™€์˜ ๊ต์ „ ์ด๋ ฅ์„ Replay Buffer์— ์ €์žฅํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ํ”Œ๋ ˆ์ด์–ด์˜ ์ „์ˆ ์— ๋ฒ”์šฉ์ ์œผ๋กœ ๋Œ€์‘ํ•˜๋Š” ๊ฐ•๊ฑดํ•œ ์ •์ฑ…์„ ๊ตฌ์ถ•ํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Deep-Q-Networks-DQN]], [[Reinforcement-Learning]], Q-Learning-Foundations, Neural-Networks-Foundations - **Raw Source:** 10_Wiki/Topics/AI/Experience-Replay.md