--- id: RL-PER-001 category: "10_Wiki/πŸ’‘ Topics/AI" confidence_score: 1.0 tags: [ai, reinforcement-learning, prioritized-experience-replay, per, dqn, learning-efficiency] last_reinforced: 2026-04-26 --- # Prioritized Experience Replay (μš°μ„ μˆœμœ„ κ²½ν—˜ μž¬μƒ) ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "λͺ¨λ“  κ³Όκ±°λ₯Ό κ³΅ν‰ν•˜κ²Œ κΈ°μ–΅ν•˜μ§€ 말고, μ˜ˆμƒ λ°–μ˜ '좩격적 κ²½ν—˜(TD Error)'을 더 자주 λ³΅κΈ°ν•˜μ—¬ ν•™μŠ΅μ˜ 가속도λ₯Ό 높여라" β€” κ°•ν™”ν•™μŠ΅ μ—μ΄μ „νŠΈμ˜ κ²½ν—˜ μ €μž₯μ†Œ(Replay Buffer)μ—μ„œ ν•™μŠ΅ 효율이 높은 μ€‘μš”ν•œ μƒ˜ν”Œμ— κ°€μ€‘μΉ˜λ₯Ό 두어 μš°μ„ μ μœΌλ‘œ μƒ˜ν”Œλ§ν•˜λŠ” 기법. ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) - **μΆ”μΆœλœ νŒ¨ν„΄:** "Learning from Surprise and Weighted Importance Sampling" β€” ν˜„μž¬ λͺ¨λΈμ΄ μ˜ˆμΈ‘ν•œ κ°€μΉ˜μ™€ μ‹€μ œ κ²°κ³Ό μ‚¬μ΄μ˜ 차이(TD Error)κ°€ 큰 μƒ˜ν”ŒμΌμˆ˜λ‘ '아직 배울 것이 λ§Žλ‹€'κ³  νŒλ‹¨ν•˜μ—¬, ν•΄λ‹Ή 데이터λ₯Ό 더 자주 ν•™μŠ΅μ— ν™œμš©ν•¨μœΌλ‘œμ¨ 수렴 속도λ₯Ό λΉ„μ•½μ μœΌλ‘œ ν–₯μƒμ‹œν‚€λŠ” νŒ¨ν„΄. - **핡심 λ©”μ»€λ‹ˆμ¦˜:** - **Priority ($p_i$):** TD Error에 λΉ„λ‘€ν•˜μ—¬ μ‚°μ •. - **Sampling Probability:** μš°μ„ μˆœμœ„μ— λ”°λ₯Έ ν™•λ₯  뢄포 생성. - **Importance Sampling Weights:** μš°μ„ μˆœμœ„ μƒ˜ν”Œλ§μœΌλ‘œ μΈν•œ 데이터 편ν–₯을 μˆ˜ν•™μ μœΌλ‘œ λ³΄μ •ν•˜μ—¬ ν•™μŠ΅ μ•ˆμ •μ„± μœ μ§€. - **의의:** λ¬΄μž‘μœ„ μƒ˜ν”Œλ§(Uniform Sampling)보닀 훨씬 적은 κ²½ν—˜ λ°μ΄ν„°λ‘œλ„ λ³΅μž‘ν•œ μž‘μ—…μ„ λΉ λ₯΄κ²Œ λ§ˆμŠ€ν„°ν•˜κ²Œ ν•˜λ©°, λ“œλ¬Έ 보상(Sparse Reward) ν™˜κ²½μ—μ„œ 결정적인 역할을 μˆ˜ν–‰ν•¨. ## ⚠️ λͺ¨μˆœ 및 μ—…λ°μ΄νŠΈ (Contradictions & RL Update) - **κ³Όκ±° λ°μ΄ν„°μ™€μ˜ 좩돌:** 초기 DQN의 λ¬΄μž‘μœ„ μž¬μƒ λ°©μ‹μ—μ„œ λ²—μ–΄λ‚˜, μ΄μ œλŠ” λ°μ΄ν„°μ˜ '질적 κ°€μΉ˜'λ₯Ό ν‰κ°€ν•˜μ—¬ ν•™μŠ΅μ— λ°˜μ˜ν•˜λŠ” μ§€λŠ₯적 데이터 선별 방식이 ν˜„λŒ€ κ°•ν™”ν•™μŠ΅μ˜ μ •μ„μœΌλ‘œ 자리 작음. - **μ •μ±… λ³€ν™”:** Antigravity ν”„λ‘œμ νŠΈλŠ” μ—μ΄μ „νŠΈμ˜ μ˜ˆμ™Έ 상황 처리 λŠ₯λ ₯을 ν‚€μšΈ λ•Œ, 과거의 μ‹€νŒ¨ 사둀 쀑 λͺ¨λΈμ˜ 예츑 μ˜€μ°¨κ°€ κ°€μž₯ 컸던 지점듀을 μš°μ„ μ μœΌλ‘œ μž¬ν•™μŠ΅μ‹œν‚€λŠ” PER μ „λž΅μ„ μ μš©ν•¨. ## πŸ”— 지식 μ—°κ²° (Graph) - Experience-Replay-Strategies, [[Off-policy-vs-On-policy-Learning]], [[Deep-Q-Networks-DQN]], [[Reinforcement-Learning]] - **Raw Source:** 10_Wiki/Topics/AI/Prioritized-Experience-Replay.md