--- id: [[P-Reinforce|P-Reinforce]]-AUTO-4BB54E category: Unified confidence_score: 0.98 tags: [AlphaGo, MCTS, Reinforcement Learning, Simulation, [[Robotics|Robotics]]] last_reinforced: 2026-04-20 github_commit: "[P-Reinforce] Substantial content added to AI Simulation Bundle." --- # [[AlphaGo (Monte Carlo Tree Search RL)] [Autonomous Driving Simulation] [Robotic Manipulation|AlphaGo (Monte Carlo Tree Search + RL)], [Autonomous Driving Simulation], [Robotic Manipulation]] ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > ๋ณต์žกํ•œ ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ๋Š” '๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜'๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, '์Šน๋ฆฌ(์„ฑ๊ณต) ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ๊ฒฝ๋กœ'๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ํƒ์ƒ‰ํ•˜๊ณ  ๊ทธ ๊ฒฝํ—˜์„ ์‹ ๊ฒฝ๋ง(RL)์— ๋‚ด์žฌํ™”ํ•˜๋Š” ๊ณผ์ •์ด๋‹ค. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **AlphaGo (MCTS + RL)์˜ ์ •์ˆ˜**: - **Monte Carlo Tree Search (MCTS)**: ๋ฌด์ž‘์œ„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์œ ๋งํ•œ ์ˆ˜(Node)๋ฅผ ํ™•์žฅํ•˜๊ณ  ํ†ต๊ณ„์ ์œผ๋กœ ์ตœ์ ์˜ ์ˆ˜๋ฅผ ์ฐพ๋Š”๋‹ค. - **Reinforcement Learning (๊ฐ•ํ™” ํ•™์Šต)**: ์ž๊ฐ€ ๋Œ€๊ตญ(Self-play)์„ ํ†ตํ•ด ์ •์ฑ…๋ง(Policy Network)๊ณผ ๊ฐ€์น˜๋ง(Value Network)์„ ๊ณ ๋„ํ™”ํ•˜์—ฌ, ์ธ๊ฐ„์˜ ๊ธฐ๋ณด๋ฅผ ๋›ฐ์–ด๋„˜๋Š” ์ง๊ด€์„ ํ˜•์„ฑํ•œ๋‹ค. - **์ž์œจ์ฃผํ–‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Autonomous Driving Simulation)**: - ํ˜„์‹ค์—์„œ์˜ ์‚ฌ๊ณ ๋Š” ์น˜๋ช…์ ์ด๋‹ค. ๋””์ง€ํ„ธ ํŠธ์œˆ ํ™˜๊ฒฝ์—์„œ ์ˆ˜๋ฐฑ๋งŒ ๋งˆ์ผ์˜ ๊ฐ€์ƒ ์ฃผํ–‰์„ ํ†ตํ•ด ์ฝ”๋„ˆ ์ผ€์ด์Šค(Edge Cases)๋ฅผ ํ•™์Šต์‹œํ‚ค๊ณ , ์ด๋ฅผ ํ˜„์‹ค ์„ธ๊ณ„์˜ ์ œ์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ด์‹(Sim-to-Real)ํ•œ๋‹ค. - **๋กœ๋ด‡ ์กฐ์ž‘ (Robotic Manipulation)**: - ๋ฌผ์ฒด์˜ ๋งˆ์ฐฐ๋ ฅ, ์ค‘๋ ฅ, ์ด‰๊ฐ์„ ๋ฌผ๋ฆฌ ์—”์ง„ ๋‚ด์—์„œ ๋ฌผ๋ฆฌ ๋ฒ•์น™์œผ๋กœ ๊ตฌํ˜„ํ•˜๊ณ , ๊ฐ•ํ™” ํ•™์Šต์„ ํ†ตํ•ด ๋กœ๋ด‡ ํŒ”์ด ์ •๊ตํ•œ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ›ˆ๋ จ์‹œํ‚จ๋‹ค. ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (RL Update) - ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ์ •๊ตํ• ์ˆ˜๋ก ์ข‹์ง€๋งŒ, ํ˜„์‹ค๊ณผ์˜ ๊ดด๋ฆฌ์ธ 'Reality Gap'์ด ํ•ญ์ƒ ์กด์žฌํ•œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Domain Randomization(์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์— ๋ฌด์ž‘์œ„ ๋ณ€๋™์„ ์ฃผ์–ด ๊ฐ•๊ฑดํ•จ์„ ํ™•๋ณด) ๊ธฐ๋ฒ•์ด ํ•„์ˆ˜์ ์œผ๋กœ ๋™๋ฐ˜๋˜์–ด์•ผ ํ•œ๋‹ค. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - Related: [[Digital Twins|Digital Twins]] , Reinforcement Learning , [[Systemic_Simulation_Principles|Systemic_Simulation_Principles]] - Foundation: [[Information Theory|Information Theory]]