--- id: RL-MARL-001 category: "10_Wiki/๐Ÿ’ก Topics/AI" confidence_score: 1.0 tags: [ai, reinforcement-learning, multi-agent, marl, game-theory, coordination] last_reinforced: 2026-04-26 --- # Multi-Agent Reinforcement Learning (MARL, ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ฐ•ํ™”ํ•™์Šต) ## ๐Ÿ“Œ ํ•œ ์ค„ ํ†ต์ฐฐ (The Karpathy Summary) > "๊ฐœ๋ณ„ ์—์ด์ „ํŠธ์˜ ์ด๊ธฐ์‹ฌ์„ ๋„˜์–ด ์ง‘๋‹จ์˜ ํ•˜๋ชจ๋‹ˆ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , ์ƒํ˜ธ์ž‘์šฉ์˜ ์—ญ๋™์„ฑ ์†์—์„œ ์ฐฝ๋ฐœ์  ์ง€๋Šฅ์„ ๋ฐœํ˜„ํ•˜๋ผ" โ€” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ์ ์ธ ํ•™์Šต ์ฃผ์ฒด(Agents)๊ฐ€ ๋™์ผํ•œ ํ™˜๊ฒฝ์—์„œ ๋™์‹œ์— ํ•™์Šตํ•˜๋ฉฐ ์„œ๋กœ ํ˜‘๋ ฅํ•˜๊ฑฐ๋‚˜ ๊ฒฝ์Ÿํ•˜์—ฌ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต ์ฒด๊ณ„. ## ๐Ÿ“– ๊ตฌ์กฐํ™”๋œ ์ง€์‹ (Synthesized Content) - **์ถ”์ถœ๋œ ํŒจํ„ด:** "Co-evolution and Joint Strategy" โ€” ํ•œ ์—์ด์ „ํŠธ์˜ ํ–‰๋™์ด ๋‹ค๋ฅธ ์—์ด์ „ํŠธ์˜ ๋ณด์ƒ๊ณผ ํ™˜๊ฒฝ์„ ๋ณ€ํ™”์‹œํ‚ค๋Š” '๋น„์ •์ ์ธ(Non-stationary)' ํ™˜๊ฒฝ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ƒ๋Œ€์˜ ํ–‰๋™์„ ์˜ˆ์ธกํ•˜๊ณ  ๊ณต๋™์˜ ๋ชฉํ‘œ๋‚˜ ๋‚ด์‹œ ๊ท ํ˜•(Nash Equilibrium)์„ ์ฐพ์•„๊ฐ€๋Š” ์ง„ํ™”์  ํ•™์Šต ํŒจํ„ด. - **ํ•ต์‹ฌ ์•„ํ‚คํ…์ฒ˜:** - **Independent Learning:** ๊ฐ ์—์ด์ „ํŠธ๊ฐ€ ํƒ€์ธ์„ ํ™˜๊ฒฝ์˜ ์ผ๋ถ€๋กœ ๋ณด๊ณ  ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต. - **Centralized Training, Decentralized Execution (CTDE):** ํ•™์Šต์€ ์ค‘์•™์—์„œ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๋ชจ์•„ ์ •๊ตํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•˜๊ณ , ์‹คํ–‰์€ ๊ฐ์ž ๋…๋ฆฝ์ ์ธ ๋„คํŠธ์›Œํฌ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ํ˜„๋Œ€์  ํ‘œ์ค€ ๋ฐฉ์‹. - **์˜์˜:** ๊ตฐ์ง‘ ๋กœ๋ด‡ ์ œ์–ด, ์ž์œจ์ฃผํ–‰ ์ฐจ๋Ÿ‰ ๊ฐ„ ํ†ต์‹ , ์ฃผ์‹ ์‹œ์žฅ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋“ฑ ์‹ค์„ธ๊ณ„์˜ ๋ณต์žกํ•œ ๋‹ค์ž๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•„์ˆ˜ ๊ธฐ์ˆ . ## โš ๏ธ ๋ชจ์ˆœ ๋ฐ ์—…๋ฐ์ดํŠธ (Contradictions & RL Update) - **๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์™€์˜ ์ถฉ๋Œ:** ์—์ด์ „ํŠธ ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก ํƒ์ƒ‰ ๊ณต๊ฐ„์ด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ํญ์ฆํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ตœ๊ทผ์—๋Š” ์—์ด์ „ํŠธ ๊ฐ„์˜ '์†Œํ†ต(Communication)' ํ”„๋กœํ† ์ฝœ์„ ์Šค์Šค๋กœ ํ•™์Šตํ•˜๊ฒŒ ํ•˜๊ฑฐ๋‚˜ ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง(GNN)์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ™”. - **์ •์ฑ… ๋ณ€ํ™”:** Antigravity ํ”„๋กœ์ ํŠธ๋Š” ์—ฌ๋Ÿฌ ํ˜‘์—… AI(Planning Agent, Coding Agent, Review Agent ๋“ฑ)๊ฐ€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ํ•˜๋‚˜์˜ ์ž‘์—…์„ ์™„๋ฃŒํ•  ๋•Œ, ์ตœ์ ์˜ ํ˜‘์—… ํšจ์œจ์„ ๋„์ถœํ•˜๊ธฐ ์œ„ํ•ด MARL ๊ธฐ๋ฐ˜์˜ ์›Œํฌํ”Œ๋กœ์šฐ ์ตœ์ ํ™”๋ฅผ ์—ฐ๊ตฌํ•จ. ## ๐Ÿ”— ์ง€์‹ ์—ฐ๊ฒฐ (Graph) - [[Reinforcement-Learning|Reinforcement-Learning]], [[Game-Theory|Game-Theory]], [[Monte-Carlo-Tree-Search-MCTS|Monte-Carlo-Tree-Search-MCTS]], Graph-Neural-Networks-GNN - **Raw Source:** 10_Wiki/Topics/AI/Multi-Agent-Reinforcement-Learning.md