--- id: GAME-THEORY-AI-001 date: 2026-05-07T14:56:00.000Z type: knowledge_artifact standard: P-Reinforce v3.0 tags: [game-theory, ai, multi-agent-systems, reinforcement-learning, strategic-decision-making] --- # [[Game-Theory-in-AI]] ## πŸ“Œ ν•œ 쀄 톡찰 (The Karpathy Summary) > "μƒν˜Έμž‘μš©μ˜ μ§€λŠ₯ν™”: 닀쀑 μ—μ΄μ „νŠΈ ν™˜κ²½μ—μ„œ 각 주체의 졜적 μ „λž΅μ΄ μ„œλ‘œμ—κ²Œ λ―ΈμΉ˜λŠ” 영ν–₯을 μˆ˜ν•™μ μœΌλ‘œ λͺ¨λΈλ§ν•˜μ—¬, λ³΅μž‘ν•œ 경쟁 및 ν˜‘λ ₯ μƒν™©μ—μ„œ 졜적의 κ· ν˜•μ (Equilibrium)을 μ°ΎλŠ” 인곡지λŠ₯ 섀계 원칙." ## πŸ“– κ΅¬μ‘°ν™”λœ 지식 (Synthesized Content) κ²Œμž„ 이둠은 ν˜„λŒ€ 인곡지λŠ₯, 특히 닀쀑 μ—μ΄μ „νŠΈ κ°•ν™”ν•™μŠ΅(MARL)κ³Ό 자율 μ‹œμŠ€ν…œ μ„€κ³„μ—μ„œ μ˜μ‚¬κ²°μ •μ˜ 핡심 논리λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. 1. **핡심 κ°œλ… 및 κ· ν˜•μ **: * **λ‚΄μ‹œ κ· ν˜• (Nash Equilibrium)**: λ‹€λ₯Έ μ—μ΄μ „νŠΈμ˜ μ „λž΅μ΄ κ³ μ •λ˜μ—ˆμ„ λ•Œ, μ–΄λ–€ μ—μ΄μ „νŠΈλ„ μžμ‹ μ˜ μ „λž΅μ„ λ³€κ²½ν•˜μ—¬ 더 높은 이득을 얻을 수 μ—†λŠ” μƒνƒœμž…λ‹ˆλ‹€. AIλŠ” 이 κ· ν˜•μ μ„ ν–₯ν•΄ ν•™μŠ΅ν•˜λ©° μ‹œμŠ€ν…œμ˜ μ•ˆμ •μ„±μ„ ν™•λ³΄ν•©λ‹ˆλ‹€. * **μ œλ‘œμ„¬ vs λΉ„μ œλ‘œμ„¬ κ²Œμž„**: μžμ›μ΄ ν•œμ •λœ 경쟁 상황(Zero-sum)κ³Ό ν˜‘λ ₯을 톡해 κ³΅λ™μ˜ 이읡을 μ°½μΆœν•  수 μžˆλŠ” 상황(Non-zero-sum)을 κ΅¬λΆ„ν•˜μ—¬ 보상 ν•¨μˆ˜(Reward Function)λ₯Ό μ„€κ³„ν•©λ‹ˆλ‹€. 2. **AI λΆ„μ•Όμ˜ μ‘μš©**: * **Multi-Agent Reinforcement Learning (MARL)**: μ—¬λŸ¬ AI μ—μ΄μ „νŠΈκ°€ λ™μ‹œμ— ν•™μŠ΅ν•˜λŠ” ν™˜κ²½μ—μ„œ μƒν˜Έ κ°„μ˜ 간섭을 κ²Œμž„ 이둠적으둜 ν•΄κ²°ν•©λ‹ˆλ‹€. * **λ©”μ»€λ‹ˆμ¦˜ λ””μžμΈ (Mechanism Design)**: μ—μ΄μ „νŠΈλ“€μ΄ μžμ‹ μ˜ 이읡을 μœ„ν•΄ ν–‰λ™ν•˜λ”λΌλ„ 전체 μ‹œμŠ€ν…œμ΄ μ›ν•˜λŠ” λ°©ν–₯(예: μžμ› νš¨μœ¨μ„±)으둜 μœ λ„λ˜λ„λ‘ κ·œμΉ™κ³Ό 보상 ꡬ쑰λ₯Ό μ„€κ³„ν•©λ‹ˆλ‹€. 3. **μ‹€μ „ 사둀**: * **AlphaGo & Poker AI**: λΆˆμ™„μ „ 정보 κ²Œμž„(Poker)μ΄λ‚˜ λ³΅μž‘ν•œ μƒνƒœ 곡간(Go)μ—μ„œ μƒλŒ€μ˜ μ „λž΅μ„ μ˜ˆμΈ‘ν•˜κ³  졜적의 λŒ€μ‘μˆ˜λ₯Ό κ³„μ‚°ν•˜λŠ” 데 ν•„μˆ˜μ μœΌλ‘œ μ‚¬μš©λ©λ‹ˆλ‹€. * **μžμœ¨μ£Όν–‰ 및 λ“œλ‘  μŠ€μ›œ**: μ—¬λŸ¬ 자율 μ£Όν–‰ μ°¨λŸ‰μ΄ κ΅μ°¨λ‘œμ—μ„œ 좩돌 없이 효율적으둜 ν†΅ν–‰ν•˜κΈ° μœ„ν•œ μ „λž΅μ  μƒν˜Έμž‘μš© λͺ¨λΈλ§μ— ν™œμš©λ©λ‹ˆλ‹€. ## βš–οΈ Trade-offs & Caveats * **계산 λ³΅μž‘λ„**: μ—μ΄μ „νŠΈ μˆ˜κ°€ μ¦κ°€ν• μˆ˜λ‘ λ‚΄μ‹œ κ· ν˜•μ„ μ°ΎλŠ” κ³„μ‚°λŸ‰μ΄ κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ λŠ˜μ–΄λ‚˜λŠ” 'μ°¨μ›μ˜ μ €μ£Ό' λ¬Έμ œκ°€ λ°œμƒν•©λ‹ˆλ‹€. * **합리성 κ°€μ •μ˜ ν•œκ³„**: ν˜„μ‹€μ˜ μ—μ΄μ „νŠΈ(인간 포함)λŠ” 항상 μ™„λ²½ν•˜κ²Œ ν•©λ¦¬μ μœΌλ‘œ ν–‰λ™ν•˜μ§€ μ•ŠμœΌλ―€λ‘œ, μ œν•œλœ 합리성(Bounded Rationality)을 κ³ λ €ν•œ λͺ¨λΈλ§μ΄ ν•„μš”ν•©λ‹ˆλ‹€. ## πŸ”— 지식 μ—°κ²° (Graph) - **Related Topics**: [[Reinforcement Learning (RL)]], [[Multi-Agent-Systems-MAS]], [[Bounded Rationality]], [[Algorithmic-Game-Theory]] - **Applications**: [[Autonomous Vehicles]], [[Artificial-Intelligence-in-Games]] --- *Last updated: 2026-05-07*