--- id: wiki-2026-0508-prisoners-dilemma-models title: Prisoner's Dilemma Models in Game Design category: 10_Wiki/Topics status: verified canonical_id: self aliases: [Prisoners Dilemma, PD Game Design, Cooperation Games] duplicate_of: none source_trust_level: A confidence_score: 0.9 verification_status: applied tags: [game-design, game-theory, multiplayer, cooperation, axelrod] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: python framework: numpy --- # Prisoner's Dilemma Models in Game Design ## 매 한 줄 > **"매 PD model은 매 multiplayer game design의 매 cooperation tension의 매 mathematical core — 매 individual rational choice가 매 collective suboptimal로 leads하는 매 모든 trust mechanic의 base."** Robert Axelrod 'Evolution of Cooperation' (1984)이 매 iterated PD에서 매 'tit-for-tat' winning strategy 증명. 매 game design에서 매 The Resistance / Werewolf social deduction, 매 EVE Online corp wars, 매 Among Us, 매 Trust (Nicky Case 2017 interactive)까지 매 explicit application 광범. 매 2026 시점, 매 Multi-Agent RL (Llama 3 / Claude 3.5)이 매 inter-agent cooperation 학습에 매 PD framework 활용. ## 매 핵심 ### 매 PD payoff matrix - **Standard PD**: T (Temptation, 5) > R (Reward, 3) > P (Punishment, 1) > S (Sucker, 0). - **Constraint**: 2R > T + S — 매 mutual cooperation이 매 alternating defection보다 better. - **One-shot**: 매 rational defect (Nash). 매 iterated: 매 cooperation 가능. ### 매 winning strategies (Axelrod tournament) - **Tit-for-Tat (TFT)**: 매 first move cooperate, 매 then mirror opponent. 매 nice + retaliating + forgiving + non-envious. - **Tit-for-Two-Tats**: 매 noise tolerant — 매 2회 연속 defect 후에야 retaliate. - **Generous TFT**: 매 retaliate 90% of time — 매 forgive 10%. - **Pavlov (Win-Stay, Lose-Shift)**: 매 last round 'win' (R or T)이면 매 same action repeat. ### 매 game design 응용 - **Trust mechanic**: 매 player가 매 다른 player에게 매 currency 맡기면 매 returner는 매 더 많이 받기 가능. EVE Online stockpiling. - **Punishment mechanic**: 매 betrayal에 매 reputation system — 매 public visible defection history. - **Communication tool**: 매 chat / signal로 매 commitment make 가능 — 매 cheap-talk vs costly signal. - **Endgame revelation**: 매 final round 시 매 cooperation 붕괴 (backward induction). ## 💻 패턴 ### IPD simulator ```python import numpy as np from typing import Callable PAYOFF = { ('C', 'C'): (3, 3), ('C', 'D'): (0, 5), ('D', 'C'): (5, 0), ('D', 'D'): (1, 1), } def play(strat_a: Callable, strat_b: Callable, rounds=200, noise=0.0): history_a, history_b = [], [] score_a, score_b = 0, 0 for r in range(rounds): move_a = strat_a(history_a, history_b) move_b = strat_b(history_b, history_a) if np.random.random() < noise: move_a = 'D' if move_a == 'C' else 'C' if np.random.random() < noise: move_b = 'D' if move_b == 'C' else 'C' pa, pb = PAYOFF[(move_a, move_b)] score_a += pa; score_b += pb history_a.append(move_a); history_b.append(move_b) return score_a, score_b ``` ### TFT + variants ```python def tit_for_tat(my_hist, opp_hist): return 'C' if not opp_hist else opp_hist[-1] def tit_for_two_tats(my_hist, opp_hist): if len(opp_hist) < 2: return 'C' return 'D' if opp_hist[-1] == 'D' and opp_hist[-2] == 'D' else 'C' def generous_tft(my_hist, opp_hist): if not opp_hist: return 'C' if opp_hist[-1] == 'D' and np.random.random() < 0.1: return 'C' # 매 forgive return opp_hist[-1] def pavlov(my_hist, opp_hist): if not my_hist: return 'C' last_payoff = PAYOFF[(my_hist[-1], opp_hist[-1])][0] return my_hist[-1] if last_payoff >= 3 else ('D' if my_hist[-1] == 'C' else 'C') ``` ### Reputation system (multiplayer game) ```python class Reputation: def __init__(self): self.scores = {} # player_id → reputation float def record_action(self, player: str, action: str, target: str): delta = +0.1 if action == 'cooperate' else -0.3 self.scores[player] = self.scores.get(player, 0) + delta self.scores[player] = max(-1, min(1, self.scores[player])) def is_trustworthy(self, player: str) -> bool: return self.scores.get(player, 0) > 0.3 ``` ### Costly signal mechanic ```python # 매 player가 매 commitment를 매 escrow로 demonstrate class CostlySignal: def __init__(self): self.escrows = {} def signal_commitment(self, player: str, amount: int): # 매 player가 매 amount를 lock — 매 betray시 매 lose self.escrows[player] = amount def reward_or_punish(self, player: str, betrayed: bool): amt = self.escrows.pop(player, 0) if betrayed: return 0 # 매 escrow 몰수 else: return amt + (amt * 0.5) # 매 50% bonus return ``` ### Endgame anti-defection (finite-game prevention) ```python # 매 final round를 매 hidden — 매 backward induction 차단 class HiddenEndgame: def __init__(self, expected_rounds: int, jitter: int): self.actual_rounds = expected_rounds + np.random.randint(-jitter, jitter+1) def is_final(self, current_round: int) -> bool: return current_round >= self.actual_rounds # 매 player에게 매 actual_rounds 매 공개 안 함 ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Social deduction (Werewolf 식) | 매 information asymmetry + 매 PD | | Persistent MMO | Reputation + costly signal | | Co-op survival (Don't Starve Together) | 매 mutual benefit dominant — 매 PD weak | | Competitive 1v1 | Pure PD only at meta level | | Multi-agent RL | TFT-family baseline | **기본값**: 매 iterated PD with reputation + 매 hidden endgame. 매 one-shot은 매 always defect dominant. ## 🔗 Graph ## 🤖 LLM 활용 **언제**: 매 LLM 두 instance를 매 IPD opponent로 simulate — 매 emergent strategy 분석. **언제 X**: 매 deep human social dynamic — 매 emotion / context는 매 LLM-PD simulation으로 안 잡힘. ## ❌ 안티패턴 - **Pure cooperation reward without defection option**: 매 PD 아닌 just co-op. - **No reputation persistence**: 매 betrayal 후 매 anonymity → 매 cooperation collapse. - **Known finite endgame**: 매 backward induction → 매 always defect. - **No noise tolerance**: 매 single mistake → 매 permanent defection spiral (TFT vs TFT trap). ## 🧪 검증 / 중복 - Verified — Axelrod "Evolution of Cooperation" (1984), Nicky Case "The Evolution of Trust" (2017), 매 Multi-Agent RL papers (DeepMind 'Sequential Social Dilemmas' 2017). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — PD payoff matrix, TFT variants, reputation / costly signal / hidden endgame patterns |