"매 PD model은 매 multiplayer game design의 매 cooperation tension의 매 mathematical core — 매 individual rational choice가 매 collective suboptimal로 leads하는 매 모든 trust mechanic의 base." Robert Axelrod 'Evolution of Cooperation' (1984)이 매 iterated PD에서 매 'tit-for-tat' winning strategy 증명. 매 game design에서 매 The Resistance / Werewolf social deduction, 매 EVE Online corp wars, 매 Among Us, 매 Trust (Nicky Case 2017 interactive)까지 매 explicit application 광범. 매 2026 시점, 매 Multi-Agent RL (Llama 3 / Claude 3.5)이 매 inter-agent cooperation 학습에 매 PD framework 활용.
매 핵심
매 PD payoff matrix
Standard PD: T (Temptation, 5) > R (Reward, 3) > P (Punishment, 1) > S (Sucker, 0).
Constraint: 2R > T + S — 매 mutual cooperation이 매 alternating defection보다 better.
One-shot: 매 rational defect (Nash). 매 iterated: 매 cooperation 가능.
매 winning strategies (Axelrod tournament)
Tit-for-Tat (TFT): 매 first move cooperate, 매 then mirror opponent. 매 nice + retaliating + forgiving + non-envious.
Tit-for-Two-Tats: 매 noise tolerant — 매 2회 연속 defect 후에야 retaliate.
Generous TFT: 매 retaliate 90% of time — 매 forgive 10%.
Pavlov (Win-Stay, Lose-Shift): 매 last round 'win' (R or T)이면 매 same action repeat.
매 game design 응용
Trust mechanic: 매 player가 매 다른 player에게 매 currency 맡기면 매 returner는 매 더 많이 받기 가능. EVE Online stockpiling.
Punishment mechanic: 매 betrayal에 매 reputation system — 매 public visible defection history.
Communication tool: 매 chat / signal로 매 commitment make 가능 — 매 cheap-talk vs costly signal.
Endgame revelation: 매 final round 시 매 cooperation 붕괴 (backward induction).
deftit_for_tat(my_hist,opp_hist):return'C'ifnotopp_histelseopp_hist[-1]deftit_for_two_tats(my_hist,opp_hist):iflen(opp_hist)<2:return'C'return'D'ifopp_hist[-1]=='D'andopp_hist[-2]=='D'else'C'defgenerous_tft(my_hist,opp_hist):ifnotopp_hist:return'C'ifopp_hist[-1]=='D'andnp.random.random()<0.1:return'C'# 매 forgivereturnopp_hist[-1]defpavlov(my_hist,opp_hist):ifnotmy_hist:return'C'last_payoff=PAYOFF[(my_hist[-1],opp_hist[-1])][0]returnmy_hist[-1]iflast_payoff>=3else('D'ifmy_hist[-1]=='C'else'C')
# 매 player가 매 commitment를 매 escrow로 demonstrateclassCostlySignal:def__init__(self):self.escrows={}defsignal_commitment(self,player:str,amount:int):# 매 player가 매 amount를 lock — 매 betray시 매 loseself.escrows[player]=amountdefreward_or_punish(self,player:str,betrayed:bool):amt=self.escrows.pop(player,0)ifbetrayed:return0# 매 escrow 몰수else:returnamt+(amt*0.5)# 매 50% bonus return
Endgame anti-defection (finite-game prevention)
# 매 final round를 매 hidden — 매 backward induction 차단classHiddenEndgame:def__init__(self,expected_rounds:int,jitter:int):self.actual_rounds=expected_rounds+np.random.randint(-jitter,jitter+1)defis_final(self,current_round:int)->bool:returncurrent_round>=self.actual_rounds# 매 player에게 매 actual_rounds 매 공개 안 함
매 결정 기준
상황
Approach
Social deduction (Werewolf 식)
매 information asymmetry + 매 PD
Persistent MMO
Reputation + costly signal
Co-op survival (Don't Starve Together)
매 mutual benefit dominant — 매 PD weak
Competitive 1v1
Pure PD only at meta level
Multi-agent RL
TFT-family baseline
기본값: 매 iterated PD with reputation + 매 hidden endgame. 매 one-shot은 매 always defect dominant.
🔗 Graph
🤖 LLM 활용
언제: 매 LLM 두 instance를 매 IPD opponent로 simulate — 매 emergent strategy 분석.
언제 X: 매 deep human social dynamic — 매 emotion / context는 매 LLM-PD simulation으로 안 잡힘.
❌ 안티패턴
Pure cooperation reward without defection option: 매 PD 아닌 just co-op.
No reputation persistence: 매 betrayal 후 매 anonymity → 매 cooperation collapse.
Known finite endgame: 매 backward induction → 매 always defect.
No noise tolerance: 매 single mistake → 매 permanent defection spiral (TFT vs TFT trap).
🧪 검증 / 중복
Verified — Axelrod "Evolution of Cooperation" (1984), Nicky Case "The Evolution of Trust" (2017), 매 Multi-Agent RL papers (DeepMind 'Sequential Social Dilemmas' 2017).