[G1-Sync] Manual knowledge update

2026-05-10 22:08:15 +09:00
parent 21ac3ed255
commit 504fd5fb42
3011 changed files with 380280 additions and 206977 deletions
@@ -1,78 +1,174 @@
 ---
 id: wiki-2026-0508-prisoners-dilemma-models
-title: Prisoners Dilemma Models
-category: 10_Wiki/Topics_GD
+title: Prisoner's Dilemma Models in Game Design
+category: 10_Wiki/Topics
 status: verified
 canonical_id: self
-aliases: []
+aliases: [Prisoners Dilemma, PD Game Design, Cooperation Games]
 duplicate_of: none
 source_trust_level: A
-confidence_score: 0.92
-tags: [uncategorized]
+confidence_score: 0.9
+verification_status: applied
+tags: [game-design, game-theory, multiplayer, cooperation, axelrod]
 raw_sources: []
-last_reinforced: 2026-05-08
+last_reinforced: 2026-05-10
 github_commit: pending
-inferred_by: Claude Opus 4.7 (auto-normalize 2026-05-08)
+tech_stack:
+  language: python
+  framework: numpy
 ---

---
-redirect_to: "[[게임_디자인_및_가상_경제_시스템]]"
-canonical_id: "wiki-2026-0507-105"
---
+# Prisoner's Dilemma Models in Game Design

-# Redirect
+## 매 한 줄
+> **"매 PD model은 매 multiplayer game design의 매 cooperation tension의 매 mathematical core — 매 individual rational choice가 매 collective suboptimal로 leads하는 매 모든 trust mechanic의 base."** Robert Axelrod 'Evolution of Cooperation' (1984)이 매 iterated PD에서 매 'tit-for-tat' winning strategy 증명. 매 game design에서 매 The Resistance / Werewolf social deduction, 매 EVE Online corp wars, 매 Among Us, 매 Trust (Nicky Case 2017 interactive)까지 매 explicit application 광범. 매 2026 시점, 매 Multi-Agent RL (Llama 3 / Claude 3.5)이 매 inter-agent cooperation 학습에 매 PD framework 활용.

-이 문서는 Canonical 문서인  통합되었습니다.
-모든 최신 지식과 세부 내용은 위 링크를 참조하십시오.
-## 📌 한 줄 통찰 (The Karpathy Summary)
+## 매 핵심

-> 죄수의 딜레마는 협력·배신의 게임이론적 기본 모델로, 길드·동맹·PvP의 협력 시스템 디자인에 직접 응용된다.
+### 매 PD payoff matrix
+- **Standard PD**: T (Temptation, 5) > R (Reward, 3) > P (Punishment, 1) > S (Sucker, 0).
+- **Constraint**: 2R > T + S — 매 mutual cooperation이 매 alternating defection보다 better.
+- **One-shot**: 매 rational defect (Nash). 매 iterated: 매 cooperation 가능.

-## 📖 구조화된 지식 (Synthesized Content)
+### 매 winning strategies (Axelrod tournament)
+- **Tit-for-Tat (TFT)**: 매 first move cooperate, 매 then mirror opponent. 매 nice + retaliating + forgiving + non-envious.
+- **Tit-for-Two-Tats**: 매 noise tolerant — 매 2회 연속 defect 후에야 retaliate.
+- **Generous TFT**: 매 retaliate 90% of time — 매 forgive 10%.
+- **Pavlov (Win-Stay, Lose-Shift)**: 매 last round 'win' (R or T)이면 매 same action repeat.

-**추출된 패턴:** 1회 게임에선 배신이 우세, 반복 게임에선 협력이 진화 — 반복성·평판이 협력 동력.
+### 매 game design 응용
+- **Trust mechanic**: 매 player가 매 다른 player에게 매 currency 맡기면 매 returner는 매 더 많이 받기 가능. EVE Online stockpiling.
+- **Punishment mechanic**: 매 betrayal에 매 reputation system — 매 public visible defection history.
+- **Communication tool**: 매 chat / signal로 매 commitment make 가능 — 매 cheap-talk vs costly signal.
+- **Endgame revelation**: 매 final round 시 매 cooperation 붕괴 (backward induction).

-**세부 내용:**
- 페이오프 행렬: T > R > P > S.
- 일회성 vs 반복 vs 무한.
- 진화 전략: Tit-for-Tat, Pavlov.
- 게임 응용: 길드 협력, NPC 평판, PvP 동맹.
- Axelrod 토너먼트: 협력 우세 입증.
+## 💻 패턴

-## 🤖 LLM 활용 힌트 (How to Use This Knowledge)
+### IPD simulator
+```python
+import numpy as np
+from typing import Callable

-**언제 이 지식을 쓰는가:**
- *(TODO)*
+PAYOFF = {
+    ('C', 'C'): (3, 3),
+    ('C', 'D'): (0, 5),
+    ('D', 'C'): (5, 0),
+    ('D', 'D'): (1, 1),
+}

-**언제 쓰면 안 되는가:**
- *(TODO)*
+def play(strat_a: Callable, strat_b: Callable, rounds=200, noise=0.0):
+    history_a, history_b = [], []
+    score_a, score_b = 0, 0
+    for r in range(rounds):
+        move_a = strat_a(history_a, history_b)
+        move_b = strat_b(history_b, history_a)
+        if np.random.random() < noise: move_a = 'D' if move_a == 'C' else 'C'
+        if np.random.random() < noise: move_b = 'D' if move_b == 'C' else 'C'
+        pa, pb = PAYOFF[(move_a, move_b)]
+        score_a += pa; score_b += pb
+        history_a.append(move_a); history_b.append(move_b)
+    return score_a, score_b
+```

-## 🧪 검증 상태 (Validation)
+### TFT + variants
+```python
+def tit_for_tat(my_hist, opp_hist):
+    return 'C' if not opp_hist else opp_hist[-1]

- **정보 상태:** draft
- **출처 신뢰도:** A
- **검토 이유:** *(P-Reinforce Phase 1 자동 정규화. 본문 검증 필요.)*
+def tit_for_two_tats(my_hist, opp_hist):
+    if len(opp_hist) < 2: return 'C'
+    return 'D' if opp_hist[-1] == 'D' and opp_hist[-2] == 'D' else 'C'

-## 🧬 중복 검사 (Duplicate Check)
+def generous_tft(my_hist, opp_hist):
+    if not opp_hist: return 'C'
+    if opp_hist[-1] == 'D' and np.random.random() < 0.1: return 'C' # 매 forgive
+    return opp_hist[-1]

- **기존 유사 문서:** *(TODO: 인덱서 클러스터 리포트 참조)*
- **처리 방식:** UPDATE (자동 정규화)
- **처리 이유:** Phase 1 정규화 — 옛 템플릿/누락 필드 보강.
+def pavlov(my_hist, opp_hist):
+    if not my_hist: return 'C'
+    last_payoff = PAYOFF[(my_hist[-1], opp_hist[-1])][0]
+    return my_hist[-1] if last_payoff >= 3 else ('D' if my_hist[-1] == 'C' else 'C')
+```

-## ⚠️ 모순 및 업데이트 (Contradictions & Updates)
+### Reputation system (multiplayer game)
+```python
+class Reputation:
+    def __init__(self):
+        self.scores = {}  # player_id → reputation float

- **과거 데이터와의 충돌:** 없음
- **정책 변화:** 없음
+    def record_action(self, player: str, action: str, target: str):
+        delta = +0.1 if action == 'cooperate' else -0.3
+        self.scores[player] = self.scores.get(player, 0) + delta
+        self.scores[player] = max(-1, min(1, self.scores[player]))

-## 🔗 지식 연결 (Graph)
+    def is_trustworthy(self, player: str) -> bool:
+        return self.scores.get(player, 0) > 0.3
+```

- **Parent:** [[10_Wiki/Topics]]
- **Related:** *(TODO: 최소 2개)*
- **Opposite / Trade-off:** *(TODO)*
- **Raw Source:** 직접 입력
+### Costly signal mechanic
+```python
+# 매 player가 매 commitment를 매 escrow로 demonstrate
+class CostlySignal:
+    def __init__(self):
+        self.escrows = {}

-## 🕓 변경 이력 (Changelog)
+    def signal_commitment(self, player: str, amount: int):
+        # 매 player가 매 amount를 lock — 매 betray시 매 lose
+        self.escrows[player] = amount

-| 날짜 | 변경 내용 | 처리 방식 | 신뢰도 |
-|------|-----------|-----------|--------|
-| 2026-05-08 | P-Reinforce Phase 1 정규화 (frontmatter + 헤더 표준화) | UPDATE | A |
+    def reward_or_punish(self, player: str, betrayed: bool):
+        amt = self.escrows.pop(player, 0)
+        if betrayed:
+            return 0  # 매 escrow 몰수
+        else:
+            return amt + (amt * 0.5)  # 매 50% bonus return
+```
+
+### Endgame anti-defection (finite-game prevention)
+```python
+# 매 final round를 매 hidden — 매 backward induction 차단
+class HiddenEndgame:
+    def __init__(self, expected_rounds: int, jitter: int):
+        self.actual_rounds = expected_rounds + np.random.randint(-jitter, jitter+1)
+
+    def is_final(self, current_round: int) -> bool:
+        return current_round >= self.actual_rounds
+    # 매 player에게 매 actual_rounds 매 공개 안 함
+```
+
+## 매 결정 기준
+| 상황 | Approach |
+|---|---|
+| Social deduction (Werewolf 식) | 매 information asymmetry + 매 PD |
+| Persistent MMO | Reputation + costly signal |
+| Co-op survival (Don't Starve Together) | 매 mutual benefit dominant — 매 PD weak |
+| Competitive 1v1 | Pure PD only at meta level |
+| Multi-agent RL | TFT-family baseline |
+
+**기본값**: 매 iterated PD with reputation + 매 hidden endgame. 매 one-shot은 매 always defect dominant.
+
+## 🔗 Graph
+- 부모: [[Game-Theory]] · [[Multiplayer-Design]]
+- 변형: [[Iterated-PD]] · [[Public-Goods-Game]] · [[Stag-Hunt]]
+- 응용: [[The-Resistance-Design]] · [[EVE-Online-Trust]]
+- Adjacent: [[Axelrod-Tournaments]] · [[Multi-Agent-RL]]
+
+## 🤖 LLM 활용
+**언제**: 매 LLM 두 instance를 매 IPD opponent로 simulate — 매 emergent strategy 분석.
+**언제 X**: 매 deep human social dynamic — 매 emotion / context는 매 LLM-PD simulation으로 안 잡힘.
+
+## ❌ 안티패턴
+- **Pure cooperation reward without defection option**: 매 PD 아닌 just co-op.
+- **No reputation persistence**: 매 betrayal 후 매 anonymity → 매 cooperation collapse.
+- **Known finite endgame**: 매 backward induction → 매 always defect.
+- **No noise tolerance**: 매 single mistake → 매 permanent defection spiral (TFT vs TFT trap).
+
+## 🧪 검증 / 중복
+- Verified — Axelrod "Evolution of Cooperation" (1984), Nicky Case "The Evolution of Trust" (2017), 매 Multi-Agent RL papers (DeepMind 'Sequential Social Dilemmas' 2017).
+- 신뢰도 A.
+
+## 🕓 Changelog
+| 날짜 | 변경 |
+|---|---|
+| 2026-05-08 | Phase 1 |
+| 2026-05-10 | Manual cleanup — PD payoff matrix, TFT variants, reputation / costly signal / hidden endgame patterns |