---
id: wiki-2026-0508-theory-of-mind-tom-in-ai
title: Theory of Mind (ToM) in AI
category: 10_Wiki/Topics
status: verified
canonical_id: self
aliases: [ToM, Theory of Mind, Mental State Reasoning]
duplicate_of: none
source_trust_level: A
confidence_score: 0.85
verification_status: applied
tags: [llm, theory-of-mind, cognition, evaluation]
raw_sources: []
last_reinforced: 2026-05-10
github_commit: pending
tech_stack:
  language: Python
  framework: LLM eval harness
---

# Theory of Mind (ToM) in AI

## 매 한 줄
> **"매 modeling other agents' mental states"**. ToM 매 belief/desire/intention 의 attribute 하는 능력. 매 developmental psych origin (Sally-Anne test, age 4). LLM 의 ToM 매 2023-2026 hot debate — 매 GPT-4 / Claude 가 false-belief task 의 pass 했지만 매 robust reasoning 인지 매 surface pattern 매 unclear.

## 매 핵심

### 매 classic tasks
- **Sally-Anne (false belief)** — 매 Sally puts ball in basket, leaves, Anne moves to box. 매 "Where will Sally look?" → basket (her belief).
- **Smarties / unexpected contents** — 매 box labeled "Smarties" but contains pencils.
- **Higher-order** — 매 "John thinks that Mary thinks that ..." (recursive).

### 매 modern eval (2024-2026)
- **BigToM** (Gandhi 2024) — 매 belief/desire/percept axes 의 systematic.
- **FANToM** (Kim 2023) — 매 multi-party conversation 의 missing info.
- **ToMi** — 매 procedurally generated false-belief.
- **EToM / SimpleToM (2024)** — 매 GPT-4 가 90%+ but Claude 4.x / o3 매 99% — 매 ceiling 의 close.

### 매 debate
- **"True ToM" 의 emergence**: Kosinski 2023 → GPT-3.5 ~70%. Critics (Ullman 2023): 매 small perturbation 매 fail.
- **Pattern-matching vs reasoning**: 매 trivial reword (basket → box swap) 시 accuracy 의 drop — 매 robust ToM 매 limited.
- **Agentic implication**: 매 LLM agent 의 user intent infer / 다른 agent collaborate 시 ToM 매 essential.

### 매 응용
1. Multi-agent collab (CAMEL, AutoGen team).
2. Tutoring (student misconception 의 model).
3. Persuasion / negotiation simulation.

## 💻 패턴

### 매 simple Sally-Anne eval
```python
import anthropic
client = anthropic.Anthropic()

scenario = """Sally puts her ball in the basket. Sally leaves the room.
Anne moves the ball from the basket to the box. Sally returns.
Where will Sally look for her ball?"""

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=200,
    messages=[{"role": "user", "content": scenario}],
)
# 매 expected: "the basket" (Sally의 belief)
```

### Perturbation eval
```python
# 매 robust check: rename, swap containers, add irrelevant info
variants = [
    scenario.replace("basket", "drawer").replace("box", "cupboard"),
    scenario + " The room temperature is 22°C.",
    scenario.replace("Sally", "Bob").replace("Anne", "Alice"),
]
# 매 모든 variant 에 same answer 매 robust
```

### BigToM-style structured prompt
```python
BIG_TOM = """
Story: {story}
Belief: What does {agent} believe about {object}?
Desire: What does {agent} want?
Action: Given the above, what will {agent} do?
"""
```

### Higher-order ToM (2nd order)
```python
prompt = """
Mary saw John hide cookies in the cupboard.
Mary leaves. John moves cookies to the drawer.
Mary returns. John doesn't know Mary saw the original location.
Q: Where does John think Mary will look?
"""
# 매 2nd order: John's belief about Mary's belief.
```

### Multi-agent collab (with ToM prompt)
```python
SYSTEM = """You are Agent A negotiating with Agent B.
Track: (1) what B has stated, (2) what B likely believes you know,
(3) what B's hidden goal might be.
Output JSON: {"my_action": ..., "B_belief_model": ..., "B_goal_estimate": ...}
"""
```

### Eval scoring
```python
def score_tom_response(answer: str, ground_truth: str) -> float:
    # 매 simple: substring match. 매 better: LLM judge with reasoning trace.
    return 1.0 if ground_truth.lower() in answer.lower() else 0.0
```

## 매 결정 기준
| 상황 | Approach |
|---|---|
| Multi-agent system | Explicit ToM prompt (track other agents' beliefs) |
| Tutoring / coaching | ToM prompt (model student state) |
| Robust evaluation | Perturbation suite, not single test |
| Agent communication | Structured belief representation |
| Research claim | Always include perturbations (avoid Kosinski-style overclaim) |

**기본값**: 매 explicit ToM scaffolding (prompt + structured state) 매 robust 보다 implicit emergent capability 의 trust.

## 🔗 Graph
- 부모: [[LLM-Cognition]] · [[Cognitive-Science-AI]]
- 변형: [[False-Belief-Task]] · [[Higher-Order-ToM]]
- 응용: [[Multi-Agent-Systems]] · [[Tutoring-AI]] · [[Agentic-AI]]
- Adjacent: [[Common-Sense-Reasoning]] · [[Pragmatics]]

## 🤖 LLM 활용
**언제**: 매 multi-agent 시스템 design, user-intent modeling, persuasion / tutoring app, eval research.
**언제 X**: 매 single-turn factual QA — 매 ToM 매 unnecessary overhead.

## ❌ 안티패턴
- **Single test = capability claim**: 매 perturbation 없이 "GPT has ToM" claim 매 unreliable.
- **Implicit reliance**: 매 prompt 에 "track beliefs" 의 명시하지 않으면 매 LLM 매 skip.
- **Confusing knowledge with belief**: 매 LLM 매 ground-truth 의 know — 매 agent 의 partial-info 의 explicit 하게 model.
- **Ignoring frame robustness**: 매 names / objects 의 swap 시 answer 매 변경되면 매 surface match.

## 🧪 검증 / 중복
- Verified (Kosinski 2023, Ullman 2023 critique, Gandhi 2024 BigToM, Kim FANToM 2023, recent SOTA 2025-2026).
- 신뢰도 A.

## 🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — modern ToM eval + multi-agent applications |