"매 modeling other agents' mental states". ToM 매 belief/desire/intention 의 attribute 하는 능력. 매 developmental psych origin (Sally-Anne test, age 4). LLM 의 ToM 매 2023-2026 hot debate — 매 GPT-4 / Claude 가 false-belief task 의 pass 했지만 매 robust reasoning 인지 매 surface pattern 매 unclear.
매 핵심
매 classic tasks
Sally-Anne (false belief) — 매 Sally puts ball in basket, leaves, Anne moves to box. 매 "Where will Sally look?" → basket (her belief).
Smarties / unexpected contents — 매 box labeled "Smarties" but contains pencils.
Higher-order — 매 "John thinks that Mary thinks that ..." (recursive).
매 modern eval (2024-2026)
BigToM (Gandhi 2024) — 매 belief/desire/percept axes 의 systematic.
FANToM (Kim 2023) — 매 multi-party conversation 의 missing info.
ToMi — 매 procedurally generated false-belief.
EToM / SimpleToM (2024) — 매 GPT-4 가 90%+ but Claude 4.x / o3 매 99% — 매 ceiling 의 close.
매 debate
"True ToM" 의 emergence: Kosinski 2023 → GPT-3.5 ~70%. Critics (Ullman 2023): 매 small perturbation 매 fail.
Pattern-matching vs reasoning: 매 trivial reword (basket → box swap) 시 accuracy 의 drop — 매 robust ToM 매 limited.
Agentic implication: 매 LLM agent 의 user intent infer / 다른 agent collaborate 시 ToM 매 essential.
매 응용
Multi-agent collab (CAMEL, AutoGen team).
Tutoring (student misconception 의 model).
Persuasion / negotiation simulation.
💻 패턴
매 simple Sally-Anne eval
importanthropicclient=anthropic.Anthropic()scenario="""Sally puts her ball in the basket. Sally leaves the room.
Anne moves the ball from the basket to the box. Sally returns.
Where will Sally look for her ball?"""resp=client.messages.create(model="claude-opus-4-7",max_tokens=200,messages=[{"role":"user","content":scenario}],)# 매 expected: "the basket" (Sally의 belief)
Perturbation eval
# 매 robust check: rename, swap containers, add irrelevant infovariants=[scenario.replace("basket","drawer").replace("box","cupboard"),scenario+" The room temperature is 22°C.",scenario.replace("Sally","Bob").replace("Anne","Alice"),]# 매 모든 variant 에 same answer 매 robust
BigToM-style structured prompt
BIG_TOM="""
Story: {story}Belief: What does {agent} believe about {object}?
Desire: What does {agent} want?
Action: Given the above, what will {agent} do?
"""
Higher-order ToM (2nd order)
prompt="""
Mary saw John hide cookies in the cupboard.
Mary leaves. John moves cookies to the drawer.
Mary returns. John doesn't know Mary saw the original location.
Q: Where does John think Mary will look?
"""# 매 2nd order: John's belief about Mary's belief.
Multi-agent collab (with ToM prompt)
SYSTEM="""You are Agent A negotiating with Agent B.
Track: (1) what B has stated, (2) what B likely believes you know,
(3) what B's hidden goal might be.
Output JSON: {"my_action": ..., "B_belief_model": ..., "B_goal_estimate": ...}
"""
Eval scoring
defscore_tom_response(answer:str,ground_truth:str)->float:# 매 simple: substring match. 매 better: LLM judge with reasoning trace.return1.0ifground_truth.lower()inanswer.lower()else0.0
매 결정 기준
상황
Approach
Multi-agent system
Explicit ToM prompt (track other agents' beliefs)
Tutoring / coaching
ToM prompt (model student state)
Robust evaluation
Perturbation suite, not single test
Agent communication
Structured belief representation
Research claim
Always include perturbations (avoid Kosinski-style overclaim)
기본값: 매 explicit ToM scaffolding (prompt + structured state) 매 robust 보다 implicit emergent capability 의 trust.
언제: 매 multi-agent 시스템 design, user-intent modeling, persuasion / tutoring app, eval research.
언제 X: 매 single-turn factual QA — 매 ToM 매 unnecessary overhead.
❌ 안티패턴
Single test = capability claim: 매 perturbation 없이 "GPT has ToM" claim 매 unreliable.
Implicit reliance: 매 prompt 에 "track beliefs" 의 명시하지 않으면 매 LLM 매 skip.
Confusing knowledge with belief: 매 LLM 매 ground-truth 의 know — 매 agent 의 partial-info 의 explicit 하게 model.
Ignoring frame robustness: 매 names / objects 의 swap 시 answer 매 변경되면 매 surface match.
🧪 검증 / 중복
Verified (Kosinski 2023, Ullman 2023 critique, Gandhi 2024 BigToM, Kim FANToM 2023, recent SOTA 2025-2026).
신뢰도 A.
🕓 Changelog
날짜
변경
2026-05-08
Phase 1
2026-05-10
Manual cleanup — modern ToM eval + multi-agent applications