Files
2nd/10_Wiki/Topics/AI_and_ML/Theory-of-Mind (ToM) in AI.md
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.4 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-theory-of-mind-tom-in-ai Theory of Mind (ToM) in AI 10_Wiki/Topics verified self
ToM
Theory of Mind
Mental State Reasoning
none A 0.85 applied
llm
theory-of-mind
cognition
evaluation
2026-05-10 pending
language framework
Python LLM eval harness

Theory of Mind (ToM) in AI

매 한 줄

"매 modeling other agents' mental states". ToM 매 belief/desire/intention 의 attribute 하는 능력. 매 developmental psych origin (Sally-Anne test, age 4). LLM 의 ToM 매 2023-2026 hot debate — 매 GPT-4 / Claude 가 false-belief task 의 pass 했지만 매 robust reasoning 인지 매 surface pattern 매 unclear.

매 핵심

매 classic tasks

  • Sally-Anne (false belief) — 매 Sally puts ball in basket, leaves, Anne moves to box. 매 "Where will Sally look?" → basket (her belief).
  • Smarties / unexpected contents — 매 box labeled "Smarties" but contains pencils.
  • Higher-order — 매 "John thinks that Mary thinks that ..." (recursive).

매 modern eval (2024-2026)

  • BigToM (Gandhi 2024) — 매 belief/desire/percept axes 의 systematic.
  • FANToM (Kim 2023) — 매 multi-party conversation 의 missing info.
  • ToMi — 매 procedurally generated false-belief.
  • EToM / SimpleToM (2024) — 매 GPT-4 가 90%+ but Claude 4.x / o3 매 99% — 매 ceiling 의 close.

매 debate

  • "True ToM" 의 emergence: Kosinski 2023 → GPT-3.5 ~70%. Critics (Ullman 2023): 매 small perturbation 매 fail.
  • Pattern-matching vs reasoning: 매 trivial reword (basket → box swap) 시 accuracy 의 drop — 매 robust ToM 매 limited.
  • Agentic implication: 매 LLM agent 의 user intent infer / 다른 agent collaborate 시 ToM 매 essential.

매 응용

  1. Multi-agent collab (CAMEL, AutoGen team).
  2. Tutoring (student misconception 의 model).
  3. Persuasion / negotiation simulation.

💻 패턴

매 simple Sally-Anne eval

import anthropic
client = anthropic.Anthropic()

scenario = """Sally puts her ball in the basket. Sally leaves the room.
Anne moves the ball from the basket to the box. Sally returns.
Where will Sally look for her ball?"""

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=200,
    messages=[{"role": "user", "content": scenario}],
)
# 매 expected: "the basket" (Sally의 belief)

Perturbation eval

# 매 robust check: rename, swap containers, add irrelevant info
variants = [
    scenario.replace("basket", "drawer").replace("box", "cupboard"),
    scenario + " The room temperature is 22°C.",
    scenario.replace("Sally", "Bob").replace("Anne", "Alice"),
]
# 매 모든 variant 에 same answer 매 robust

BigToM-style structured prompt

BIG_TOM = """
Story: {story}
Belief: What does {agent} believe about {object}?
Desire: What does {agent} want?
Action: Given the above, what will {agent} do?
"""

Higher-order ToM (2nd order)

prompt = """
Mary saw John hide cookies in the cupboard.
Mary leaves. John moves cookies to the drawer.
Mary returns. John doesn't know Mary saw the original location.
Q: Where does John think Mary will look?
"""
# 매 2nd order: John's belief about Mary's belief.

Multi-agent collab (with ToM prompt)

SYSTEM = """You are Agent A negotiating with Agent B.
Track: (1) what B has stated, (2) what B likely believes you know,
(3) what B's hidden goal might be.
Output JSON: {"my_action": ..., "B_belief_model": ..., "B_goal_estimate": ...}
"""

Eval scoring

def score_tom_response(answer: str, ground_truth: str) -> float:
    # 매 simple: substring match. 매 better: LLM judge with reasoning trace.
    return 1.0 if ground_truth.lower() in answer.lower() else 0.0

매 결정 기준

상황 Approach
Multi-agent system Explicit ToM prompt (track other agents' beliefs)
Tutoring / coaching ToM prompt (model student state)
Robust evaluation Perturbation suite, not single test
Agent communication Structured belief representation
Research claim Always include perturbations (avoid Kosinski-style overclaim)

기본값: 매 explicit ToM scaffolding (prompt + structured state) 매 robust 보다 implicit emergent capability 의 trust.

🔗 Graph

🤖 LLM 활용

언제: 매 multi-agent 시스템 design, user-intent modeling, persuasion / tutoring app, eval research. 언제 X: 매 single-turn factual QA — 매 ToM 매 unnecessary overhead.

안티패턴

  • Single test = capability claim: 매 perturbation 없이 "GPT has ToM" claim 매 unreliable.
  • Implicit reliance: 매 prompt 에 "track beliefs" 의 명시하지 않으면 매 LLM 매 skip.
  • Confusing knowledge with belief: 매 LLM 매 ground-truth 의 know — 매 agent 의 partial-info 의 explicit 하게 model.
  • Ignoring frame robustness: 매 names / objects 의 swap 시 answer 매 변경되면 매 surface match.

🧪 검증 / 중복

  • Verified (Kosinski 2023, Ullman 2023 critique, Gandhi 2024 BigToM, Kim FANToM 2023, recent SOTA 2025-2026).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — modern ToM eval + multi-agent applications