Files
2nd/10_Wiki/Topics/AI_and_ML/Introspection (자기성찰).md
T
Antigravity Agent f8b21af4be Wiki cleanup: error-doc removal, dedup merge, link normalization
10_Wiki/Topics 대규모 정리:
- 오류 캡처/미완성 stub 문서 227개 제거
- 교차폴더 중복 43클러스터 병합 (63파일 → redirect)
- 링크명 정규화: 깨진 링크 수정·redirect 직결·개념 매핑 ~2,400건
- 카테고리 MOC 6개 신규 생성
- Graph 섹션 미해결 related-keyword 링크 10,058건 제거

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 23:52:15 +09:00

5.3 KiB

id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
id title category status canonical_id aliases duplicate_of source_trust_level confidence_score verification_status tags raw_sources last_reinforced github_commit tech_stack
wiki-2026-0508-introspection-자기성찰 Introspection (자기성찰) 10_Wiki/Topics verified self
Self-Reflection
자기성찰
LLM Introspection
none A 0.9 applied
llm
prompting
self-reflection
philosophy-of-mind
agent
2026-05-10 pending
language framework
python langchain

Introspection (자기성찰)

매 한 줄

"매 모델이 자기 출력을 다시 읽고 평가/수정". 철학에서는 1인칭 자기 의식 접근을 의미하고, LLM에서는 self-critique → revise loop로 정확도/정합성을 끌어올리는 핵심 prompting pattern.

매 핵심

매 두 가지 의미

  • 철학 (Locke, Kant): mind 자체를 관찰하는 1인칭 access. qualia, self-model.
  • LLM: 자기 응답을 input 으로 다시 받아 비판/개선. Reflexion, Self-Refine, CoVe 계열.

매 동작 원리

  1. Generate: 초안 응답 produce.
  2. Critique: 같은 모델이 초안을 평가 (오류, 누락, 가정).
  3. Revise: critique 반영 재생성.
  4. (선택) 수렴할 때까지 반복.

매 응용

  1. Reasoning 정확도 향상 (math, code).
  2. Hallucination 검출 (CoVe — Chain of Verification).
  3. Agent 환경: tool 호출 결과 self-evaluate 후 재시도.
  4. RLHF reward modeling 대안 (constitutional AI).

💻 패턴

Self-Refine (single-model loop)

def self_refine(prompt, model, max_iters=3):
    answer = model.invoke(f"Answer: {prompt}")
    for _ in range(max_iters):
        feedback = model.invoke(
            f"Critique this answer (errors, gaps):\n{answer}"
        )
        if "no issues" in feedback.lower():
            break
        answer = model.invoke(
            f"Original: {prompt}\nDraft: {answer}\nFeedback: {feedback}\nRevised:"
        )
    return answer

Reflexion (verbal RL)

class ReflexionAgent:
    def __init__(self, llm):
        self.llm = llm
        self.memory = []  # past reflections

    def act(self, task):
        ctx = "\n".join(self.memory)
        result = self.llm(f"{ctx}\nTask: {task}")
        if not self.evaluate(result):
            reflection = self.llm(f"Why did this fail?\n{result}")
            self.memory.append(reflection)
        return result

Chain of Verification (CoVe)

def cove(question, llm):
    draft = llm(f"Q: {question}")
    plan = llm(f"List verification questions for:\n{draft}")
    answers = [llm(q) for q in plan.split("\n")]
    return llm(f"Original:{draft}\nVerifications:{answers}\nFinal:")

Constitutional AI (self-critique with principles)

PRINCIPLES = ["harmless", "honest", "helpful"]

def constitutional_revise(response, llm):
    for p in PRINCIPLES:
        critique = llm(f"Critique by '{p}':\n{response}")
        response = llm(f"Revise per critique:\n{critique}")
    return response

Confidence-gated introspection

def maybe_introspect(answer, llm, threshold=0.7):
    score = float(llm(f"Confidence 0-1 in:\n{answer}").strip())
    if score < threshold:
        return self_refine(answer, llm)
    return answer

LangChain self-critique chain

from langchain.chains import LLMChain, SequentialChain

draft = LLMChain(llm=llm, prompt=draft_prompt, output_key="draft")
critic = LLMChain(llm=llm, prompt=critic_prompt, output_key="critique")
final = LLMChain(llm=llm, prompt=final_prompt, output_key="final")
chain = SequentialChain(chains=[draft, critic, final],
                        input_variables=["q"],
                        output_variables=["final"])

매 결정 기준

상황 Approach
단순 factual Q&A introspection 불필요 (latency↑)
Math / code Self-Refine + verifier
Long-form 사실 검증 CoVe
Agent 실패 학습 Reflexion (memory 누적)
Safety alignment Constitutional AI

기본값: 1-shot Self-Refine (1회 critique → revise). 반복은 비용 대비 효과 체감 빠름.

🔗 Graph

🤖 LLM 활용

언제: multi-step reasoning, 사실 검증 필요한 long-form, agent 실패 분석, alignment. 언제 X: 단답형 classification, latency 민감, cost-bound batch inference.

안티패턴

  • Critique 동일 모델만 사용: confirmation bias — 가능하면 stronger judge 사용.
  • 무한 loop: 수렴 조건 없이 반복 → 비용 폭증, 답 표류.
  • Critique 무시: revise 단계에서 critique 미반영 prompt.
  • 자기성찰 == 자의식: LLM introspection 은 functional pattern. 철학적 self-awareness 와 구분.

🧪 검증 / 중복

  • Madaan et al. 2023 (Self-Refine), Shinn et al. 2023 (Reflexion), Dhuliawala et al. 2023 (CoVe).
  • Bai et al. 2022 (Constitutional AI, Anthropic).
  • 신뢰도 A.

🕓 Changelog

날짜 변경
2026-05-08 Phase 1
2026-05-10 Manual cleanup — 철학/LLM 두 의미 통합, Self-Refine/Reflexion/CoVe/Constitutional 패턴 정리