95cd8bb891
- 코드 그라운딩: 기술 주제 문서의 '적용 사례'에 실제 레포 구현 위치
(file:line)+커밋 자동 주입 (예: 문서 청킹 전략→connectai/src/retrieval/chunker.ts).
멱등 마커(CODE-GROUNDING)로 재실행 시 갱신.
- MOC: 39개 클러스터 폴더에 _MOC.md 학습지도 생성(진입점+통찰 주석).
도구: Datacollect/scripts/{code_grounding,moc_generator}.mjs
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.2 KiB
5.2 KiB
id, title, category, status, canonical_id, aliases, duplicate_of, source_trust_level, confidence_score, verification_status, tags, raw_sources, last_reinforced, github_commit, tech_stack
| id | title | category | status | canonical_id | aliases | duplicate_of | source_trust_level | confidence_score | verification_status | tags | raw_sources | last_reinforced | github_commit | tech_stack | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wiki-2026-0508-self-verification | Self-Verification | 10_Wiki/Topics | verified | self |
|
none | A | 0.9 | applied |
|
2026-05-10 | pending |
|
Self-Verification
매 한 줄
"매 LLM 이 자기 답을 다시 점검 — generate → verify → revise". 매 Dhuliawala 2023 의 CoVe (Chain of Verification), self-consistency, self-refine, reflexion 가 매 family. 매 2026: reasoning model (Claude Opus 4.7 thinking, o3) 이 매 internalized self-verify, 그래도 매 explicit verify pass 가 critical accuracy 추가.
매 핵심
매 형태
- Self-consistency (Wang 2022): 매 sample N 개 → majority vote.
- Chain of Verification (CoVe): plan → baseline → verify Qs → answer Qs → final.
- Self-refine (Madaan 2023): generate → critique → revise loop.
- Reflexion: episodic memory of past mistakes.
- Constitutional / RLHF self-judge: model 가 own output 평가.
매 verify 가 효과적인 곳
- Multi-hop reasoning (factual chains).
- Math / logic (intermediate step check).
- Code (compile, test, lint).
- Long-form factuality (claim-by-claim).
- Hallucination 감소.
매 verify 가 부정확한 곳
- Model 의 systematic bias — 같은 wrong answer.
- Highly creative / open-ended (no ground truth).
- 매 verify model = generator → blind spots 공유.
매 응용
- Agent loop critical-path step 검증.
- RAG answer claim verification (cite-check).
- Code review pre-PR.
- Math homework solver.
- Medical / legal high-stakes Q&A.
💻 패턴
Self-consistency
from collections import Counter
samples = [llm(prompt, temperature=0.8) for _ in range(7)]
answer = Counter(extract_answer(s) for s in samples).most_common(1)[0][0]
CoVe (4 steps)
baseline = llm(f"Answer: {q}")
verify_qs = llm(f"List 5 verification Qs for: {baseline}")
verify_as = [llm(f"Answer concisely: {vq}") for vq in verify_qs.splitlines()]
final = llm(f"Given verification:\n{verify_as}\nRevise: {baseline}")
Self-refine loop
draft = llm(f"Solve: {task}")
for _ in range(3):
critique = llm(f"Critique:\n{draft}\nList concrete issues; 'NONE' if perfect.")
if "NONE" in critique[:20]:
break
draft = llm(f"Revise based on critique:\n{critique}\n\nDraft:\n{draft}")
Verifier-as-different-model
draft = anthropic_call("claude-opus-4-7", task)
verdict = openai_call("gpt-5", f"Find errors in:\n{draft}")
final = anthropic_call("claude-opus-4-7", f"Address:\n{verdict}\n\nDraft:\n{draft}")
Code self-test loop
code = llm(f"Write Python for: {spec}")
for _ in range(3):
res = run_tests(code, spec.tests)
if res.passed:
break
code = llm(f"Tests failed:\n{res.report}\nFix:\n{code}")
Extended thinking (Claude 2026)
msg = anthropic.messages.create(
model="claude-opus-4-7",
thinking={"type": "enabled", "budget_tokens": 16000},
messages=[{"role": "user", "content": hard_problem}],
max_tokens=4096,
)
# 매 internal verify already happens within thinking
RAG claim-by-claim verify
claims = extract_claims(answer)
for c in claims:
evidence = retrieve(c)
ok = llm(f"Is '{c}' supported by:\n{evidence}\nyes/no")
if "no" in ok.lower():
flag(c)
매 결정 기준
| 상황 | Approach |
|---|---|
| Cheap, parallelizable | self-consistency |
| Factual long-form | CoVe |
| Iterative improvement | self-refine |
| Code / has tests | execution-grounded |
| Reasoning model 사용 가능 | thinking budget + light verify |
기본값: thinking + light claim-verify (RAG case) 또는 self-consistency (3-5 samples).
🔗 Graph
- 변형: Chain-of-Thought · Self-Consistency · Reflexion
- 응용: RAG · Code-Generation
🤖 LLM 활용
언제: 매 high-stakes accuracy, hallucination cost 큼. 매 budget 가 latency 보다 중요. 언제 X: 매 latency-critical (chat UI). 매 task 가 verify 가능한 ground truth 없음 (open creative).
❌ 안티패턴
- Self-verify infinite loop: 매 max iter cap 필수.
- Same model verify same model on bias: blind spots 공유 → cross-model verify.
- Verify trivial output: 매 cost waste — gating 필요.
- Trust verify verdict blindly: verify hallucinate 가능.
🧪 검증 / 중복
- Verified (Wang 2022 Self-Consistency, Dhuliawala 2023 CoVe, Madaan 2023 Self-Refine).
- 신뢰도 A.
🕓 Changelog
| 날짜 | 변경 |
|---|---|
| 2026-05-08 | Phase 1 |
| 2026-05-10 | Manual cleanup — verification family + thinking 2026 |
🛠️ 적용 사례 (Applied in summary)
🔎 코드베이스 근거 (자동 추출 — E:\Wiki 레포)
실제 구현/사용 위치:
connectai/src/features/selfReflector/selfReflectorPrompt.ts:67— ## [Code Self-Verification — 코드 작성 시 추가 검증]
자동 생성: code_grounding.mjs · 재실행 시 갱신됨