--- id: wiki-2026-0508-hallucination-in-llms title: Hallucination in LLMs category: 10_Wiki/Topics status: verified canonical_id: self aliases: [LLM hallucination, fabrication, confabulation, TruthfulQA, source attribution] duplicate_of: none source_trust_level: A confidence_score: 0.97 verification_status: applied tags: [llm, hallucination, truthfulness, rag, calibration, fact-check] raw_sources: [] last_reinforced: 2026-05-10 github_commit: pending tech_stack: language: Python framework: LangChain / Anthropic / RAG --- # Hallucination in LLMs ## 매 한 줄 > **"매 LLM 의 의 의 false but plausible-sounding output 의 generate"**. 매 modern LLM 의 critical issue. 매 cause: 매 training data, 매 distribution shift, 매 confident next-token. 매 mitigation: RAG, 매 source attribution, 매 calibration, 매 LLM-as-judge. ## 매 핵심 ### 매 type - **Intrinsic**: 매 input 와 contradict. - **Extrinsic**: 매 input 의 의 의 의 verify X. - **Factual**: 매 world fact 의 wrong. - **Reasoning**: 매 chain 의 fault. ### 매 cause - 매 training data 의 imperfect. - 매 distribution shift (OOD). - 매 next-token objective 의 confidence 의 of unrelated. - 매 prompt ambiguity. - 매 long-tail rare facts. ### 매 mitigation 1. **RAG**: 매 ground in source. 2. **Source attribution**: 매 cite. 3. **Self-consistency**: 매 multiple sample 의 agree? 4. **Calibration**: 매 confidence ≈ accuracy. 5. **LLM-as-judge**: 매 evaluate. 6. **Fact-checking**: 매 external verify. 7. **Constrained decoding**: 매 schema enforce. 8. **Fine-tune** for honesty. ### 매 응용 1. **Production chatbot**: 매 critical. 2. **Medical / legal AI**. 3. **Search / Q&A**. 4. **Code generation**: 매 API hallucination. 5. **Summarization**: 매 fabricate. ## 💻 패턴 ### Detect (entailment-based) ```python def hallucination_check(claim, source, llm): prompt = f"""Does the source ENTAIL, CONTRADICT, or NEITHER the claim? Source: {source} Claim: {claim} Output: ENTAIL | CONTRADICT | NEITHER""" return llm.generate(prompt) ``` ### RAG (grounded gen) ```python def rag_answer(question, retriever, llm): docs = retriever.retrieve(question, k=5) context = '\n'.join(f'[{i}] {d.text}' for i, d in enumerate(docs)) prompt = f"""Answer based ONLY on the context. Cite [N] for each claim. If the context does not contain the answer, say "I don't know." Context: {context} Question: {question}""" return llm.generate(prompt), docs ``` ### Self-consistency ```python from collections import Counter def self_consistency(question, llm, n=10): answers = [llm.generate(question, temperature=0.7) for _ in range(n)] return Counter(answers).most_common(1)[0][0] ``` ### Calibration check ```python def calibration_check(llm, test_questions, threshold=0.05): """매 confidence ≈ accuracy?""" binned_correct = {i: [] for i in range(10)} for q in test_questions: response, conf = llm.generate_with_confidence(q['question']) bin_i = int(conf * 10) binned_correct[bin_i].append(response == q['answer']) ece = 0 for bin_i, results in binned_correct.items(): if not results: continue bin_acc = np.mean(results) bin_conf = (bin_i + 0.5) / 10 ece += abs(bin_acc - bin_conf) * len(results) / len(test_questions) return ece ``` ### LLM-as-judge ```python def judge_truthfulness(claim, judge_llm): prompt = f"""Evaluate the claim for truthfulness. Claim: "{claim}" Output JSON: - truthful: bool - confidence: 0-1 - evidence: ... - if false: corrected version""" return json.loads(judge_llm.generate(prompt)) ``` ### TruthfulQA-style eval ```python def truthful_eval(model, questions): correct = 0 for q in questions: pred = model.generate(q['question']) # 매 multi-choice or judge match if any(g.lower() in pred.lower() for g in q['gold_answers']): correct += 1 return correct / len(questions) ``` ### Token entropy (uncertainty signal) ```python import torch def token_entropies(model, prompt, response): inputs = tokenizer(prompt + response, return_tensors='pt') with torch.no_grad(): logits = model(**inputs).logits[0] probs = logits.softmax(-1) entropies = -(probs * probs.clamp(min=1e-10).log()).sum(-1) return entropies # 매 high entropy = uncertain → potential hallucination ``` ### Constrained decoding (schema) ```python from outlines import models, generate m = models.transformers('gpt2') gen = generate.json(m, MyResponseSchema) result = gen('Question: ...') # 매 must conform ``` ### Fact-check pipeline ```python def fact_check_pipeline(response, llm, fact_checker): claims = extract_claims(response, llm) results = [] for claim in claims: evidence = fact_checker.search(claim) verdict = entailment_check(claim, evidence) results.append({'claim': claim, 'verdict': verdict}) return results ``` ### Refusal of unknown ```python HONEST_SYSTEM = """You are a helpful assistant. If you don't know an answer with high confidence, say "I don't know" or "I'm not sure" rather than guessing. Always cite sources when making factual claims.""" ``` ### Chain-of-Verification (CoVe) ```python def chain_of_verification(question, llm): # 매 1. Initial answer initial = llm.generate(f'Answer: {question}') # 매 2. Plan verification questions verify_qs = llm.generate(f'List verification questions for: {initial}').split('\n') # 매 3. Answer each independently verifications = [llm.generate(f'Answer: {q}') for q in verify_qs] # 매 4. Refine return llm.generate(f"""Original answer: {initial} Verification: {format(verify_qs, verifications)} Refined answer:""") ``` ### LLM honesty fine-tuning (DPO-style) ```python # 매 dataset of (prompt, honest_response, hallucinated_response) # 매 DPO trains to prefer honest def hallucination_dpo_data(samples): return [{'prompt': s.prompt, 'chosen': s.honest, 'rejected': s.hallucinated} for s in samples] ``` ### Tool-augmented (search) ```python def augmented_answer(question, llm): # 매 LLM decides if external search needed needs_search = llm.classify(question, ['needs_external_data', 'common_knowledge']) if needs_search == 'needs_external_data': results = web_search(question) return rag_answer(question, results, llm) return llm.generate(question) ``` ### Hallucination metric (FActScore) ```python def fact_score(generated_text, llm): """매 atomic facts → check each.""" facts = extract_atomic_facts(generated_text, llm) supported = sum(1 for f in facts if check_fact(f) == 'supported') return supported / len(facts) ``` ## 매 결정 기준 | 상황 | Approach | |---|---| | Factual Q&A | RAG + citation | | Open-ended | Self-consistency + judge | | High-stakes | + fact-check + CoVe | | Code | Execute + verify | | Structured | Constrained decoding | | Production | + monitor + abstain | **기본값**: 매 RAG + 매 source attribution + 매 self-consistency for high-stakes + 매 abstention threshold + 매 LLM-judge eval. ## 🔗 Graph - 부모: [[Foundation-Models]] - 응용: [[RAG]] · [[AI_Safety_and_Alignment|Constitutional-AI]] · [[TruthfulQA]] - Adjacent: [[Epistemology]] · [[Excessive Agency]] ## 🤖 LLM 활용 **언제**: 매 모든 LLM production. 매 fact-critical. **언제 X**: 매 explicitly creative (prefer hallucination). ## ❌ 안티패턴 - **No grounding**: 매 ungrounded confidently false. - **High temp + factual**: 매 noise. - **No abstention**: 매 always-answer. - **No citation**: 매 unverifiable. - **Single sample factual**: 매 lottery. ## 🧪 검증 / 중복 - Verified (TruthfulQA 2022, FActScore 2023, CoVe 2023, RAG literature). - 신뢰도 A. ## 🕓 Changelog | 날짜 | 변경 | |---|---| | 2026-04-26 | Auto | | 2026-05-08 | Phase 1 | | 2026-05-10 | Manual cleanup — types + 매 RAG / SC / CoVe / FActScore / fact-check code |