Definition
Hallucination is the phenomenon where an LLM confidently generates information that is factually incorrect, fabricated, or not supported by its training data or the provided context. The model produces plausible-sounding but false content — often without any indication of uncertainty.
Why the Term "Hallucination"?
Borrowed from psychology — the model "perceives" (generates) something that isn't there, just as a person hallucinating sees things that don't exist. The model is not lying intentionally; it genuinely doesn't know it's wrong.
Types of Hallucination
Factual Hallucination
- Generates incorrect facts presented as true
- Examples:
- Contradicts information explicitly provided in the prompt
- The document says X, but the model summarizes it as Y
- Common in RAG systems when the model ignores retrieved context
- Fabricates academic papers, URLs, book ISBNs, quotes
- Citations look real (correct format, plausible authors) but don't exist
- Particularly dangerous in legal, medical, and academic contexts
- References non-existent library functions or APIs
- Example:
df.aggregate_by_column(...)— a function that doesn't exist - Generates code that looks syntactically correct but doesn't run
- Confuses different people, organizations, or events with similar names
- Example: mixing up two different "John Smith" researchers
- The model is trained to predict the next token (plausibility), not to be accurate
- Confident, fluent text is rewarded even if incorrect
- No explicit "I don't know" signal in standard language modeling
- Training data has a cutoff date — model doesn't know about recent events
- Rare topics may have insufficient training signal
- Model "fills in" gaps with plausible-sounding content
- RLHF rewards helpful-sounding responses
- Confidently wrong answers often sound more helpful than "I don't know"
- Can inadvertently reward hallucination
- For very long contexts, models lose track of details mentioned earlier
- Generates output inconsistent with early context
- Model weights are a lossy compression of training data
- Specific details (exact quotes, statistics, dates) are poorly retained
- Ask the same question multiple times with temperature > 0
- If answers disagree, the model is uncertain → likely hallucinating
- Have a second LLM verify whether the response is entailed by the source document
- Used in RAG pipelines: does the generated answer match the retrieved context?
- Ask the model to rate its confidence: "How certain are you about this, 1–10?"
- Well-calibrated models show correlation between confidence and accuracy
- Check generated facts against a knowledge base or search engine
- "Grounding" tools (Bing, Google) verify claims post-generation
- Don't rely on parametric memory for facts
- Retrieve ground-truth documents and constrain answers to retrieved content
- "Answer only based on the provided documents"
- Most effective mitigation for factual hallucination
- "Only use information from the document below"
- "If the answer is not in the context, say 'I don't know'"
- Explicit instructions to cite sources
- Lower temperature → model sticks to highest-probability (more likely correct) tokens
- Not a complete fix, but reduces creative/fabricated responses
- Fine-tune with examples of "I don't know" responses for uncertain queries
- Train model to express calibrated uncertainty
- Post-process outputs with a fact-checking layer
- External search/knowledge base verification
- Hallucination detection classifiers
- Force model to cite sources: "List the source sentence for each claim"
- Forces the model to ground each claim in the provided context
- Grounding, RAG, Alignment, Temperature, Inference, Calibration, RLHF, Context Window
- "Albert Einstein won the Nobel Prize in 1921 for the theory of relativity" (he won it for the photoelectric effect)
- Fabricating a book title, author, or publication date
- Inventing a court case, law, or medical study
Contextual Hallucination
Citation / Reference Hallucination
Code Hallucination
Identity Hallucination
Root Causes
1. Training Objective Mismatch
2. Knowledge Gaps and Cutoffs
3. Over-Smoothing via RLHF
4. Long-Range Context Failure
5. Compression Artifacts
Hallucination Rate Benchmarks
| Model | TruthfulQA Score | Notes |
|-------|-----------------|-------|
| GPT-4 | ~70–80% | Among best |
| Claude 3 Opus | ~75–85% | Strong |
| GPT-3.5 | ~50–60% | Significant hallucination |
| Smaller models | 30–50% | Higher hallucination rates |
Detection Methods
Self-Consistency Check
Entailment Verification
Confidence Elicitation
External Verification
Mitigation Strategies
RAG (Retrieval-Augmented Generation)
Grounding Instructions
Reducing Temperature
Fine-Tuning on Factual Data
Output Verification Pipeline
Structured Prompting
High-Risk Domains for Hallucination
| Domain | Risk | Consequence |
|--------|------|-------------|
| Medical | High | Wrong dosage, wrong diagnosis |
| Legal | High | Fabricated case law, wrong statute |
| Financial | High | Wrong figures, fake regulations |
| Academic | High | Fake citations, wrong attributions |
| Code | Medium | Runtime errors, security vulnerabilities |