Hallucination — FDE@ProdAI Blog

Definition

Hallucination is the phenomenon where an LLM confidently generates information that is factually incorrect, fabricated, or not supported by its training data or the provided context. The model produces plausible-sounding but false content — often without any indication of uncertainty.

Why the Term "Hallucination"?

Borrowed from psychology — the model "perceives" (generates) something that isn't there, just as a person hallucinating sees things that don't exist. The model is not lying intentionally; it genuinely doesn't know it's wrong.

Types of Hallucination

Factual Hallucination

Generates incorrect facts presented as true
Examples:

- "Albert Einstein won the Nobel Prize in 1921 for the theory of relativity" (he won it for the photoelectric effect)

- Fabricating a book title, author, or publication date

- Inventing a court case, law, or medical study

Contextual Hallucination

Contradicts information explicitly provided in the prompt
The document says X, but the model summarizes it as Y
Common in RAG systems when the model ignores retrieved context

Citation / Reference Hallucination

Fabricates academic papers, URLs, book ISBNs, quotes
Citations look real (correct format, plausible authors) but don't exist
Particularly dangerous in legal, medical, and academic contexts

Code Hallucination

References non-existent library functions or APIs
Example: df.aggregate_by_column(...) — a function that doesn't exist
Generates code that looks syntactically correct but doesn't run

Identity Hallucination

Confuses different people, organizations, or events with similar names
Example: mixing up two different "John Smith" researchers

Root Causes

1. Training Objective Mismatch

The model is trained to predict the next token (plausibility), not to be accurate
Confident, fluent text is rewarded even if incorrect
No explicit "I don't know" signal in standard language modeling

2. Knowledge Gaps and Cutoffs

Training data has a cutoff date — model doesn't know about recent events
Rare topics may have insufficient training signal
Model "fills in" gaps with plausible-sounding content

3. Over-Smoothing via RLHF

RLHF rewards helpful-sounding responses
Confidently wrong answers often sound more helpful than "I don't know"
Can inadvertently reward hallucination

4. Long-Range Context Failure

For very long contexts, models lose track of details mentioned earlier
Generates output inconsistent with early context

5. Compression Artifacts

Model weights are a lossy compression of training data
Specific details (exact quotes, statistics, dates) are poorly retained

Hallucination Rate Benchmarks

| Model | TruthfulQA Score | Notes |

|-------|-----------------|-------|

| GPT-4 | ~70–80% | Among best |

| Claude 3 Opus | ~75–85% | Strong |

| GPT-3.5 | ~50–60% | Significant hallucination |

| Smaller models | 30–50% | Higher hallucination rates |

Detection Methods

Self-Consistency Check

Ask the same question multiple times with temperature > 0
If answers disagree, the model is uncertain → likely hallucinating

Entailment Verification

Have a second LLM verify whether the response is entailed by the source document
Used in RAG pipelines: does the generated answer match the retrieved context?

Confidence Elicitation

Ask the model to rate its confidence: "How certain are you about this, 1–10?"
Well-calibrated models show correlation between confidence and accuracy

External Verification

Check generated facts against a knowledge base or search engine
"Grounding" tools (Bing, Google) verify claims post-generation

Mitigation Strategies

RAG (Retrieval-Augmented Generation)

Don't rely on parametric memory for facts
Retrieve ground-truth documents and constrain answers to retrieved content
"Answer only based on the provided documents"
Most effective mitigation for factual hallucination

Grounding Instructions

"Only use information from the document below"
"If the answer is not in the context, say 'I don't know'"
Explicit instructions to cite sources

Reducing Temperature

Lower temperature → model sticks to highest-probability (more likely correct) tokens
Not a complete fix, but reduces creative/fabricated responses

Fine-Tuning on Factual Data

Fine-tune with examples of "I don't know" responses for uncertain queries
Train model to express calibrated uncertainty

Output Verification Pipeline

Post-process outputs with a fact-checking layer
External search/knowledge base verification
Hallucination detection classifiers

Structured Prompting

Force model to cite sources: "List the source sentence for each claim"
Forces the model to ground each claim in the provided context

High-Risk Domains for Hallucination

| Domain | Risk | Consequence |

|--------|------|-------------|

| Medical | High | Wrong dosage, wrong diagnosis |

| Legal | High | Fabricated case law, wrong statute |

| Financial | High | Wrong figures, fake regulations |

| Academic | High | Fake citations, wrong attributions |

| Code | Medium | Runtime errors, security vulnerabilities |

Related Concepts

Grounding, RAG, Alignment, Temperature, Inference, Calibration, RLHF, Context Window