Intermediate·5 min read

Hallucination

Hallucination is the phenomenon where an LLM confidently generates information that is factually incorrect, fabricated, or not supported by its traini

Definition

Hallucination is the phenomenon where an LLM confidently generates information that is factually incorrect, fabricated, or not supported by its training data or the provided context. The model produces plausible-sounding but false content — often without any indication of uncertainty.

Why the Term "Hallucination"?

Borrowed from psychology — the model "perceives" (generates) something that isn't there, just as a person hallucinating sees things that don't exist. The model is not lying intentionally; it genuinely doesn't know it's wrong.

Types of Hallucination

Factual Hallucination

  • Generates incorrect facts presented as true
  • Examples:
  • - "Albert Einstein won the Nobel Prize in 1921 for the theory of relativity" (he won it for the photoelectric effect)

    - Fabricating a book title, author, or publication date

    - Inventing a court case, law, or medical study

    Contextual Hallucination

  • Contradicts information explicitly provided in the prompt
  • The document says X, but the model summarizes it as Y
  • Common in RAG systems when the model ignores retrieved context
  • Citation / Reference Hallucination

  • Fabricates academic papers, URLs, book ISBNs, quotes
  • Citations look real (correct format, plausible authors) but don't exist
  • Particularly dangerous in legal, medical, and academic contexts
  • Code Hallucination

  • References non-existent library functions or APIs
  • Example: df.aggregate_by_column(...) — a function that doesn't exist
  • Generates code that looks syntactically correct but doesn't run
  • Identity Hallucination

  • Confuses different people, organizations, or events with similar names
  • Example: mixing up two different "John Smith" researchers
  • Root Causes

    1. Training Objective Mismatch

  • The model is trained to predict the next token (plausibility), not to be accurate
  • Confident, fluent text is rewarded even if incorrect
  • No explicit "I don't know" signal in standard language modeling
  • 2. Knowledge Gaps and Cutoffs

  • Training data has a cutoff date — model doesn't know about recent events
  • Rare topics may have insufficient training signal
  • Model "fills in" gaps with plausible-sounding content
  • 3. Over-Smoothing via RLHF

  • RLHF rewards helpful-sounding responses
  • Confidently wrong answers often sound more helpful than "I don't know"
  • Can inadvertently reward hallucination
  • 4. Long-Range Context Failure

  • For very long contexts, models lose track of details mentioned earlier
  • Generates output inconsistent with early context
  • 5. Compression Artifacts

  • Model weights are a lossy compression of training data
  • Specific details (exact quotes, statistics, dates) are poorly retained
  • Hallucination Rate Benchmarks

    | Model | TruthfulQA Score | Notes |

    |-------|-----------------|-------|

    | GPT-4 | ~70–80% | Among best |

    | Claude 3 Opus | ~75–85% | Strong |

    | GPT-3.5 | ~50–60% | Significant hallucination |

    | Smaller models | 30–50% | Higher hallucination rates |

    Detection Methods

    Self-Consistency Check

  • Ask the same question multiple times with temperature > 0
  • If answers disagree, the model is uncertain → likely hallucinating
  • Entailment Verification

  • Have a second LLM verify whether the response is entailed by the source document
  • Used in RAG pipelines: does the generated answer match the retrieved context?
  • Confidence Elicitation

  • Ask the model to rate its confidence: "How certain are you about this, 1–10?"
  • Well-calibrated models show correlation between confidence and accuracy
  • External Verification

  • Check generated facts against a knowledge base or search engine
  • "Grounding" tools (Bing, Google) verify claims post-generation
  • Mitigation Strategies

    RAG (Retrieval-Augmented Generation)

  • Don't rely on parametric memory for facts
  • Retrieve ground-truth documents and constrain answers to retrieved content
  • "Answer only based on the provided documents"
  • Most effective mitigation for factual hallucination
  • Grounding Instructions

  • "Only use information from the document below"
  • "If the answer is not in the context, say 'I don't know'"
  • Explicit instructions to cite sources
  • Reducing Temperature

  • Lower temperature → model sticks to highest-probability (more likely correct) tokens
  • Not a complete fix, but reduces creative/fabricated responses
  • Fine-Tuning on Factual Data

  • Fine-tune with examples of "I don't know" responses for uncertain queries
  • Train model to express calibrated uncertainty
  • Output Verification Pipeline

  • Post-process outputs with a fact-checking layer
  • External search/knowledge base verification
  • Hallucination detection classifiers
  • Structured Prompting

  • Force model to cite sources: "List the source sentence for each claim"
  • Forces the model to ground each claim in the provided context
  • High-Risk Domains for Hallucination

    | Domain | Risk | Consequence |

    |--------|------|-------------|

    | Medical | High | Wrong dosage, wrong diagnosis |

    | Legal | High | Fabricated case law, wrong statute |

    | Financial | High | Wrong figures, fake regulations |

    | Academic | High | Fake citations, wrong attributions |

    | Code | Medium | Runtime errors, security vulnerabilities |

    Related Concepts

  • Grounding, RAG, Alignment, Temperature, Inference, Calibration, RLHF, Context Window

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 6).