Larger LLMs hallucinate more often despite having the correct concept available because instruction tuning causes probability mass to disperse across alternative surface forms instead of concentrating on one.
Nature , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
method 1polarities
background 1representative citing papers
Instruction token embeddings encode visual information that can be leveraged to detect object hallucinations in MLLMs via a new combined score outperforming prior detectors.
DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
citing papers explorer
-
Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer
Larger LLMs hallucinate more often despite having the correct concept available because instruction tuning causes probability mass to disperse across alternative surface forms instead of concentrating on one.
-
Instruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language Models
Instruction token embeddings encode visual information that can be leveraged to detect object hallucinations in MLLMs via a new combined score outperforming prior detectors.
-
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.