HalluWorld is a controlled benchmark using explicit reference world models to automatically label and disentangle hallucinations in LLMs across synthetic environments with varying complexity and observability.
super hub Canonical reference
Survey of Hallucination in Natural Language Generation
Canonical reference. 88% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- background [315, 361]. Furthermore, Liu et al. [185], Zong et al. [395] and Liu et al. [184] show that LVLMs can be easily fooled and experience a severe performance drop due to their over-reliance on the strong language prior, as well as its inferior ability to defend against inappropriate user inputs [112, 134]. Jiang et al. [138], Wang et al. [315] and Jing et al. [141] took a step forward to holistically evaluate multi-modal hallucination. What's more, when presented with multiple images, LVLMs sometim
authors
co-cited works
roles
background 17representative citing papers
A new benchmark with cognitive traps shows frontier deep research agents achieve only 13-16% acceptance on expert consulting tasks under combined verifier and rubric criteria.
LibEvoBench benchmark shows LLMs are version-oblivious on evolving APIs, with documentation helping but version specification not.
MedHal-Loc benchmark shows KG-triple hallucination detectors localize errors no better than chance on controlled medical statements due to entity extraction limits, while NLI and consistency methods succeed above chance, and real hallucinations are mostly diffuse conclusion changes.
Empirical study of 2,214 MCP servers finds 9.93% of 19,200 description-code pairs inconsistent via a new static-analysis-plus-LLM-prompting framework, with security implications.
Locate-then-edit succeeds at the same early-to-mid MLP locations in masked diffusion models as in autoregressive models, but requires optimization over intermediate partial-mask states to handle multi-token targets.
Randomized experiment finds AI draft assistance raises feedback provision by teaching assistants 10.8 percentage points without harming quality.
Reflexive agents confabulate incorrect task interpretations in memory, detected via Reflection Repetition Rate metric, with a programmatic mitigation raising correct object mentions from 0% to 86% in frozen ALFWorld cases.
QAOD projects away question-aligned directions from answer representations to isolate domain-agnostic factuality signals, enabling efficient hallucination detection with top in-domain AUROC and up to 21% better OOD transfer.
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
LLMs routinely produce unsupported causal stories for personal sensing anomalies, and richer evidence or constrained prompts do not reliably eliminate this epistemic overreach.
Indirect elicitation via triplet comparisons recovers meaningful association structures from LLMs and supports conservative causal candidate links across prompted subpopulations.
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.
CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
Frontier LLMs generate BibTeX entries at 83.6% field accuracy but only 50.9% fully correct; two-stage clibib revision raises accuracy to 91.5% and fully correct entries to 78.3% with 0.8% regression.
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
Introduces a protocol scoring AI investment advisors on validity under constraints, stability, and agreement with a deterministic baseline, showing agreement often masks invalid actions.
Hallucination in world models is a data coverage issue predictable by three signals and preventable through targeted training sampling and online data collection.
Knowledge editing methods redistribute and suppress rather than overwrite facts in LLMs, creating narrow vulnerable regions in representation space that adversarial prompts can exploit.
Vaani Benchmark V1.0 is a multimodal Hindi ASR dataset from 104 districts featuring spontaneous speech recordings in real-world conditions and three independent transcriptions per segment for robust multi-reference evaluation.
CAPRA is a multi-agent LLM system with evidence anchoring and consistency checking that analyzes software architecture deliverables and meets 88.8% of an eight-criterion evaluation on 10 student reports.
Formulates pre-hoc fine-tuning prediction as stochastic estimation, proves lower bound on optimization variance decay rate, and introduces a three-regime predictability phase diagram.
IVIE generates complete playable interactive fiction worlds via a four-stage incremental pipeline that combines LLM creativity with symbolic validation for coherence.
citing papers explorer
-
Honest Lying: Understanding Memory Confabulation in Reflexive Agents
Reflexive agents confabulate incorrect task interpretations in memory, detected via Reflection Repetition Rate metric, with a programmatic mitigation raising correct object mentions from 0% to 86% in frozen ALFWorld cases.
-
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
QAOD projects away question-aligned directions from answer representations to isolate domain-agnostic factuality signals, enabling efficient hallucination detection with top in-domain AUROC and up to 21% better OOD transfer.
-
Eliciting associations between clinical variables from LLMs via comparison questions across populations
Indirect elicitation via triplet comparisons recovers meaningful association structures from LLMs and supports conservative causal candidate links across prompted subpopulations.
-
Hallucination in World Models is Predictable and Preventable
Hallucination in world models is a data coverage issue predictable by three signals and preventable through targeted training sampling and online data collection.
-
Exposing the Illusion of Erasure in Knowledge Editing for LLMs
Knowledge editing methods redistribute and suppress rather than overwrite facts in LLMs, creating narrow vulnerable regions in representation space that adversarial prompts can exploit.
-
A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction
Formulates pre-hoc fine-tuning prediction as stochastic estimation, proves lower bound on optimization variance decay rate, and introduces a three-regime predictability phase diagram.
-
Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG
Grounded Decoding fuses full-RAG and retrieval-only next-token distributions via normalized geometric mean from a KL-barycenter to improve factual consistency and citation quality in RAG.
-
The Attribution Contract: Feature Attribution for Generative Language Models
The paper proposes the Attribution Contract as a framework to resolve conceptual ambiguities in applying feature attribution to autoregressive and diffusion language models by explicitly specifying what is being explained.
-
TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins
TUNEAHEAD predicts fine-tuning performance from meta-features and short probes, reporting RMSE 1.47 and 95.1% of predictions within 3 points on 370 held-out runs of Qwen2.5-7B.
-
Short paper: Models in the dark -- Rectification and erasure under GDPR in ML supply chains
Survey identifying technical and supply-chain barriers to GDPR data subject rights in ML, with new framing of 'models in the dark' for downstream opacity.
-
From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity
EPGS detects high-confidence factual errors in LLMs by using embedding perturbations to measure gradient sensitivity as a proxy for sharp versus flat minima.
-
At the Edge of Understanding: Sparse Autoencoders Trace The Limits of Transformer Generalization
Sparse autoencoders show OOD prompts increase fallacious concept activation in transformers, offering a mechanistic measure of shift and a path to robust fine-tuning.
-
Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)
HUMBR reduces LLM hallucinations in enterprise workflows by using a hybrid semantic-lexical utility within minimum Bayes risk decoding to identify consensus outputs, with derived error bounds and reported outperformance over self-consistency on benchmarks and production data.
- From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales