Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

Ekaterina Fadeeva , Aleksandr Rubashevskii , Dzianis Piatrashyn , Roman Vashurin , Shehzaad Dhuliawala , Artem Shelmanov , Timothy Baldwin , Preslav Nakov

show 2 more authors

Mrinmaya Sachan Maxim Panov

Authors on Pith no claims yet

classification 💻 cs.CL

keywords franqfactualityhallucinationsretrievalretrievedansweringapproachescontext

0 comments

read the original abstract

Large Language Models (LLMs) enhanced with retrieval, an approach known as Retrieval-Augmented Generation (RAG), have achieved strong performance in open-domain question answering. However, RAG remains prone to hallucinations: factually incorrect outputs may arise from inaccuracies in the model's internal knowledge and the retrieved context. Existing approaches to mitigating hallucinations often conflate factuality with faithfulness to the retrieved evidence, incorrectly labeling factually correct statements as hallucinations if they are not explicitly supported by the retrieval. In this paper, we introduce FRANQ, a new method for hallucination detection in RAG outputs. FRANQ applies distinct uncertainty quantification (UQ) techniques to estimate factuality, conditioning on whether a statement is faithful to the retrieved context. To evaluate FRANQ and competing UQ methods, we construct a new long-form question answering dataset annotated for both factuality and faithfulness, combining automated labeling with manual validation of challenging cases. Extensive experiments across multiple datasets, tasks, and LLMs show that FRANQ achieves more accurate detection of factual errors in RAG-generated responses compared to existing approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge
cs.IR 2026-04 unverdicted novelty 6.0

SmartVector augments embeddings with time, confidence, and relation signals plus a consolidation process, raising top-1 accuracy on versioned queries from 31% to 62% on a synthetic benchmark while cutting stale answer...