arxiv: 2512.20182 · v4 · submitted 2025-12-23 · 💻 cs.CL · cs.AI

Recognition: 1 theorem link

· Lean Theorem

FaithLens: Detecting and Explaining Faithfulness Hallucination

Shuzheng Si , Qingyi Wang , Haozhe Zhao , Yuzhuo Bai , Guanqiao Chen , Kangyang Luo , Gang Chen , Fanchao Qi

show 3 more authors

Minjia Zhang Baobao Chang Maosong Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-16 20:41 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords faithfulness hallucinationLLM detectionexplanation generationreinforcement learningsynthetic datafine-tuningtrustworthy AIhallucination mitigation

0 comments

The pith

FaithLens, an 8B model, detects faithfulness hallucinations in LLM outputs and supplies explanations more accurately than GPT-5.2 or o3.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FaithLens to determine whether large language model responses stay faithful to their provided context or facts and to explain the judgment in natural language. This matters for practical uses such as retrieval-augmented generation and summarization, where undetected hallucinations reduce reliability. The method begins by creating training examples with advanced LLMs, applies a filtering step to retain only correct labels and high-quality explanations, performs supervised fine-tuning on an 8B model, and then refines it with rule-based reinforcement learning that scores both prediction accuracy and explanation quality. On twelve varied tasks the resulting model exceeds the performance of much larger systems while remaining computationally lighter.

Core claim

FaithLens is an 8B-parameter model trained first on filtered synthetic data produced by stronger LLMs and then optimized via rule-based reinforcement learning that rewards both correct binary faithfulness predictions and high-quality explanations, resulting in better detection and explanation performance than GPT-5.2 and o3 across twelve tasks.

What carries the argument

Two-stage training pipeline of supervised fine-tuning on filtered synthetic data followed by rule-based reinforcement learning that jointly optimizes prediction correctness and explanation quality.

If this is right

Deployed systems can flag unfaithful outputs in real time at modest compute cost and surface explanations for user review.
Joint prediction-plus-explanation training yields transparent detections that support debugging of retrieval-augmented generation and summarization pipelines.
The same 8B model generalizes across twelve tasks without requiring separate fine-tuning per application.
Smaller open models can reach or exceed closed large-model performance on specialized detection when trained with filtered synthetic data and targeted reinforcement learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embedding FaithLens inside generation loops could allow explanations to drive automatic correction of detected hallucinations.
The synthetic-data-plus-RL recipe may transfer to other hallucination categories such as factual or logical errors.
Low-cost local deployment becomes feasible for on-device checking in latency-sensitive or privacy-sensitive settings.
Feedback from the generated explanations could be looped back to improve the original generator model.

Load-bearing premise

Synthetic data generated by advanced LLMs and passed through the described filtering strategy produces labels and explanations that match the distribution of real faithfulness hallucinations.

What would settle it

If FaithLens underperforms GPT-5.2 on a fresh collection of human-annotated faithfulness labels drawn from actual deployed LLM applications, the superiority claim would not hold.

Figures

Figures reproduced from arXiv: 2512.20182 by Baobao Chang, Fanchao Qi, Gang Chen, Guanqiao Chen, Haozhe Zhao, Kangyang Luo, Maosong Sun, Minjia Zhang, Qingyi Wang, Shuzheng Si, Yuzhuo Bai.

**Figure 1.** Figure 1: The illustration of our FaithLens. Given a document doc and a claim c, FaithLens can jointly determine whether the claim is faithful or hallucinated and provide the corresponding explanations for its decision, applicable across various tasks. generalization abilities of LLMs and formulate it as a binary classification task (Wang et al., 2024). The first line of research leverages designed prompts to query… view at source ↗

**Figure 2.** Figure 2: The Overall Process of Training FaithLens, including (1) Cold-Start SFT: We first synthesize highquality data with explanations used for the SFT stage. (2) Rule-Based RL Training: We further refine the model using a rule-based RL approach with the designed rewards for both prediction correctness and explanation quality. outputs that are informative to users, improving trustworthiness in hallucination dete… view at source ↗

**Figure 3.** Figure 3: Human Evaluation. We compare the explanations from FaithLens and GPT-4o on 120 samples. ered dimension plays its expected role. Specifically, the label correctness filtering affects the model’s prediction performance. The explanation quality filtering influences the model’s explainability, and the data diversity filtering impacts the consistency of the model’s cross-task performance. Meanwhile, the propos… view at source ↗

**Figure 4.** Figure 4: Prompt used for training and inference of FaithLens. [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Prompt used for data synthesis. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Prompts used for our designed explanation quality filtering. To assess the explanation quality, we [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Prompts used for data diversity filtering. The upper part of the prompt is used to measure the model’s [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Prompt used for computing explanation quality reward. [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Prompt used for scoring the generated explanations using the LLM as a judge. [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Prompts used for evaluating LLM-based baselines. The upper part of prompt is adapted from [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

**Figure 11.** Figure 11: Prompt used for claim decontextualization. [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗

**Figure 12.** Figure 12: Prompt used for claim decomposition. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

**Figure 13.** Figure 13: The principles of human evaluation. Prompt used for question 1 in variant methods testing Determine whether the provided claim is consistent with the corresponding document. Consistency in this context implies that all information presented in the claim is substantiated by the document. If not, it should be considered inconsistent. - First, please provide an easy-to-understand explanation for your answer … view at source ↗

**Figure 14.** Figure 14: Prompt used for question 1 in variant methods testing. [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗

**Figure 15.** Figure 15: Prompt used for question 1 in variant methods testing. [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗

**Figure 16.** Figure 16: Case study from LLM-AggreFact. Case study from HoVer Document: Animation is the process of making the illusion of motion and the illusion of change by means of the rapid succession of sequential images that minimally differ from each other. The illusion—as in motion pictures in general—is thought to rely on the phi phenomenon and beta movement, but the exact causes are still unclear. Tom and Jerry: A Nutc… view at source ↗

**Figure 17.** Figure 17: Case study from HoVer. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗

read the original abstract

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-5.2 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FaithLens packages synthetic data filtering plus rule-based RL into an 8B joint detector-explainers that claims to beat GPT-5.2 and o3, but the abstract gives no metrics or external checks.

read the letter

The core offering is an 8B model trained to flag faithfulness hallucinations and output explanations at the same time. The recipe starts with advanced LLMs generating labeled examples plus explanations, applies filters for label accuracy and diversity, does a cold-start fine-tune, then runs rule-based RL that scores both the binary decision and the explanation quality. That specific sequence is what the paper puts forward as new enough to name the system FaithLens. It targets a real pain point in RAG and summarization where you want something smaller and faster than prompting the biggest models every time. The joint prediction-plus-explanation goal is sensible because downstream users need to understand the flag, not just receive it. The efficiency angle is also straightforward: if an 8B model really holds up, it lowers the cost of adding this check to pipelines. The main gaps sit in the evidence. The abstract states strong results across 12 tasks without reporting any numbers, baselines, variance, or dataset sizes, so the outperformance claim cannot be assessed yet. More importantly, the training data comes from the same class of models whose outputs are being judged; even with filtering, there is no visible human validation or ablation on real, non-synthetic cases to show the labels and explanations match actual faithfulness failures rather than model-specific artifacts. If the full paper supplies those checks and ablations, the work becomes more usable. This is worth sending to referees because the problem is practical and the method is concrete, but any review would need to focus on whether the synthetic pipeline actually generalizes.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces FaithLens, an 8B-parameter model for detecting faithfulness hallucinations in LLM outputs while jointly generating explanations. It synthesizes training data via advanced LLMs, applies a filtering strategy to ensure label correctness, explanation quality, and diversity, then performs supervised fine-tuning followed by rule-based reinforcement learning that rewards both prediction accuracy and explanation quality. The central claim is that this model outperforms GPT-5.2 and o3 on 12 diverse tasks and delivers high-quality explanations, providing an efficient and trustworthy alternative for applications such as RAG and summarization.

Significance. If the performance and generalization claims hold after proper validation, FaithLens would offer a practically useful advance by delivering competitive hallucination detection and explanation quality in a compact open model, improving efficiency and trustworthiness over larger proprietary systems. The approach of combining synthetic data curation with RL optimization for dual objectives is a reasonable direction, though its impact depends on whether the synthetic pipeline produces signals that transfer to real distributions.

major comments (2)

[Abstract] Abstract: the claim that the 8B FaithLens 'outperforms advanced models such as GPT-5.2 and o3' on 12 tasks is presented without any metrics, baselines, error bars, dataset sizes, or statistical details, so the central empirical result cannot be evaluated from the provided text.
[Data Synthesis and Filtering] Data synthesis section: the pipeline generates labels and explanations from advanced LLMs followed by filtering, yet no human validation, inter-annotator agreement, or ablation on non-synthetic real-world data (e.g., RAG contexts) is reported to confirm that the filtered signals match actual faithfulness hallucination distributions.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one concrete performance number (e.g., average accuracy or F1) alongside the model comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the 8B FaithLens 'outperforms advanced models such as GPT-5.2 and o3' on 12 tasks is presented without any metrics, baselines, error bars, dataset sizes, or statistical details, so the central empirical result cannot be evaluated from the provided text.

Authors: We agree that the abstract would benefit from quantitative support for the central claim. The full paper includes detailed results tables with metrics, baselines, error bars, and dataset sizes across the 12 tasks (see Section 4 and Table 2). In the revised version, we will update the abstract to include key summary statistics, such as average accuracy gains and dataset scale, while preserving brevity. revision: yes
Referee: [Data Synthesis and Filtering] Data synthesis section: the pipeline generates labels and explanations from advanced LLMs followed by filtering, yet no human validation, inter-annotator agreement, or ablation on non-synthetic real-world data (e.g., RAG contexts) is reported to confirm that the filtered signals match actual faithfulness hallucination distributions.

Authors: We acknowledge this point on validation. Our filtering pipeline uses automated rule-based checks for label accuracy, explanation quality, and diversity to align with faithfulness distributions, and the RL stage further optimizes for real-task performance. However, we did not include human validation or real-data ablations in the initial submission. We will add a human evaluation study (with inter-annotator agreement) on sampled data and an ablation on real RAG contexts in the revised manuscript to directly address transferability. revision: yes

Circularity Check

0 steps flagged

No significant circularity in FaithLens derivation chain

full rationale

The paper describes synthesizing training data via advanced LLMs, applying a filtering strategy for label correctness and quality, then fine-tuning an 8B model as cold start followed by rule-based RL optimization using rewards for correctness and explanation quality. The central performance claims on 12 tasks are presented as empirical outcomes of this pipeline and do not reduce any numeric result or prediction to the synthetic inputs by construction. No self-definitional steps, fitted-input predictions, load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the derivation. The approach is self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that LLM-generated synthetic data, once filtered, provides reliable supervision for both detection accuracy and explanation quality; no free parameters or new entities are explicitly introduced.

axioms (1)

domain assumption Advanced LLMs can generate synthetic training examples whose labels and explanations are sufficiently accurate after filtering to serve as ground truth for faithfulness hallucination detection.
Invoked in the first step of data synthesis and filtering.

pith-pipeline@v0.9.0 · 5499 in / 1203 out tokens · 21444 ms · 2026-05-16T20:41:54.543200+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Context to Skills: Can Language Models Learn from Context Skillfully?
cs.AI 2026-04 unverdicted novelty 8.0

Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

InProceedings of the 61st An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16409– 16423, Toronto, Canada

MeetingBank: A benchmark dataset for meet- ing summarization. InProceedings of the 61st An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16409– 16423, Toronto, Canada. Association for Computa- tional Linguistics. Lei Huang, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yangfan Ye, Weihong Zhong, Yux-...

work page arXiv 2025
[2]

OpenAI o1 System Card

Openai o1 system card.arXiv preprint arXiv:2412.16720. Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of halluci- nation in natural language generation.ACM Comput. Surv., 55(12). Yichen Jiang, Shikha Bordia, Zheng Zhong, Charles Dognin, Maneesh Singh, and Mohit Ban...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

FactCG: Enhancing fact checkers with graph- based multi-hop data. InProceedings of the 2025 Conference of the Nations of the Americas Chap- ter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Pa- pers), pages 5002–5020, Albuquerque, New Mexico. Association for Computational Linguistics. Anguo Li and Lei Yu. 20...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Preprint, arXiv:2502.01812

Selfcheckagent: Zero-resource hallucina- tion detection in generative large language models. Preprint, arXiv:2502.01812. Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Ça˘glar Gu9lçehre, and Bing Xiang. 2016. Abstrac- tive text summarization using sequence-to-sequence RNNs and beyond. InProceedings of the 20th SIGNLL Conference on Computational Natural ...

work page arXiv 2016
[5]

GPT-4 Technical Report

Don‘t give me the details, just the summary! topic-aware convolutional neural networks for ex- treme summarization. InProceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807, Brussels, Bel- gium. Association for Computational Linguistics. Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, ...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Measuring short-form factuality in large language models

Document segmentation matters for retrieval- augmented generation. InFindings of the Associa- tion for Computational Linguistics: ACL 2025, pages 8063–8075, Vienna, Austria. Association for Compu- tational Linguistics. Jason Wei, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, and William Fedus. 2024a. Mea- su...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11328–11348, Toronto, Canada

AlignScore: Evaluating factual consistency with a unified alignment function. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11328–11348, Toronto, Canada. Association for Computational Linguistics. Wenxuan Zhou, Sheng Zhang, Hoifung Poon, and Muhao Chen. 2023. Context-faithful promp...

work page 2023
[8]

InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online

MediaSum: A large-scale media interview dataset for dialogue summarization. InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies, pages 5927–5934, Online. Association for Computational Linguistics. 14 Appendix This appendix is organized as follows. • In Section A...

work page 2021
[9]

uses synthetic data and multi-task training, enabling the model to engage in CoT reasoning be- 15 fore answering. However, despite these advances in prediction performance, current models still pro- vide only binary labels without accompanying ex- planations for real-world users, and often exhibit inconsistent performance across tasks. In this work, we fi...

work page 2025
[10]

is an evaluation benchmark for summariza- tion targetingCNN(/DM)(Nallapati et al., 2016) andXSum(Narayan et al., 2018). It focuses on the SOTA sets, where documents are from the original CNN and XSum datasets and summaries are gener- ated from SOTA finetuned summarizers, since their analysis suggests that summaries are more chal- lenging to evaluate for h...

work page 2016
[11]

Yes” or “No

Specifically, we sequentially perform Label Cor- 5https://huggingface.co/just1nseo/ClearCheck-8B 17 Model # Data Initial Whole Data (i.e., - w/o. Data Filtering) 52,268 Initial SFT Data (i.e., - w/o. Data Filtering) 35,554 Initial Data For RL 16,714 Filtered Data from whole Data Filtering 23,625 Filtered Data from Label Correctness Filtering 14,258 Filter...

work page 2025
[12]

Readability (1–5): The explanation should be written in a clear and well-structured manner that enables the reader to easily follow the reasoning behind the model’s conclusion. Beyond sentence fluency, focus on whether the explanation presents ideas in a logical sequence, avoids ambiguity, and makes it straightforward for the user to correctly understand ...

work page
[13]

Focus on whether the reasoning is clear and logically connected to the claim and document, enabling the user to act on, adapt, or reconsider the claim if needed

Helpfulness (1–5): The explanation should effectively guide the user to understand why the model arrived at its conclusion. Focus on whether the reasoning is clear and logically connected to the claim and document, enabling the user to act on, adapt, or reconsider the claim if needed

work page
[14]

readability

Informativeness (1–5): The explanation should provide detailed, specific, and substantive information relevant to the claim and document. Focus on the richness and completeness of content, such as explicit evidence cited, nuanced reasoning, or contextual details that give a deeper understanding, even beyond what is strictly needed to justify the conclusio...

work page
[16]

The STATEMENT does not need to be explicitly supported by the DOCUMENT but should be strongly implied by the DOCUMENT

Determine whether the given STATEMENT is supported by the given DOCUMENT. The STATEMENT does not need to be explicitly supported by the DOCUMENT but should be strongly implied by the DOCUMENT

work page
[17]

As part of your reasoning, summarize the main points of the DOCUMENT

Before showing your answer, think step-by-step and show your specific reasoning. As part of your reasoning, summarize the main points of the DOCUMENT

work page
[20]

Your final answer should be either [Attributable] or [Not Attributable], or [Contradictory]

work page
[21]

DOCUMENT: [DOCUMENT PLACEHOLDER] STATEMENT: [STATEMENT PLACEHOLDER] - - - Instructions:

Wrap your final answer in square brackets. DOCUMENT: [DOCUMENT PLACEHOLDER] STATEMENT: [STATEMENT PLACEHOLDER] - - - Instructions:

work page
[22]

You have been given a STATEMENT and some DOCUMENT

work page
[23]

The STATEMENT does not need to be explicitly supported by the DOCUMENT, but should be strongly implied by the DOCUMENT

Determine whether the given STATEMENT is supported by the given DOCUMENT. The STATEMENT does not need to be explicitly supported by the DOCUMENT, but should be strongly implied by the DOCUMENT

work page
[24]

As part of your reasoning, summarize the main points of the DOCUMENT

Before showing your explanation and answer, think step-by-step and show your chain of thought and specific reasoning. As part of your reasoning, summarize the main points of the DOCUMENT

work page
[25]

If the STATEMENT is supported by the DOCUMENT, be sure to show the supporting evidence

work page
[26]

After stating your reasoning, restate the STATEMENT and then determine your final answer based on your reasoning and the STATEMENT

work page
[27]

This explanation should be understandable to a human reader and should not reveal the model’s internal chain of thought

After your reasoning but before the final answer, provide a human-readable explanation (<explanation>) that clearly and concisely justifies your conclusion, citing specific parts or descriptions from the DOCUMENT that support or contradict the STATEMENT. This explanation should be understandable to a human reader and should not reveal the model’s internal...

work page
[28]

Your final answer should be either [Attributable] or [Not Attributable], or [Contradictory].Wrap your final answer in square brackets

work page
[29]

Attributable

Your final output must follow the exact structure: <think>step-by-step reasoning (your internal reasoning)</think> <reason>human-readable justification using evidence from the document</reason> <answer>[Attributable] or [Not Attributable] or [Contradictory]</answer> DOCUMENT: [DOCUMENT PLACEHOLDER] STATEMENT: [STATEMENT PLACEHOLDER] Figure 10: Prompts use...

work page 2025
[30]

Readability: The explanation should be written in a clear and well-structured manner that enables the reader to easily follow the reasoning behind the model’s conclusion. Beyond sentence fluency, focus on whether the explanation presents ideas in a logical sequence, avoids ambiguity, and makes it straightforward for the user to correctly understand why th...

work page
[31]

Focus on whether the reasoning is clear and logically connected to the claim and document, enabling the user to act on, adapt, or reconsider the claim if needed

Helpfulness: The explanation should effectively guide the user to understand why the model arrived at its conclusion. Focus on whether the reasoning is clear and logically connected to the claim and document, enabling the user to act on, adapt, or reconsider the claim if needed

work page
[32]

Yes” or “No

Informativeness: The explanation should provide detailed, specific, and substantive information relevant to the claim and document. Focus on the richness and completeness of content, such as explicit evidence cited, nuanced reasoning, or contextual details that give a deeper understanding, even beyond what is strictly needed to justify the conclusion. Fin...

work page 2007