arxiv: 2605.03971 · v1 · submitted 2026-05-05 · 💻 cs.CL

Recognition: unknown

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

Beizhe Hu, Danding Wang, Hao Mi, Hengqi Zeng, Juan Cao, Qiang Sheng, Shaofei Wang, Yang Li, Yifan Sun, Zhengjia Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 03:29 UTC · model grok-4.3

classification 💻 cs.CL

keywords hallucination detectionlarge language modelslogical consistencyself-judgmentmeta-judgmentmutual learninglabel constraintneural-symbolic integration

0 comments

The pith

LaaB improves LLM hallucination detection by using logical consistency between responses and self-judgments to bridge neural features with symbolic reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing detectors focus either on internal neural uncertainty or on prompted self-judgments but treat these as separate signals. LaaB adds a meta-judgment step that translates the symbolic self-judgment label into the feature space, then applies a consistency rule: the response label and meta-judgment label must match or oppose each other according to the self-judgment's semantics. Mutual learning aligns the two views under this constraint. The result is a more complete detection signal that exploits the natural interdependence of the model's implicit and explicit behaviors. Readers care because reliable hallucination detection is required for trusting LLMs in factual tasks such as question answering and knowledge retrieval.

Core claim

The central claim is that an inherent logical bridge exists between an LLM response label and its meta-judgment label (same or opposite depending on self-judgment semantics). By mapping symbolic judgments back into feature space via the meta-judgment process and enforcing the label constraint during mutual learning, the framework integrates neural-level patterns with symbolic reasoning to produce stronger hallucination detection than either signal alone.

What carries the argument

The meta-judgment process that maps symbolic self-judgment labels into neural feature space, together with the logical consistency constraint that requires response and meta-judgment labels to be identical or opposite based on the self-judgment semantics.

If this is right

Detection improves when neural uncertainty and symbolic self-judgments are aligned through mutual learning rather than used in isolation.
The same-or-opposite label constraint produces consistent gains across four LLMs and four public datasets.
The approach integrates implicit neural features with explicit symbolic judgments without requiring dataset-specific tuning.
Experiments against eight baselines confirm that bridging the two views outperforms single-facet methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The logical consistency mechanism could extend to other LLM tasks that produce both an output and a meta-reasoning step, such as chain-of-thought verification.
Enforcing label relations between generations and self-assessments may offer a path to training more internally consistent models by design.
The framework suggests a general route for combining black-box prompting signals with white-box feature analysis in LLM evaluation pipelines.

Load-bearing premise

The logical consistency relation between response labels and meta-judgment labels supplies a reliable, non-circular signal that improves detection without adding bias or needing post-hoc fitting to test data.

What would settle it

Applying LaaB to a new LLM or held-out dataset and observing no gain or a loss relative to the eight baselines would show that the logical bridge does not reliably enhance detection.

Figures

Figures reproduced from arXiv: 2605.03971 by Beizhe Hu, Danding Wang, Hao Mi, Hengqi Zeng, Juan Cao, Qiang Sheng, Shaofei Wang, Yang Li, Yifan Sun, Zhengjia Wang.

**Figure 1.** Figure 1: Comparison between existing hallucination detection methods (a-b) and our proposed LaaB (c). Unlike methods that rely solely on micro-level intrinsic patterns (a) or macro-level symbolic judgment (b), LaaB bridges these two views by enforcing logical consistency via logic-constraint mutual learning. To perform effective integration, we propose a hallucination detection method named LaaB (Logical Consisten… view at source ↗

**Figure 2.** Figure 2: Overall architecture of LaaB. Given a user query and corresponding response, LaaB first performs (a) Response Hallucination Modeling, extracting intrinsic features from the response generation to capture implicit uncertainty. (b) Self-Judgment Hallucination Modeling introduces a meta-judgment process that analyzes the elicited verbal judgment to mitigate evaluative biases. Finally, (c) Logic-Constrained Mu… view at source ↗

**Figure 3.** Figure 3: Prediction correctness transitions before and after applying LaaB. How do the predictions change before and after applying LaaB? We categorized all testing data into four groups according to the independently trained detectors Dr and Dj : Only Dr ✓ ( ), only Dj ✓ ( ), both × ( ), and both ✓. For the categories except “both ✓”, we visualize the prediction correctness transition with the sankey diagram ( view at source ↗

**Figure 4.** Figure 4: Performance on testing subsets with instances in different length intervals. 3) Interestingly, we find a small but consistent flow from the “both ×” category to the correct class on all four datasets (the blue flow to purple end), which is beyond expectation. This might be because the logical constraints from LaaB introduced weak yet useful supervision signals that allow Dr to refine the learned representa… view at source ↗

**Figure 5.** Figure 5: Normalized cumulative distribution of KL divergence across all attention heads in Llama-3.1-70B. The pronounced long-tail distribution reveals that a small subset of heads captures the core factual discriminative capacity, while the majority provides only weak signals. The orange dashed line indicates a cumulative probability of 0.85. attention scores assigned to the context tokens within each segment, yie… view at source ↗

read the original abstract

Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or elicit macro-level self-judgments through verbalized prompts. However, these methods address only a single facet of the hallucination, focusing either on implicit neural uncertainty or explicit symbolic reasoning, thereby treating these inherently coupled behaviors in isolation and failing to exploit their interdependence for a holistic view. In this paper, we propose LaaB (Logical Consistency-as-a-Bridge), a framework that bridges neural features and symbolic judgments for hallucination detection. LaaB introduces a "meta-judgment" process to map symbolic labels back into the feature space. By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection. Extensive experiments on 4 public datasets, across 4 LLMs, against 8 baselines demonstrate the superiority of LaaB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LaaB adds a meta-judgment step and logical consistency rule to fuse uncertainty and self-judgment signals for hallucination detection, but the reported gains rest on experiments whose independence from the model's own outputs is not yet clear.

read the letter

The paper's main move is to insert a meta-judgment that converts a model's verbal self-judgment back into feature space, then train the detector with a mutual-learning objective that respects an explicit logical rule: the response label and the meta-judgment label must be the same or opposite depending on the semantics of the self-judgment. That combination is presented as the bridge between neural uncertainty and symbolic reasoning that prior single-facet methods missed. The experiments cover four datasets and four LLMs and show consistent improvement over eight baselines, which is the strongest evidence the work offers right now. The breadth of the testbed is useful and gives the claim some practical weight. The implementation details for the meta-judgment mapping and the exact form of the consistency loss are not spelled out in the abstract, but the overall framing is straightforward enough that a reader can see what they are trying to do. The soft spot is the one the stress-test note already flags. Because the self-judgment and the original response are both generated by the same LLM, the semantic rule that decides “same” versus “opposite” is itself a model output. If that rule correlates with the uncertainty patterns the detector is meant to catch, the consistency constraint can simply reinforce existing errors rather than supply an independent supervisory signal. The abstract does not report an ablation that keeps the meta-judgment but severs or randomizes the logical link, so it remains unclear whether the gains come from the bridge or from training on an additional derived view. The paper is aimed at people already working on hybrid hallucination detectors who want to see one concrete way to couple the two signals. A reader who needs a new baseline or a starting point for further ablations will find it worth reading. It is coherent on its own terms and the experimental scope is wide enough that it should go to peer review rather than be desk-rejected. The referees will need to press on the circularity question and ask for the missing implementation details, but the work is worth that effort.

Referee Report

2 major / 1 minor

Summary. The paper proposes LaaB (Logical Consistency-as-a-Bridge), a framework for detecting factual hallucinations in LLMs. It introduces a meta-judgment process that maps symbolic self-judgment labels back into the neural feature space of the original response. By enforcing an inherent logical consistency constraint—where response and meta-judgment labels must be the same or opposite according to the semantics of the self-judgment—LaaB aligns the dual signals through mutual learning and reports improved detection performance over eight baselines on four public datasets and four LLMs.

Significance. If the logical bridge supplies a genuinely non-circular supervisory signal, the approach would meaningfully advance the field by integrating implicit neural uncertainty with explicit symbolic reasoning rather than treating them in isolation. The multi-dataset, multi-LLM, multi-baseline experimental design is a clear strength and provides a solid empirical foundation for the claims. However, the absence of targeted ablations on the core assumption leaves the source of the reported gains ambiguous.

major comments (2)

[Abstract and §3] Abstract and §3 (Method): The central claim rests on the 'inherent logical bridge' providing a non-circular signal for mutual learning. Yet the same LLM generates both the response and the self-judgment, and the same/opposite rule is derived directly from the self-judgment's semantics. No ablation is described that severs this semantic dependence (e.g., by randomizing or replacing the semantic mapping while retaining the meta-judgment structure) to isolate whether gains arise from the logical constraint or simply from training on an additional derived view. This is load-bearing for the contribution.
[§4] §4 (Experiments): The abstract states superiority on four datasets and four LLMs against eight baselines, but the manuscript provides no details on the implementation of the meta-judgment process, the exact loss functions used for mutual learning, the train/validation/test splits, or any independent validation that the logical constraint itself is reliable and non-circular. These omissions prevent assessment of reproducibility and of whether the constraint introduces new biases correlated with the hallucination patterns being measured.

minor comments (1)

[Abstract] Abstract: The method description is compressed; a single additional sentence clarifying how the meta-judgment maps labels back into feature space would improve accessibility for readers unfamiliar with the dual-view setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater clarity, reproducibility, and validation of the core claims.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): The central claim rests on the 'inherent logical bridge' providing a non-circular signal for mutual learning. Yet the same LLM generates both the response and the self-judgment, and the same/opposite rule is derived directly from the self-judgment's semantics. No ablation is described that severs this semantic dependence (e.g., by randomizing or replacing the semantic mapping while retaining the meta-judgment structure) to isolate whether gains arise from the logical constraint or simply from training on an additional derived view. This is load-bearing for the contribution.

Authors: We agree that an explicit ablation isolating the semantic logical constraint is necessary to substantiate the non-circular nature of the supervisory signal. The current framework defines the same/opposite mapping directly from the self-judgment semantics, which is intentional to bridge neural and symbolic views. To address this concern, we will add a targeted ablation in the revised §3 and §4: we will randomize the label mapping (assigning same/opposite independently of semantics while preserving the meta-judgment structure and training procedure) and report the resulting performance drop relative to LaaB. This will help demonstrate that gains derive from the logical consistency rather than merely from an additional derived view. We will also expand the method description to clarify this assumption and its implications. revision: yes
Referee: [§4] §4 (Experiments): The abstract states superiority on four datasets and four LLMs against eight baselines, but the manuscript provides no details on the implementation of the meta-judgment process, the exact loss functions used for mutual learning, the train/validation/test splits, or any independent validation that the logical constraint itself is reliable and non-circular. These omissions prevent assessment of reproducibility and of whether the constraint introduces new biases correlated with the hallucination patterns being measured.

Authors: We acknowledge that the original manuscript omitted key implementation details, which limits reproducibility and independent assessment of the logical constraint. In the revised §4, we will add: (1) full description of the meta-judgment process, including prompt templates and how symbolic labels are mapped back to feature space; (2) the precise loss functions for mutual learning, with equations; (3) explicit train/validation/test splits for all datasets and LLMs; and (4) new validation analyses, such as empirical consistency rates between responses and meta-judgments plus discussion of potential biases. These additions will enable full reproduction and allow readers to evaluate whether the constraint introduces correlated biases. revision: yes

Circularity Check

1 steps flagged

Logical consistency bridge is self-definitional from self-judgment semantics

specific steps

self definitional [Abstract]
"By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection."

The same/opposite relation is dictated directly by the semantic interpretation of the self-judgment output itself rather than being an independent constraint. Because the self-judgment is generated by the same LLM whose responses are under scrutiny, the 'bridge' becomes a tautological mapping: the label relationship is true by how the judgment prompt is defined, not by external logic or data. Mutual learning therefore aligns quantities already linked by construction.

full rationale

The paper's core contribution rests on imposing a 'logical bridge' that forces response and meta-judgment labels to be identical or opposite according to the semantic content of the self-judgment. This relation is not learned from data, derived from first principles, or validated externally; it is directly encoded by how the self-judgment prompt is worded and interpreted. Mutual learning then operates on signals whose alignment is predefined by construction, matching the self-definitional pattern. The abstract explicitly states the bridge is 'inherent' and 'based on the self-judgment's semantics,' confirming the reduction. No other circular steps (e.g., self-citation chains or fitted predictions) appear in the provided text. The method may still yield empirical gains by incorporating an extra derived view, but the claimed 'non-circular supervisory signal' reduces to a definitional mapping.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that neural uncertainty and symbolic self-judgment signals are coupled through an inherent logical relationship that can be exploited without circularity. The meta-judgment process is introduced as a new component without independent evidence outside the paper's own experiments.

axioms (1)

domain assumption Response labels and meta-judgment labels are either identical or opposite depending on the semantics of the self-judgment
This logical bridge is invoked as the mechanism that allows alignment of dual-view signals; it is presented as inherent rather than derived from data.

invented entities (1)

meta-judgment process no independent evidence
purpose: Maps symbolic self-judgment labels back into the neural feature space to enable mutual learning
This is a new component introduced by the framework to bridge the two views; no independent evidence outside the paper is provided in the abstract.

pith-pipeline@v0.9.0 · 5523 in / 1627 out tokens · 49304 ms · 2026-05-07T03:29:51.029584+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

98 extracted references · 52 canonical work pages · 1 internal anchor

[1]

Transactions on Machine Learning Research , issn=

Emergent Abilities of Large Language Models , author=. Transactions on Machine Learning Research , issn=. 2022 , url=

2022
[2]

Computational Linguistics , volume =

Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models , author =. Computational Linguistics , volume =. 2025 , month =

2025
[3]

Survey of hallucination in natural language generation,

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , title =. 2023 , issue_date =. doi:10.1145/3571730 , journal =

work page doi:10.1145/3571730 2023
[4]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Huang, Lei and Yu, Weijiang and Ma, Weitao and Zhong, Weihong and Feng, Zhangyin and Wang, Haotian and Chen, Qianglong and Peng, Weihua and Feng, Xiaocheng and Qin, Bing and Liu, Ting , title =. 2025 , issue_date =. doi:10.1145/3703155 , journal =

work page doi:10.1145/3703155 2025
[5]

2026 , eprint=

On the Fundamental Limits of LLMs at Scale , author=. 2026 , eprint=

2026
[6]

2025 , eprint=

On the Fundamental Impossibility of Hallucination Control in Large Language Models , author=. 2025 , eprint=

2025
[7]

2025 , eprint=

Hallucination is Inevitable: An Innate Limitation of Large Language Models , author=. 2025 , eprint=

2025
[8]

Advances in Neural Information Processing Systems , volume=

Llm evaluators recognize and favor their own generations , author=. Advances in Neural Information Processing Systems , volume=. 2024 , doi=

2024
[9]

Self-Preference Bias in

Koki Wataoka and Tsubasa Takahashi and Ryokan Ri , booktitle=. Self-Preference Bias in. 2024 , url=

2024
[10]

Too Consistent to Detect: A Study of Self-Consistent Errors in LLM s

Tan, Hexiang and Sun, Fei and Liu, Sha and Su, Du and Cao, Qi and Chen, Xin and Wang, Jingang and Cai, Xunliang and Wang, Yuanzhuo and Shen, Huawei and Cheng, Xueqi. Too Consistent to Detect: A Study of Self-Consistent Errors in LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.238

work page doi:10.18653/v1/2025.emnlp-main.238 2025
[11]

and Ren, Xiang and Sap, Maarten

Zhou, Kaitlyn and Hwang, Jena D. and Ren, Xiang and Sap, Maarten. Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.198

work page doi:10.18653/v1/2024.acl-long.198 2024
[12]

NeurIPS 2024 Workshop on Behavioral Machine Learning , year=

Mitigating overconfidence in large language models: A behavioral lens on confidence estimation and calibration , author=. NeurIPS 2024 Workshop on Behavioral Machine Learning , year=

2024
[13]

Trust Me, I ' m Wrong: LLM s Hallucinate with Certainty Despite Knowing the Answer

Simhi, Adi and Itzhak, Itay and Barez, Fazl and Stanovsky, Gabriel and Belinkov, Yonatan. Trust Me, I ' m Wrong: LLM s Hallucinate with Certainty Despite Knowing the Answer. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.792

work page doi:10.18653/v1/2025.findings-emnlp.792 2025
[14]

Transformer Feed-Forward Layers Are Key-Value Memories

Geva, Mor and Schuster, Roei and Berant, Jonathan and Levy, Omer. Transformer Feed-Forward Layers Are Key-Value Memories. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.446

work page internal anchor Pith review doi:10.18653/v1/2021.emnlp-main.446 2021
[15]

Forty-second International Conference on Machine Learning , year=

Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding , author=. Forty-second International Conference on Machine Learning , year=
[16]

Seongheon Park and Xuefeng Du and Min-Hsuan Yeh and Haobo Wang and Yixuan Li , booktitle=. Steer. 2025 , url=

2025
[17]

Andrey Malinin and Mark Gales

Manakul, Potsawee and Liusie, Adian and Gales, Mark. S elf C heck GPT : Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.557

work page doi:10.18653/v1/2023.emnlp-main.557 2023
[18]

SAC ^3 : Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

Zhang, Jiaxin and Li, Zhuohang and Das, Kamalika and Malin, Bradley and Kumar, Sricharan. SAC ^3 : Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.1032

work page doi:10.18653/v1/2023.findings-emnlp.1032 2023
[19]

I nterrogate LLM : Zero-Resource Hallucination Detection in LLM -Generated Answers

Yehuda, Yakir and Malkiel, Itzik and Barkan, Oren and Weill, Jonathan and Ronen, Royi and Koenigstein, Noam. I nterrogate LLM : Zero-Resource Hallucination Detection in LLM -Generated Answers. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.506

work page doi:10.18653/v1/2024.acl-long.506 2024
[20]

International conference on machine learning , pages=

On calibration of modern neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[21]

2025 , eprint=

Estimating LLM Uncertainty with Evidence , author=. 2025 , eprint=

2025
[22]

The internal state of an LLM knows when it ' s lying

Azaria, Amos and Mitchell, Tom. The Internal State of an LLM Knows When It ' s Lying. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.68

work page doi:10.18653/v1/2023.findings-emnlp.68 2023
[23]

URL https: //doi.org/10.18653/v1/2024.emnlp-main.84

Chuang, Yung-Sung and Qiu, Linlu and Hsieh, Cheng-Yu and Krishna, Ranjay and Kim, Yoon and Glass, James R. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.84

work page doi:10.18653/v1/2024.emnlp-main.84 2024
[24]

Towards Mitigating LLM Hallucination via Self Reflection

Ji, Ziwei and Yu, Tiezheng and Xu, Yan and Lee, Nayeon and Ishii, Etsuko and Fung, Pascale. Towards Mitigating LLM Hallucination via Self Reflection. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.123

work page doi:10.18653/v1/2023.findings-emnlp.123 2023
[25]

From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge

Li, Dawei and Jiang, Bohan and Huang, Liangjie and Beigi, Alimohammad and Zhao, Chengshuai and Tan, Zhen and Bhattacharjee, Amrita and Jiang, Yuxuan and Chen, Canyu and Wu, Tianhao and Shu, Kai and Cheng, Lu and Liu, Huan. From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge. Proceedings of the 2025 Conference on Empirical Methods ...

work page doi:10.18653/v1/2025.emnlp-main.138 2025
[26]

Hallucination Detection in Structured Query Generation via LLM Self-Debating

Li, Miaoran and Chen, Jiangning and Xu, Minghua and Wang, Xiaolong. Hallucination Detection in Structured Query Generation via LLM Self-Debating. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.873

work page doi:10.18653/v1/2025.findings-emnlp.873 2025
[27]

Understanding the Dark Side of LLM s' Intrinsic Self-Correction

Zhang, Qingjie and Wang, Di and Qian, Haoting and Li, Yiming and Zhang, Tianwei and Huang, Minlie and Xu, Ke and Li, Hewu and Yan, Liu and Qiu, Han. Understanding the Dark Side of LLM s' Intrinsic Self-Correction. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl...

work page doi:10.18653/v1/2025.acl-long.1314 2025
[28]

2025 , eprint=

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs , author=. 2025 , eprint=

2025
[29]

Nature , volume=

Detecting hallucinations in large language models using semantic entropy , author=. Nature , volume=. 2024 , publisher=

2024
[30]

FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs

Sawczyn, Albert and Binkowski, Jakub and Janiak, Denis and Gabrys, Bogdan and Kajdanowicz, Tomasz Jan. F act S elf C heck: Fact-Level Black-Box Hallucination Detection for LLM s. Findings of the A ssociation for C omputational L inguistics: EACL 2026. 2026. doi:10.18653/v1/2026.findings-eacl.296

work page doi:10.18653/v1/2026.findings-eacl.296 2026
[31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Zero-resource hallucination detection for text generation via graph-based contextual knowledge triples modeling , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2025 , DOI=

2025
[32]

The Eleventh International Conference on Learning Representations , year=

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=. The Eleventh International Conference on Learning Representations , year=
[33]

Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity

Nguyen, Dang and Payani, Ali and Mirzasoleiman, Baharan. Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.234

work page doi:10.18653/v1/2025.findings-acl.234 2025
[34]

2025 , eprint=

A Survey on LLM-as-a-Judge , author=. 2025 , eprint=

2025
[35]

Do Large Language Models Know What They Don

Yin, Zhangyue and Sun, Qiushi and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Huang, Xuanjing. Do Large Language Models Know What They Don ' t Know?. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.551

work page doi:10.18653/v1/2023.findings-acl.551 2023
[36]

Miao Xiong and Zhiyuan Hu and Xinyang Lu and YIFEI LI and Jie Fu and Junxian He and Bryan Hooi , booktitle=. Can. 2024 , url=

2024
[37]

Long-form Hallucination Detection with Self-elicitation

Liu, Zihang and Guo, Jiawei and Zhang, Hao and Chen, Hongyang and Bu, Jiajun and Wang, Haishuai. Long-form Hallucination Detection with Self-elicitation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.211

work page doi:10.18653/v1/2025.findings-acl.211 2025
[38]

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

Chen, Lida and Liang, Zujie and Wang, Xintao and Liang, Jiaqing and Xiao, Yanghua and Wei, Feng and Chen, Jinglei and Hao, Zhenghong and Han, Bing and Wang, Wei. Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals. Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM). 2025. doi:10.18653/v1/202...

work page doi:10.18653/v1/2025.knowllm-1.3 2025
[39]

Calibrating verbal uncertainty as a linear feature to reduce hallucinations

Ji, Ziwei and Yu, Lei and Koishekenov, Yeskendir and Bang, Yejin and Hartshorn, Anthony and Schelten, Alan and Zhang, Cheng and Fung, Pascale and Cancedda, Nicola. Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.e...

work page doi:10.18653/v1/2025.emnlp-main.187 2025
[40]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Deep mutual learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=. 2018 , doi=

2018
[41]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Online knowledge distillation via collaborative learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2020 , DOI=

2020
[42]

Information Sciences , volume=

Online knowledge distillation with elastic peer , author=. Information Sciences , volume=. 2022 , publisher=

2022
[43]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Online knowledge distillation via mutual contrastive learning for visual recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=

2023
[44]

On the Universal Truthfulness Hyperplane Inside LLM s

Liu, Junteng and Chen, Shiqi and Cheng, Yu and He, Junxian. On the Universal Truthfulness Hyperplane Inside LLM s. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1012

work page doi:10.18653/v1/2024.emnlp-main.1012 2024
[45]

InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.)

Su, Weihang and Wang, Changyue and Ai, Qingyao and Hu, Yiran and Wu, Zhijing and Zhou, Yujia and Liu, Yiqun. Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.854

work page doi:10.18653/v1/2024.findings-acl.854 2024
[46]

Advances in Neural Information Processing Systems , volume=

Haloscope: Harnessing unlabeled LLM generations for hallucination detection , author=. Advances in Neural Information Processing Systems , volume=. 2024 , publisher =

2024
[47]

Wong and Rui Wang , booktitle=

Yiming Wang and Pei Zhang and Baosong Yang and Derek F. Wong and Rui Wang , booktitle=. Latent Space Chain-of-Embedding Enables Output-free. 2025 , url=

2025
[48]

UNCERTAINTY - LINE : Length-Invariant Estimation of Uncertainty for Large Language Models

Vashurin, Roman and Goloburda, Maiya and Nakov, Preslav and Panov, Maxim. UNCERTAINTY - LINE : Length-Invariant Estimation of Uncertainty for Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.400

work page doi:10.18653/v1/2025.emnlp-main.400 2025
[49]

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models

Vazhentsev, Artem and Fadeeva, Ekaterina and Xing, Rui and Kuzmin, Gleb and Lazichny, Ivan and Panchenko, Alexander and Nakov, Preslav and Baldwin, Timothy and Panov, Maxim and Shelmanov, Artem. Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Langua...

work page doi:10.18653/v1/2025.emnlp-main.1807 2025
[50]

doi: 10.18653/v1/2024.naacl-long.60

Jiang, Che and Qi, Biqing and Hong, Xiangyu and Fu, Dayuan and Cheng, Yang and Meng, Fandong and Yu, Mo and Zhou, Bowen and Zhou, Jie. On Large Language Models' Hallucination with Regard to Known Facts. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: ...

work page doi:10.18653/v1/2024.naacl-long.60 2024
[51]

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Geva, Mor and Caciularu, Avi and Wang, Kevin and Goldberg, Yoav. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.3

work page doi:10.18653/v1/2022.emnlp-main.3 2022
[52]

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models

Liu, Qiang and Chen, Xinlong and Ding, Yue and Song, Bowen and Wang, Weiqiang and Wu, Shu and Wang, Liang. Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1063

work page doi:10.18653/v1/2025.emnlp-main.1063 2025
[53]

ISBN 979-8-89176- 332-6

Binkowski, Jakub and Janiak, Denis and Sawczyn, Albert and Gabrys, Bogdan and Kajdanowicz, Tomasz Jan. Hallucination Detection in LLM s Using Spectral Features of Attention Maps. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1239

work page doi:10.18653/v1/2025.emnlp-main.1239 2025
[54]

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

Zhang, Tianhang and Qiu, Lin and Guo, Qipeng and Deng, Cheng and Zhang, Yue and Zhang, Zheng and Zhou, Chenghu and Wang, Xinbing and Fu, Luoyi. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.58

work page doi:10.18653/v1/2023.emnlp-main.58 2023
[55]

InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.)

He, Jinwen and Gong, Yujia and Lin, Zijin and Wei, Cheng ' an and Zhao, Yue and Chen, Kai. LLM Factoscope: Uncovering LLM s' Factual Discernment through Measuring Inner States. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.608

work page doi:10.18653/v1/2024.findings-acl.608 2024
[56]

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLM s

Zhang, Zhenliang and Hu, Xinyu and Zhang, Huixuan and Zhang, Junzhe and Wan, Xiaojun. ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLM s. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.880

work page doi:10.18653/v1/2025.acl-long.880 2025
[57]

Knowledge-Centric Hallucination Detection

Hu, Xiangkun and Ru, Dongyu and Qiu, Lin and Guo, Qipeng and Zhang, Tianhang and Xu, Yang and Luo, Yun and Liu, Pengfei and Zhang, Yue and Zhang, Zheng. Knowledge-Centric Hallucination Detection. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.395

work page doi:10.18653/v1/2024.emnlp-main.395 2024
[58]

2022 , eprint=

Language Models (Mostly) Know What They Know , author=. 2022 , eprint=

2022
[59]

Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector

Cheng, Xiaoxue and Li, Junyi and Zhao, Wayne Xin and Zhang, Hongzhi and Zhang, Fuzheng and Zhang, Di and Gai, Kun and Wen, Ji-Rong. Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.809

work page doi:10.18653/v1/2024.emnlp-main.809 2024
[60]

Self-Alignment for Factuality: Mitigating Hallucinations in LLM s via Self-Evaluation

Zhang, Xiaoying and Peng, Baolin and Tian, Ye and Zhou, Jingyan and Jin, Lifeng and Song, Linfeng and Mi, Haitao and Meng, Helen. Self-Alignment for Factuality: Mitigating Hallucinations in LLM s via Self-Evaluation. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024....

work page doi:10.18653/v1/2024.acl-long.107 2024
[61]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

Ten words only still help: improving black-box AI-generated text detection via proxy-guided efficient re-sampling , author=. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=. 2024 , DOI=

2024
[62]

S eq XGPT : Sentence-Level AI -Generated Text Detection

Wang, Pengyu and Li, Linyang and Ren, Ke and Jiang, Botian and Zhang, Dong and Qiu, Xipeng. S eq XGPT : Sentence-Level AI -Generated Text Detection. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

2023
[63]

2025 , url=

Aoife Cahill and Leon Derczynski and Kokil Jaidka , title=. 2025 , url=

2025
[64]

2020 , url=

nostalgebraist , title=. 2020 , url=

2020
[65]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024
[66]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[67]

2023 , eprint=

Mistral 7B , author=. 2023 , eprint=

2023
[68]

The Annals of Mathematical Statistics , volume=

Robust Estimation of a Location Parameter , author=. The Annals of Mathematical Statistics , volume=. 1964 , doi=

1964
[69]

Pengcheng He and Jianfeng Gao and Weizhu Chen , booktitle=. De. 2023 , url=

2023
[70]

Joshi, E

Joshi, Mandar and Choi, Eunsol and Weld, Daniel and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1147

work page doi:10.18653/v1/p17-1147 2017
[71]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=
[72]

2024 , url=

Chao Chen and Kai Liu and Ze Chen and Yi Gu and Yue Wu and Mingyuan Tao and Zhihang Fu and Jieping Ye , booktitle=. 2024 , url=

2024
[73]

Transactions of the Association for Computational Linguistics , author =

Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav. Natura...

work page doi:10.1162/tacl_a_00276 2019
[74]

H alu E val: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Li, Junyi and Cheng, Xiaoxue and Zhao, Xin and Nie, Jian-Yun and Wen, Ji-Rong. H alu E val: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.397

work page doi:10.18653/v1/2023.emnlp-main.397 2023
[75]

doi: 10.18653/v1/2023.emnlp-main.741

Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh. FA ct S core: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.1...

work page doi:10.18653/v1/2023.emnlp-main.741 2023
[76]

URLhttps://openreview.net/pdf?id=VD-AYtP0dve

Liu, Xin and Zhang, Lechen and Munir, Sheza and Gu, Yiyang and Wang, Lu. V eri F act: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.905

work page doi:10.18653/v1/2025.emnlp-main.905 2025
[77]

F a S t F act: Faster, Stronger Long-Form Factuality Evaluations in LLM s

Wan, Yingjia and Tan, Haochen and Zhu, Xiao and Zhou, Xinyu and Li, Zhiwei and Lv, Qingsong and Sun, Changxuan and Zeng, Jiaqi and Xu, Yi and Lu, Jianqiao and Liu, Yinhong and Guo, Zhijiang. F a S t F act: Faster, Stronger Long-Form Factuality Evaluations in LLM s. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v...

work page doi:10.18653/v1/2025.findings-emnlp.1295 2025
[78]

Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

Wang, Yuxia and Gangi Reddy, Revanth and Mujahid, Zain Muhammad and Arora, Arnav and Rubashevskii, Aleksandr and Geng, Jiahui and Mohammed Afzal, Osama and Pan, Liangming and Borenstein, Nadav and Pillai, Aditya and Augenstein, Isabelle and Gurevych, Iryna and Nakov, Preslav. Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers. ...

work page doi:10.18653/v1/2024.findings-emnlp.830 2024
[79]

2025 , isbn =

Hu, Beizhe and Sheng, Qiang and Cao, Juan and Li, Yang and Wang, Danding , title =. 2025 , isbn =. doi:10.1145/3726302.3730027 , booktitle =

work page doi:10.1145/3726302.3730027 2025
[80]

Proceedings of the AAAI conference on artificial intelligence , volume=

Bad actor, good advisor: Exploring the role of large language models in fake news detection , author=. Proceedings of the AAAI conference on artificial intelligence , volume=. doi:10.1609/aaai.v38i20.30214 , year=

work page doi:10.1609/aaai.v38i20.30214

Showing first 80 references.