Recognition: unknown
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments
Pith reviewed 2026-05-07 03:29 UTC · model grok-4.3
The pith
LaaB improves LLM hallucination detection by using logical consistency between responses and self-judgments to bridge neural features with symbolic reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an inherent logical bridge exists between an LLM response label and its meta-judgment label (same or opposite depending on self-judgment semantics). By mapping symbolic judgments back into feature space via the meta-judgment process and enforcing the label constraint during mutual learning, the framework integrates neural-level patterns with symbolic reasoning to produce stronger hallucination detection than either signal alone.
What carries the argument
The meta-judgment process that maps symbolic self-judgment labels into neural feature space, together with the logical consistency constraint that requires response and meta-judgment labels to be identical or opposite based on the self-judgment semantics.
If this is right
- Detection improves when neural uncertainty and symbolic self-judgments are aligned through mutual learning rather than used in isolation.
- The same-or-opposite label constraint produces consistent gains across four LLMs and four public datasets.
- The approach integrates implicit neural features with explicit symbolic judgments without requiring dataset-specific tuning.
- Experiments against eight baselines confirm that bridging the two views outperforms single-facet methods.
Where Pith is reading between the lines
- The logical consistency mechanism could extend to other LLM tasks that produce both an output and a meta-reasoning step, such as chain-of-thought verification.
- Enforcing label relations between generations and self-assessments may offer a path to training more internally consistent models by design.
- The framework suggests a general route for combining black-box prompting signals with white-box feature analysis in LLM evaluation pipelines.
Load-bearing premise
The logical consistency relation between response labels and meta-judgment labels supplies a reliable, non-circular signal that improves detection without adding bias or needing post-hoc fitting to test data.
What would settle it
Applying LaaB to a new LLM or held-out dataset and observing no gain or a loss relative to the eight baselines would show that the logical bridge does not reliably enhance detection.
Figures
read the original abstract
Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or elicit macro-level self-judgments through verbalized prompts. However, these methods address only a single facet of the hallucination, focusing either on implicit neural uncertainty or explicit symbolic reasoning, thereby treating these inherently coupled behaviors in isolation and failing to exploit their interdependence for a holistic view. In this paper, we propose LaaB (Logical Consistency-as-a-Bridge), a framework that bridges neural features and symbolic judgments for hallucination detection. LaaB introduces a "meta-judgment" process to map symbolic labels back into the feature space. By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection. Extensive experiments on 4 public datasets, across 4 LLMs, against 8 baselines demonstrate the superiority of LaaB.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LaaB (Logical Consistency-as-a-Bridge), a framework for detecting factual hallucinations in LLMs. It introduces a meta-judgment process that maps symbolic self-judgment labels back into the neural feature space of the original response. By enforcing an inherent logical consistency constraint—where response and meta-judgment labels must be the same or opposite according to the semantics of the self-judgment—LaaB aligns the dual signals through mutual learning and reports improved detection performance over eight baselines on four public datasets and four LLMs.
Significance. If the logical bridge supplies a genuinely non-circular supervisory signal, the approach would meaningfully advance the field by integrating implicit neural uncertainty with explicit symbolic reasoning rather than treating them in isolation. The multi-dataset, multi-LLM, multi-baseline experimental design is a clear strength and provides a solid empirical foundation for the claims. However, the absence of targeted ablations on the core assumption leaves the source of the reported gains ambiguous.
major comments (2)
- [Abstract and §3] Abstract and §3 (Method): The central claim rests on the 'inherent logical bridge' providing a non-circular signal for mutual learning. Yet the same LLM generates both the response and the self-judgment, and the same/opposite rule is derived directly from the self-judgment's semantics. No ablation is described that severs this semantic dependence (e.g., by randomizing or replacing the semantic mapping while retaining the meta-judgment structure) to isolate whether gains arise from the logical constraint or simply from training on an additional derived view. This is load-bearing for the contribution.
- [§4] §4 (Experiments): The abstract states superiority on four datasets and four LLMs against eight baselines, but the manuscript provides no details on the implementation of the meta-judgment process, the exact loss functions used for mutual learning, the train/validation/test splits, or any independent validation that the logical constraint itself is reliable and non-circular. These omissions prevent assessment of reproducibility and of whether the constraint introduces new biases correlated with the hallucination patterns being measured.
minor comments (1)
- [Abstract] Abstract: The method description is compressed; a single additional sentence clarifying how the meta-judgment maps labels back into feature space would improve accessibility for readers unfamiliar with the dual-view setup.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater clarity, reproducibility, and validation of the core claims.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Method): The central claim rests on the 'inherent logical bridge' providing a non-circular signal for mutual learning. Yet the same LLM generates both the response and the self-judgment, and the same/opposite rule is derived directly from the self-judgment's semantics. No ablation is described that severs this semantic dependence (e.g., by randomizing or replacing the semantic mapping while retaining the meta-judgment structure) to isolate whether gains arise from the logical constraint or simply from training on an additional derived view. This is load-bearing for the contribution.
Authors: We agree that an explicit ablation isolating the semantic logical constraint is necessary to substantiate the non-circular nature of the supervisory signal. The current framework defines the same/opposite mapping directly from the self-judgment semantics, which is intentional to bridge neural and symbolic views. To address this concern, we will add a targeted ablation in the revised §3 and §4: we will randomize the label mapping (assigning same/opposite independently of semantics while preserving the meta-judgment structure and training procedure) and report the resulting performance drop relative to LaaB. This will help demonstrate that gains derive from the logical consistency rather than merely from an additional derived view. We will also expand the method description to clarify this assumption and its implications. revision: yes
-
Referee: [§4] §4 (Experiments): The abstract states superiority on four datasets and four LLMs against eight baselines, but the manuscript provides no details on the implementation of the meta-judgment process, the exact loss functions used for mutual learning, the train/validation/test splits, or any independent validation that the logical constraint itself is reliable and non-circular. These omissions prevent assessment of reproducibility and of whether the constraint introduces new biases correlated with the hallucination patterns being measured.
Authors: We acknowledge that the original manuscript omitted key implementation details, which limits reproducibility and independent assessment of the logical constraint. In the revised §4, we will add: (1) full description of the meta-judgment process, including prompt templates and how symbolic labels are mapped back to feature space; (2) the precise loss functions for mutual learning, with equations; (3) explicit train/validation/test splits for all datasets and LLMs; and (4) new validation analyses, such as empirical consistency rates between responses and meta-judgments plus discussion of potential biases. These additions will enable full reproduction and allow readers to evaluate whether the constraint introduces correlated biases. revision: yes
Circularity Check
Logical consistency bridge is self-definitional from self-judgment semantics
specific steps
-
self definitional
[Abstract]
"By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection."
The same/opposite relation is dictated directly by the semantic interpretation of the self-judgment output itself rather than being an independent constraint. Because the self-judgment is generated by the same LLM whose responses are under scrutiny, the 'bridge' becomes a tautological mapping: the label relationship is true by how the judgment prompt is defined, not by external logic or data. Mutual learning therefore aligns quantities already linked by construction.
full rationale
The paper's core contribution rests on imposing a 'logical bridge' that forces response and meta-judgment labels to be identical or opposite according to the semantic content of the self-judgment. This relation is not learned from data, derived from first principles, or validated externally; it is directly encoded by how the self-judgment prompt is worded and interpreted. Mutual learning then operates on signals whose alignment is predefined by construction, matching the self-definitional pattern. The abstract explicitly states the bridge is 'inherent' and 'based on the self-judgment's semantics,' confirming the reduction. No other circular steps (e.g., self-citation chains or fitted predictions) appear in the provided text. The method may still yield empirical gains by incorporating an extra derived view, but the claimed 'non-circular supervisory signal' reduces to a definitional mapping.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Response labels and meta-judgment labels are either identical or opposite depending on the semantics of the self-judgment
invented entities (1)
-
meta-judgment process
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Transactions on Machine Learning Research , issn=
Emergent Abilities of Large Language Models , author=. Transactions on Machine Learning Research , issn=. 2022 , url=
2022
-
[2]
Computational Linguistics , volume =
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models , author =. Computational Linguistics , volume =. 2025 , month =
2025
-
[3]
Survey of hallucination in natural language generation,
Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , title =. 2023 , issue_date =. doi:10.1145/3571730 , journal =
-
[4]
Huang, Lei and Yu, Weijiang and Ma, Weitao and Zhong, Weihong and Feng, Zhangyin and Wang, Haotian and Chen, Qianglong and Peng, Weihua and Feng, Xiaocheng and Qin, Bing and Liu, Ting , title =. 2025 , issue_date =. doi:10.1145/3703155 , journal =
-
[5]
2026 , eprint=
On the Fundamental Limits of LLMs at Scale , author=. 2026 , eprint=
2026
-
[6]
2025 , eprint=
On the Fundamental Impossibility of Hallucination Control in Large Language Models , author=. 2025 , eprint=
2025
-
[7]
2025 , eprint=
Hallucination is Inevitable: An Innate Limitation of Large Language Models , author=. 2025 , eprint=
2025
-
[8]
Advances in Neural Information Processing Systems , volume=
Llm evaluators recognize and favor their own generations , author=. Advances in Neural Information Processing Systems , volume=. 2024 , doi=
2024
-
[9]
Self-Preference Bias in
Koki Wataoka and Tsubasa Takahashi and Ryokan Ri , booktitle=. Self-Preference Bias in. 2024 , url=
2024
-
[10]
Too Consistent to Detect: A Study of Self-Consistent Errors in LLM s
Tan, Hexiang and Sun, Fei and Liu, Sha and Su, Du and Cao, Qi and Chen, Xin and Wang, Jingang and Cai, Xunliang and Wang, Yuanzhuo and Shen, Huawei and Cheng, Xueqi. Too Consistent to Detect: A Study of Self-Consistent Errors in LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.238
-
[11]
and Ren, Xiang and Sap, Maarten
Zhou, Kaitlyn and Hwang, Jena D. and Ren, Xiang and Sap, Maarten. Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.198
-
[12]
NeurIPS 2024 Workshop on Behavioral Machine Learning , year=
Mitigating overconfidence in large language models: A behavioral lens on confidence estimation and calibration , author=. NeurIPS 2024 Workshop on Behavioral Machine Learning , year=
2024
-
[13]
Trust Me, I ' m Wrong: LLM s Hallucinate with Certainty Despite Knowing the Answer
Simhi, Adi and Itzhak, Itay and Barez, Fazl and Stanovsky, Gabriel and Belinkov, Yonatan. Trust Me, I ' m Wrong: LLM s Hallucinate with Certainty Despite Knowing the Answer. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.792
-
[14]
Transformer Feed-Forward Layers Are Key-Value Memories
Geva, Mor and Schuster, Roei and Berant, Jonathan and Levy, Omer. Transformer Feed-Forward Layers Are Key-Value Memories. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.446
work page internal anchor Pith review doi:10.18653/v1/2021.emnlp-main.446 2021
-
[15]
Forty-second International Conference on Machine Learning , year=
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding , author=. Forty-second International Conference on Machine Learning , year=
-
[16]
Seongheon Park and Xuefeng Du and Min-Hsuan Yeh and Haobo Wang and Yixuan Li , booktitle=. Steer. 2025 , url=
2025
-
[17]
Manakul, Potsawee and Liusie, Adian and Gales, Mark. S elf C heck GPT : Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.557
-
[18]
Zhang, Jiaxin and Li, Zhuohang and Das, Kamalika and Malin, Bradley and Kumar, Sricharan. SAC ^3 : Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.1032
-
[19]
I nterrogate LLM : Zero-Resource Hallucination Detection in LLM -Generated Answers
Yehuda, Yakir and Malkiel, Itzik and Barkan, Oren and Weill, Jonathan and Ronen, Royi and Koenigstein, Noam. I nterrogate LLM : Zero-Resource Hallucination Detection in LLM -Generated Answers. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.506
-
[20]
International conference on machine learning , pages=
On calibration of modern neural networks , author=. International conference on machine learning , pages=. 2017 , organization=
2017
-
[21]
2025 , eprint=
Estimating LLM Uncertainty with Evidence , author=. 2025 , eprint=
2025
-
[22]
The internal state of an LLM knows when it ' s lying
Azaria, Amos and Mitchell, Tom. The Internal State of an LLM Knows When It ' s Lying. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.68
-
[23]
URL https: //doi.org/10.18653/v1/2024.emnlp-main.84
Chuang, Yung-Sung and Qiu, Linlu and Hsieh, Cheng-Yu and Krishna, Ranjay and Kim, Yoon and Glass, James R. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.84
-
[24]
Towards Mitigating LLM Hallucination via Self Reflection
Ji, Ziwei and Yu, Tiezheng and Xu, Yan and Lee, Nayeon and Ishii, Etsuko and Fung, Pascale. Towards Mitigating LLM Hallucination via Self Reflection. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.123
-
[25]
From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge
Li, Dawei and Jiang, Bohan and Huang, Liangjie and Beigi, Alimohammad and Zhao, Chengshuai and Tan, Zhen and Bhattacharjee, Amrita and Jiang, Yuxuan and Chen, Canyu and Wu, Tianhao and Shu, Kai and Cheng, Lu and Liu, Huan. From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge. Proceedings of the 2025 Conference on Empirical Methods ...
-
[26]
Hallucination Detection in Structured Query Generation via LLM Self-Debating
Li, Miaoran and Chen, Jiangning and Xu, Minghua and Wang, Xiaolong. Hallucination Detection in Structured Query Generation via LLM Self-Debating. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.873
-
[27]
Understanding the Dark Side of LLM s' Intrinsic Self-Correction
Zhang, Qingjie and Wang, Di and Qian, Haoting and Li, Yiming and Zhang, Tianwei and Huang, Minlie and Xu, Ke and Li, Hewu and Yan, Liu and Qiu, Han. Understanding the Dark Side of LLM s' Intrinsic Self-Correction. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl...
-
[28]
2025 , eprint=
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs , author=. 2025 , eprint=
2025
-
[29]
Nature , volume=
Detecting hallucinations in large language models using semantic entropy , author=. Nature , volume=. 2024 , publisher=
2024
-
[30]
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Sawczyn, Albert and Binkowski, Jakub and Janiak, Denis and Gabrys, Bogdan and Kajdanowicz, Tomasz Jan. F act S elf C heck: Fact-Level Black-Box Hallucination Detection for LLM s. Findings of the A ssociation for C omputational L inguistics: EACL 2026. 2026. doi:10.18653/v1/2026.findings-eacl.296
-
[31]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Zero-resource hallucination detection for text generation via graph-based contextual knowledge triples modeling , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2025 , DOI=
2025
-
[32]
The Eleventh International Conference on Learning Representations , year=
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=. The Eleventh International Conference on Learning Representations , year=
-
[33]
Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity
Nguyen, Dang and Payani, Ali and Mirzasoleiman, Baharan. Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.234
-
[34]
2025 , eprint=
A Survey on LLM-as-a-Judge , author=. 2025 , eprint=
2025
-
[35]
Do Large Language Models Know What They Don
Yin, Zhangyue and Sun, Qiushi and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Huang, Xuanjing. Do Large Language Models Know What They Don ' t Know?. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.551
-
[36]
Miao Xiong and Zhiyuan Hu and Xinyang Lu and YIFEI LI and Jie Fu and Junxian He and Bryan Hooi , booktitle=. Can. 2024 , url=
2024
-
[37]
Long-form Hallucination Detection with Self-elicitation
Liu, Zihang and Guo, Jiawei and Zhang, Hao and Chen, Hongyang and Bu, Jiajun and Wang, Haishuai. Long-form Hallucination Detection with Self-elicitation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.211
-
[38]
Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals
Chen, Lida and Liang, Zujie and Wang, Xintao and Liang, Jiaqing and Xiao, Yanghua and Wei, Feng and Chen, Jinglei and Hao, Zhenghong and Han, Bing and Wang, Wei. Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals. Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM). 2025. doi:10.18653/v1/202...
-
[39]
Calibrating verbal uncertainty as a linear feature to reduce hallucinations
Ji, Ziwei and Yu, Lei and Koishekenov, Yeskendir and Bang, Yejin and Hartshorn, Anthony and Schelten, Alan and Zhang, Cheng and Fung, Pascale and Cancedda, Nicola. Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.e...
-
[40]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Deep mutual learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=. 2018 , doi=
2018
-
[41]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Online knowledge distillation via collaborative learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2020 , DOI=
2020
-
[42]
Information Sciences , volume=
Online knowledge distillation with elastic peer , author=. Information Sciences , volume=. 2022 , publisher=
2022
-
[43]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Online knowledge distillation via mutual contrastive learning for visual recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=
2023
-
[44]
On the Universal Truthfulness Hyperplane Inside LLM s
Liu, Junteng and Chen, Shiqi and Cheng, Yu and He, Junxian. On the Universal Truthfulness Hyperplane Inside LLM s. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1012
-
[45]
Su, Weihang and Wang, Changyue and Ai, Qingyao and Hu, Yiran and Wu, Zhijing and Zhou, Yujia and Liu, Yiqun. Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.854
-
[46]
Advances in Neural Information Processing Systems , volume=
Haloscope: Harnessing unlabeled LLM generations for hallucination detection , author=. Advances in Neural Information Processing Systems , volume=. 2024 , publisher =
2024
-
[47]
Wong and Rui Wang , booktitle=
Yiming Wang and Pei Zhang and Baosong Yang and Derek F. Wong and Rui Wang , booktitle=. Latent Space Chain-of-Embedding Enables Output-free. 2025 , url=
2025
-
[48]
UNCERTAINTY - LINE : Length-Invariant Estimation of Uncertainty for Large Language Models
Vashurin, Roman and Goloburda, Maiya and Nakov, Preslav and Panov, Maxim. UNCERTAINTY - LINE : Length-Invariant Estimation of Uncertainty for Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.400
-
[49]
Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models
Vazhentsev, Artem and Fadeeva, Ekaterina and Xing, Rui and Kuzmin, Gleb and Lazichny, Ivan and Panchenko, Alexander and Nakov, Preslav and Baldwin, Timothy and Panov, Maxim and Shelmanov, Artem. Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Langua...
-
[50]
doi: 10.18653/v1/2024.naacl-long.60
Jiang, Che and Qi, Biqing and Hong, Xiangyu and Fu, Dayuan and Cheng, Yang and Meng, Fandong and Yu, Mo and Zhou, Bowen and Zhou, Jie. On Large Language Models' Hallucination with Regard to Known Facts. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: ...
-
[51]
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Geva, Mor and Caciularu, Avi and Wang, Kevin and Goldberg, Yoav. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.3
-
[52]
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
Liu, Qiang and Chen, Xinlong and Ding, Yue and Song, Bowen and Wang, Weiqiang and Wu, Shu and Wang, Liang. Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1063
-
[53]
Binkowski, Jakub and Janiak, Denis and Sawczyn, Albert and Gabrys, Bogdan and Kajdanowicz, Tomasz Jan. Hallucination Detection in LLM s Using Spectral Features of Attention Maps. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1239
-
[54]
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Zhang, Tianhang and Qiu, Lin and Guo, Qipeng and Deng, Cheng and Zhang, Yue and Zhang, Zheng and Zhou, Chenghu and Wang, Xinbing and Fu, Luoyi. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.58
-
[55]
He, Jinwen and Gong, Yujia and Lin, Zijin and Wei, Cheng ' an and Zhao, Yue and Chen, Kai. LLM Factoscope: Uncovering LLM s' Factual Discernment through Measuring Inner States. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.608
-
[56]
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLM s
Zhang, Zhenliang and Hu, Xinyu and Zhang, Huixuan and Zhang, Junzhe and Wan, Xiaojun. ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLM s. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.880
-
[57]
Knowledge-Centric Hallucination Detection
Hu, Xiangkun and Ru, Dongyu and Qiu, Lin and Guo, Qipeng and Zhang, Tianhang and Xu, Yang and Luo, Yun and Liu, Pengfei and Zhang, Yue and Zhang, Zheng. Knowledge-Centric Hallucination Detection. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.395
-
[58]
2022 , eprint=
Language Models (Mostly) Know What They Know , author=. 2022 , eprint=
2022
-
[59]
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Cheng, Xiaoxue and Li, Junyi and Zhao, Wayne Xin and Zhang, Hongzhi and Zhang, Fuzheng and Zhang, Di and Gai, Kun and Wen, Ji-Rong. Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.809
-
[60]
Self-Alignment for Factuality: Mitigating Hallucinations in LLM s via Self-Evaluation
Zhang, Xiaoying and Peng, Baolin and Tian, Ye and Zhou, Jingyan and Jin, Lifeng and Song, Linfeng and Mi, Haitao and Meng, Helen. Self-Alignment for Factuality: Mitigating Hallucinations in LLM s via Self-Evaluation. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024....
-
[61]
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=
Ten words only still help: improving black-box AI-generated text detection via proxy-guided efficient re-sampling , author=. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=. 2024 , DOI=
2024
-
[62]
S eq XGPT : Sentence-Level AI -Generated Text Detection
Wang, Pengyu and Li, Linyang and Ren, Ke and Jiang, Botian and Zhang, Dong and Qiu, Xipeng. S eq XGPT : Sentence-Level AI -Generated Text Detection. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023
2023
-
[63]
2025 , url=
Aoife Cahill and Leon Derczynski and Kokil Jaidka , title=. 2025 , url=
2025
-
[64]
2020 , url=
nostalgebraist , title=. 2020 , url=
2020
-
[65]
2024 , eprint=
The Llama 3 Herd of Models , author=. 2024 , eprint=
2024
-
[66]
2025 , eprint=
Qwen2.5 Technical Report , author=. 2025 , eprint=
2025
-
[67]
2023 , eprint=
Mistral 7B , author=. 2023 , eprint=
2023
-
[68]
The Annals of Mathematical Statistics , volume=
Robust Estimation of a Location Parameter , author=. The Annals of Mathematical Statistics , volume=. 1964 , doi=
1964
-
[69]
Pengcheng He and Jianfeng Gao and Weizhu Chen , booktitle=. De. 2023 , url=
2023
-
[70]
Joshi, Mandar and Choi, Eunsol and Weld, Daniel and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1147
-
[71]
International Conference on Learning Representations , year=
Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=
-
[72]
2024 , url=
Chao Chen and Kai Liu and Ze Chen and Yi Gu and Yue Wu and Mingyuan Tao and Zhihang Fu and Jieping Ye , booktitle=. 2024 , url=
2024
-
[73]
Transactions of the Association for Computational Linguistics , author =
Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav. Natura...
-
[74]
H alu E val: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Li, Junyi and Cheng, Xiaoxue and Zhao, Xin and Nie, Jian-Yun and Wen, Ji-Rong. H alu E val: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.397
-
[75]
doi: 10.18653/v1/2023.emnlp-main.741
Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh. FA ct S core: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.1...
-
[76]
URLhttps://openreview.net/pdf?id=VD-AYtP0dve
Liu, Xin and Zhang, Lechen and Munir, Sheza and Gu, Yiyang and Wang, Lu. V eri F act: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.905
-
[77]
F a S t F act: Faster, Stronger Long-Form Factuality Evaluations in LLM s
Wan, Yingjia and Tan, Haochen and Zhu, Xiao and Zhou, Xinyu and Li, Zhiwei and Lv, Qingsong and Sun, Changxuan and Zeng, Jiaqi and Xu, Yi and Lu, Jianqiao and Liu, Yinhong and Guo, Zhijiang. F a S t F act: Faster, Stronger Long-Form Factuality Evaluations in LLM s. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v...
-
[78]
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
Wang, Yuxia and Gangi Reddy, Revanth and Mujahid, Zain Muhammad and Arora, Arnav and Rubashevskii, Aleksandr and Geng, Jiahui and Mohammed Afzal, Osama and Pan, Liangming and Borenstein, Nadav and Pillai, Aditya and Augenstein, Isabelle and Gurevych, Iryna and Nakov, Preslav. Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers. ...
-
[79]
Hu, Beizhe and Sheng, Qiang and Cao, Juan and Li, Yang and Wang, Danding , title =. 2025 , isbn =. doi:10.1145/3726302.3730027 , booktitle =
-
[80]
Proceedings of the AAAI conference on artificial intelligence , volume=
Bad actor, good advisor: Exploring the role of large language models in fake news detection , author=. Proceedings of the AAAI conference on artificial intelligence , volume=. doi:10.1609/aaai.v38i20.30214 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.