Recognition: unknown
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models
Pith reviewed 2026-05-10 08:34 UTC · model grok-4.3
The pith
Sequential Internal Variance Representation estimates uncertainty in large language models by measuring dispersion of hidden states across layers and tokens.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SIVR is a supervised hallucination detection framework that derives token-wise, layer-wise variance features from internal hidden states and aggregates the full per-token sequence of these variances. The method assumes uncertainty manifests in the degree of dispersion of representations across layers rather than in specific evolution patterns, which removes the need for strict assumptions and makes the approach model and task agnostic. Learning from the complete temporal sequence of variance signals allows the model to capture patterns associated with factual errors without the information loss that occurs when focusing only on the last or average token.
What carries the argument
Sequential Internal Variance Representation (SIVR), which computes per-token variance across layers and feeds the resulting sequence of features into a supervised classifier to identify factual errors.
If this is right
- SIVR can be deployed on new models without retraining large datasets because it generalizes from the variance signal alone.
- The method supports real-time hallucination checks during generation since variance features are extracted from internal states already computed by the model.
- Because it avoids assumptions about state evolution, SIVR transfers across different LLM families and tasks without architecture-specific tuning.
- Full-sequence aggregation of per-token variances reduces false negatives on errors that appear only in middle tokens.
Where Pith is reading between the lines
- The variance signal could be used for uncertainty calibration in addition to binary hallucination detection.
- Similar dispersion measures might extend to multimodal models where cross-modal consistency can be checked via variance across modality-specific layers.
- If dispersion proves reliable, future work could test whether intervening on high-variance layers during generation reduces hallucinations.
Load-bearing premise
Uncertainty in model outputs shows up as measurable dispersion or variance in hidden states across layers rather than in any particular way those states are supposed to change.
What would settle it
Apply SIVR to a new LLM architecture and task with only a small training set and check whether its hallucination detection accuracy drops below that of existing baselines that rely on last-token or mean representations.
Figures
read the original abstract
Uncertainty estimation is a promising approach to detect hallucinations in large language models (LLMs). Recent approaches commonly depend on model internal states to estimate uncertainty. However, they suffer from strict assumptions on how hidden states should evolve across layers, and from information loss by solely focusing on last or mean tokens. To address these issues, we present Sequential Internal Variance Representation (SIVR), a supervised hallucination detection framework that leverages token-wise, layer-wise features derived from hidden states. SIVR adopts a more basic assumption that uncertainty manifests in the degree of dispersion or variance of internal representations across layers, rather than relying on specific assumptions, which makes the method model and task agnostic. It additionally aggregates the full sequence of per-token variance features, learning temporal patterns indicative of factual errors and thereby preventing information loss. Experimental results demonstrate SIVR consistently outperforms strong baselines. Most importantly, SIVR enjoys stronger generalisation and avoids relying on large training sets, highlighting the potential for practical deployment. Our code repository is available online at https://github.com/ponhvoan/internal-variance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Sequential Internal Variance Representation (SIVR), a supervised framework for hallucination detection in LLMs. It extracts token-wise and layer-wise variance features from hidden states under the assumption that uncertainty manifests as dispersion of internal representations across layers (rather than specific evolution patterns), aggregates the full per-token sequence to learn temporal patterns, and claims that SIVR outperforms strong baselines while offering stronger generalization and requiring smaller training sets. The abstract notes that code is available at a GitHub repository.
Significance. If the experimental assertions are substantiated with quantitative evidence, SIVR could provide a more model- and task-agnostic alternative to existing internal-state methods for uncertainty estimation, addressing information loss from last/mean token focus and overly restrictive assumptions on hidden-state dynamics. The public code repository is a clear strength that supports reproducibility and follow-up work in LLM reliability research.
major comments (2)
- [Abstract] Abstract: the central claims that 'SIVR consistently outperforms strong baselines' and 'enjoys stronger generalisation' are stated without any numerical results, error bars, dataset details, ablation studies, or statistical tests. This absence prevents verification of the primary empirical contribution.
- [Methods] Methods section: the assumption that uncertainty is captured by 'the degree of dispersion or variance of internal representations across layers' is adopted as more basic and model-agnostic, yet no formal definition of the per-token variance features, no derivation of the aggregation step, and no direct comparison to prior assumptions on hidden-state evolution are supplied to establish why this formulation avoids the cited limitations.
minor comments (1)
- [Abstract] Abstract: the title uses 'Sequential Internal Dispersion' while the body uses 'Sequential Internal Variance Representation'; aligning the terminology would reduce potential confusion.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below and have revised the manuscript to improve clarity and substantiation of our claims. The revisions focus on enhancing the abstract with key empirical highlights and formalizing the methods section without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims that 'SIVR consistently outperforms strong baselines' and 'enjoys stronger generalisation' are stated without any numerical results, error bars, dataset details, ablation studies, or statistical tests. This absence prevents verification of the primary empirical contribution.
Authors: We agree that the abstract, being a high-level summary, omits specific numbers to maintain brevity. The full manuscript (Section 4 and Tables 1-3) provides the requested details: quantitative comparisons across three datasets and two LLMs, with error bars in figures, ablation studies on feature aggregation, and statistical significance tests (paired t-tests, p<0.05). In the revised version, we have added one sentence to the abstract summarizing the main results (e.g., 'SIVR achieves 8-12% higher AUROC than baselines while generalizing to unseen models with only 20% of the training data'). This preserves the abstract's conciseness while enabling verification. revision: partial
-
Referee: [Methods] Methods section: the assumption that uncertainty is captured by 'the degree of dispersion or variance of internal representations across layers' is adopted as more basic and model-agnostic, yet no formal definition of the per-token variance features, no derivation of the aggregation step, and no direct comparison to prior assumptions on hidden-state evolution are supplied to establish why this formulation avoids the cited limitations.
Authors: The referee correctly identifies that the original methods section could benefit from greater formality. We have revised it to include: (1) a precise mathematical definition of the per-token variance feature as v_t = Var_{l=1 to L}(h_l^t), where h_l^t denotes the hidden state at layer l for token t; (2) a step-by-step derivation of the sequence aggregation, showing how feeding the full {v_t} sequence into a lightweight temporal encoder learns uncertainty patterns without presupposing monotonic or specific layer-wise trajectories; and (3) a new comparison paragraph contrasting our dispersion assumption against prior works (e.g., those assuming hidden-state convergence or last-token focus), explaining why it reduces model-specific assumptions and information loss. These additions directly address the cited limitations while remaining faithful to the original framework. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces SIVR as an empirically motivated supervised framework that extracts token-wise and layer-wise variance features from LLM hidden states under the modeling assumption that uncertainty appears as dispersion across layers. No equations, derivations, or parameter-fitting steps are described that reduce any claimed prediction or detection output to a quantity defined by the method itself. The approach aggregates full-sequence variance features to learn temporal patterns and is validated through experiments against baselines, remaining self-contained without reliance on self-citations, uniqueness theorems, or ansatzes that collapse into the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Uncertainty manifests in the degree of dispersion or variance of internal representations across layers
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, and 1 others. 2023. https://arxiv.org/abs/2303.08774 Gpt-4 technical report . arXiv preprint arXiv:2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Amos Azaria and Tom Mitchell. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.68 The internal state of an LLM knows when it ' s lying . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 967--976, Singapore. Association for Computational Linguistics
- [3]
- [4]
- [5]
-
[6]
Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, and James R. Glass. 2024. https://doi.org/10.18653/v1/2024.emnlp-main.84 Lookback lens: Detecting and mitigating contextual hallucinations in large language models using only attention maps . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pag...
-
[7]
Jesse Davis and Mark Goadrich. 2006. https://ftp.cs.wisc.edu/machine-learning/shavlik-group/davis.icml06.pdf The relationship between precision-recall and roc curves . In Proceedings of the 23rd international conference on Machine learning, pages 233--240
2006
-
[8]
Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, and Kaidi Xu. 2024. https://aclanthology.org/2024.acl-long.276 Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models . In Proceedings of the 62nd Annual Meeting of the Association for Computational Li...
2024
-
[9]
Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, and Maxim Panov. 2024. https://aclanthology.org/2024.findings-acl.558 Fact-checking the output of large language models via token-level uncertainty quantification . In F...
2024
-
[10]
Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, and Artem Shelmanov. 2023. https://doi.org/10.18653/v1/2023.emnlp-demo.41 LM -polygraph: Uncertainty estimation for language models . In Proceedings of the 2023 Confer...
-
[11]
Yarin Gal and Zoubin Ghahramani. 2016. https://proceedings.mlr.press/v48/gal16.html Dropout as a bayesian approximation: Representing model uncertainty in deep learning . In international conference on machine learning, pages 1050--1059. PMLR
2016
-
[12]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, and 1 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3 herd of models . arXiv preprint arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, and 1 others. 2025. https://arxiv.org/abs/2501.12948 Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning . arXiv preprint arXiv:2501.12948
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Jinwen He, Yujia Gong, Zijin Lin, Cheng ' an Wei, Yue Zhao, and Kai Chen. 2024. https://doi.org/10.18653/v1/2024.findings-acl.608 LLM factoscope: Uncovering LLM s' factual discernment through measuring inner states . In Findings of the Association for Computational Linguistics: ACL 2024, pages 10218--10230, Bangkok, Thailand. Association for Computational...
-
[15]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2020. https://arxiv.org/abs/2009.03300 Measuring massive multitask language understanding . arXiv preprint arXiv:2009.03300
work page internal anchor Pith review arXiv 2020
-
[16]
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. https://arxiv.org/abs/2103.03874 Measuring mathematical problem solving with the math dataset . arXiv preprint arXiv:2103.03874
work page internal anchor Pith review arXiv 2021
-
[17]
Dan Hendrycks and Kevin Gimpel. 2016. https://arxiv.org/abs/1610.02136 A baseline for detecting misclassified and out-of-distribution examples in neural networks . arXiv preprint arXiv:1610.02136
work page internal anchor Pith review arXiv 2016
- [18]
-
[19]
Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Yejin Bang, Bryan Wilie, and Pascale Fung. 2024. https://doi.org/10.18653/v1/2024.blackboxnlp-1.6 LLM internal states reveal hallucination risk faced with a query . In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 88--104, Miami, Florida, US. ...
-
[20]
Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, and Yongfeng Zhang. 2025. https://aclanthology.org/2025.coling-main.37/ Exploring concept depth: How large language models acquire knowledge and concept at different layers? In Proceedings of the 31st Intern...
2025
-
[21]
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. https://doi.org/10.18653/v1/P17-1147 T rivia QA : A large scale distantly supervised challenge dataset for reading comprehension . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601--1611, Vancouver, Canada. Assoc...
-
[22]
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, and 1 others. 2022. https://arxiv.org/abs/2207.05221 Language models (mostly) know what they know . arXiv preprint arXiv:2207.05221
work page internal anchor Pith review arXiv 2022
- [23]
-
[24]
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. 2023. https://arxiv.org/abs/2302.09664 Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation . arXiv preprint arXiv:2302.09664
work page internal anchor Pith review arXiv 2023
-
[25]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf Simple and scalable predictive uncertainty estimation using deep ensembles . Advances in neural information processing systems, 30
2017
-
[26]
Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. https://proceedings.neurips.cc/paper_files/paper/2018/file/abdeb6f575ac5c6676b747bca8d09cc2-Paper.pdf A simple unified framework for detecting out-of-distribution samples and adversarial attacks . Advances in neural information processing systems, 31
2018
-
[27]
Kenneth Li, Oam Patel, Fernanda Vi \'e gas, Hanspeter Pfister, and Martin Wattenberg. 2023. https://arxiv.org/abs/2306.03341 Inference-time intervention: Eliciting truthful answers from a language model . Advances in Neural Information Processing Systems, 36:41451--41530
work page internal anchor Pith review arXiv 2023
-
[28]
Chin-Yew Lin. 2004. https://aclanthology.org/W04-1013/ ROUGE : A package for automatic evaluation of summaries . In Text Summarization Branches Out, pages 74--81, Barcelona, Spain. Association for Computational Linguistics
2004
- [29]
- [30]
-
[31]
Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. https://proceedings.neurips.cc/paper/2020/file/f5496252609c43eb8a3d147ab9b9c006-Paper.pdf Energy-based out-of-distribution detection . Advances in neural information processing systems, 33:21464--21475
2020
-
[32]
Potsawee Manakul, Adian Liusie, and Mark Gales. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.557 S elf C heck GPT : Zero-resource black-box hallucination detection for generative large language models . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9004--9017, Singapore. Association for Computational...
-
[33]
Kanti V Mardia and Peter E Jupp. 2009. https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316979 Directional statistics . John Wiley & Sons
-
[34]
Samuel Marks and Max Tegmark. 2023. https://arxiv.org/abs/2310.06824 The geometry of truth: Emergent linear structure in large language model representations of true/false datasets . arXiv preprint arXiv:2310.06824
work page internal anchor Pith review arXiv 2023
-
[35]
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf Locating and editing factual associations in gpt . Advances in neural information processing systems, 35:17359--17372
2022
-
[36]
Mistral AI . 2024. Ministral-8B-Instruct-2410 . https://huggingface.co/mistralai/Mistral-8B-Instruct-2410
2024
-
[37]
Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. 2022. https://proceedings.mlr.press/v174/pal22a.html Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering . In Conference on health, inference, and learning, pages 248--260. PMLR
2022
- [38]
-
[39]
Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun, Ivan Tsvigun, Zhuohan Xie, Igor Kiselev, Nico Daheim, Caiqi Zhang, Artem Vazhentsev, Mrinmaya Sachan, Preslav Nakov, and Timothy Baldwin. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1809 A head to predict and a head to question: Pre-trained uncertainty quantification heads for hallucination detection...
-
[40]
Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, and 1 others. 2022. https://arxiv.org/abs/2210.03057 Language models are multilingual chain-of-thought reasoners . arXiv preprint arXiv:2210.03057
-
[41]
Andy Shih, Dorsa Sadigh, and Stefano Ermon. 2023. https://proceedings.mlr.press/v202/shih23a/shih23a.pdf Long horizon temperature scaling . In International conference on machine learning, pages 31422--31434. PMLR
2023
- [42]
- [43]
-
[44]
Ponhvoan Srey, Xiaobao Wu, and Anh Tuan Luu. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1124 Unsupervised hallucination detection by inspecting reasoning processes . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22117--22129, Suzhou, China. Association for Computational Linguistics
-
[45]
Weihang Su, Changyue Wang, Qingyao Ai, Yiran Hu, Zhijing Wu, Yujia Zhou, and Yiqun Liu. 2024. https://doi.org/10.18653/v1/2024.findings-acl.854 Unsupervised real-time hallucination detection based on the internal states of large language models . In Findings of the Association for Computational Linguistics: ACL 2024, pages 14379--14391, Bangkok, Thailand....
-
[46]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. https://proceedings.mlr.press/v70/sundararajan17a/sundararajan17a.pdf Axiomatic attribution for deep networks . In International conference on machine learning, pages 3319--3328. PMLR
2017
-
[47]
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. https://doi.org/10.18653/v1/N19-1421 C ommonsense QA : A question answering challenge targeting commonsense knowledge . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long ...
-
[48]
Hexiang Tan, Fei Sun, Sha Liu, Du Su, Qi Cao, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, and Xueqi Cheng. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.238 Too consistent to detect: A study of self-consistent errors in LLM s . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4755--...
-
[49]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. https://doi.org/10.18653/v1/N18-1074 FEVER : a large-scale dataset for fact extraction and VER ification . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Pap...
work page internal anchor Pith review doi:10.18653/v1/n18-1074 2018
-
[50]
Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, and Artem Shelmanov. 2025. https://doi.org/10.1162/tacl_a_00737 Benchmarking uncertainty quantification methods for larg...
-
[51]
Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, and Artem Shelmanov. 2025 a . https://doi.org/10.18653/v1/2025.emnlp-main.1807 Unconditional truthfulness: Learning unconditional uncertainty of large language models . In Proceedings of the 2025 Conference on Empiri...
-
[52]
Artem Vazhentsev, Lyudmila Rvanova, Gleb Kuzmin, Ekaterina Fadeeva, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Mrinmaya Sachan, Preslav Nakov, and 1 others. 2025 b . https://arxiv.org/abs/2505.20045 Uncertainty-aware attention heads: Efficient unsupervised uncertainty quantification for llms . arXiv preprint arXiv:2505.20045
-
[53]
Artem Vazhentsev, Lyudmila Rvanova, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, and Artem Shelmanov. 2025 c . https://doi.org/10.18653/v1/2025.naacl-long.113 Token-level density-based uncertainty quantification methods for eliciting truthfulness of large language models . In Proceedings of the 2025 Conference of the Nations of the Am...
- [54]
-
[55]
Data mining and knowledge discovery , 33(4):917–963
Johannes Welbl, Nelson F. Liu, and Matt Gardner. 2017. https://doi.org/10.18653/v1/W17-4413 Crowdsourcing multiple choice science questions . In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 94--106, Copenhagen, Denmark. Association for Computational Linguistics
-
[56]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025. https://arxiv.org/abs/2505.09388 Qwen3 technical report . arXiv preprint arXiv:2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
KiYoon Yoo, Jangho Kim, Jiho Jang, and Nojun Kwak. 2022. https://doi.org/10.18653/v1/2022.findings-acl.289 Detection of adversarial examples in text classification: Benchmark and baseline via robust density estimation . In Findings of the Association for Computational Linguistics: ACL 2022, pages 3656--3672, Dublin, Ireland. Association for Computational ...
-
[58]
Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, and Luoyi Fu. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.58 Enhancing uncertainty-based hallucination detection with stronger focus . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 915--932, Singapor...
-
[59]
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. 2025. https://doi.org/10.1162/coli.a.16 Siren's song in the ai ocean: A survey on hallucination in large language models . Computational Linguistics, 51(4):1373--1418
-
[60]
Zhanghao Zhouyin and Ding Liu. 2025. https://doi.org/10.1016/j.neucom.2025.131520 Understanding neural networks with logarithm determinant entropy estimator . Neurocomputing, 657:131520
-
[61]
online" 'onlinestring :=
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[62]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.