Recognition: unknown
Uncertainty Propagation in LLM-Based Systems
Pith reviewed 2026-05-08 06:06 UTC · model grok-4.3
The pith
Uncertainty in LLM-based systems propagates and compounds across model internals, workflows, components, state, and human processes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deployed LLM applications are compound systems in which uncertainty is transformed and reused across model internals, workflow stages, component boundaries, persistent state, and human or organisational processes. Without principled treatment of how uncertainty is carried and reused across these boundaries, early errors can propagate and compound in ways that are difficult to detect and govern. The paper develops a systems-level account by introducing a conceptual framing for characterising propagated uncertainty signals and presenting a structured taxonomy spanning intra-model (P1), system-level (P2), and socio-technical (P3) propagation mechanisms, while synthesising cross-cutting insights
What carries the argument
A structured taxonomy of uncertainty propagation mechanisms divided into intra-model (P1), system-level (P2), and socio-technical (P3) categories, supported by a conceptual framing for characterising propagated uncertainty signals.
If this is right
- Early errors originating inside models can be carried forward through workflow stages and stored in persistent state, producing compounded downstream effects.
- System-level mechanisms allow uncertainty to cross component boundaries inside compound applications.
- Socio-technical mechanisms incorporate how humans and organisations receive and reuse uncertain outputs.
- Cross-cutting engineering insights can inform practices for tracking and mitigating propagation.
- Five named open research challenges must be addressed to achieve reliable governance of uncertainty in LLM systems.
Where Pith is reading between the lines
- The same propagation lens could be applied to other multi-component AI systems such as agent frameworks or retrieval-augmented setups.
- System designers might add explicit uncertainty provenance logs that follow signals across workflow and state boundaries.
- Governance policies for AI could shift from focusing solely on model accuracy to monitoring flows through socio-technical layers.
- Evaluation suites could include synthetic propagation scenarios to test whether a given architecture allows early errors to remain hidden.
Load-bearing premise
Uncertainty in deployed LLM applications is routinely transformed and reused across model internals, workflow stages, component boundaries, persistent state, and human or organisational processes in ways that require and benefit from a new principled systems-level treatment beyond single-model analysis.
What would settle it
A set of measurements or case studies of deployed LLM applications showing that uncertainty signals do not measurably transform or propagate across the described boundaries in ways that affect detection or governance.
Figures
read the original abstract
Uncertainty in large language model (LLM)-based systems is often studied at the level of a single model output, yet deployed LLM applications are compound systems in which uncertainty is transformed and reused across model internals, workflow stages, component boundaries, persistent state, and human or organisational processes. Without principled treatment of how uncertainty is carried and reused across these boundaries, early errors can propagate and compound in ways that are difficult to detect and govern. This paper develops a systems-level account of uncertainty propagation. It introduces a conceptual framing for characterising propagated uncertainty signals, presents a structured taxonomy spanning intra-model (P1), system-level (P2), and socio-technical (P3) propagation mechanisms, synthesises cross-cutting engineering insights, and identifies five open research challenges.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that uncertainty in LLM-based systems is transformed and reused across model internals, workflow stages, component boundaries, persistent state, and socio-technical processes, necessitating a systems-level account beyond single-model analysis. It introduces a conceptual framing for 'propagated uncertainty signals' and a taxonomy of propagation mechanisms divided into intra-model (P1), system-level (P2), and socio-technical (P3) categories, followed by cross-cutting engineering insights and five open research challenges.
Significance. If the taxonomy holds, the work provides a useful organizing lens for an emerging area, synthesizing observations about compound LLM systems and directing attention to propagation across boundaries. Strengths include the clear distinction among the three levels and the explicit listing of open challenges to guide follow-on research. As a purely conceptual contribution without empirical validation, formal derivations, or data, its significance will depend on community adoption and subsequent testing of the proposed structure.
minor comments (2)
- The abstract refers to 'five open research challenges' without enumerating them; including a brief list would improve the summary's standalone value.
- The terms 'propagated uncertainty signals' and the P1/P2/P3 labels are central to the taxonomy; ensure they receive explicit, early definitions with concrete examples in the introduction or framing section to aid reader comprehension.
Simulated Author's Rebuttal
We thank the referee for their positive review and recommendation of minor revision. We appreciate the recognition that the work offers a useful organizing lens through its three-level taxonomy and explicit open challenges.
Circularity Check
No significant circularity in conceptual taxonomy and framing
full rationale
The paper develops a systems-level conceptual framing and structured taxonomy for uncertainty propagation across intra-model (P1), system-level (P2), and socio-technical (P3) mechanisms, followed by engineering insights and open challenges. No equations, derivations, fitted parameters, or mathematical reductions appear in the manuscript. The contribution is a synthesis motivated by the observation that uncertainty crosses boundaries in compound LLM systems, with definitions, distinctions, and examples supplied directly rather than derived from prior self-citations or internal fits. This is a standard non-circular outcome for a taxonomy-style proposal whose central claim reduces only to coherent presentation of the framing.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Uncertainty in LLM-based systems is transformed and reused across model internals, workflow stages, component boundaries, persistent state, and human or organisational processes.
invented entities (2)
-
Propagated uncertainty signals
no independent evidence
-
P1 intra-model, P2 system-level, P3 socio-technical propagation mechanisms
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Llm-based agentic systems in medicine and healthcare.Nature Machine Intelligence, 6(12):1418–1420, 2024
Jianing Qiu, Kyle Lam, Guohao Li, Amish Acharya, Tien Yin Wong, Ara Darzi, Wu Yuan, and Eric J Topol. Llm-based agentic systems in medicine and healthcare.Nature Machine Intelligence, 6(12):1418–1420, 2024
2024
-
[2]
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
Jiayi Chen, Junyi Ye, and Guiling Wang. From standalone llms to integrated intelligence: A survey of compound al systems.arXiv preprint arXiv:2506.04565, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
Addison-Wesley Professional, 2025
Len Bass, Qinghua Lu, Ingo Weber, and Liming Zhu.Engineering AI systems: architecture and DevOps essentials. Addison-Wesley Professional, 2025
2025
-
[4]
Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
2020
-
[5]
A review of prominent paradigms for llm-based agents: Tool use, planning (including rag), and feedback learning
Xinzhe Li. A review of prominent paradigms for llm-based agents: Tool use, planning (including rag), and feedback learning. InProceedings of the 31st International Conference on Computational Linguistics, pages 9760–9779, 2025
2025
-
[6]
Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023
2023
-
[7]
Agent-as-a-judge: Evaluate agents with agents
Mingchen Zhuge, Changsheng Zhao, Dylan R Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, et al. Agent-as-a-judge: Evaluate agents with agents. InForty-second International Conference on Machine Learning, 2025
2025
-
[8]
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. Sok: Agentic skills–beyond tool use in llm agents.arXiv preprint arXiv:2602.20867, 2026. 24 Uncertainty Propagation in LLM-Based Systems
work page internal anchor Pith review arXiv 2026
-
[9]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022
2022
-
[10]
Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023
2023
-
[11]
Where llm agents fail and how they can learn from failures.arXiv preprint arXiv:2509.25370, 2025
Kunlun Zhu, Zijia Liu, Bingxuan Li, Muxin Tian, Yingxuan Yang, Jiaxun Zhang, Pengrui Han, Qipeng Xie, Fuyang Cui, Weijia Zhang, et al. Where llm agents fail and how they can learn from failures.arXiv preprint arXiv:2509.25370, 2025
-
[12]
Jingwen Zhou, Jieshan Chen, Qinghua Lu, Dehai Zhao, and Liming Zhu. Shielda: Structured handling of exceptions in llm-driven agentic workflows.arXiv preprint arXiv:2508.07935, 2025
-
[13]
Agentracer: Who is inducing failure in the llm agentic systems?arXiv preprint arXiv:2509.03312, 2025
Guibin Zhang, Junhao Wang, Junjie Chen, Wangchunshu Zhou, Kun Wang, and Shuicheng Yan. Agentracer: Who is inducing failure in the llm agentic systems?arXiv preprint arXiv:2509.03312, 2025
-
[14]
Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms
Miao Xiong, Zhiyuan Hu, Xinyang Lu, YIFEI LI, Jie Fu, Junxian He, and Bryan Hooi. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[15]
Calibrated language models must hallucinate
Adam Tauman Kalai and Santosh S Vempala. Calibrated language models must hallucinate. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 160–171, 2024
2024
-
[16]
Why Language Models Hallucinate
Adam Tauman Kalai, Ofir Nachum, Santosh S Vempala, and Edwin Zhang. Why language models hallucinate. arXiv preprint arXiv:2509.04664, 2025
work page internal anchor Pith review arXiv 2025
-
[17]
Mind the confidence gap: Overconfidence, calibration, and distractor effects in large language models.Transactions on Machine Learning Research, 2025
Prateek Chhikara. Mind the confidence gap: Overconfidence, calibration, and distractor effects in large language models.Transactions on Machine Learning Research, 2025
2025
-
[18]
Zhiqiu Xia, Jinxuan Xu, Yuqian Zhang, and Hang Liu. A survey of uncertainty estimation methods on large language models.arXiv preprint arXiv:2503.00172, 2025
-
[19]
A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions.ACM Computing Surveys, 2025
Ola Shorinwa, Zhiting Mei, Justin Lidard, Allen Z Ren, and Anirudha Majumdar. A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions.ACM Computing Surveys, 2025
2025
-
[20]
Sungmin Kang, Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, and Salman Avestimehr. Uncer- tainty quantification for hallucination detection in large language models: Foundations, methodology, and future directions.arXiv preprint arXiv:2510.12040, 2025
-
[21]
A survey of uncertainty estimation in llms: Theory meets practice,
Hsiu-Yuan Huang, Yutong Yang, Zhaoxi Zhang, Sanwoo Lee, and Yunfang Wu. A survey of uncertainty estimation in llms: Theory meets practice.arXiv preprint arXiv:2410.15326, 2024
-
[22]
Survey of uncertainty estimation in large language models-sources, methods, applications, and challenge
Jianfeng He, Linlin Yu, Changbin Li, Runing Yang, Fanglan Chen, Kangshuo Li, Min Zhang, Shuo Lei, Xuchao Zhang, Mohammad Beigi, et al. Survey of uncertainty estimation in large language models-sources, methods, applications, and challenge. 2025
2025
-
[23]
Uncertainty quantification and confidence calibration in large language models: A survey
Xiaoou Liu, Tiejin Chen, Longchao Da, Chacha Chen, Zhen Lin, and Hua Wei. Uncertainty quantification and confidence calibration in large language models: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6107–6117, 2025
2025
-
[24]
Toghrul Abbasli, Kentaroh Toyoda, Yuan Wang, Leon Witt, Muhammad Asif Ali, Yukai Miao, Dan Li, and Qingsong Wei. Comparing uncertainty measurement and mitigation methods for large language models: A systematic review.arXiv preprint arXiv:2504.18346, 2025
-
[25]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2024
work page internal anchor Pith review arXiv 2024
-
[26]
Llm-based agents for tool learning: A survey.Data Science and Engineering, pages 1–31, 2025
Weikai Xu, Chengrui Huang, Shen Gao, and Shuo Shang. Llm-based agents for tool learning: A survey.Data Science and Engineering, pages 1–31, 2025
2025
-
[27]
A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, December 2024
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, December 2024
2024
-
[28]
Chawla, Olaf Wiest, and Xiangliang Zhang
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: a survey of progress and challenges. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI ’24, 2024. 25 Uncertainty Propagation in LLM-Based Systems
2024
-
[29]
A survey on rag meeting llms: Towards retrieval-augmented large language models
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 6491–6501, 2024
2024
-
[30]
Understanding the planning of LLM agents: A survey
Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey.arXiv preprint arXiv:2402.02716, 2024
work page internal anchor Pith review arXiv 2024
-
[31]
The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025
2025
-
[32]
Cognitive mirage: A review of hallucinations in large language models
Hongbin Ye, Tong Liu, Aijia Zhang, Wei Hua, and Weiqiang Jia. Cognitive mirage: A review of hallucinations in large language models.arXiv preprint arXiv:2309.06794, 2023
-
[33]
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.Computational Linguistics, pages 1–46, 2025
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, and Yulong Chen. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.Computational Linguistics, pages 1–46, 2025
2025
-
[34]
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Trans
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Trans. Inf. Syst., 43(2), January 2025
2025
-
[35]
Towards reliable large language models: A survey on hallucination detection
Yao Pan, Linggang Kong, Jiaju Wu, Yonghui Yang, Hongfu Zuo, Ze Xiu, and Xiaodong Wang. Towards reliable large language models: A survey on hallucination detection. InInternational Conference on Intelligent Computing, pages 438–451. Springer, 2025
2025
-
[36]
A comprehensive survey of hallucination mitigation techniques in large language models
S. M Towhidul Islam Tonmoy, S M Mehedi Zaman, Vinija Jain, Anku Rani, Vipula Rawte, Aman Chadha, and Amitava Das. A comprehensive survey of hallucination mitigation techniques in large language models.arXiv preprint arXiv:2401.01313, 2024
-
[37]
Verbalizing llm’s higher-order uncertainty via imprecise probabilities
Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, and Masaki Adachi. Verbalizing llm’s higher-order uncertainty via imprecise probabilities. 2026
2026
-
[38]
Unconditional truthfulness: Learning uncondi- tional uncertainty of large language models
Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, and Artem Shelmanov. Unconditional truthfulness: Learning uncondi- tional uncertainty of large language models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35661–35...
2025
-
[39]
Artem Vazhentsev, Lyudmila Rvanova, Gleb Kuzmin, Ekaterina Fadeeva, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Mrinmaya Sachan, Preslav Nakov, et al. Uncertainty-aware attention heads: Efficient unsupervised uncertainty quantification for llms.arXiv preprint arXiv:2505.20045, 2025
-
[40]
Uncertainty-aware contrastive decoding
Hakyung Lee, Subeen Park, Joowang Kim, Sungjun Lim, and Kyungwoo Song. Uncertainty-aware contrastive decoding. InFindings of the Association for Computational Linguistics: ACL 2025, pages 26376–26391, 2025
2025
-
[41]
Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models
Mikhail L Arbuzov, Alexey A Shvets, and Sisong Beir. Beyond exponential decay: Rethinking error accumulation in large language models.arXiv preprint arXiv:2505.24187, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Learned hallucination detection in black-box LLMs using token-level entropy production rate,
Charles Moslonka, Hicham Randrianarivo, Arthur Garnier, and Emmanuel Malherbe. Learned hallucination detection in black-box llms using token-level entropy production rate.arXiv preprint arXiv:2509.04492, 2025
-
[43]
Bottom-up policy optimization: Your language model policy secretly contains internal policies
Yuqiao Tan, Minzheng Wang, Shizhu He, Huanxuan Liao, Chengfeng Zhao, Qiunan Lu, Tian Liang, Jun Zhao, and Kang Liu. Bottom-up policy optimization: Your language model policy secretly contains internal policies. arXiv preprint arXiv:2512.19673, 2025
-
[44]
Yiming Huang, Junyan Zhang, Zihao Wang, Biquan Bie, Yunzhong Qiu, Yi R Fung, and Xinlei He. Reppl: Recalibrating perplexity by uncertainty in semantic propagation and language generation for explainable qa hallucination detection.arXiv preprint arXiv:2505.15386, 2025
-
[45]
Numerical error analysis of large language models.arXiv preprint arXiv:2503.10251,
Stanislav Budzinskiy, Wenyi Fang, Longbin Zeng, and Philipp Petersen. Numerical error analysis of large language models.arXiv preprint arXiv:2503.10251, 2025
-
[46]
Amir Zur, Atticus Geiger, Ekdeep Singh Lubana, and Eric Bigelow. Are language models aware of the road not taken? token-level uncertainty and hidden state dynamics.arXiv preprint arXiv:2511.04527, 2025
-
[47]
Analysis of image-and-text uncertainty propagation in multimodal large language models with cardiac mr-based applications
Yucheng Tang, Yunguan Fu, Weixi Yi, Yipei Wang, Daniel C Alexander, Rhodri Davies, and Yipeng Hu. Analysis of image-and-text uncertainty propagation in multimodal large language models with cardiac mr-based applications. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 36–45. Springer, 2025. 26 Uncertainty Pr...
2025
-
[48]
Flue: Streamlined uncertainty estimation for large language models
Shiqi Gao, Tianxiang Gong, Zijie Lin, Runhua Xu, Haoyi Zhou, and Jianxin Li. Flue: Streamlined uncertainty estimation for large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16745–16753, 2025
2025
-
[49]
Tianyi Zhou, Johanne Medina, and Sanjay Chawla. Can llms detect their confabulations? estimating reliability in uncertainty-aware language models.arXiv preprint arXiv:2508.08139, 2025
-
[50]
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought, June 2025
Boxuan Zhang and Ruqi Zhang. CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought, June 2025
2025
-
[51]
Halluentity: Benchmarking and understanding entity-level hallucination detection.Transactions on Machine Learning Research, 2025
Min-Hsuan Yeh, Max Kamachee, Seongheon Park, and Yixuan Li. Halluentity: Benchmarking and understanding entity-level hallucination detection.Transactions on Machine Learning Research, 2025
2025
-
[52]
Ramzi Dakhmouche, Adrien Letellier, and Hossein Gorji. Can linear probes measure llm uncertainty? In NeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making, 2025
2025
-
[53]
Amirhosein Ghasemabadi and Di Niu. Can llms predict their own failures? self-awareness via internal circuits. arXiv preprint arXiv:2512.20578, 2025
-
[54]
Uncertainty- aware reward model: Teaching reward models to know what is unknown, 2025
Xingzhou Lou, Dong Yan, Wei Shen, Yuzi Yan, Jian Xie, and Junge Zhang. Uncertainty-aware reward model: Teaching reward models to know what is unknown.arXiv preprint arXiv:2410.00847, 2024
-
[55]
Confident adaptive language modeling.Advances in Neural Information Processing Systems, 35:17456–17472, 2022
Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Tran, Yi Tay, and Donald Metzler. Confident adaptive language modeling.Advances in Neural Information Processing Systems, 35:17456–17472, 2022
2022
-
[56]
Tokur: Token-level uncertainty estimation for large language model reasoning
Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, et al. Tokur: Token-level uncertainty estimation for large language model reasoning. In First Workshop on Foundations of Reasoning in Language Models, 2025
2025
-
[57]
arXiv preprint arXiv:2508.15260 , year=
Yichao Fu, Xuewei Wang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence.arXiv preprint arXiv:2508.15260, 2025
-
[58]
Rohin Manvi, Joey Hong, Tim Seyde, Maxime Labonne, Mathias Lechner, and Sergey Levine. Zero-overhead introspection for adaptive test-time compute.arXiv preprint arXiv:2512.01457, 2025
-
[59]
Mitigating token-level uncertainty in retrieval-augmented large language models.Authorea Preprints, 2024
Liz Yarie, Dominic Soriano, Leonard Kaczmarek, Benjamin Wilkinson, and Eduardo Vasquez. Mitigating token-level uncertainty in retrieval-augmented large language models.Authorea Preprints, 2024
2024
-
[60]
Jinhao Duan, James Diffenderfer, Sandeep Madireddy, Tianlong Chen, Bhavya Kailkhura, and Kaidi Xu. Up- rop: Investigating the uncertainty propagation of llms in multi-step agentic decision-making.arXiv preprint arXiv:2506.17419, 2025
-
[61]
Uncertainty propagation on llm agent
Qiwei Zhao, Dong Li, Yanchi Liu, Wei Cheng, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Huaxiu Yao, Chen Zhao, et al. Uncertainty propagation on llm agent. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6064–6073, 2025
2025
-
[62]
Agentic confidence calibration.arXiv preprint arXiv:2601.15778, 2026
Jiaxin Zhang, Caiming Xiong, and Chien-Sheng Wu. Agentic confidence calibration.arXiv preprint arXiv:2601.15778, 2026
-
[63]
CER: confidence enhanced reasoning in llms
Ali Razghandi, Seyed Mohammad Hadi Hosseini, and Mahdieh Soleymani Baghshah. CER: confidence enhanced reasoning in llms. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, pages 7918–7938. Association for Computational Linguistics, 2025
2025
-
[64]
Geometric uncertainty for detecting and correcting hallucinations in llms
Edward Phillips, Sean Wu, Soheila Molaei, Danielle Belgrave, Anshul Thakur, and David Clifton. Geometric uncertainty for detecting and correcting hallucinations in llms.arXiv preprint arXiv:2509.13813, 2025
-
[65]
Planu: Large language model reasoning through planning under uncertainty
Ziwei Deng, Mian Deng, Chenjing Liang, Zeming Gao, Chennan Ma, Chenxing Lin, Haipeng Zhang, Songzhu Mei, Siqi Shen, and Cheng Wang. Planu: Large language model reasoning through planning under uncertainty. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[66]
Fei Yu, Yingru Li, and Benyou Wang. Robust search with uncertainty-aware value models for language model reasoning.arXiv preprint arXiv:2502.11155, 2025
-
[67]
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
Deepro Choudhury, Sinead Williamson, Adam Goli´nski, Ning Miao, Freddie Bickford Smith, Michael Kirchhof, Yizhe Zhang, and Tom Rainforth. Bed-llm: Intelligent information gathering with llms and bayesian experimental design.arXiv preprint arXiv:2508.21184, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Uncertainty of thoughts: Uncertainty-aware planning enhances information seeking in llms
Zhiyuan Hu, Chumin Liu, Xidong Feng, Yilun Zhao, See-Kiong Ng, Anh Tuan Luu, Junxian He, Pang Wei W Koh, and Bryan Hooi. Uncertainty of thoughts: Uncertainty-aware planning enhances information seeking in llms. Advances in Neural Information Processing Systems, 37:24181–24215, 2024. 27 Uncertainty Propagation in LLM-Based Systems
2024
-
[69]
Reasoning in flux: Enhancing large language models reasoning through uncertainty-aware adaptive guidance
Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Junqi Dai, Qinyuan Cheng, Xuan-Jing Huang, and Xipeng Qiu. Reasoning in flux: Enhancing large language models reasoning through uncertainty-aware adaptive guidance. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2401–2416, 2024
2024
-
[70]
Rational tuning of llm cascades via probabilistic modeling.arXiv preprint arXiv:2501.09345, 2025
Michael J Zellinger and Matt Thomson. Rational tuning of llm cascades via probabilistic modeling.arXiv preprint arXiv:2501.09345, 2025
-
[71]
Shayan Kiyani, Sima Noorani, George Pappas, and Hamed Hassani. When to trust the cheap check: Weak and strong verification for reasoning.arXiv preprint arXiv:2602.17633, 2026
-
[72]
Uncertainty- aware hybrid inference with on-device small and remote large language models
Seungeun Oh, Jinhyuk Kim, Jihong Park, Seung-Woo Ko, Tony QS Quek, and Seong-Lyun Kim. Uncertainty- aware hybrid inference with on-device small and remote large language models. In2025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), pages 1–7. IEEE, 2025
2025
-
[73]
Yuqi Zhu, Ge Li, Xue Jiang, Jia Li, Hong Mei, Zhi Jin, and Yihong Dong. Uncertainty-guided chain-of-thought for code generation with llms.arXiv preprint arXiv:2503.15341, 2025
-
[74]
Robots that ask for help: Uncertainty alignment for large language model planners
Allen Z Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, et al. Robots that ask for help: Uncertainty alignment for large language model planners. In7th Annual Conference on Robot Learning, 2023
2023
-
[75]
Structured Uncertainty guided Clarification for LLM Agents
Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A Rossi, and Dinesh Manocha. Structured uncertainty guided clarification for llm agents.arXiv preprint arXiv:2511.08798, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[76]
Panagiotis Lymperopoulos and Vasanth Sarathy. Tools in the loop: Quantifying uncertainty of llm question answering systems that use tools.arXiv preprint arXiv:2505.16113, 2025
-
[77]
How to correctly report llm-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025
Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, and Kangwook Lee. How to correctly report llm-as-a-judge evaluations.arXiv preprint arXiv:2511.21140, 2025
-
[78]
Kg-uq: Knowledge graph-based uncertainty quantification for long text in large language models
Yingqing Yuan, Linwei Tao, Haohui Lu, Matloob Khushi, Imran Razzak, Mark Dras, Jian Yang, and Usman Naseem. Kg-uq: Knowledge graph-based uncertainty quantification for long text in large language models. In Companion Proceedings of the ACM on Web Conference 2025, pages 2071–2077, 2025
2025
-
[79]
Enhancing uncertainty modeling with semantic graph for hallucination detection
Kedi Chen, Qin Chen, Jie Zhou, Xinqi Tao, Bowen Ding, Jingwen Xie, Mingchen Xie, Peilong Li, and Zheng Feng. Enhancing uncertainty modeling with semantic graph for hallucination detection. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23586–23594, 2025
2025
-
[80]
Reasoning over uncertain text by generative large language models
Aliakbar Nafar, Kristen Brent Venable, and Parisa Kordjamshidi. Reasoning over uncertain text by generative large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24911–24920, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.