arxiv: 2604.17821 · v2 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent

Lingfeng Zhang , Yongan Sun , Jinpeng Hu , Hui Ma , Yang Ying , Kuien Liu , Zenglin Shi , Meng Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:47 UTC · model grok-4.3

classification 💻 cs.AI

keywords autonomous web agentsuncertainty quantificationadaptive planningMonte Carlo tree searchlarge language modelsweb navigationaleatoric uncertaintyepistemic uncertainty

0 comments

The pith

WebUncertainty improves web agent performance by handling uncertainty in both planning and reasoning steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WebUncertainty to let language-model agents complete complex instructions on live websites more reliably. Current agents often fail on long tasks because they use fixed plans that cannot adjust to new page states and because reasoning steps contain errors that accumulate. The framework adds one mechanism that switches planning styles according to how uncertain the overall task appears and another that runs Monte Carlo tree search while scoring each possible action for two kinds of uncertainty. Experiments show higher success rates than prior agents on the standard WebArena and WebVoyager test collections. Readers should care because more robust web agents could automate routine online work once they cope with the unpredictability of real pages.

Core claim

The paper claims that a dual-level uncertainty framework, consisting of a Task Uncertainty-Driven Adaptive Planning Mechanism together with an Action Uncertainty-Driven MCTS Reasoning Mechanism that uses the ConActU strategy to quantify aleatoric and epistemic uncertainty, produces more robust decisions and higher task-completion rates than existing rigid-planning baselines on WebArena and WebVoyager.

What carries the argument

Dual-level uncertainty mechanisms: Task Uncertainty-Driven Adaptive Planning that chooses planning modes and Action Uncertainty-Driven MCTS Reasoning that incorporates the Confidence-induced Action Uncertainty (ConActU) strategy to quantify aleatoric and epistemic uncertainty for search guidance.

If this is right

Planning mode selection becomes responsive to the degree of task novelty rather than fixed in advance.
MCTS search favors actions whose uncertainty scores indicate lower risk of hallucinated or unstable outcomes.
Long-horizon web tasks accumulate fewer errors because uncertainty signals interrupt poor reasoning paths early.
Benchmark scores rise because the agent avoids both overly conservative and overly reckless action sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of uncertainty types could be tested in agents that control desktop software or mobile apps.
ConActU-style scoring might be combined with other calibration methods already used inside large language models.
Performance on very long tasks or on pages with heavy JavaScript dynamics would provide a direct stress test of the approach.
If the uncertainty estimates prove stable across sites, the method could serve as a plug-in module for other web-agent architectures.

Load-bearing premise

The ConActU strategy can reliably measure and separate aleatoric from epistemic uncertainty in live web environments so that the MCTS component makes better choices than it would without those measurements.

What would settle it

Disabling or randomizing the uncertainty estimates inside ConActU and then re-running the full system on WebArena to check whether success rates fall back to or below the level of the non-uncertainty baselines.

Figures

Figures reproduced from arXiv: 2604.17821 by Hui Ma, Jinpeng Hu, Kuien Liu, Lingfeng Zhang, Meng Wang, Yang Ying, Yongan Sun, Zenglin Shi.

**Figure 2.** Figure 2: Overview of WebUncertainty. The framework decouples the process into Task Uncertainty-Driven [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation study using the Qwen-Max backbone. Planning Mechanism and the Action UncertaintyDriven MCTS Reasoning Mechanism. The comparative results are visualized in [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hallucination-prone reasoning. To address these limitations, we propose WebUncertainty, a novel autonomous agent framework designed to tackle dual-level uncertainty in planning and reasoning. Specifically, we design a Task Uncertainty-Driven Adaptive Planning Mechanism that adaptively selects planning modes to navigate unknown environments. Furthermore, we introduce an Action Uncertainty-Driven Monte Carlo tree search (MCTS) Reasoning Mechanism. This mechanism incorporates the Confidence-induced Action Uncertainty (ConActU) strategy to quantify both aleatoric uncertainty (AU) and epistemic uncertainty (EU), thereby optimizing the search process and guiding robust decision-making. Experimental results on the WebArena and WebVoyager benchmarks demonstrate that WebUncertainty achieves superior performance compared to state-of-the-art baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WebUncertainty adds dual uncertainty mechanisms to web agents and shows benchmark gains, but the ConActU uncertainty separation may lack rigorous support.

read the letter

The main point is that WebUncertainty uses task uncertainty to adapt its planning strategy and action uncertainty in an MCTS setup with ConActU to handle both aleatoric and epistemic uncertainty for better web agent performance. It outperforms baselines on WebArena and WebVoyager. The work is new in applying this dual-level approach specifically to autonomous web navigation with LLMs. The adaptive planning mechanism helps switch between modes in uncertain environments, and the MCTS with uncertainty guidance aims to make reasoning more robust against hallucinations and dynamic changes. It does well by focusing on practical challenges like long-horizon tasks and testing on real benchmarks that involve actual web interactions. This gives a clear picture of engineering improvements for automation tools. The soft spots center on the ConActU strategy. The claim that it reliably quantifies and separates the two uncertainty types in web settings is load-bearing for the contribution, but if it relies primarily on LLM confidence without additional modeling or empirical checks for calibration in partially observable states, the gains might not stem from the dual-level aspect. Standard MCTS or the planning module could explain the results instead. The stress-test concern holds unless the full paper has strong ablations or validation experiments. This paper is for the community working on LLM agents for web tasks. It has enough substance in its proposal and evaluation to go through peer review rather than a desk reject. I recommend sending it to referees for a closer look at the methods.

Referee Report

2 major / 2 minor

Summary. The paper proposes WebUncertainty, a framework for autonomous web agents that addresses limitations in rigid planning and hallucination-prone reasoning by introducing dual-level uncertainty handling. It features a Task Uncertainty-Driven Adaptive Planning Mechanism that selects planning modes adaptively and an Action Uncertainty-Driven MCTS Reasoning Mechanism that uses the ConActU strategy to quantify and separate aleatoric uncertainty (AU) and epistemic uncertainty (EU) for optimized search and robust decisions. Experiments on WebArena and WebVoyager benchmarks report superior performance over state-of-the-art baselines.

Significance. If the performance gains are shown to stem specifically from the uncertainty mechanisms rather than increased search budget or other factors, this work could advance reliable long-horizon web agent execution by providing a principled way to handle uncertainty in dynamic, partially observable environments. The combination of adaptive planning with MCTS guided by decomposed uncertainty estimates represents a targeted extension of existing LLM-agent techniques.

major comments (2)

[§3.2] §3.2 (Action Uncertainty-Driven MCTS Reasoning Mechanism): The ConActU strategy is presented as quantifying both AU and EU to optimize MCTS, but the description relies on LLM confidence scores without explicit equations, calibration against ground-truth uncertainty labels, or ablation showing that the AU/EU decomposition improves decision quality over standard MCTS or single-score baselines. This is load-bearing for the central claim that dual-level uncertainty drives the reported gains.
[§4] §4 (Experiments): The superior performance on WebArena and WebVoyager is attributed to the full framework, yet no ablation isolates the contribution of ConActU versus the adaptive planning module alone or versus MCTS with uniform exploration. Without these controls, it is impossible to confirm that the uncertainty quantification is responsible for the improvements rather than confounding factors such as expanded search budget.

minor comments (2)

[Abstract and §3] The abstract and methods should include a brief formal definition or pseudocode for how ConActU computes the AU/EU split from LLM outputs.
[Figure 2] Figure 2 (MCTS tree illustration) would benefit from explicit labeling of how AU and EU values influence node selection and backpropagation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment point by point below, providing clarifications on the uncertainty mechanisms and committing to revisions that strengthen the empirical support for our claims without overstating current results.

read point-by-point responses

Referee: [§3.2] §3.2 (Action Uncertainty-Driven MCTS Reasoning Mechanism): The ConActU strategy is presented as quantifying both AU and EU to optimize MCTS, but the description relies on LLM confidence scores without explicit equations, calibration against ground-truth uncertainty labels, or ablation showing that the AU/EU decomposition improves decision quality over standard MCTS or single-score baselines. This is load-bearing for the central claim that dual-level uncertainty drives the reported gains.

Authors: We agree that Section 3.2 would benefit from greater formality. The current text describes ConActU as deriving AU from entropy over action confidence distributions and EU from variance across sampled LLM responses, but explicit equations were omitted for brevity. In the revised manuscript we will insert the precise formulations (AU as normalized entropy of the confidence vector and EU as sample variance) along with a short discussion of why direct calibration against ground-truth uncertainty labels is infeasible in partially observable web environments. We will also add a targeted ablation that replaces the AU/EU decomposition with a single aggregated confidence score inside the same MCTS budget, allowing readers to isolate the value of the decomposition. revision: yes
Referee: [§4] §4 (Experiments): The superior performance on WebArena and WebVoyager is attributed to the full framework, yet no ablation isolates the contribution of ConActU versus the adaptive planning module alone or versus MCTS with uniform exploration. Without these controls, it is impossible to confirm that the uncertainty quantification is responsible for the improvements rather than confounding factors such as expanded search budget.

Authors: We acknowledge the concern. The reported results compare the complete WebUncertainty agent against external baselines, but do not yet disentangle the two proposed modules or control for search effort. In the revision we will add three new controlled experiments: (i) adaptive planning alone (no MCTS), (ii) MCTS with uniform exploration (no ConActU), and (iii) full ConActU-MCTS, all executed under identical LLM-call and simulation budgets. These controls will be reported alongside the original tables so that the incremental contribution of each uncertainty component can be assessed directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on benchmark results rather than self-referential derivation.

full rationale

The paper introduces Task Uncertainty-Driven Adaptive Planning and Action Uncertainty-Driven MCTS with ConActU as novel mechanisms, then reports superior performance on WebArena and WebVoyager. No equations, parameter-fitting steps, or derivation chains are described that reduce a claimed result to its own inputs by construction. Self-citations, if present, are not load-bearing for the central performance claim, which is framed as an empirical outcome. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only view yields no visible free parameters, axioms, or invented entities beyond the named ConActU strategy; the central claim therefore rests on unstated implementation choices and benchmark-specific tuning.

invented entities (1)

ConActU strategy no independent evidence
purpose: Quantify aleatoric and epistemic uncertainty to optimize MCTS reasoning
Introduced in the abstract as the core of the action uncertainty mechanism; no independent evidence provided.

pith-pipeline@v0.9.0 · 5481 in / 1210 out tokens · 27105 ms · 2026-05-10T04:47:47.235756+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 33 canonical work pages · 1 internal anchor

[1]

Proceedings of the 41st

Zheng, Boyuan and Gou, Boyu and Kil, Jihyung and Sun, Huan and Su, Yu , editor =. Proceedings of the 41st. 2024 , pages =

2024
[2]

Proceedings of the 63rd

Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and Huang, Fei , editor =. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.508 , abstract =

work page doi:10.18653/v1/2025.acl-long.508 2025
[3]

Findings of the

Pan, Jianfeng and Deng, Senyou and Huang, Shaomang , editor =. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-emnlp.700 , abstract =

work page doi:10.18653/v1/2025.findings-emnlp.700 2025
[4]

Proceedings of the 37th

Deng, Xiang and Gu, Yu and Zheng, Boyuan and Chen, Shijie and Stevens, Samuel and Wang, Boshi and Sun, Huan and Su, Yu , month = dec, year =. Proceedings of the 37th
[5]

International

Yu, Xiao and Peng, Baolin and Vajipey, Vineeth and Cheng, Hao and Galley, Michel and Gao, Jianfeng and Yu, Zhou , editor =. International. 2025 , pages =

2025
[6]

Proceedings of the 63rd

Huang, Tenghao and Basu, Kinjal and Abdelaziz, Ibrahim and Kapanipathi, Pavan and May, Jonathan and Chen, Muhao , editor =. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1464 , abstract =

work page doi:10.18653/v1/2025.acl-long.1464 2025
[7]

Explorer: scaling exploration-driven web trajectory synthesis for multimodal web agents , isbn =

Pahuja, Vardaan and Lu, Yadong and Rosset, Corby and Gou, Boyu and Mitra, Arindam and Whitehead, Spencer and Su, Yu and Awadallah, Ahmed Hassan , editor =. Explorer: scaling exploration-driven web trajectory synthesis for multimodal web agents , isbn =. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-acl.326 , abstract =

work page doi:10.18653/v1/2025.findings-acl.326 2025
[8]

and Zhou, Shuyan and Neubig, Graham , editor =

Song, Yueqi and Xu, Frank F. and Zhou, Shuyan and Neubig, Graham , editor =. Beyond browsing:. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-acl.577 , abstract =

work page doi:10.18653/v1/2025.findings-acl.577 2025
[9]

Proceedings of the 42nd

Erdogan, Lutfi Eren and Lee, Nicholas and Kim, Sehoon and Moon, Suhong and Furuta, Hiroki and Anumanchipalli, Gopala and Keutzer, Kurt and Gholami, Amir , month = jul, year =. Proceedings of the 42nd
[10]

Yang, Ke and Liu, Yao and Chaudhary, Sapana and Fakoor, Rasool and Chaudhari, Pratik and Karypis, George and Rangwala, Huzefa , year =. The
[11]

Proceedings of the

Niu, Runliang and Li, Jindong and Wang, Shiqi and Fu, Yali and Hu, Xiyu and Leng, Xueyuan and Kong, He and Chang, Yi and Wang, Qi , editor =. Proceedings of the. 2024 , pages =. doi:10.24963/ijcai.2024/711 , language =

work page doi:10.24963/ijcai.2024/711 2024
[12]

International

Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Ou, Tianyue and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , editor =. International. 2024 , pages =

2024
[13]

A real-world WebAgent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856, 2023

Gur, Izzeddin and Furuta, Hiroki and Huang, Austin and Safdari, Mustafa and Matsuo, Yutaka and Eck, Douglas and Faust, Aleksandra , editor =. A real-world. International. 2024 , pages =. doi:10.48550/arXiv.2307.12856 , language =

work page doi:10.48550/arxiv.2307.12856 2024
[14]

Multimodal web navigation with instruction-finetuned foundation models , volume =

Furuta, Hiroki and Lee, Kuang-Huei and Nachum, Ofir and Matsuo, Yutaka and Faust, Aleksandra and Gu, Shixiang and Gur, Izzeddin , editor =. Multimodal web navigation with instruction-finetuned foundation models , volume =. International. 2024 , pages =

2024
[15]

Large language models as commonsense knowledge for large-scale task planning , volume =

Zhao, Zirui and Lee, Wee Sun and Hsu, David , editor =. Large language models as commonsense knowledge for large-scale task planning , volume =. Advances in. 2023 , pages =

2023
[16]

Reinforcement learning on web interfaces using workflow-guided exploration , url =

Liu, Evan Zheran and Guu, Kelvin and Pasupat, Panupong and Shi, Tianlin and Liang, Percy , month = feb, year =. Reinforcement learning on web interfaces using workflow-guided exploration , url =. International
[17]

Sodhi, Paloma and Branavan, S. R. K. and Artzi, Yoav and McDonald, Ryan , month = aug, year =. First
[18]

Large language model powered agents in the web , isbn =

Deng, Yang and Zhang, An and Lin, Yankai and Chen, Xu and Wen, Ji-Rong and Chua, Tat-Seng , month = may, year =. Large language model powered agents in the web , isbn =. Companion. doi:10.1145/3589335.3641240 , abstract =

work page doi:10.1145/3589335.3641240
[19]

Abuelsaad, Tamer and Akkil, Deepak and Dey, Prasenjit and Jagmohan, Ashish and Vempaty, Aditya and Kokku, Ravi , month = jul, year =. Agent-. doi:10.48550/arXiv.2407.13032 , abstract =

work page doi:10.48550/arxiv.2407.13032
[20]

Journal of Computer Science and Technology , author =

General framework of. Journal of Computer Science and Technology , author =. doi:10.1007/s11390-025-5951-5 , abstract =

work page doi:10.1007/s11390-025-5951-5
[21]

A zero-shot language agent for computer control with structured reflection , url =

Li, Tao and Li, Gang and Deng, Zhiwei and Wang, Bryan and Li, Yang , editor =. A zero-shot language agent for computer control with structured reflection , url =. Findings of the. 2023 , pages =. doi:10.18653/v1/2023.findings-emnlp.753 , abstract =

work page doi:10.18653/v1/2023.findings-emnlp.753 2023
[22]

URL https://doi.org/10.18653/v1/2024.acl-long.371

He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong , year =. Proceedings of the 62nd. doi:10.18653/v1/2024.acl-long.371 , language =

work page doi:10.18653/v1/2024.acl-long.371 2024
[23]

Language agent tree search unifies reasoning, acting, and planning in language models , volume =

Zhou, Andy and Yan, Kai and Shlapentokh-Rothman, Michal and Wang, Haohan and Wang, Yu-Xiong , month = jul, year =. Language agent tree search unifies reasoning, acting, and planning in language models , volume =. Proceedings of the 41st
[24]

Uncertainty propagation on

Zhao, Qiwei and Li, Dong and Liu, Yanchi and Cheng, Wei and Sun, Yiyou and Oishi, Mika and Osaki, Takao and Matsuda, Katsushi and Yao, Huaxiu and Zhao, Chen and Chen, Haifeng and Zhao, Xujiang , year =. Uncertainty propagation on. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-long.302 , language =

work page doi:10.18653/v1/2025.acl-long.302 2025
[25]

doi:10.48550/ARXIV.2603.20366 , abstract =

Zhang, Xuanwang and Han, Yuteng and Qi, Jinnan and Xie, Mulong and Wu, Zhen and Dai, Xinyu , year =. doi:10.48550/ARXIV.2603.20366 , abstract =

work page doi:10.48550/arxiv.2603.20366
[26]

Science China Information Sciences , author =

The rise and potential of large language model based agents: a survey , volume =. Science China Information Sciences , author =. 2025 , pages =. doi:10.1007/s11432-024-4222-0 , abstract =

work page doi:10.1007/s11432-024-4222-0 2025
[27]

Ask-before-plan: proactive language agents for real-world planning , shorttitle =

Zhang, Xuan and Deng, Yang and Ren, Zifeng and Ng, See-Kiong and Chua, Tat-Seng , year =. Ask-before-plan: proactive language agents for real-world planning , shorttitle =. Findings of the. doi:10.18653/v1/2024.findings-emnlp.636 , language =

work page doi:10.18653/v1/2024.findings-emnlp.636 2024
[28]

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Gupta, Tanmay and Wolters, Piper and Ma, Zixian and Sushko, Peter and Pang, Rock Yuren and Llanes, Diego and Yang, Yue and Anderson, Taira and Zheng, Boyuan and Ren, Zhongzheng and Trivedi, Harsh and Blanton, Taylor and Ouellette, Caleb and Han, Winson and Farhadi, Ali and Krishna, Ranjay , month = apr, year =. doi:10.48550/arXiv.2604.08516 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.08516
[29]

Proceedings of the

Zhang, Yao and Ma, Zijian and Ma, Yunpu and Han, Zhen and Wu, Yu and Tresp, Volker , month = feb, year =. Proceedings of the. doi:10.1609/aaai.v39i22.34505 , abstract =

work page doi:10.1609/aaai.v39i22.34505
[30]

Rethinking external slow-thinking: from snowball errors to probability of correct reasoning , issn =

Gan, Zeyu and Liao, Yun and Liu, Yong , month = oct, year =. Rethinking external slow-thinking: from snowball errors to probability of correct reasoning , issn =. Proceedings of the 42nd
[31]

Tree Search for Language Model Agents,

Koh, Jing Yu and McAleer, Stephen and Fried, Daniel and Salakhutdinov, Ruslan , year =. Tree search for language model agents , copyright =. doi:10.48550/ARXIV.2407.01476 , abstract =

work page doi:10.48550/arxiv.2407.01476
[32]

arXiv preprint arXiv:2601.12538 , year=

Wei, Tianxin and Li, Ting-Wei and Liu, Zhining and Ning, Xuying and Yang, Ze and Zou, Jiaru and Zeng, Zhichen and Qiu, Ruizhong and Lin, Xiao and Fu, Dongqi and Li, Zihao and Ai, Mengting and Zhou, Duo and Bao, Wenxuan and Li, Yunzhe and Li, Gaotang and Qian, Cheng and Wang, Yu and Tang, Xiangru and Xiao, Yin and Fang, Liri and Liu, Hui and Tang, Xianfeng...

work page doi:10.48550/arxiv.2601.12538
[33]

doi:10.48550/ARXIV.2603.12710 , abstract =

Shahnovsky, Orit and Dror, Rotem , year =. doi:10.48550/ARXIV.2603.12710 , abstract =

work page doi:10.48550/arxiv.2603.12710
[34]

Agentic web: weaving the next web with

Yang, Yingxuan and Ma, Mulei and Huang, Yuxuan and Chai, Huacan and Gong, Chenyu and Geng, Haoran and Zhou, Yuanjian and Wen, Ying and Fang, Meng and Chen, Muhao and Gu, Shangding and Jin, Ming and Spanos, Costas and Yang, Yang and Abbeel, Pieter and Song, Dawn and Zhang, Weinan and Wang, Jun , year =. Agentic web: weaving the next web with. doi:10.48550/...

work page doi:10.48550/arxiv.2507.21206
[35]

Proceedings of the 63rd

Hu, Xueyu and Xiong, Tao and Yi, Biao and Wei, Zishu and Xiao, Ruixuan and Chen, Yurun and Ye, Jiasheng and Tao, Meiling and Zhou, Xiangxin and Zhao, Ziyu and Li, Yuhuai and Xu, Shengze and Wang, Shenzhi and Xu, Xinchen and Qiao, Shuofei and Wang, Zhaokai and Kuang, Kun and Zeng, Tieyong and Wang, Liang and Li, Jiwei and Jiang, Yuchen Eleanor and Zhou, Wa...

work page doi:10.18653/v1/2025.acl-long.369 2025
[36]

and Li, Qing , month = aug, year =

Ning, Liangbo and Liang, Ziran and Jiang, Zhuohang and Qu, Haohao and Ding, Yujuan and Fan, Wenqi and Wei, Xiao-yong and Lin, Shanru and Liu, Hui and Yu, Philip S. and Li, Qing , month = aug, year =. A survey of. Proceedings of the 31st. doi:10.1145/3711896.3736555 , language =

work page doi:10.1145/3711896.3736555
[37]

Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, Yirong Chen, and Ding Wang

Nguyen, Dang and Chen, Jian and Wang, Yu and Wu, Gang and Park, Namyong and Hu, Zhengmian and Lyu, Hanjia and Wu, Junda and Aponte, Ryan and Xia, Yu and Li, Xintong and Shi, Jing and Chen, Hongjie and Lai, Viet Dac and Xie, Zhouhang and Kim, Sungchul and Zhang, Ruiyi and Yu, Tong and Tanjim, Mehrab and Ahmed, Nesreen K. and Mathur, Puneet and Yoon, Seungh...

work page doi:10.18653/v1/2025.findings-acl.1158 2025
[38]

2025 , pages =

ACM Transactions on Information Systems , author =. 2025 , pages =. doi:10.1145/3729421 , abstract =

work page doi:10.1145/3729421 2025
[39]

doi: 10.1145/3789261

A survey on the optimization of large language model-based agents , volume =. ACM Computing Surveys , author =. 2026 , pages =. doi:10.1145/3789261 , abstract =

work page doi:10.1145/3789261 2026
[40]

From system 1 to system 2: A survey of reasoning large language models.IEEE Trans

From system 1 to system 2: a survey of reasoning large language models , volume =. IEEE Transactions on Pattern Analysis and Machine Intelligence , author =. 2026 , pages =. doi:10.1109/TPAMI.2025.3637037 , language =

work page doi:10.1109/tpami.2025.3637037 2026
[41]

Proceedings of the 36th

Pu, Kevin and Yang, Jim and Yuan, Angel and Ma, Minyi and Dong, Rui and Wang, Xinyu and Chen, Yan and Grossman, Tovi , month = oct, year =. Proceedings of the 36th. doi:10.1145/3586183.3606822 , language =

work page doi:10.1145/3586183.3606822
[42]

Browsing like human: a multimodal web agent with experiential fast-and-slow thinking , shorttitle =

Luo, Haohao and Kuang, Jiayi and Liu, Wei and Shen, Ying and Luan, Jian and Deng, Yang , year =. Browsing like human: a multimodal web agent with experiential fast-and-slow thinking , shorttitle =. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-long.697 , language =

work page doi:10.18653/v1/2025.acl-long.697 2025
[43]

Proceedings of the 63rd

He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Zhang, Hongming and Fang, Tianqing and Lan, Zhenzhong and Yu, Dong , year =. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-long.1336 , language =

work page doi:10.18653/v1/2025.acl-long.1336 2025
[44]

In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Xia, Zhiqiu and Xu, Jinxuan and Zhang, Yuqian and Liu, Hang , year =. A survey of uncertainty estimation methods on large language models , url =. Findings of the. doi:10.18653/v1/2025.findings-acl.1101 , language =

work page doi:10.18653/v1/2025.findings-acl.1101 2025
[45]

Estimating llm uncertainty with evidence.arXiv preprint arXiv:2502.00290, 2025

Ma, Huan and Chen, Jingdong and Zhou, Joey Tianyi and Wang, Guangyu and Zhang, Changqing , month = may, year =. Estimating. doi:10.48550/arXiv.2502.00290 , abstract =

work page doi:10.48550/arxiv.2502.00290