pith. machine review for the scientific record. sign in

arxiv: 2604.17821 · v2 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:47 UTC · model grok-4.3

classification 💻 cs.AI
keywords autonomous web agentsuncertainty quantificationadaptive planningMonte Carlo tree searchlarge language modelsweb navigationaleatoric uncertaintyepistemic uncertainty
0
0 comments X

The pith

WebUncertainty improves web agent performance by handling uncertainty in both planning and reasoning steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WebUncertainty to let language-model agents complete complex instructions on live websites more reliably. Current agents often fail on long tasks because they use fixed plans that cannot adjust to new page states and because reasoning steps contain errors that accumulate. The framework adds one mechanism that switches planning styles according to how uncertain the overall task appears and another that runs Monte Carlo tree search while scoring each possible action for two kinds of uncertainty. Experiments show higher success rates than prior agents on the standard WebArena and WebVoyager test collections. Readers should care because more robust web agents could automate routine online work once they cope with the unpredictability of real pages.

Core claim

The paper claims that a dual-level uncertainty framework, consisting of a Task Uncertainty-Driven Adaptive Planning Mechanism together with an Action Uncertainty-Driven MCTS Reasoning Mechanism that uses the ConActU strategy to quantify aleatoric and epistemic uncertainty, produces more robust decisions and higher task-completion rates than existing rigid-planning baselines on WebArena and WebVoyager.

What carries the argument

Dual-level uncertainty mechanisms: Task Uncertainty-Driven Adaptive Planning that chooses planning modes and Action Uncertainty-Driven MCTS Reasoning that incorporates the Confidence-induced Action Uncertainty (ConActU) strategy to quantify aleatoric and epistemic uncertainty for search guidance.

If this is right

  • Planning mode selection becomes responsive to the degree of task novelty rather than fixed in advance.
  • MCTS search favors actions whose uncertainty scores indicate lower risk of hallucinated or unstable outcomes.
  • Long-horizon web tasks accumulate fewer errors because uncertainty signals interrupt poor reasoning paths early.
  • Benchmark scores rise because the agent avoids both overly conservative and overly reckless action sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of uncertainty types could be tested in agents that control desktop software or mobile apps.
  • ConActU-style scoring might be combined with other calibration methods already used inside large language models.
  • Performance on very long tasks or on pages with heavy JavaScript dynamics would provide a direct stress test of the approach.
  • If the uncertainty estimates prove stable across sites, the method could serve as a plug-in module for other web-agent architectures.

Load-bearing premise

The ConActU strategy can reliably measure and separate aleatoric from epistemic uncertainty in live web environments so that the MCTS component makes better choices than it would without those measurements.

What would settle it

Disabling or randomizing the uncertainty estimates inside ConActU and then re-running the full system on WebArena to check whether success rates fall back to or below the level of the non-uncertainty baselines.

Figures

Figures reproduced from arXiv: 2604.17821 by Hui Ma, Jinpeng Hu, Kuien Liu, Lingfeng Zhang, Meng Wang, Yang Ying, Yongan Sun, Zenglin Shi.

Figure 1
Figure 1. Figure 1: The dual-level uncertainty challenges in complex web tasks. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of WebUncertainty. The framework decouples the process into Task Uncertainty-Driven [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study using the Qwen-Max back￾bone. Planning Mechanism and the Action Uncertainty￾Driven MCTS Reasoning Mechanism. The com￾parative results are visualized in [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hallucination-prone reasoning. To address these limitations, we propose WebUncertainty, a novel autonomous agent framework designed to tackle dual-level uncertainty in planning and reasoning. Specifically, we design a Task Uncertainty-Driven Adaptive Planning Mechanism that adaptively selects planning modes to navigate unknown environments. Furthermore, we introduce an Action Uncertainty-Driven Monte Carlo tree search (MCTS) Reasoning Mechanism. This mechanism incorporates the Confidence-induced Action Uncertainty (ConActU) strategy to quantify both aleatoric uncertainty (AU) and epistemic uncertainty (EU), thereby optimizing the search process and guiding robust decision-making. Experimental results on the WebArena and WebVoyager benchmarks demonstrate that WebUncertainty achieves superior performance compared to state-of-the-art baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes WebUncertainty, a framework for autonomous web agents that addresses limitations in rigid planning and hallucination-prone reasoning by introducing dual-level uncertainty handling. It features a Task Uncertainty-Driven Adaptive Planning Mechanism that selects planning modes adaptively and an Action Uncertainty-Driven MCTS Reasoning Mechanism that uses the ConActU strategy to quantify and separate aleatoric uncertainty (AU) and epistemic uncertainty (EU) for optimized search and robust decisions. Experiments on WebArena and WebVoyager benchmarks report superior performance over state-of-the-art baselines.

Significance. If the performance gains are shown to stem specifically from the uncertainty mechanisms rather than increased search budget or other factors, this work could advance reliable long-horizon web agent execution by providing a principled way to handle uncertainty in dynamic, partially observable environments. The combination of adaptive planning with MCTS guided by decomposed uncertainty estimates represents a targeted extension of existing LLM-agent techniques.

major comments (2)
  1. [§3.2] §3.2 (Action Uncertainty-Driven MCTS Reasoning Mechanism): The ConActU strategy is presented as quantifying both AU and EU to optimize MCTS, but the description relies on LLM confidence scores without explicit equations, calibration against ground-truth uncertainty labels, or ablation showing that the AU/EU decomposition improves decision quality over standard MCTS or single-score baselines. This is load-bearing for the central claim that dual-level uncertainty drives the reported gains.
  2. [§4] §4 (Experiments): The superior performance on WebArena and WebVoyager is attributed to the full framework, yet no ablation isolates the contribution of ConActU versus the adaptive planning module alone or versus MCTS with uniform exploration. Without these controls, it is impossible to confirm that the uncertainty quantification is responsible for the improvements rather than confounding factors such as expanded search budget.
minor comments (2)
  1. [Abstract and §3] The abstract and methods should include a brief formal definition or pseudocode for how ConActU computes the AU/EU split from LLM outputs.
  2. [Figure 2] Figure 2 (MCTS tree illustration) would benefit from explicit labeling of how AU and EU values influence node selection and backpropagation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment point by point below, providing clarifications on the uncertainty mechanisms and committing to revisions that strengthen the empirical support for our claims without overstating current results.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Action Uncertainty-Driven MCTS Reasoning Mechanism): The ConActU strategy is presented as quantifying both AU and EU to optimize MCTS, but the description relies on LLM confidence scores without explicit equations, calibration against ground-truth uncertainty labels, or ablation showing that the AU/EU decomposition improves decision quality over standard MCTS or single-score baselines. This is load-bearing for the central claim that dual-level uncertainty drives the reported gains.

    Authors: We agree that Section 3.2 would benefit from greater formality. The current text describes ConActU as deriving AU from entropy over action confidence distributions and EU from variance across sampled LLM responses, but explicit equations were omitted for brevity. In the revised manuscript we will insert the precise formulations (AU as normalized entropy of the confidence vector and EU as sample variance) along with a short discussion of why direct calibration against ground-truth uncertainty labels is infeasible in partially observable web environments. We will also add a targeted ablation that replaces the AU/EU decomposition with a single aggregated confidence score inside the same MCTS budget, allowing readers to isolate the value of the decomposition. revision: yes

  2. Referee: [§4] §4 (Experiments): The superior performance on WebArena and WebVoyager is attributed to the full framework, yet no ablation isolates the contribution of ConActU versus the adaptive planning module alone or versus MCTS with uniform exploration. Without these controls, it is impossible to confirm that the uncertainty quantification is responsible for the improvements rather than confounding factors such as expanded search budget.

    Authors: We acknowledge the concern. The reported results compare the complete WebUncertainty agent against external baselines, but do not yet disentangle the two proposed modules or control for search effort. In the revision we will add three new controlled experiments: (i) adaptive planning alone (no MCTS), (ii) MCTS with uniform exploration (no ConActU), and (iii) full ConActU-MCTS, all executed under identical LLM-call and simulation budgets. These controls will be reported alongside the original tables so that the incremental contribution of each uncertainty component can be assessed directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on benchmark results rather than self-referential derivation.

full rationale

The paper introduces Task Uncertainty-Driven Adaptive Planning and Action Uncertainty-Driven MCTS with ConActU as novel mechanisms, then reports superior performance on WebArena and WebVoyager. No equations, parameter-fitting steps, or derivation chains are described that reduce a claimed result to its own inputs by construction. Self-citations, if present, are not load-bearing for the central performance claim, which is framed as an empirical outcome. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only view yields no visible free parameters, axioms, or invented entities beyond the named ConActU strategy; the central claim therefore rests on unstated implementation choices and benchmark-specific tuning.

invented entities (1)
  • ConActU strategy no independent evidence
    purpose: Quantify aleatoric and epistemic uncertainty to optimize MCTS reasoning
    Introduced in the abstract as the core of the action uncertainty mechanism; no independent evidence provided.

pith-pipeline@v0.9.0 · 5481 in / 1210 out tokens · 27105 ms · 2026-05-10T04:47:47.235756+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    Proceedings of the 41st

    Zheng, Boyuan and Gou, Boyu and Kil, Jihyung and Sun, Huan and Su, Yu , editor =. Proceedings of the 41st. 2024 , pages =

  2. [2]

    Proceedings of the 63rd

    Wu, Jialong and Yin, Wenbiao and Jiang, Yong and Wang, Zhenglin and Xi, Zekun and Fang, Runnan and Zhang, Linhai and He, Yulan and Zhou, Deyu and Xie, Pengjun and Huang, Fei , editor =. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.508 , abstract =

  3. [3]

    Findings of the

    Pan, Jianfeng and Deng, Senyou and Huang, Shaomang , editor =. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-emnlp.700 , abstract =

  4. [4]

    Proceedings of the 37th

    Deng, Xiang and Gu, Yu and Zheng, Boyuan and Chen, Shijie and Stevens, Samuel and Wang, Boshi and Sun, Huan and Su, Yu , month = dec, year =. Proceedings of the 37th

  5. [5]

    International

    Yu, Xiao and Peng, Baolin and Vajipey, Vineeth and Cheng, Hao and Galley, Michel and Gao, Jianfeng and Yu, Zhou , editor =. International. 2025 , pages =

  6. [6]

    Proceedings of the 63rd

    Huang, Tenghao and Basu, Kinjal and Abdelaziz, Ibrahim and Kapanipathi, Pavan and May, Jonathan and Chen, Muhao , editor =. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1464 , abstract =

  7. [7]

    Explorer: scaling exploration-driven web trajectory synthesis for multimodal web agents , isbn =

    Pahuja, Vardaan and Lu, Yadong and Rosset, Corby and Gou, Boyu and Mitra, Arindam and Whitehead, Spencer and Su, Yu and Awadallah, Ahmed Hassan , editor =. Explorer: scaling exploration-driven web trajectory synthesis for multimodal web agents , isbn =. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-acl.326 , abstract =

  8. [8]

    and Zhou, Shuyan and Neubig, Graham , editor =

    Song, Yueqi and Xu, Frank F. and Zhou, Shuyan and Neubig, Graham , editor =. Beyond browsing:. Findings of the. 2025 , pages =. doi:10.18653/v1/2025.findings-acl.577 , abstract =

  9. [9]

    Proceedings of the 42nd

    Erdogan, Lutfi Eren and Lee, Nicholas and Kim, Sehoon and Moon, Suhong and Furuta, Hiroki and Anumanchipalli, Gopala and Keutzer, Kurt and Gholami, Amir , month = jul, year =. Proceedings of the 42nd

  10. [10]

    Yang, Ke and Liu, Yao and Chaudhary, Sapana and Fakoor, Rasool and Chaudhari, Pratik and Karypis, George and Rangwala, Huzefa , year =. The

  11. [11]

    Proceedings of the

    Niu, Runliang and Li, Jindong and Wang, Shiqi and Fu, Yali and Hu, Xiyu and Leng, Xueyuan and Kong, He and Chang, Yi and Wang, Qi , editor =. Proceedings of the. 2024 , pages =. doi:10.24963/ijcai.2024/711 , language =

  12. [12]

    International

    Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Ou, Tianyue and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , editor =. International. 2024 , pages =

  13. [13]

    A real-world WebAgent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856, 2023

    Gur, Izzeddin and Furuta, Hiroki and Huang, Austin and Safdari, Mustafa and Matsuo, Yutaka and Eck, Douglas and Faust, Aleksandra , editor =. A real-world. International. 2024 , pages =. doi:10.48550/arXiv.2307.12856 , language =

  14. [14]

    Multimodal web navigation with instruction-finetuned foundation models , volume =

    Furuta, Hiroki and Lee, Kuang-Huei and Nachum, Ofir and Matsuo, Yutaka and Faust, Aleksandra and Gu, Shixiang and Gur, Izzeddin , editor =. Multimodal web navigation with instruction-finetuned foundation models , volume =. International. 2024 , pages =

  15. [15]

    Large language models as commonsense knowledge for large-scale task planning , volume =

    Zhao, Zirui and Lee, Wee Sun and Hsu, David , editor =. Large language models as commonsense knowledge for large-scale task planning , volume =. Advances in. 2023 , pages =

  16. [16]

    Reinforcement learning on web interfaces using workflow-guided exploration , url =

    Liu, Evan Zheran and Guu, Kelvin and Pasupat, Panupong and Shi, Tianlin and Liang, Percy , month = feb, year =. Reinforcement learning on web interfaces using workflow-guided exploration , url =. International

  17. [17]

    Sodhi, Paloma and Branavan, S. R. K. and Artzi, Yoav and McDonald, Ryan , month = aug, year =. First

  18. [18]

    Large language model powered agents in the web , isbn =

    Deng, Yang and Zhang, An and Lin, Yankai and Chen, Xu and Wen, Ji-Rong and Chua, Tat-Seng , month = may, year =. Large language model powered agents in the web , isbn =. Companion. doi:10.1145/3589335.3641240 , abstract =

  19. [19]

    Abuelsaad, Tamer and Akkil, Deepak and Dey, Prasenjit and Jagmohan, Ashish and Vempaty, Aditya and Kokku, Ravi , month = jul, year =. Agent-. doi:10.48550/arXiv.2407.13032 , abstract =

  20. [20]

    Journal of Computer Science and Technology , author =

    General framework of. Journal of Computer Science and Technology , author =. doi:10.1007/s11390-025-5951-5 , abstract =

  21. [21]

    A zero-shot language agent for computer control with structured reflection , url =

    Li, Tao and Li, Gang and Deng, Zhiwei and Wang, Bryan and Li, Yang , editor =. A zero-shot language agent for computer control with structured reflection , url =. Findings of the. 2023 , pages =. doi:10.18653/v1/2023.findings-emnlp.753 , abstract =

  22. [22]

    URL https://doi.org/10.18653/v1/2024.acl-long.371

    He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong , year =. Proceedings of the 62nd. doi:10.18653/v1/2024.acl-long.371 , language =

  23. [23]

    Language agent tree search unifies reasoning, acting, and planning in language models , volume =

    Zhou, Andy and Yan, Kai and Shlapentokh-Rothman, Michal and Wang, Haohan and Wang, Yu-Xiong , month = jul, year =. Language agent tree search unifies reasoning, acting, and planning in language models , volume =. Proceedings of the 41st

  24. [24]

    Uncertainty propagation on

    Zhao, Qiwei and Li, Dong and Liu, Yanchi and Cheng, Wei and Sun, Yiyou and Oishi, Mika and Osaki, Takao and Matsuda, Katsushi and Yao, Huaxiu and Zhao, Chen and Chen, Haifeng and Zhao, Xujiang , year =. Uncertainty propagation on. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-long.302 , language =

  25. [25]

    doi:10.48550/ARXIV.2603.20366 , abstract =

    Zhang, Xuanwang and Han, Yuteng and Qi, Jinnan and Xie, Mulong and Wu, Zhen and Dai, Xinyu , year =. doi:10.48550/ARXIV.2603.20366 , abstract =

  26. [26]

    Science China Information Sciences , author =

    The rise and potential of large language model based agents: a survey , volume =. Science China Information Sciences , author =. 2025 , pages =. doi:10.1007/s11432-024-4222-0 , abstract =

  27. [27]

    Ask-before-plan: proactive language agents for real-world planning , shorttitle =

    Zhang, Xuan and Deng, Yang and Ren, Zifeng and Ng, See-Kiong and Chua, Tat-Seng , year =. Ask-before-plan: proactive language agents for real-world planning , shorttitle =. Findings of the. doi:10.18653/v1/2024.findings-emnlp.636 , language =

  28. [28]

    MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

    Gupta, Tanmay and Wolters, Piper and Ma, Zixian and Sushko, Peter and Pang, Rock Yuren and Llanes, Diego and Yang, Yue and Anderson, Taira and Zheng, Boyuan and Ren, Zhongzheng and Trivedi, Harsh and Blanton, Taylor and Ouellette, Caleb and Han, Winson and Farhadi, Ali and Krishna, Ranjay , month = apr, year =. doi:10.48550/arXiv.2604.08516 , abstract =

  29. [29]

    Proceedings of the

    Zhang, Yao and Ma, Zijian and Ma, Yunpu and Han, Zhen and Wu, Yu and Tresp, Volker , month = feb, year =. Proceedings of the. doi:10.1609/aaai.v39i22.34505 , abstract =

  30. [30]

    Rethinking external slow-thinking: from snowball errors to probability of correct reasoning , issn =

    Gan, Zeyu and Liao, Yun and Liu, Yong , month = oct, year =. Rethinking external slow-thinking: from snowball errors to probability of correct reasoning , issn =. Proceedings of the 42nd

  31. [31]

    Tree Search for Language Model Agents,

    Koh, Jing Yu and McAleer, Stephen and Fried, Daniel and Salakhutdinov, Ruslan , year =. Tree search for language model agents , copyright =. doi:10.48550/ARXIV.2407.01476 , abstract =

  32. [32]

    arXiv preprint arXiv:2601.12538 , year=

    Wei, Tianxin and Li, Ting-Wei and Liu, Zhining and Ning, Xuying and Yang, Ze and Zou, Jiaru and Zeng, Zhichen and Qiu, Ruizhong and Lin, Xiao and Fu, Dongqi and Li, Zihao and Ai, Mengting and Zhou, Duo and Bao, Wenxuan and Li, Yunzhe and Li, Gaotang and Qian, Cheng and Wang, Yu and Tang, Xiangru and Xiao, Yin and Fang, Liri and Liu, Hui and Tang, Xianfeng...

  33. [33]

    doi:10.48550/ARXIV.2603.12710 , abstract =

    Shahnovsky, Orit and Dror, Rotem , year =. doi:10.48550/ARXIV.2603.12710 , abstract =

  34. [34]

    Agentic web: weaving the next web with

    Yang, Yingxuan and Ma, Mulei and Huang, Yuxuan and Chai, Huacan and Gong, Chenyu and Geng, Haoran and Zhou, Yuanjian and Wen, Ying and Fang, Meng and Chen, Muhao and Gu, Shangding and Jin, Ming and Spanos, Costas and Yang, Yang and Abbeel, Pieter and Song, Dawn and Zhang, Weinan and Wang, Jun , year =. Agentic web: weaving the next web with. doi:10.48550/...

  35. [35]

    Proceedings of the 63rd

    Hu, Xueyu and Xiong, Tao and Yi, Biao and Wei, Zishu and Xiao, Ruixuan and Chen, Yurun and Ye, Jiasheng and Tao, Meiling and Zhou, Xiangxin and Zhao, Ziyu and Li, Yuhuai and Xu, Shengze and Wang, Shenzhi and Xu, Xinchen and Qiao, Shuofei and Wang, Zhaokai and Kuang, Kun and Zeng, Tieyong and Wang, Liang and Li, Jiwei and Jiang, Yuchen Eleanor and Zhou, Wa...

  36. [36]

    and Li, Qing , month = aug, year =

    Ning, Liangbo and Liang, Ziran and Jiang, Zhuohang and Qu, Haohao and Ding, Yujuan and Fan, Wenqi and Wei, Xiao-yong and Lin, Shanru and Liu, Hui and Yu, Philip S. and Li, Qing , month = aug, year =. A survey of. Proceedings of the 31st. doi:10.1145/3711896.3736555 , language =

  37. [37]

    Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, Yirong Chen, and Ding Wang

    Nguyen, Dang and Chen, Jian and Wang, Yu and Wu, Gang and Park, Namyong and Hu, Zhengmian and Lyu, Hanjia and Wu, Junda and Aponte, Ryan and Xia, Yu and Li, Xintong and Shi, Jing and Chen, Hongjie and Lai, Viet Dac and Xie, Zhouhang and Kim, Sungchul and Zhang, Ruiyi and Yu, Tong and Tanjim, Mehrab and Ahmed, Nesreen K. and Mathur, Puneet and Yoon, Seungh...

  38. [38]

    2025 , pages =

    ACM Transactions on Information Systems , author =. 2025 , pages =. doi:10.1145/3729421 , abstract =

  39. [39]

    doi: 10.1145/3789261

    A survey on the optimization of large language model-based agents , volume =. ACM Computing Surveys , author =. 2026 , pages =. doi:10.1145/3789261 , abstract =

  40. [40]

    From system 1 to system 2: A survey of reasoning large language models.IEEE Trans

    From system 1 to system 2: a survey of reasoning large language models , volume =. IEEE Transactions on Pattern Analysis and Machine Intelligence , author =. 2026 , pages =. doi:10.1109/TPAMI.2025.3637037 , language =

  41. [41]

    Proceedings of the 36th

    Pu, Kevin and Yang, Jim and Yuan, Angel and Ma, Minyi and Dong, Rui and Wang, Xinyu and Chen, Yan and Grossman, Tovi , month = oct, year =. Proceedings of the 36th. doi:10.1145/3586183.3606822 , language =

  42. [42]

    Browsing like human: a multimodal web agent with experiential fast-and-slow thinking , shorttitle =

    Luo, Haohao and Kuang, Jiayi and Liu, Wei and Shen, Ying and Luan, Jian and Deng, Yang , year =. Browsing like human: a multimodal web agent with experiential fast-and-slow thinking , shorttitle =. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-long.697 , language =

  43. [43]

    Proceedings of the 63rd

    He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Zhang, Hongming and Fang, Tianqing and Lan, Zhenzhong and Yu, Dong , year =. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-long.1336 , language =

  44. [44]

    In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

    Xia, Zhiqiu and Xu, Jinxuan and Zhang, Yuqian and Liu, Hang , year =. A survey of uncertainty estimation methods on large language models , url =. Findings of the. doi:10.18653/v1/2025.findings-acl.1101 , language =

  45. [45]

    Estimating llm uncertainty with evidence.arXiv preprint arXiv:2502.00290, 2025

    Ma, Huan and Chen, Jingdong and Zhou, Joey Tianyi and Wang, Guangyu and Zhang, Changqing , month = may, year =. Estimating. doi:10.48550/arXiv.2502.00290 , abstract =