A prompt-based uncertainty decomposition separates action confidence from request uncertainty to enable clarification seeking in LLM agents, yielding F1 gains of 73% and 36% over baselines on two new underspecified benchmarks across five models.
BrowseConf: Confidence-guided test-time scaling for web agents,
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Mobile-Aptus uses supervised fine-tuning followed by semantic similarity retrieval and direct preference optimization to calibrate confidence scores in mobile agents, yielding over 17% average task success improvement on four benchmarks.
citing papers explorer
-
Uncertainty Decomposition for Clarification Seeking in LLM Agents
A prompt-based uncertainty decomposition separates action confidence from request uncertainty to enable clarification seeking in LLM agents, yielding F1 gains of 73% and 36% over baselines on two new underspecified benchmarks across five models.
-
Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents
Mobile-Aptus uses supervised fine-tuning followed by semantic similarity retrieval and direct preference optimization to calibrate confidence scores in mobile agents, yielding over 17% average task success improvement on four benchmarks.