Recognition: unknown
Structured Uncertainty guided Clarification for LLM Agents
read the original abstract
LLM agents with tool-calling capabilities often fail when user instructions are ambiguous or incomplete, leading to incorrect invocations and task failures. Existing approaches operate in unstructured language spaces, generating clarifying questions through prompting strategies that lack principled criteria for determining which questions to ask and when to stop. We introduce a principled formulation of structured uncertainty that operates directly over tool parameters and their domains, cleanly separating specification uncertainty (what the user wants) from model uncertainty (what the LLM predicts). Our formulation uses Expected Value of Perfect Information (EVPI) to quantify the disambiguation value of each potential question, balanced against aspect-based cost modeling that prevents redundant questioning. We demonstrate the versatility of this formulation through two applications. First, SAGE-Agent uses structured uncertainty for inference-time question selection, achieving 7-39% higher coverage on ambiguous tasks while reducing clarification questions by 1.5-2.7x compared to strong prompting and uncertainty-based baselines. Second, we show that structured uncertainty provides effective training signals: uncertainty-guided reward modeling boosts When2Call accuracy from 36.5% to 65.2% (3B model) and 36.7% to 62.9% (7B model) through uncertainty-weighted GRPO training, demonstrating more sample-efficient reinforcement learning for tool-calling agents. To enable evaluation, we present ClarifyBench, the first multi-turn dynamic tool-calling disambiguation benchmark. Our results establish structured uncertainty as a principled framework that improves both inference-time interaction efficiency and training-time sample efficiency in tool-augmented agents.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
Uncertainty Propagation in LLM-Based Systems
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insight...
-
Don't Start What You Can't Finish: A Counterfactual Audit of Support-State Triage in LLM Agents
LLM agents overcommit on non-complete tasks at 41.7% unless given explicit support-state categories, which raise typed deferral accuracy to 91.7%.
-
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
HiL-Bench shows frontier AI agents fail to ask for help on incomplete tasks, recovering only a fraction of full-information performance, but RL training on Ask-F1 reward improves judgment and transfers across domains.
-
Agentic Coding Needs Proactivity, Not Just Autonomy
Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.
-
Position: agentic AI orchestration should be Bayes-consistent
Agentic AI orchestration should apply Bayesian principles for belief maintenance, updating from interactions, and utility-based action selection.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.