Recognition: unknown
ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling
read the original abstract
Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with \textbf{ToolPRM}, a process reward model scoring each intra-call decision (function name and argument filling). We build the first fine-grained intra-call supervision dataset via function masking, rollout collection, and step-level annotation. ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy and yields consistent test-time gains on multiple function-calling benchmarks. We further show that structured generation follows ``\textbf{explore more but retain less}'', since early JSON errors are unrecoverable.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
DataPRM is a new process reward model for data analysis agents that detects silent errors via environment interaction and ternary rewards, yielding 7-11% gains on benchmarks and further RL improvements.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.