ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

Jianghao Lin , Yuanyuan Shi , Xin Peng , Renjie Ding , Hairui Wang , Yuxuan Peng , Bizhe Bai , Weixi Song

show 5 more authors

Fengshuo Bai Huacan Chai Weinan Zhang Fei Huang Ying Wen

Authors on Pith no claims yet

classification 💻 cs.AI

keywords functionfine-grainedstructuredtoolprmcallinggenerationinferenceintra-call

0 comments

read the original abstract

Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with \textbf{ToolPRM}, a process reward model scoring each intra-call decision (function name and argument filling). We build the first fine-grained intra-call supervision dataset via function masking, rollout collection, and step-level annotation. ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy and yields consistent test-time gains on multiple function-calling benchmarks. We further show that structured generation follows ``\textbf{explore more but retain less}'', since early JSON errors are unrecoverable.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
cs.CL 2026-04 unverdicted novelty 7.0

DataPRM is a new process reward model for data analysis agents that detects silent errors via environment interaction and ternary rewards, yielding 7-11% gains on benchmarks and further RL improvements.