Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Bowen Zuo , Yinglun Zhu

Authors on Pith no claims yet

classification 💻 cs.AI cs.CLcs.LGstat.ML

keywords computealgorithmsqueriesallocateallocationdatasetperformancerelative

read the original abstract

Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly. Compared to uniform allocation, our algorithms allocate more compute to challenging queries while maintaining accuracy on easier ones. Among challenging queries, our algorithms further learn to prioritize solvable instances, effectively reducing excessive computing on unsolvable queries. We theoretically prove that our algorithms achieve better compute efficiency than uniform allocation and empirically validate their effectiveness on math and code benchmarks. Specifically, our algorithms achieve up to an 11.10% performance improvement (15.04% relative) on the MATH-500 dataset, up to 10.82% (14.44% relative) on the AIME25 dataset, and up to an 11.23% performance improvement (15.29% relative) on the LiveCodeBench dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Active Testing of Large Language Models via Approximate Neyman Allocation
cs.AI 2026-05 unverdicted novelty 6.0

Active testing via surrogate semantic entropy stratification and approximate Neyman allocation reduces MSE by up to 28% versus uniform sampling and saves about 23% of the labeling budget on language and multimodal benchmarks.