pith. machine review for the scientific record. sign in

arxiv: 2507.14958 · v3 · submitted 2025-07-20 · 💻 cs.CL

Recognition: unknown

MUR: Momentum Uncertainty guided Reasoning

Fangzhi Xu, Haiteng Zhao, Hang Yan, Haoran Luo, Jian Zhang, Jun Liu, Luu Anh Tuan, Qika Lin, Rongman Xu, Xiaobao Wu, Yifei Li

Authors on Pith no claims yet
classification 💻 cs.CL
keywords reasoningmomentumcurrentmodelsscalingsupporttest-timeuncertainty
0
0 comments X
read the original abstract

Current models have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide current model' test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by by over 45% on average while improving accuracy from 0.33 to 3.46%.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

    cs.CR 2026-05 unverdicted novelty 6.0

    Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of sus...

  2. Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

    cs.CV 2026-04 unverdicted novelty 5.0

    TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolutio...