pith. machine review for the scientific record. sign in

arxiv: 2503.16419 · v4 · submitted 2025-03-20 · 💻 cs.CL

Recognition: 3 theorem links

· Lean Theorem

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Andrew Wen, Guanchu Wang, Hanjie Chen, Hongyi Liu, Jiamu Zhang, Jiayi Yuan, Na Zou, Shaochen Zhong, Tianyi Zhang, Xia Hu, Yang Sui, Yu-Neng Chuang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 01:24 UTC · model grok-4.3

classification 💻 cs.CL
keywords efficient reasoninglarge language modelschain-of-thoughtoverthinkingsurveymodel optimizationinference efficiencysmall language models
0
0 comments X

The pith

A survey organizes methods to achieve efficient reasoning in large language models by reducing overthinking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper is the first structured survey on achieving efficient reasoning in large language models. It addresses the overthinking phenomenon where longer chain-of-thought sequences boost performance but add unnecessary computational overhead. The work categorizes methods into model-based, output-based, and prompt-based efficient reasoning, and also covers efficient training data and small language models. A sympathetic reader would care because these approaches promise to deliver strong reasoning capabilities at lower cost and latency. This could broaden the use of advanced LLMs in everyday applications.

Core claim

The paper provides the first structured survey to systematically investigate progress toward efficient reasoning in LLMs by categorizing existing works into model-based efficient reasoning, which optimizes full-length reasoning models into more concise ones or trains efficient models directly; reasoning output-based efficient reasoning, which dynamically reduces reasoning steps and length during inference; and input prompts-based efficient reasoning, which enhances efficiency based on input prompt properties such as difficulty or length control, while also introducing efficient data for training and exploring reasoning in small language models.

What carries the argument

The three-way categorization (model-based, output-based, prompt-based) that structures the survey of techniques to reduce verbose and redundant chain-of-thought outputs in LLMs.

If this is right

  • Researchers gain a map for identifying patterns and unexplored areas in efficient reasoning.
  • Model-based methods can produce LLMs that generate concise reasoning by design through optimization or new training.
  • Output-based techniques allow inference-time trimming of reasoning length to trade off accuracy against compute.
  • Prompt-based controls can steer models toward shorter paths based on task difficulty or length signals.
  • Efficient data and small-model explorations extend concise reasoning beyond the largest models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid systems that combine techniques from multiple categories could achieve greater efficiency than any single category alone.
  • Standardized metrics for measuring reasoning redundancy and cost would make cross-paper comparisons more reliable.
  • These efficiency methods may prove useful for deploying reasoning models on edge devices or in low-latency settings.
  • Future surveys could track how quickly new papers fit or expand the proposed categories.

Load-bearing premise

The three-way categorization plus sections on data and small models comprehensively covers the relevant literature without major omissions.

What would settle it

A substantial set of papers on efficient LLM reasoning that cannot be placed into the model-based, output-based, or prompt-based categories or the additional data and small-model sections.

read the original abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking. Project website: https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript is a survey on efficient reasoning in LLMs that addresses the overthinking phenomenon in extended chain-of-thought outputs from models such as OpenAI o1 and DeepSeek-R1. It organizes the literature into a three-way taxonomy of model-based methods (optimizing or training concise reasoning models), output-based methods (dynamically shortening reasoning steps at inference), and prompt-based methods (controlling efficiency via input properties), while also covering efficient training data, reasoning capabilities of small models, and evaluation/benchmarking practices.

Significance. If the taxonomy proves comprehensive, the survey supplies a timely organizing framework for an active research area focused on reducing computational cost while preserving reasoning performance. This can help consolidate disparate lines of work on CoT efficiency and guide development of practical LRMs.

major comments (1)
  1. [Abstract] Abstract: the central claim that the work is 'the first structured survey to systematically investigate and explore the current progress' is load-bearing for the contribution. The manuscript does not include an explicit comparison to prior surveys on LLM reasoning or efficiency, leaving the novelty and completeness assertions unsubstantiated.
minor comments (1)
  1. [Taxonomy] Taxonomy presentation: the boundaries between model-based, output-based, and prompt-based categories should be clarified with explicit discussion of hybrid approaches that may straddle multiple categories.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our survey's timeliness and for recommending minor revision. We address the single major comment below and will incorporate the suggested changes to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the work is 'the first structured survey to systematically investigate and explore the current progress' is load-bearing for the contribution. The manuscript does not include an explicit comparison to prior surveys on LLM reasoning or efficiency, leaving the novelty and completeness assertions unsubstantiated.

    Authors: We agree that an explicit comparison to prior surveys is needed to fully substantiate the novelty claim. While our survey is the first to systematically organize techniques specifically targeting the overthinking phenomenon in LRMs via the proposed three-way taxonomy (model-based, output-based, and prompt-based), we acknowledge that the manuscript would benefit from a direct comparison. In the revised version, we will add a dedicated subsection (or comparison table) in the introduction that contrasts our work with existing surveys on LLM reasoning (e.g., those covering Chain-of-Thought and general reasoning) and on efficiency methods. This will highlight our unique focus on computational overhead reduction while preserving performance, as well as our coverage of efficient training data, small-model reasoning, and benchmarking. We will also revise the abstract to reference this addition and, if appropriate, qualify the 'first' phrasing to emphasize scope. This change directly addresses the concern. revision: yes

Circularity Check

0 steps flagged

No circularity: survey taxonomy is an external literature organization

full rationale

This is a literature review paper with no derivations, equations, predictions, or fitted quantities of any kind. The three-way categorization (model-based, output-based, prompt-based) plus data and small-model sections is presented as an organizational framework relying on the inherent mechanisms of LLMs and citing external works for each direction. No step reduces by construction to a self-definition, a fitted input renamed as prediction, or a self-citation chain that forces the central claim. The assertion of systematic coverage is an empirical claim about the field rather than an internal reduction, so the paper is self-contained against external benchmarks with score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper introduces no new free parameters, axioms, or invented entities; it reviews prior literature on LLM reasoning efficiency.

pith-pipeline@v0.9.0 · 5593 in / 915 out tokens · 41540 ms · 2026-05-14T01:24:38.308812+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Cost.FunctionalEquation Jcost_nonneg, Jcost_pos_of_ne_one echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the 'overthinking phenomenon'. Efficient reasoning... offers practical benefits such as reduced computational costs

  • Foundation.LawOfExistence defect_zero_iff_one, nothing_cannot_exist echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    categorize existing works into... (1) model-based efficient reasoning... (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning

  • Foundation.DiscretenessForcing J_log_quadratic_approx, J_log_pos_off_zero echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    RL with Length Reward Design... length reward assigns higher scores to short, correct answers while penalizing lengthy or incorrect ones

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

    cs.CL 2026-05 unverdicted novelty 7.0

    RL improves LLM reasoning by sparse policy selection at high-entropy tokens rather than new capability learning, and a minimal RL-free method matches its gains at three orders of magnitude lower cost.

  2. Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

    cs.AI 2026-05 conditional novelty 7.0

    Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

  3. LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning

    cs.AI 2026-04 unverdicted novelty 7.0

    LLM+ASP framework enables task-agnostic nonmonotonic reasoning by having LLMs generate and self-correct ASP programs using solver feedback, outperforming SMT alternatives on diverse benchmarks.

  4. LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction

    cs.IR 2026-04 unverdicted novelty 7.0

    LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.

  5. Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

    cs.AI 2026-05 unverdicted novelty 6.0

    RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.

  6. Weighted Rules under the Stable Model Semantics

    cs.AI 2026-05 unverdicted novelty 6.0

    Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.

  7. Hint Tuning: Less Data Makes Better Reasoners

    cs.CL 2026-05 unverdicted novelty 6.0

    Hint Tuning uses an instruct model as a difficulty probe to create 1K multi-level hint examples that train reasoning models to calibrate chain-of-thought length, cutting tokens by 31.5% on average across 4B-32B models...

  8. Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training

    cs.AI 2026-05 unverdicted novelty 6.0

    ICR creates a virtual shorter distribution from shortest correct on-policy responses to regularize RL post-training toward concise yet accurate reasoning, improving the accuracy-length Pareto frontier on math and know...

  9. Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

    cs.CL 2026-05 unverdicted novelty 6.0

    RL for LLM reasoning acts as sparse policy selection at high-entropy tokens already present in the base model, enabling ReasonMaxxer—an efficient contrastive method that recovers most RL gains at three orders of magni...

  10. A Multimodal Dataset for Visually Grounded Ambiguity in Machine Translation

    cs.CL 2026-05 unverdicted novelty 6.0

    VIDA provides 2,500 visually-dependent ambiguous MT instances and LLM-judge metrics; chain-of-thought SFT improves disambiguation accuracy over standard SFT, especially out-of-distribution.

  11. When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

    cs.CL 2026-05 unverdicted novelty 6.0

    LLM accuracy on controlled procedural arithmetic drops from 61% at 5 steps to 20% at 95 steps, with failures including skipped steps, premature answers, and hallucinated operations.

  12. QuantClaw: Precision Where It Matters for OpenClaw

    cs.AI 2026-04 unverdicted novelty 6.0

    QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.

  13. HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

    cs.AI 2026-04 unverdicted novelty 6.0

    HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

  14. Pause or Fabricate? Training Language Models for Grounded Reasoning

    cs.CL 2026-04 conditional novelty 6.0

    GRIL uses stage-specific RL rewards to train LLMs to detect missing premises, pause proactively, and resume grounded reasoning after clarification, yielding up to 45% better premise detection and 30% higher task succe...

  15. CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning

    cs.CL 2026-04 unverdicted novelty 6.0

    CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.

  16. Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning

    cs.CL 2026-04 unverdicted novelty 6.0

    STACK reduces average reasoning response length by 59.9% and raises accuracy by 4.8 points over prior methods on three math benchmarks via state-aware compression, knowledge guidance, and early stopping.

  17. ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

    cs.AI 2026-04 unverdicted novelty 6.0

    ETR is a trajectory-aware reward that promotes progressive entropy reduction during CoT reasoning, integrated into GRPO to deliver higher accuracy and 67% shorter traces on tested models and benchmarks.

  18. Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

    cs.CL 2026-04 unverdicted novelty 6.0

    CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller t...

  19. How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

    cs.AI 2026-05 unverdicted novelty 5.0

    Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.

  20. DIAURec: Dual-Intent Space Representation Optimization for Recommendation

    cs.IR 2026-04 unverdicted novelty 5.0

    DIAURec unifies intent and language modeling to reconstruct and optimize representations in prototype and distribution spaces, outperforming baselines on three datasets.

  21. Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA

    cs.IR 2026-04 unverdicted novelty 3.0

    A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.

Reference graph

Works this paper leans on

251 extracted references · 251 canonical work pages · cited by 20 Pith papers · 25 internal anchors

  1. [1]

    First finish search: Efficient test-time scaling in large language models, 2025

    Aradhye Agarwal, Ayan Sengupta, and Tanmoy Chakraborty. First finish search: Efficient test-time scaling in large language models, 2025. 4

  2. [2]

    L1: Controlling how long a reasoning model thinks with reinforcement learning.arXiv preprint arXiv:2503.04697, 2025

    Pranjal Aggarwal and Sean Welleck. L1: Controlling how long a reasoning model thinks with reinforcement learning. arXiv preprint arXiv:2503.04697, 2025. 4, 6, 7

  3. [3]

    Don’t think longer, think wisely: Optimizing thinking dynamics for large reasoning models, 2025

    Sohyun An, Ruochen Wang, Tianyi Zhou, and Cho-Jui Hsieh. Don’t think longer, think wisely: Optimizing thinking dynamics for large reasoning models, 2025. 4

  4. [4]

    Claude 3.7 sonnet, 2023

    Anthropic. Claude 3.7 sonnet, 2023. Accessed: March 10, 2025. 4, 15

  5. [5]

    Training language models to reason efficiently

    Daman Arora and Andrea Zanette. Training language models to reason efficiently. arXiv preprint arXiv:2502.04463, 2025. 4, 6

  6. [6]

    Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching

    Simon A Aytes, Jinheon Baek, and Sung Ju Hwang. Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching. arXiv preprint arXiv:2503.05179, 2025. 4, 15

  7. [7]

    Activation steering for chain-of-thought compression, 2025

    Seyedarmin Azizi, Erfan Baghaei Potraghloo, and Massoud Pedram. Activation steering for chain-of-thought compression, 2025. 4 20

  8. [8]

    Scaling test-time compute with open models

    Edward Beeching, Lewis Tunstall, and Sasha Rush. Scaling test-time compute with open models. 11

  9. [9]

    Graph of thoughts: Solving elaborate problems with large language models

    Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17682–17690, 2024. 3

  10. [10]

    SPECS: Faster test-time scaling through speculative drafts, 2025

    Mert Cemri, Nived Rajaraman, Rishabh Tiwari, Xiaoxuan Liu, Kurt Keutzer, Ion Stoica, Kannan Ramchandran, Ahmad Beirami, and Ziteng Sun. SPECS: Faster test-time scaling through speculative drafts, 2025. 4

  11. [11]

    Evaluating Large Language Models Trained on Code

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021. 1

  12. [12]

    Aware first, think less: Dynamic boundary self-awareness drives extreme reasoning efficiency in large language models, 2025

    Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, and Wanxiang Che. Aware first, think less: Dynamic boundary self-awareness drives extreme reasoning efficiency in large language models, 2025. 4

  13. [13]

    Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of- thought

    Qiguang Chen, Libo Qin, Jiaqi Wang, Jingxuan Zhou, and Wanxiang Che. Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of- thought. Advances in Neural Information Processing Systems , 37:54872–54904, 2024. 4, 15

  14. [14]

    Seal: Steer- able reasoning calibration of large language models for free

    Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, and Zhangyang Wang. Seal: Steer- able reasoning calibration of large language models for free. arXiv preprint arXiv:2504.07986,

  15. [15]

    Distilling reasoning ability from large language models with adaptive thinking

    Xiaoshu Chen, Sihang Zhou, Ke Liang, and Xinwang Liu. Distilling reasoning ability from large language models with adaptive thinking. arXiv preprint arXiv:2404.09170, 2024. 4, 17

  16. [16]

    Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

    Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, et al. Do not think that much for 2+ 3=? on the overthinking of o1-like llms. arXiv preprint arXiv:2412.21187, 2024. 2, 5

  17. [17]

    arXiv preprint arXiv:2502.13842 (2025)

    Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuo- huan Wang, Yu Sun, Hua Wu, and Haifeng Wang. Inner thinking transformer: Leveraging dynamic depth scaling to foster adaptive internal thinking. arXiv preprint arXiv:2502.13842,

  18. [18]

    R-stitch: Dynamic trajectory stitching for efficient reasoning, 2025

    Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, and Bohan Zhuang. R-stitch: Dynamic trajectory stitching for efficient reasoning, 2025. 4

  19. [19]

    Verithinker: Learning to verify makes reasoning model efficient

    Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, and Xinchao Wang. Verithinker: Learning to verify makes reasoning model efficient. arXiv preprint arXiv:2505.17941, 2025. 4

  20. [20]

    Compressed chain of thought: Efficient reasoning through dense representations.arXiv preprint arXiv:2412.13171, 2024

    Jeffrey Cheng and Benjamin Van Durme. Compressed chain of thought: Efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171, 2024. 4, 10

  21. [21]

    Incentivizing dual process thinking for efficient large language model reasoning

    Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, and Zhiqiang Zhang. Incentivizing dual process thinking for efficient large language model reasoning. arXiv preprint arXiv:2505.16315, 2025. 4

  22. [22]

    Optimizing length compression in large reasoning models, 2025

    Zhengxiang Cheng, Dongping Chen, Mingyang Fu, and Tianyi Zhou. Optimizing length compression in large reasoning models, 2025. 4

  23. [23]

    Mixed distillation helps smaller language models reason better

    Li Chenglin, Qianglong Chen, Liangyue Li, Caiyu Wang, Feng Tao, Yicheng Li, Zulong Chen, and Yin Zhang. Mixed distillation helps smaller language models reason better. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1673–1690, 2024. 4, 17

  24. [24]

    Confident or seek stronger: Exploring uncertainty-based on-device llm routing from benchmarking to generalization, 2025

    Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, and Xia Hu. Confident or seek stronger: Exploring uncertainty-based on-device llm routing from benchmarking to generalization, 2025. 4, 15

  25. [25]

    Learning to route llms with confidence tokens, 2025

    Yu-Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, and Xia Hu. Learning to route llms with confidence tokens, 2025. 4, 15

  26. [26]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. 1 21

  27. [27]

    Codeforces - competitive programming platform, 2025

    Codeforces. Codeforces - competitive programming platform, 2025. Accessed: 2025-03-18. 1

  28. [28]

    Efficient selectivity and backup operators in monte-carlo tree search

    Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pages 72–83. Springer, 2006. 5

  29. [29]

    Gonzalez

    Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya De- sai, Ion Stoica, Ana Klimovic, Graham Neubig, and Joseph E. Gonzalez. The danger of overthinking: Examining the reasoning-action dilemma in agentic tasks, 2025. 4, 18

  30. [30]

    A survey on multimodal large language models for autonomous driving

    Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 958–979, 2024. 18

  31. [31]

    Stepwise perplexity-guided refinement for efficient chain-of-thought reasoning in large language models

    Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, et al. Stepwise perplexity-guided refinement for efficient chain-of-thought reasoning in large language models. arXiv preprint arXiv:2502.13260, 2025. 4

  32. [32]

    Stable reinforcement learning for efficient reasoning,

    Muzhi Dai, Shixuan Liu, and Qingyi Si. Stable reinforcement learning for efficient reasoning,

  33. [33]

    S-grpo: Early exit via reinforcement learning in reasoning models.arXiv preprint arXiv:2505.07686, 2025

    Muzhi Dai, Chenxu Yang, and Qingyi Si. S-grpo: Early exit via reinforcement learning in reasoning models. arXiv preprint arXiv:2505.07686, 2025. 4

  34. [34]

    From explicit cot to implicit cot: Learning to internalize cot step by step

    Yuntian Deng, Yejin Choi, and Stuart Shieber. From explicit cot to implicit cot: Learning to internalize cot step by step. arXiv preprint arXiv:2405.14838, 2024. 9

  35. [35]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019. 1

  36. [36]

    Do thinking tokens help or trap? towards more efficient large reasoning model, 2025

    Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, and Tao Lin. Do thinking tokens help or trap? towards more efficient large reasoning model, 2025. 4

  37. [37]

    Dujian Ding, Ankur Mallick, Shaokun Zhang, Chi Wang, Daniel Madrigal, Mirian Del Carmen Hipolito Garcia, Menglin Xia, Laks V . S. Lakshmanan, Qingyun Wu, and Victor Rühle. Best-route: Adaptive llm routing with test-time optimal compute, 2025. 4

  38. [38]

    Dynamic parallel tree search for efficient llm reasoning

    Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, et al. Dynamic parallel tree search for efficient llm reasoning. arXiv preprint arXiv:2502.16235, 2025. 4, 11, 12

  39. [39]

    A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

    Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. 18

  40. [40]

    Conciserl: Conciseness-guided reinforcement learning for efficient reasoning models

    Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, and Liangming Pan. Conciserl: Conciseness-guided reinforcement learning for efficient reasoning models. arXiv preprint arXiv:2505.17250, 2025. 4

  41. [41]

    Overclocking llm reasoning: Monitoring and controlling thinking path lengths in llms, 2025

    Roy Eisenstadt, Itamar Zimerman, and Lior Wolf. Overclocking llm reasoning: Monitoring and controlling thinking path lengths in llms, 2025. 4

  42. [42]

    Debate only when necessary: Adaptive multiagent collaboration for efficient llm reasoning, 2025

    Sugyeong Eo, Hyeonseok Moon, Evelyn Hayoon Zi, Chanjun Park, and Heuiseok Lim. Debate only when necessary: Adaptive multiagent collaboration for efficient llm reasoning, 2025. 20

  43. [43]

    arXiv preprint arXiv:2504.06514 , year=

    Chenrui Fan, Ming Li, Lichao Sun, and Tianyi Zhou. Missing premise exacerbates overthink- ing: Are reasoning models losing critical thinking skill? arXiv preprint arXiv:2504.06514,

  44. [44]

    Cothink: Token-efficient reasoning via instruct models guiding reasoning models, 2025

    Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, and Aixin Sun. Cothink: Token-efficient reasoning via instruct models guiding reasoning models, 2025. 4

  45. [45]

    Thinkless: Llm learns when to think, 2025

    Gongfan Fang, Xinyin Ma, and Xinchao Wang. Thinkless: Llm learns when to think, 2025. 4

  46. [46]

    Safemlrm: Demystifying safety in multi-modal large reasoning models

    Junfeng Fang, Yukai Wang, Ruipeng Wang, Zijun Yao, Kun Wang, An Zhang, Xiang Wang, and Tat-Seng Chua. Safemlrm: Demystifying safety in multi-modal large reasoning models. arXiv preprint arXiv:2504.08813, 2025. 20 22

  47. [47]

    Concise reasoning via reinforcement learning

    Mehdi Fatemi, Banafsheh Rafiee, Mingjie Tang, and Kartik Talamadupula. Concise reasoning via reinforcement learning. arXiv preprint arXiv:2504.05185, 2025. 4

  48. [48]

    Teaching small language models reasoning through counterfactual distillation

    Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, and Yin Zhang. Teaching small language models reasoning through counterfactual distillation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5831–5842, 2024. 4, 17

  49. [49]

    Gptq: Accurate post-training quantization for generative pre-trained transformers

    Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations. OpenReview, 2023. 2

  50. [50]

    Efficiently serving llm reasoning programs with certaindex

    Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, and Hao Zhang. Efficiently serving llm reasoning programs with certaindex. arXiv preprint arXiv:2412.20993, 2024. 4, 11, 12

  51. [51]

    Reasoning without self-doubt: More efficient chain-of-thought through certainty probing

    Yichao Fu, Junda Chen, Yonghao Zhuang, Zheyu Fu, Ion Stoica, and Hao Zhang. Reasoning without self-doubt: More efficient chain-of-thought through certainty probing. In ICLR 2025 Workshop on Foundation Models in the Wild, 2025. 4, 13

  52. [52]

    How far are we from optimal reasoning efficiency?, 2025

    Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, and Yi Wu. How far are we from optimal reasoning efficiency?, 2025. 4

  53. [53]

    Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

    Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up test-time compute with latent reasoning: A recurrent depth approach. arXiv preprint arXiv:2502.05171,

  54. [54]

    Mills, Baochun Li, and Di Niu

    Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, and Di Niu. Guided by gut: Efficient test-time scaling with reinforced intrinsic confidence, 2025. 4

  55. [55]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. 1

  56. [56]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. 1, 2, 4, 5, 6, 7, 11

  57. [57]

    Train long, think short: Curriculum learning for efficient reasoning, 2025

    Hasan Abed Al Kader Hammoud, Kumail Alhamoud, Abed Hammoud, Elie Bou-Zeid, Marzyeh Ghassemi, and Bernard Ghanem. Train long, think short: Curriculum learning for efficient reasoning, 2025. 4

  58. [58]

    Token-budget-aware llm reasoning

    Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, and Zhenting Wang. Token-budget-aware llm reasoning. arXiv preprint arXiv:2412.18547, 2024. 2, 4, 8, 9, 14, 15

  59. [59]

    Omnikv: Dynamic context selection for efficient long-context llms

    Jitai Hao, Yuke Zhu, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, and Sheng Guo. Omnikv: Dynamic context selection for efficient long-context llms. In The Thirteenth International Conference on Learning Representations, 2025. 2

  60. [60]

    Training Large Language Models to Reason in a Continuous Latent Space

    Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769, 2024. 2, 4, 10

  61. [61]

    Don’t overthink it

    Michael Hassid, Gabriel Synnaeve, Yossi Adi, and Roy Schwartz. Don’t overthink it. preferring shorter thinking chains for improved llm reasoning, 2025. 4

  62. [62]

    A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

    Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv preprint arXiv:2310.05694, 2023. 19

  63. [63]

    Smartthinker: Learning to compress and preserve reasoning by step-level length control, 2025

    Xingyang He, Xiao Ling, and Jie Liu. Smartthinker: Learning to compress and preserve reasoning by step-level length control, 2025. 4

  64. [64]

    Measuring mathematical problem solving with the math dataset

    Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. 1

  65. [65]

    Reconsidering overthinking: Penalizing internal and external redundancy in cot reasoning, 2025

    Jialiang Hong, Taihang Zhen, Kai Chen, Jiaheng Liu, Wenpeng Zhu, Jing Huo, Yang Gao, De- peng Wang, Haitao Wan, Xi Yang, Boyan Wang, and Fanyu Meng. Reconsidering overthinking: Penalizing internal and external redundancy in cot reasoning, 2025. 4 23

  66. [66]

    arXiv preprint arXiv:2504.01296 , year=

    Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, and Shiyu Chang. Thinkprune: Pruning long chain-of-thought of llms via reinforcement learning. arXiv preprint arXiv:2504.01296, 2025. 4

  67. [67]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 9

  68. [68]

    Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model

    Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. arXiv preprint arXiv:2408.09559, 2024. 20

  69. [69]

    Tree-planner: Efficient close-loop task planning with large language models

    Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, and Ping Luo. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582, 2023. 20

  70. [70]

    Efficient test-time scaling via self-calibration

    Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, and Jiaxin Huang. Efficient test-time scaling via self-calibration. arXiv preprint arXiv:2503.00031, 2025. 4, 13

  71. [71]

    Efficient reasoning for large reasoning language models via certainty-guided reflection suppression,

    Jiameng Huang, Baijiong Lin, Guhao Feng, Jierun Chen, Di He, and Lu Hou. Efficient reasoning for large reasoning language models via certainty-guided reflection suppression,

  72. [72]

    Verbosity- aware rationale reduction: Effective reduction of redundant rationale via principled criteria,

    Joonwon Jang, Jaehee Kim, Wonbin Kweon, Seonghyeon Lee, and Hwanjo Yu. Verbosity- aware rationale reduction: Effective reduction of redundant rationale via principled criteria,

  73. [73]

    Flashthink: An early exit method for efficient reasoning

    Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, and Zheng Hu. Flashthink: An early exit method for efficient reasoning. arXiv preprint arXiv:2505.13949,

  74. [74]

    Think only when you need with large hybrid-reasoning models, 2025

    Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, and Furu Wei. Think only when you need with large hybrid-reasoning models, 2025. 4

  75. [75]

    DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models

    Yuxuan Jiang, Dawei Li, and Frank Ferraro. Drp: Distilled reasoning pruning with skill-aware step decomposition for efficient large reasoning models. arXiv preprint arXiv:2505.13975,

  76. [76]

    The impact of reasoning step length on large language models

    Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, and Mengnan Du. The impact of reasoning step length on large language models. arXiv preprint arXiv:2401.04925, 2024. 4, 18

  77. [77]

    Recut: Balancing reasoning length and accuracy in llms via stepwise trails and preference optimization, 2025

    Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, and Ge Yu. Recut: Balancing reasoning length and accuracy in llms via stepwise trails and preference optimization, 2025. 4

  78. [78]

    C3ot: Generating shorter chain-of- thought without compromising effectiveness

    Yu Kang, Xianghui Sun, Liangyu Chen, and Wei Zou. C3ot: Generating shorter chain-of- thought without compromising effectiveness. arXiv preprint arXiv:2412.11664, 2024. 4, 8, 9

  79. [79]

    Assembly of experts: Linear-time construction of the chimera llm variants with emergent and adaptable behaviors

    Henrik Klagges, Robert Dahlke, Fabian Klemm, Benjamin Merkel, Daniel Klingmann, David A Reiss, and Dan Zecha. Assembly of experts: Linear-time construction of the chimera llm variants with emergent and adaptable behaviors. arXiv preprint arXiv:2506.14794, 2025. 4

  80. [80]

    Bandit based monte-carlo planning

    Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293. Springer, 2006. 5

Showing first 80 references.