Recognition: 3 theorem links
· Lean TheoremStop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Pith reviewed 2026-05-14 01:24 UTC · model grok-4.3
The pith
A survey organizes methods to achieve efficient reasoning in large language models by reducing overthinking.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper provides the first structured survey to systematically investigate progress toward efficient reasoning in LLMs by categorizing existing works into model-based efficient reasoning, which optimizes full-length reasoning models into more concise ones or trains efficient models directly; reasoning output-based efficient reasoning, which dynamically reduces reasoning steps and length during inference; and input prompts-based efficient reasoning, which enhances efficiency based on input prompt properties such as difficulty or length control, while also introducing efficient data for training and exploring reasoning in small language models.
What carries the argument
The three-way categorization (model-based, output-based, prompt-based) that structures the survey of techniques to reduce verbose and redundant chain-of-thought outputs in LLMs.
If this is right
- Researchers gain a map for identifying patterns and unexplored areas in efficient reasoning.
- Model-based methods can produce LLMs that generate concise reasoning by design through optimization or new training.
- Output-based techniques allow inference-time trimming of reasoning length to trade off accuracy against compute.
- Prompt-based controls can steer models toward shorter paths based on task difficulty or length signals.
- Efficient data and small-model explorations extend concise reasoning beyond the largest models.
Where Pith is reading between the lines
- Hybrid systems that combine techniques from multiple categories could achieve greater efficiency than any single category alone.
- Standardized metrics for measuring reasoning redundancy and cost would make cross-paper comparisons more reliable.
- These efficiency methods may prove useful for deploying reasoning models on edge devices or in low-latency settings.
- Future surveys could track how quickly new papers fit or expand the proposed categories.
Load-bearing premise
The three-way categorization plus sections on data and small models comprehensively covers the relevant literature without major omissions.
What would settle it
A substantial set of papers on efficient LLM reasoning that cannot be placed into the model-based, output-based, or prompt-based categories or the additional data and small-model sections.
read the original abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking. Project website: https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on efficient reasoning in LLMs that addresses the overthinking phenomenon in extended chain-of-thought outputs from models such as OpenAI o1 and DeepSeek-R1. It organizes the literature into a three-way taxonomy of model-based methods (optimizing or training concise reasoning models), output-based methods (dynamically shortening reasoning steps at inference), and prompt-based methods (controlling efficiency via input properties), while also covering efficient training data, reasoning capabilities of small models, and evaluation/benchmarking practices.
Significance. If the taxonomy proves comprehensive, the survey supplies a timely organizing framework for an active research area focused on reducing computational cost while preserving reasoning performance. This can help consolidate disparate lines of work on CoT efficiency and guide development of practical LRMs.
major comments (1)
- [Abstract] Abstract: the central claim that the work is 'the first structured survey to systematically investigate and explore the current progress' is load-bearing for the contribution. The manuscript does not include an explicit comparison to prior surveys on LLM reasoning or efficiency, leaving the novelty and completeness assertions unsubstantiated.
minor comments (1)
- [Taxonomy] Taxonomy presentation: the boundaries between model-based, output-based, and prompt-based categories should be clarified with explicit discussion of hybrid approaches that may straddle multiple categories.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our survey's timeliness and for recommending minor revision. We address the single major comment below and will incorporate the suggested changes to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the work is 'the first structured survey to systematically investigate and explore the current progress' is load-bearing for the contribution. The manuscript does not include an explicit comparison to prior surveys on LLM reasoning or efficiency, leaving the novelty and completeness assertions unsubstantiated.
Authors: We agree that an explicit comparison to prior surveys is needed to fully substantiate the novelty claim. While our survey is the first to systematically organize techniques specifically targeting the overthinking phenomenon in LRMs via the proposed three-way taxonomy (model-based, output-based, and prompt-based), we acknowledge that the manuscript would benefit from a direct comparison. In the revised version, we will add a dedicated subsection (or comparison table) in the introduction that contrasts our work with existing surveys on LLM reasoning (e.g., those covering Chain-of-Thought and general reasoning) and on efficiency methods. This will highlight our unique focus on computational overhead reduction while preserving performance, as well as our coverage of efficient training data, small-model reasoning, and benchmarking. We will also revise the abstract to reference this addition and, if appropriate, qualify the 'first' phrasing to emphasize scope. This change directly addresses the concern. revision: yes
Circularity Check
No circularity: survey taxonomy is an external literature organization
full rationale
This is a literature review paper with no derivations, equations, predictions, or fitted quantities of any kind. The three-way categorization (model-based, output-based, prompt-based) plus data and small-model sections is presented as an organizational framework relying on the inherent mechanisms of LLMs and citing external works for each direction. No step reduces by construction to a self-definition, a fitted input renamed as prediction, or a self-citation chain that forces the central claim. The assertion of systematic coverage is an empirical claim about the field rather than an internal reduction, so the paper is self-contained against external benchmarks with score 0.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
Cost.FunctionalEquationJcost_nonneg, Jcost_pos_of_ne_one echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the 'overthinking phenomenon'. Efficient reasoning... offers practical benefits such as reduced computational costs
-
Foundation.LawOfExistencedefect_zero_iff_one, nothing_cannot_exist echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
categorize existing works into... (1) model-based efficient reasoning... (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning
-
Foundation.DiscretenessForcingJ_log_quadratic_approx, J_log_pos_off_zero echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
RL with Length Reward Design... length reward assigns higher scores to short, correct answers while penalizing lengthy or incorrect ones
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
RL improves LLM reasoning by sparse policy selection at high-entropy tokens rather than new capability learning, and a minimal RL-free method matches its gains at three orders of magnitude lower cost.
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning
LLM+ASP framework enables task-agnostic nonmonotonic reasoning by having LLMs generate and self-correct ASP programs using solver feedback, outperforming SMT alternatives on diverse benchmarks.
-
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
-
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
-
Weighted Rules under the Stable Model Semantics
Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
-
Hint Tuning: Less Data Makes Better Reasoners
Hint Tuning uses an instruct model as a difficulty probe to create 1K multi-level hint examples that train reasoning models to calibrate chain-of-thought length, cutting tokens by 31.5% on average across 4B-32B models...
-
Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training
ICR creates a virtual shorter distribution from shortest correct on-policy responses to regularize RL post-training toward concise yet accurate reasoning, improving the accuracy-length Pareto frontier on math and know...
-
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
RL for LLM reasoning acts as sparse policy selection at high-entropy tokens already present in the base model, enabling ReasonMaxxer—an efficient contrastive method that recovers most RL gains at three orders of magni...
-
A Multimodal Dataset for Visually Grounded Ambiguity in Machine Translation
VIDA provides 2,500 visually-dependent ambiguous MT instances and LLM-judge metrics; chain-of-thought SFT improves disambiguation accuracy over standard SFT, especially out-of-distribution.
-
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
LLM accuracy on controlled procedural arithmetic drops from 61% at 5 steps to 20% at 95 steps, with failures including skipped steps, premature answers, and hallucinated operations.
-
QuantClaw: Precision Where It Matters for OpenClaw
QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
Pause or Fabricate? Training Language Models for Grounded Reasoning
GRIL uses stage-specific RL rewards to train LLMs to detect missing premises, pause proactively, and resume grounded reasoning after clarification, yielding up to 45% better premise detection and 30% higher task succe...
-
CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning
CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.
-
Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning
STACK reduces average reasoning response length by 59.9% and raises accuracy by 4.8 points over prior methods on three math benchmarks via state-aware compression, knowledge guidance, and early stopping.
-
ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning
ETR is a trajectory-aware reward that promotes progressive entropy reduction during CoT reasoning, integrated into GRPO to deliver higher accuracy and 67% shorter traces on tested models and benchmarks.
-
Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression
CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller t...
-
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem
Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.
-
DIAURec: Dual-Intent Space Representation Optimization for Recommendation
DIAURec unifies intent and language modeling to reconstruct and optimize representations in prototype and distribution spaces, outperforming baselines on three datasets.
-
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA
A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.
Reference graph
Works this paper leans on
-
[1]
First finish search: Efficient test-time scaling in large language models, 2025
Aradhye Agarwal, Ayan Sengupta, and Tanmoy Chakraborty. First finish search: Efficient test-time scaling in large language models, 2025. 4
work page 2025
-
[2]
Pranjal Aggarwal and Sean Welleck. L1: Controlling how long a reasoning model thinks with reinforcement learning. arXiv preprint arXiv:2503.04697, 2025. 4, 6, 7
-
[3]
Don’t think longer, think wisely: Optimizing thinking dynamics for large reasoning models, 2025
Sohyun An, Ruochen Wang, Tianyi Zhou, and Cho-Jui Hsieh. Don’t think longer, think wisely: Optimizing thinking dynamics for large reasoning models, 2025. 4
work page 2025
-
[4]
Anthropic. Claude 3.7 sonnet, 2023. Accessed: March 10, 2025. 4, 15
work page 2023
-
[5]
Training language models to reason efficiently
Daman Arora and Andrea Zanette. Training language models to reason efficiently. arXiv preprint arXiv:2502.04463, 2025. 4, 6
-
[6]
Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching
Simon A Aytes, Jinheon Baek, and Sung Ju Hwang. Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching. arXiv preprint arXiv:2503.05179, 2025. 4, 15
-
[7]
Activation steering for chain-of-thought compression, 2025
Seyedarmin Azizi, Erfan Baghaei Potraghloo, and Massoud Pedram. Activation steering for chain-of-thought compression, 2025. 4 20
work page 2025
-
[8]
Scaling test-time compute with open models
Edward Beeching, Lewis Tunstall, and Sasha Rush. Scaling test-time compute with open models. 11
-
[9]
Graph of thoughts: Solving elaborate problems with large language models
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17682–17690, 2024. 3
work page 2024
-
[10]
SPECS: Faster test-time scaling through speculative drafts, 2025
Mert Cemri, Nived Rajaraman, Rishabh Tiwari, Xiaoxuan Liu, Kurt Keutzer, Ion Stoica, Kannan Ramchandran, Ahmad Beirami, and Ziteng Sun. SPECS: Faster test-time scaling through speculative drafts, 2025. 4
work page 2025
-
[11]
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021. 1
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[12]
Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, and Wanxiang Che. Aware first, think less: Dynamic boundary self-awareness drives extreme reasoning efficiency in large language models, 2025. 4
work page 2025
-
[13]
Qiguang Chen, Libo Qin, Jiaqi Wang, Jingxuan Zhou, and Wanxiang Che. Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of- thought. Advances in Neural Information Processing Systems , 37:54872–54904, 2024. 4, 15
work page 2024
-
[14]
Seal: Steer- able reasoning calibration of large language models for free
Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, and Zhangyang Wang. Seal: Steer- able reasoning calibration of large language models for free. arXiv preprint arXiv:2504.07986,
-
[15]
Distilling reasoning ability from large language models with adaptive thinking
Xiaoshu Chen, Sihang Zhou, Ke Liang, and Xinwang Liu. Distilling reasoning ability from large language models with adaptive thinking. arXiv preprint arXiv:2404.09170, 2024. 4, 17
-
[16]
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, et al. Do not think that much for 2+ 3=? on the overthinking of o1-like llms. arXiv preprint arXiv:2412.21187, 2024. 2, 5
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
arXiv preprint arXiv:2502.13842 (2025)
Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuo- huan Wang, Yu Sun, Hua Wu, and Haifeng Wang. Inner thinking transformer: Leveraging dynamic depth scaling to foster adaptive internal thinking. arXiv preprint arXiv:2502.13842,
-
[18]
R-stitch: Dynamic trajectory stitching for efficient reasoning, 2025
Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, and Bohan Zhuang. R-stitch: Dynamic trajectory stitching for efficient reasoning, 2025. 4
work page 2025
-
[19]
Verithinker: Learning to verify makes reasoning model efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, and Xinchao Wang. Verithinker: Learning to verify makes reasoning model efficient. arXiv preprint arXiv:2505.17941, 2025. 4
-
[20]
Jeffrey Cheng and Benjamin Van Durme. Compressed chain of thought: Efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171, 2024. 4, 10
-
[21]
Incentivizing dual process thinking for efficient large language model reasoning
Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, and Zhiqiang Zhang. Incentivizing dual process thinking for efficient large language model reasoning. arXiv preprint arXiv:2505.16315, 2025. 4
-
[22]
Optimizing length compression in large reasoning models, 2025
Zhengxiang Cheng, Dongping Chen, Mingyang Fu, and Tianyi Zhou. Optimizing length compression in large reasoning models, 2025. 4
work page 2025
-
[23]
Mixed distillation helps smaller language models reason better
Li Chenglin, Qianglong Chen, Liangyue Li, Caiyu Wang, Feng Tao, Yicheng Li, Zulong Chen, and Yin Zhang. Mixed distillation helps smaller language models reason better. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1673–1690, 2024. 4, 17
work page 2024
-
[24]
Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, and Xia Hu. Confident or seek stronger: Exploring uncertainty-based on-device llm routing from benchmarking to generalization, 2025. 4, 15
work page 2025
-
[25]
Learning to route llms with confidence tokens, 2025
Yu-Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, and Xia Hu. Learning to route llms with confidence tokens, 2025. 4, 15
work page 2025
-
[26]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. 1 21
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[27]
Codeforces - competitive programming platform, 2025
Codeforces. Codeforces - competitive programming platform, 2025. Accessed: 2025-03-18. 1
work page 2025
-
[28]
Efficient selectivity and backup operators in monte-carlo tree search
Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pages 72–83. Springer, 2006. 5
work page 2006
-
[29]
Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya De- sai, Ion Stoica, Ana Klimovic, Graham Neubig, and Joseph E. Gonzalez. The danger of overthinking: Examining the reasoning-action dilemma in agentic tasks, 2025. 4, 18
work page 2025
-
[30]
A survey on multimodal large language models for autonomous driving
Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 958–979, 2024. 18
work page 2024
-
[31]
Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, et al. Stepwise perplexity-guided refinement for efficient chain-of-thought reasoning in large language models. arXiv preprint arXiv:2502.13260, 2025. 4
-
[32]
Stable reinforcement learning for efficient reasoning,
Muzhi Dai, Shixuan Liu, and Qingyi Si. Stable reinforcement learning for efficient reasoning,
-
[33]
Muzhi Dai, Chenxu Yang, and Qingyi Si. S-grpo: Early exit via reinforcement learning in reasoning models. arXiv preprint arXiv:2505.07686, 2025. 4
-
[34]
From explicit cot to implicit cot: Learning to internalize cot step by step
Yuntian Deng, Yejin Choi, and Stuart Shieber. From explicit cot to implicit cot: Learning to internalize cot step by step. arXiv preprint arXiv:2405.14838, 2024. 9
-
[35]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019. 1
work page 2019
-
[36]
Do thinking tokens help or trap? towards more efficient large reasoning model, 2025
Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, and Tao Lin. Do thinking tokens help or trap? towards more efficient large reasoning model, 2025. 4
work page 2025
-
[37]
Dujian Ding, Ankur Mallick, Shaokun Zhang, Chi Wang, Daniel Madrigal, Mirian Del Carmen Hipolito Garcia, Menglin Xia, Laks V . S. Lakshmanan, Qingyun Wu, and Victor Rühle. Best-route: Adaptive llm routing with test-time optimal compute, 2025. 4
work page 2025
-
[38]
Dynamic parallel tree search for efficient llm reasoning
Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, et al. Dynamic parallel tree search for efficient llm reasoning. arXiv preprint arXiv:2502.16235, 2025. 4, 11, 12
-
[39]
Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. 18
work page 2022
-
[40]
Conciserl: Conciseness-guided reinforcement learning for efficient reasoning models
Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, and Liangming Pan. Conciserl: Conciseness-guided reinforcement learning for efficient reasoning models. arXiv preprint arXiv:2505.17250, 2025. 4
-
[41]
Overclocking llm reasoning: Monitoring and controlling thinking path lengths in llms, 2025
Roy Eisenstadt, Itamar Zimerman, and Lior Wolf. Overclocking llm reasoning: Monitoring and controlling thinking path lengths in llms, 2025. 4
work page 2025
-
[42]
Debate only when necessary: Adaptive multiagent collaboration for efficient llm reasoning, 2025
Sugyeong Eo, Hyeonseok Moon, Evelyn Hayoon Zi, Chanjun Park, and Heuiseok Lim. Debate only when necessary: Adaptive multiagent collaboration for efficient llm reasoning, 2025. 20
work page 2025
-
[43]
arXiv preprint arXiv:2504.06514 , year=
Chenrui Fan, Ming Li, Lichao Sun, and Tianyi Zhou. Missing premise exacerbates overthink- ing: Are reasoning models losing critical thinking skill? arXiv preprint arXiv:2504.06514,
-
[44]
Cothink: Token-efficient reasoning via instruct models guiding reasoning models, 2025
Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, and Aixin Sun. Cothink: Token-efficient reasoning via instruct models guiding reasoning models, 2025. 4
work page 2025
-
[45]
Thinkless: Llm learns when to think, 2025
Gongfan Fang, Xinyin Ma, and Xinchao Wang. Thinkless: Llm learns when to think, 2025. 4
work page 2025
-
[46]
Safemlrm: Demystifying safety in multi-modal large reasoning models
Junfeng Fang, Yukai Wang, Ruipeng Wang, Zijun Yao, Kun Wang, An Zhang, Xiang Wang, and Tat-Seng Chua. Safemlrm: Demystifying safety in multi-modal large reasoning models. arXiv preprint arXiv:2504.08813, 2025. 20 22
-
[47]
Concise reasoning via reinforcement learning
Mehdi Fatemi, Banafsheh Rafiee, Mingjie Tang, and Kartik Talamadupula. Concise reasoning via reinforcement learning. arXiv preprint arXiv:2504.05185, 2025. 4
-
[48]
Teaching small language models reasoning through counterfactual distillation
Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, and Yin Zhang. Teaching small language models reasoning through counterfactual distillation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5831–5842, 2024. 4, 17
work page 2024
-
[49]
Gptq: Accurate post-training quantization for generative pre-trained transformers
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations. OpenReview, 2023. 2
work page 2023
-
[50]
Efficiently serving llm reasoning programs with certaindex
Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, and Hao Zhang. Efficiently serving llm reasoning programs with certaindex. arXiv preprint arXiv:2412.20993, 2024. 4, 11, 12
-
[51]
Reasoning without self-doubt: More efficient chain-of-thought through certainty probing
Yichao Fu, Junda Chen, Yonghao Zhuang, Zheyu Fu, Ion Stoica, and Hao Zhang. Reasoning without self-doubt: More efficient chain-of-thought through certainty probing. In ICLR 2025 Workshop on Foundation Models in the Wild, 2025. 4, 13
work page 2025
-
[52]
How far are we from optimal reasoning efficiency?, 2025
Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, and Yi Wu. How far are we from optimal reasoning efficiency?, 2025. 4
work page 2025
-
[53]
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up test-time compute with latent reasoning: A recurrent depth approach. arXiv preprint arXiv:2502.05171,
work page internal anchor Pith review Pith/arXiv arXiv
-
[54]
Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, and Di Niu. Guided by gut: Efficient test-time scaling with reinforced intrinsic confidence, 2025. 4
work page 2025
-
[55]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. 1
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. 1, 2, 4, 5, 6, 7, 11
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
Train long, think short: Curriculum learning for efficient reasoning, 2025
Hasan Abed Al Kader Hammoud, Kumail Alhamoud, Abed Hammoud, Elie Bou-Zeid, Marzyeh Ghassemi, and Bernard Ghanem. Train long, think short: Curriculum learning for efficient reasoning, 2025. 4
work page 2025
-
[58]
Token-budget-aware llm reasoning
Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, and Zhenting Wang. Token-budget-aware llm reasoning. arXiv preprint arXiv:2412.18547, 2024. 2, 4, 8, 9, 14, 15
-
[59]
Omnikv: Dynamic context selection for efficient long-context llms
Jitai Hao, Yuke Zhu, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, and Sheng Guo. Omnikv: Dynamic context selection for efficient long-context llms. In The Thirteenth International Conference on Learning Representations, 2025. 2
work page 2025
-
[60]
Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769, 2024. 2, 4, 10
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
Michael Hassid, Gabriel Synnaeve, Yossi Adi, and Roy Schwartz. Don’t overthink it. preferring shorter thinking chains for improved llm reasoning, 2025. 4
work page 2025
-
[62]
Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv preprint arXiv:2310.05694, 2023. 19
-
[63]
Smartthinker: Learning to compress and preserve reasoning by step-level length control, 2025
Xingyang He, Xiao Ling, and Jie Liu. Smartthinker: Learning to compress and preserve reasoning by step-level length control, 2025. 4
work page 2025
-
[64]
Measuring mathematical problem solving with the math dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. 1
work page 2021
-
[65]
Reconsidering overthinking: Penalizing internal and external redundancy in cot reasoning, 2025
Jialiang Hong, Taihang Zhen, Kai Chen, Jiaheng Liu, Wenpeng Zhu, Jing Huo, Yang Gao, De- peng Wang, Haitao Wan, Xi Yang, Boyan Wang, and Fanyu Meng. Reconsidering overthinking: Penalizing internal and external redundancy in cot reasoning, 2025. 4 23
work page 2025
-
[66]
arXiv preprint arXiv:2504.01296 , year=
Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, and Shiyu Chang. Thinkprune: Pruning long chain-of-thought of llms via reinforcement learning. arXiv preprint arXiv:2504.01296, 2025. 4
-
[67]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 9
work page 2022
-
[68]
Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. arXiv preprint arXiv:2408.09559, 2024. 20
-
[69]
Tree-planner: Efficient close-loop task planning with large language models
Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, and Ping Luo. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582, 2023. 20
-
[70]
Efficient test-time scaling via self-calibration
Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, and Jiaxin Huang. Efficient test-time scaling via self-calibration. arXiv preprint arXiv:2503.00031, 2025. 4, 13
-
[71]
Efficient reasoning for large reasoning language models via certainty-guided reflection suppression,
Jiameng Huang, Baijiong Lin, Guhao Feng, Jierun Chen, Di He, and Lu Hou. Efficient reasoning for large reasoning language models via certainty-guided reflection suppression,
-
[72]
Joonwon Jang, Jaehee Kim, Wonbin Kweon, Seonghyeon Lee, and Hwanjo Yu. Verbosity- aware rationale reduction: Effective reduction of redundant rationale via principled criteria,
-
[73]
Flashthink: An early exit method for efficient reasoning
Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, and Zheng Hu. Flashthink: An early exit method for efficient reasoning. arXiv preprint arXiv:2505.13949,
-
[74]
Think only when you need with large hybrid-reasoning models, 2025
Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, and Furu Wei. Think only when you need with large hybrid-reasoning models, 2025. 4
work page 2025
-
[75]
Yuxuan Jiang, Dawei Li, and Frank Ferraro. Drp: Distilled reasoning pruning with skill-aware step decomposition for efficient large reasoning models. arXiv preprint arXiv:2505.13975,
work page internal anchor Pith review Pith/arXiv arXiv
-
[76]
The impact of reasoning step length on large language models
Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, and Mengnan Du. The impact of reasoning step length on large language models. arXiv preprint arXiv:2401.04925, 2024. 4, 18
-
[77]
Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, and Ge Yu. Recut: Balancing reasoning length and accuracy in llms via stepwise trails and preference optimization, 2025. 4
work page 2025
-
[78]
C3ot: Generating shorter chain-of- thought without compromising effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen, and Wei Zou. C3ot: Generating shorter chain-of- thought without compromising effectiveness. arXiv preprint arXiv:2412.11664, 2024. 4, 8, 9
-
[79]
Henrik Klagges, Robert Dahlke, Fabian Klemm, Benjamin Merkel, Daniel Klingmann, David A Reiss, and Dan Zecha. Assembly of experts: Linear-time construction of the chimera llm variants with emergent and adaptable behaviors. arXiv preprint arXiv:2506.14794, 2025. 4
-
[80]
Bandit based monte-carlo planning
Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293. Springer, 2006. 5
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.