arxiv: 2503.16419 · v4 · submitted 2025-03-20 · 💻 cs.CL

Recognition: 3 theorem links

· Lean Theorem

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Andrew Wen, Guanchu Wang, Hanjie Chen, Hongyi Liu, Jiamu Zhang, Jiayi Yuan, Na Zou, Shaochen Zhong, Tianyi Zhang, Xia Hu, Yang Sui, Yu-Neng Chuang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 01:24 UTC · model grok-4.3

classification 💻 cs.CL

keywords efficient reasoninglarge language modelschain-of-thoughtoverthinkingsurveymodel optimizationinference efficiencysmall language models

0 comments

The pith

A survey organizes methods to achieve efficient reasoning in large language models by reducing overthinking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper is the first structured survey on achieving efficient reasoning in large language models. It addresses the overthinking phenomenon where longer chain-of-thought sequences boost performance but add unnecessary computational overhead. The work categorizes methods into model-based, output-based, and prompt-based efficient reasoning, and also covers efficient training data and small language models. A sympathetic reader would care because these approaches promise to deliver strong reasoning capabilities at lower cost and latency. This could broaden the use of advanced LLMs in everyday applications.

Core claim

The paper provides the first structured survey to systematically investigate progress toward efficient reasoning in LLMs by categorizing existing works into model-based efficient reasoning, which optimizes full-length reasoning models into more concise ones or trains efficient models directly; reasoning output-based efficient reasoning, which dynamically reduces reasoning steps and length during inference; and input prompts-based efficient reasoning, which enhances efficiency based on input prompt properties such as difficulty or length control, while also introducing efficient data for training and exploring reasoning in small language models.

What carries the argument

The three-way categorization (model-based, output-based, prompt-based) that structures the survey of techniques to reduce verbose and redundant chain-of-thought outputs in LLMs.

If this is right

Researchers gain a map for identifying patterns and unexplored areas in efficient reasoning.
Model-based methods can produce LLMs that generate concise reasoning by design through optimization or new training.
Output-based techniques allow inference-time trimming of reasoning length to trade off accuracy against compute.
Prompt-based controls can steer models toward shorter paths based on task difficulty or length signals.
Efficient data and small-model explorations extend concise reasoning beyond the largest models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid systems that combine techniques from multiple categories could achieve greater efficiency than any single category alone.
Standardized metrics for measuring reasoning redundancy and cost would make cross-paper comparisons more reliable.
These efficiency methods may prove useful for deploying reasoning models on edge devices or in low-latency settings.
Future surveys could track how quickly new papers fit or expand the proposed categories.

Load-bearing premise

The three-way categorization plus sections on data and small models comprehensively covers the relevant literature without major omissions.

What would settle it

A substantial set of papers on efficient LLM reasoning that cannot be placed into the model-based, output-based, or prompt-based categories or the additional data and small-model sections.

read the original abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved performance in System-2 reasoning domains like mathematics and programming by harnessing supervised fine-tuning (SFT) and reinforcement learning (RL) techniques to enhance the Chain-of-Thought (CoT) reasoning. However, while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the "overthinking phenomenon". In this paper, we provide the first structured survey to systematically investigate and explore the current progress toward achieving efficient reasoning in LLMs. Overall, relying on the inherent mechanism of LLMs, we categorize existing works into several key directions: (1) model-based efficient reasoning, which considers optimizing full-length reasoning models into more concise reasoning models or directly training efficient reasoning models; (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning, which seeks to enhance reasoning efficiency based on input prompt properties such as difficulty or length control. Additionally, we introduce the use of efficient data for training reasoning models, explore the reasoning capabilities of small language models, and discuss evaluation methods and benchmarking. Project website: https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful first survey that maps work on cutting overthinking in LLM reasoning via a clear taxonomy, but it adds no new methods and its value rests on coverage completeness.

read the letter

Hey, the main thing to know is that this paper delivers the first structured survey on efficient reasoning for LLMs, pulling together efforts to reduce the long, redundant chain-of-thought outputs from models like o1 and DeepSeek-R1. It groups the literature into model-based approaches (optimizing or training for shorter reasoning), output-based methods (trimming steps at inference time), and prompt-based techniques (using input properties like difficulty to control length), with extra sections on training data, small models, and evaluation benchmarks. The taxonomy is straightforward and the problem framing around the overthinking phenomenon is clear and grounded in real model examples. This kind of synthesis can help researchers get oriented quickly in a scattered area and spot where gaps exist for future work. What the paper does well is organize an emerging practical topic without overclaiming technical novelty. The structure makes sense for covering different intervention points in the reasoning pipeline, and including small-model reasoning and data considerations broadens the scope beyond just big-model tweaks. On the soft spots, the central taxonomy is defined by the authors' axes rather than an exhaustive scan, so hybrid methods or papers centered on new efficiency metrics could slip between categories or get short coverage. As a pure survey there are no new experiments or derivations to verify, which means the quality hinges on how accurately and completely the cited works are summarized. The claim of systematic coverage is plausible but would be stronger with an explicit methods section on paper selection. This paper is for people working on LLM deployment, reasoning efficiency, or related applications who want a reference map rather than a new algorithm. A reader looking for trends and entry points into the literature will get real value from it. It deserves serious peer review because the topic matters for practical systems and a solid survey can steer research productively, even if reviewers push for fuller coverage checks.

Referee Report

1 major / 1 minor

Summary. The manuscript is a survey on efficient reasoning in LLMs that addresses the overthinking phenomenon in extended chain-of-thought outputs from models such as OpenAI o1 and DeepSeek-R1. It organizes the literature into a three-way taxonomy of model-based methods (optimizing or training concise reasoning models), output-based methods (dynamically shortening reasoning steps at inference), and prompt-based methods (controlling efficiency via input properties), while also covering efficient training data, reasoning capabilities of small models, and evaluation/benchmarking practices.

Significance. If the taxonomy proves comprehensive, the survey supplies a timely organizing framework for an active research area focused on reducing computational cost while preserving reasoning performance. This can help consolidate disparate lines of work on CoT efficiency and guide development of practical LRMs.

major comments (1)

[Abstract] Abstract: the central claim that the work is 'the first structured survey to systematically investigate and explore the current progress' is load-bearing for the contribution. The manuscript does not include an explicit comparison to prior surveys on LLM reasoning or efficiency, leaving the novelty and completeness assertions unsubstantiated.

minor comments (1)

[Taxonomy] Taxonomy presentation: the boundaries between model-based, output-based, and prompt-based categories should be clarified with explicit discussion of hybrid approaches that may straddle multiple categories.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our survey's timeliness and for recommending minor revision. We address the single major comment below and will incorporate the suggested changes to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the work is 'the first structured survey to systematically investigate and explore the current progress' is load-bearing for the contribution. The manuscript does not include an explicit comparison to prior surveys on LLM reasoning or efficiency, leaving the novelty and completeness assertions unsubstantiated.

Authors: We agree that an explicit comparison to prior surveys is needed to fully substantiate the novelty claim. While our survey is the first to systematically organize techniques specifically targeting the overthinking phenomenon in LRMs via the proposed three-way taxonomy (model-based, output-based, and prompt-based), we acknowledge that the manuscript would benefit from a direct comparison. In the revised version, we will add a dedicated subsection (or comparison table) in the introduction that contrasts our work with existing surveys on LLM reasoning (e.g., those covering Chain-of-Thought and general reasoning) and on efficiency methods. This will highlight our unique focus on computational overhead reduction while preserving performance, as well as our coverage of efficient training data, small-model reasoning, and benchmarking. We will also revise the abstract to reference this addition and, if appropriate, qualify the 'first' phrasing to emphasize scope. This change directly addresses the concern. revision: yes

Circularity Check

0 steps flagged

No circularity: survey taxonomy is an external literature organization

full rationale

This is a literature review paper with no derivations, equations, predictions, or fitted quantities of any kind. The three-way categorization (model-based, output-based, prompt-based) plus data and small-model sections is presented as an organizational framework relying on the inherent mechanisms of LLMs and citing external works for each direction. No step reduces by construction to a self-definition, a fitted input renamed as prediction, or a self-citation chain that forces the central claim. The assertion of systematic coverage is an empirical claim about the field rather than an internal reduction, so the paper is self-contained against external benchmarks with score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper introduces no new free parameters, axioms, or invented entities; it reviews prior literature on LLM reasoning efficiency.

pith-pipeline@v0.9.0 · 5593 in / 915 out tokens · 41540 ms · 2026-05-14T01:24:38.308812+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation Jcost_nonneg, Jcost_pos_of_ne_one echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

while longer CoT reasoning sequences improve performance, they also introduce significant computational overhead due to verbose and redundant outputs, known as the 'overthinking phenomenon'. Efficient reasoning... offers practical benefits such as reduced computational costs
Foundation.LawOfExistence defect_zero_iff_one, nothing_cannot_exist echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

categorize existing works into... (1) model-based efficient reasoning... (2) reasoning output-based efficient reasoning, which aims to dynamically reduce reasoning steps and length during inference; (3) input prompts-based efficient reasoning
Foundation.DiscretenessForcing J_log_quadratic_approx, J_log_pos_off_zero echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

RL with Length Reward Design... length reward assigns higher scores to short, correct answers while penalizing lengthy or incorrect ones

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
cs.CL 2026-05 unverdicted novelty 7.0

RL improves LLM reasoning by sparse policy selection at high-entropy tokens rather than new capability learning, and a minimal RL-free method matches its gains at three orders of magnitude lower cost.
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
cs.AI 2026-05 conditional novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning
cs.AI 2026-04 unverdicted novelty 7.0

LLM+ASP framework enables task-agnostic nonmonotonic reasoning by having LLMs generate and self-correct ASP programs using solver feedback, outperforming SMT alternatives on diverse benchmarks.
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
cs.IR 2026-04 unverdicted novelty 7.0

LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
cs.AI 2026-05 unverdicted novelty 6.0

RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
Weighted Rules under the Stable Model Semantics
cs.AI 2026-05 unverdicted novelty 6.0

Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
Hint Tuning: Less Data Makes Better Reasoners
cs.CL 2026-05 unverdicted novelty 6.0

Hint Tuning uses an instruct model as a difficulty probe to create 1K multi-level hint examples that train reasoning models to calibrate chain-of-thought length, cutting tokens by 31.5% on average across 4B-32B models...
Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training
cs.AI 2026-05 unverdicted novelty 6.0

ICR creates a virtual shorter distribution from shortest correct on-policy responses to regularize RL post-training toward concise yet accurate reasoning, improving the accuracy-length Pareto frontier on math and know...
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
cs.CL 2026-05 unverdicted novelty 6.0

RL for LLM reasoning acts as sparse policy selection at high-entropy tokens already present in the base model, enabling ReasonMaxxer—an efficient contrastive method that recovers most RL gains at three orders of magni...
A Multimodal Dataset for Visually Grounded Ambiguity in Machine Translation
cs.CL 2026-05 unverdicted novelty 6.0

VIDA provides 2,500 visually-dependent ambiguous MT instances and LLM-judge metrics; chain-of-thought SFT improves disambiguation accuracy over standard SFT, especially out-of-distribution.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
cs.CL 2026-05 unverdicted novelty 6.0

LLM accuracy on controlled procedural arithmetic drops from 61% at 5 steps to 20% at 95 steps, with failures including skipped steps, premature answers, and hallucinated operations.
QuantClaw: Precision Where It Matters for OpenClaw
cs.AI 2026-04 unverdicted novelty 6.0

QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
cs.AI 2026-04 unverdicted novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Pause or Fabricate? Training Language Models for Grounded Reasoning
cs.CL 2026-04 conditional novelty 6.0

GRIL uses stage-specific RL rewards to train LLMs to detect missing premises, pause proactively, and resume grounded reasoning after clarification, yielding up to 45% better premise detection and 30% higher task succe...
CRISP: Compressing Redundancy in Chain-of-Thought via Intrinsic Saliency Pruning
cs.CL 2026-04 unverdicted novelty 6.0

CRISP compresses chain-of-thought by 50-60% using intrinsic attention saliency from the termination token to prune redundancy while preserving accuracy on math tasks.
Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning
cs.CL 2026-04 unverdicted novelty 6.0

STACK reduces average reasoning response length by 59.9% and raises accuracy by 4.8 points over prior methods on three math benchmarks via state-aware compression, knowledge guidance, and early stopping.
ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning
cs.AI 2026-04 unverdicted novelty 6.0

ETR is a trajectory-aware reward that promotes progressive entropy reduction during CoT reasoning, integrated into GRPO to deliver higher accuracy and 67% shorter traces on tested models and benchmarks.
Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression
cs.CL 2026-04 unverdicted novelty 6.0

CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller t...
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem
cs.AI 2026-05 unverdicted novelty 5.0

Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.
DIAURec: Dual-Intent Space Representation Optimization for Recommendation
cs.IR 2026-04 unverdicted novelty 5.0

DIAURec unifies intent and language modeling to reconstruct and optimize representations in prototype and distribution spaces, outperforming baselines on three datasets.
Data-Driven Function Calling Improvements in Large Language Model for Online Financial QA
cs.IR 2026-04 unverdicted novelty 3.0

A pipeline of dataset construction from prior work, AugFC parameter augmentation, and two-step LLM training improves function calling for financial APIs and is running in production.

Reference graph

Works this paper leans on

251 extracted references · 251 canonical work pages · cited by 20 Pith papers · 25 internal anchors

[1]

First finish search: Efficient test-time scaling in large language models, 2025

Aradhye Agarwal, Ayan Sengupta, and Tanmoy Chakraborty. First finish search: Efficient test-time scaling in large language models, 2025. 4

work page 2025
[2]

L1: Controlling how long a reasoning model thinks with reinforcement learning.arXiv preprint arXiv:2503.04697, 2025

Pranjal Aggarwal and Sean Welleck. L1: Controlling how long a reasoning model thinks with reinforcement learning. arXiv preprint arXiv:2503.04697, 2025. 4, 6, 7

work page arXiv 2025
[3]

Don’t think longer, think wisely: Optimizing thinking dynamics for large reasoning models, 2025

Sohyun An, Ruochen Wang, Tianyi Zhou, and Cho-Jui Hsieh. Don’t think longer, think wisely: Optimizing thinking dynamics for large reasoning models, 2025. 4

work page 2025
[4]

Claude 3.7 sonnet, 2023

Anthropic. Claude 3.7 sonnet, 2023. Accessed: March 10, 2025. 4, 15

work page 2023
[5]

Training language models to reason efficiently

Daman Arora and Andrea Zanette. Training language models to reason efficiently. arXiv preprint arXiv:2502.04463, 2025. 4, 6

work page arXiv 2025
[6]

Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching

Simon A Aytes, Jinheon Baek, and Sung Ju Hwang. Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching. arXiv preprint arXiv:2503.05179, 2025. 4, 15

work page arXiv 2025
[7]

Activation steering for chain-of-thought compression, 2025

Seyedarmin Azizi, Erfan Baghaei Potraghloo, and Massoud Pedram. Activation steering for chain-of-thought compression, 2025. 4 20

work page 2025
[8]

Scaling test-time compute with open models

Edward Beeching, Lewis Tunstall, and Sasha Rush. Scaling test-time compute with open models. 11

work page
[9]

Graph of thoughts: Solving elaborate problems with large language models

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17682–17690, 2024. 3

work page 2024
[10]

SPECS: Faster test-time scaling through speculative drafts, 2025

Mert Cemri, Nived Rajaraman, Rishabh Tiwari, Xiaoxuan Liu, Kurt Keutzer, Ion Stoica, Kannan Ramchandran, Ahmad Beirami, and Ziteng Sun. SPECS: Faster test-time scaling through speculative drafts, 2025. 4

work page 2025
[11]

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021. 1

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Aware first, think less: Dynamic boundary self-awareness drives extreme reasoning efficiency in large language models, 2025

Qiguang Chen, Dengyun Peng, Jinhao Liu, HuiKang Su, Jiannan Guan, Libo Qin, and Wanxiang Che. Aware first, think less: Dynamic boundary self-awareness drives extreme reasoning efficiency in large language models, 2025. 4

work page 2025
[13]

Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of- thought

Qiguang Chen, Libo Qin, Jiaqi Wang, Jingxuan Zhou, and Wanxiang Che. Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of- thought. Advances in Neural Information Processing Systems , 37:54872–54904, 2024. 4, 15

work page 2024
[14]

Seal: Steer- able reasoning calibration of large language models for free

Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, and Zhangyang Wang. Seal: Steer- able reasoning calibration of large language models for free. arXiv preprint arXiv:2504.07986,

work page arXiv
[15]

Distilling reasoning ability from large language models with adaptive thinking

Xiaoshu Chen, Sihang Zhou, Ke Liang, and Xinwang Liu. Distilling reasoning ability from large language models with adaptive thinking. arXiv preprint arXiv:2404.09170, 2024. 4, 17

work page arXiv 2024
[16]

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, et al. Do not think that much for 2+ 3=? on the overthinking of o1-like llms. arXiv preprint arXiv:2412.21187, 2024. 2, 5

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

arXiv preprint arXiv:2502.13842 (2025)

Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuo- huan Wang, Yu Sun, Hua Wu, and Haifeng Wang. Inner thinking transformer: Leveraging dynamic depth scaling to foster adaptive internal thinking. arXiv preprint arXiv:2502.13842,

work page arXiv
[18]

R-stitch: Dynamic trajectory stitching for efficient reasoning, 2025

Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, and Bohan Zhuang. R-stitch: Dynamic trajectory stitching for efficient reasoning, 2025. 4

work page 2025
[19]

Verithinker: Learning to verify makes reasoning model efficient

Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, and Xinchao Wang. Verithinker: Learning to verify makes reasoning model efficient. arXiv preprint arXiv:2505.17941, 2025. 4

work page arXiv 2025
[20]

Compressed chain of thought: Efficient reasoning through dense representations.arXiv preprint arXiv:2412.13171, 2024

Jeffrey Cheng and Benjamin Van Durme. Compressed chain of thought: Efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171, 2024. 4, 10

work page arXiv 2024
[21]

Incentivizing dual process thinking for efficient large language model reasoning

Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, and Zhiqiang Zhang. Incentivizing dual process thinking for efficient large language model reasoning. arXiv preprint arXiv:2505.16315, 2025. 4

work page arXiv 2025
[22]

Optimizing length compression in large reasoning models, 2025

Zhengxiang Cheng, Dongping Chen, Mingyang Fu, and Tianyi Zhou. Optimizing length compression in large reasoning models, 2025. 4

work page 2025
[23]

Mixed distillation helps smaller language models reason better

Li Chenglin, Qianglong Chen, Liangyue Li, Caiyu Wang, Feng Tao, Yicheng Li, Zulong Chen, and Yin Zhang. Mixed distillation helps smaller language models reason better. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1673–1690, 2024. 4, 17

work page 2024
[24]

Confident or seek stronger: Exploring uncertainty-based on-device llm routing from benchmarking to generalization, 2025

Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu, Xuanting Cai, Yang Sui, Vladimir Braverman, and Xia Hu. Confident or seek stronger: Exploring uncertainty-based on-device llm routing from benchmarking to generalization, 2025. 4, 15

work page 2025
[25]

Learning to route llms with confidence tokens, 2025

Yu-Neng Chuang, Helen Zhou, Prathusha Kameswara Sarma, Parikshit Gopalan, John Boccio, Sara Bolouki, and Xia Hu. Learning to route llms with confidence tokens, 2025. 4, 15

work page 2025
[26]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. 1 21

work page internal anchor Pith review Pith/arXiv arXiv 2021
[27]

Codeforces - competitive programming platform, 2025

Codeforces. Codeforces - competitive programming platform, 2025. Accessed: 2025-03-18. 1

work page 2025
[28]

Efficient selectivity and backup operators in monte-carlo tree search

Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games, pages 72–83. Springer, 2006. 5

work page 2006
[29]

Gonzalez

Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya De- sai, Ion Stoica, Ana Klimovic, Graham Neubig, and Joseph E. Gonzalez. The danger of overthinking: Examining the reasoning-action dilemma in agentic tasks, 2025. 4, 18

work page 2025
[30]

A survey on multimodal large language models for autonomous driving

Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 958–979, 2024. 18

work page 2024
[31]

Stepwise perplexity-guided refinement for efficient chain-of-thought reasoning in large language models

Yingqian Cui, Pengfei He, Jingying Zeng, Hui Liu, Xianfeng Tang, Zhenwei Dai, Yan Han, Chen Luo, Jing Huang, Zhen Li, et al. Stepwise perplexity-guided refinement for efficient chain-of-thought reasoning in large language models. arXiv preprint arXiv:2502.13260, 2025. 4

work page arXiv 2025
[32]

Stable reinforcement learning for efficient reasoning,

Muzhi Dai, Shixuan Liu, and Qingyi Si. Stable reinforcement learning for efficient reasoning,

work page
[33]

S-grpo: Early exit via reinforcement learning in reasoning models.arXiv preprint arXiv:2505.07686, 2025

Muzhi Dai, Chenxu Yang, and Qingyi Si. S-grpo: Early exit via reinforcement learning in reasoning models. arXiv preprint arXiv:2505.07686, 2025. 4

work page arXiv 2025
[34]

From explicit cot to implicit cot: Learning to internalize cot step by step

Yuntian Deng, Yejin Choi, and Stuart Shieber. From explicit cot to implicit cot: Learning to internalize cot step by step. arXiv preprint arXiv:2405.14838, 2024. 9

work page arXiv 2024
[35]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019. 1

work page 2019
[36]

Do thinking tokens help or trap? towards more efficient large reasoning model, 2025

Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, and Tao Lin. Do thinking tokens help or trap? towards more efficient large reasoning model, 2025. 4

work page 2025
[37]

Dujian Ding, Ankur Mallick, Shaokun Zhang, Chi Wang, Daniel Madrigal, Mirian Del Carmen Hipolito Garcia, Menglin Xia, Laks V . S. Lakshmanan, Qingyun Wu, and Victor Rühle. Best-route: Adaptive llm routing with test-time optimal compute, 2025. 4

work page 2025
[38]

Dynamic parallel tree search for efficient llm reasoning

Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, et al. Dynamic parallel tree search for efficient llm reasoning. arXiv preprint arXiv:2502.16235, 2025. 4, 11, 12

work page arXiv 2025
[39]

A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022. 18

work page 2022
[40]

Conciserl: Conciseness-guided reinforcement learning for efficient reasoning models

Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, and Liangming Pan. Conciserl: Conciseness-guided reinforcement learning for efficient reasoning models. arXiv preprint arXiv:2505.17250, 2025. 4

work page arXiv 2025
[41]

Overclocking llm reasoning: Monitoring and controlling thinking path lengths in llms, 2025

Roy Eisenstadt, Itamar Zimerman, and Lior Wolf. Overclocking llm reasoning: Monitoring and controlling thinking path lengths in llms, 2025. 4

work page 2025
[42]

Debate only when necessary: Adaptive multiagent collaboration for efficient llm reasoning, 2025

Sugyeong Eo, Hyeonseok Moon, Evelyn Hayoon Zi, Chanjun Park, and Heuiseok Lim. Debate only when necessary: Adaptive multiagent collaboration for efficient llm reasoning, 2025. 20

work page 2025
[43]

arXiv preprint arXiv:2504.06514 , year=

Chenrui Fan, Ming Li, Lichao Sun, and Tianyi Zhou. Missing premise exacerbates overthink- ing: Are reasoning models losing critical thinking skill? arXiv preprint arXiv:2504.06514,

work page arXiv
[44]

Cothink: Token-efficient reasoning via instruct models guiding reasoning models, 2025

Siqi Fan, Peng Han, Shuo Shang, Yequan Wang, and Aixin Sun. Cothink: Token-efficient reasoning via instruct models guiding reasoning models, 2025. 4

work page 2025
[45]

Thinkless: Llm learns when to think, 2025

Gongfan Fang, Xinyin Ma, and Xinchao Wang. Thinkless: Llm learns when to think, 2025. 4

work page 2025
[46]

Safemlrm: Demystifying safety in multi-modal large reasoning models

Junfeng Fang, Yukai Wang, Ruipeng Wang, Zijun Yao, Kun Wang, An Zhang, Xiang Wang, and Tat-Seng Chua. Safemlrm: Demystifying safety in multi-modal large reasoning models. arXiv preprint arXiv:2504.08813, 2025. 20 22

work page arXiv 2025
[47]

Concise reasoning via reinforcement learning

Mehdi Fatemi, Banafsheh Rafiee, Mingjie Tang, and Kartik Talamadupula. Concise reasoning via reinforcement learning. arXiv preprint arXiv:2504.05185, 2025. 4

work page arXiv 2025
[48]

Teaching small language models reasoning through counterfactual distillation

Tao Feng, Yicheng Li, Li Chenglin, Hao Chen, Fei Yu, and Yin Zhang. Teaching small language models reasoning through counterfactual distillation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5831–5842, 2024. 4, 17

work page 2024
[49]

Gptq: Accurate post-training quantization for generative pre-trained transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations. OpenReview, 2023. 2

work page 2023
[50]

Efficiently serving llm reasoning programs with certaindex

Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, and Hao Zhang. Efficiently serving llm reasoning programs with certaindex. arXiv preprint arXiv:2412.20993, 2024. 4, 11, 12

work page arXiv 2024
[51]

Reasoning without self-doubt: More efficient chain-of-thought through certainty probing

Yichao Fu, Junda Chen, Yonghao Zhuang, Zheyu Fu, Ion Stoica, and Hao Zhang. Reasoning without self-doubt: More efficient chain-of-thought through certainty probing. In ICLR 2025 Workshop on Foundation Models in the Wild, 2025. 4, 13

work page 2025
[52]

How far are we from optimal reasoning efficiency?, 2025

Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, and Yi Wu. How far are we from optimal reasoning efficiency?, 2025. 4

work page 2025
[53]

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, and Tom Goldstein. Scaling up test-time compute with latent reasoning: A recurrent depth approach. arXiv preprint arXiv:2502.05171,

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Mills, Baochun Li, and Di Niu

Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, and Di Niu. Guided by gut: Efficient test-time scaling with reinforced intrinsic confidence, 2025. 4

work page 2025
[55]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. 1

work page internal anchor Pith review Pith/arXiv arXiv 2024
[56]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. 1, 2, 4, 5, 6, 7, 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

Train long, think short: Curriculum learning for efficient reasoning, 2025

Hasan Abed Al Kader Hammoud, Kumail Alhamoud, Abed Hammoud, Elie Bou-Zeid, Marzyeh Ghassemi, and Bernard Ghanem. Train long, think short: Curriculum learning for efficient reasoning, 2025. 4

work page 2025
[58]

Token-budget-aware llm reasoning

Tingxu Han, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen, and Zhenting Wang. Token-budget-aware llm reasoning. arXiv preprint arXiv:2412.18547, 2024. 2, 4, 8, 9, 14, 15

work page arXiv 2024
[59]

Omnikv: Dynamic context selection for efficient long-context llms

Jitai Hao, Yuke Zhu, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, and Sheng Guo. Omnikv: Dynamic context selection for efficient long-context llms. In The Thirteenth International Conference on Learning Representations, 2025. 2

work page 2025
[60]

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769, 2024. 2, 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2024
[61]

Don’t overthink it

Michael Hassid, Gabriel Synnaeve, Yossi Adi, and Roy Schwartz. Don’t overthink it. preferring shorter thinking chains for improved llm reasoning, 2025. 4

work page 2025
[62]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv preprint arXiv:2310.05694, 2023. 19

work page arXiv 2023
[63]

Smartthinker: Learning to compress and preserve reasoning by step-level length control, 2025

Xingyang He, Xiao Ling, and Jie Liu. Smartthinker: Learning to compress and preserve reasoning by step-level length control, 2025. 4

work page 2025
[64]

Measuring mathematical problem solving with the math dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. 1

work page 2021
[65]

Reconsidering overthinking: Penalizing internal and external redundancy in cot reasoning, 2025

Jialiang Hong, Taihang Zhen, Kai Chen, Jiaheng Liu, Wenpeng Zhu, Jing Huo, Yang Gao, De- peng Wang, Haitao Wan, Xi Yang, Boyan Wang, and Fanyu Meng. Reconsidering overthinking: Penalizing internal and external redundancy in cot reasoning, 2025. 4 23

work page 2025
[66]

arXiv preprint arXiv:2504.01296 , year=

Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, and Shiyu Chang. Thinkprune: Pruning long chain-of-thought of llms via reinforcement learning. arXiv preprint arXiv:2504.01296, 2025. 4

work page arXiv 2025
[67]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 9

work page 2022
[68]

Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. arXiv preprint arXiv:2408.09559, 2024. 20

work page arXiv 2024
[69]

Tree-planner: Efficient close-loop task planning with large language models

Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, and Ping Luo. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582, 2023. 20

work page arXiv 2023
[70]

Efficient test-time scaling via self-calibration

Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, and Jiaxin Huang. Efficient test-time scaling via self-calibration. arXiv preprint arXiv:2503.00031, 2025. 4, 13

work page arXiv 2025
[71]

Efficient reasoning for large reasoning language models via certainty-guided reflection suppression,

Jiameng Huang, Baijiong Lin, Guhao Feng, Jierun Chen, Di He, and Lu Hou. Efficient reasoning for large reasoning language models via certainty-guided reflection suppression,

work page
[72]

Verbosity- aware rationale reduction: Effective reduction of redundant rationale via principled criteria,

Joonwon Jang, Jaehee Kim, Wonbin Kweon, Seonghyeon Lee, and Hwanjo Yu. Verbosity- aware rationale reduction: Effective reduction of redundant rationale via principled criteria,

work page
[73]

Flashthink: An early exit method for efficient reasoning

Guochao Jiang, Guofeng Quan, Zepeng Ding, Ziqin Luo, Dixuan Wang, and Zheng Hu. Flashthink: An early exit method for efficient reasoning. arXiv preprint arXiv:2505.13949,

work page arXiv
[74]

Think only when you need with large hybrid-reasoning models, 2025

Lingjie Jiang, Xun Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi, Li Dong, Xingxing Zhang, Tengchao Lv, Lei Cui, and Furu Wei. Think only when you need with large hybrid-reasoning models, 2025. 4

work page 2025
[75]

DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models

Yuxuan Jiang, Dawei Li, and Frank Ferraro. Drp: Distilled reasoning pruning with skill-aware step decomposition for efficient large reasoning models. arXiv preprint arXiv:2505.13975,

work page internal anchor Pith review Pith/arXiv arXiv
[76]

The impact of reasoning step length on large language models

Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, and Mengnan Du. The impact of reasoning step length on large language models. arXiv preprint arXiv:2401.04925, 2024. 4, 18

work page arXiv 2024
[77]

Recut: Balancing reasoning length and accuracy in llms via stepwise trails and preference optimization, 2025

Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, and Ge Yu. Recut: Balancing reasoning length and accuracy in llms via stepwise trails and preference optimization, 2025. 4

work page 2025
[78]

C3ot: Generating shorter chain-of- thought without compromising effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen, and Wei Zou. C3ot: Generating shorter chain-of- thought without compromising effectiveness. arXiv preprint arXiv:2412.11664, 2024. 4, 8, 9

work page arXiv 2024
[79]

Assembly of experts: Linear-time construction of the chimera llm variants with emergent and adaptable behaviors

Henrik Klagges, Robert Dahlke, Fabian Klemm, Benjamin Merkel, Daniel Klingmann, David A Reiss, and Dan Zecha. Assembly of experts: Linear-time construction of the chimera llm variants with emergent and adaptable behaviors. arXiv preprint arXiv:2506.14794, 2025. 4

work page arXiv 2025
[80]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293. Springer, 2006. 5

work page 2006

Showing first 80 references.