hub

O1-pruner: Length-harmonizing fine-tuning for o1-like reasoning pruning

Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao · 2025 · arXiv 2501.12570

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LEAD uses online adaptive mechanisms including Potential-Scaled Instability and symmetric efficiency rewards based on correct rollouts to achieve higher accuracy-efficiency scores with substantially shorter reasoning outputs than base models on math benchmarks.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

TrigReason matches large reasoning model accuracy on math and science benchmarks by delegating most steps to small models and intervening selectively on three triggers, cutting latency by 43.9% and cost by 73.3%.

STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

STOP uses structured on-policy analysis to prune long reasoning traces to their earliest correct node, cutting token usage 19-42% with little accuracy loss on math benchmarks.

Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.

Hint Tuning: Less Data Makes Better Reasoners

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Hint Tuning uses an instruct model as a difficulty probe to create 1K multi-level hint examples that train reasoning models to calibrate chain-of-thought length, cutting tokens by 31.5% on average across 4B-32B models without accuracy loss.

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

ETR is a trajectory-aware reward that promotes progressive entropy reduction during CoT reasoning, integrated into GRPO to deliver higher accuracy and 67% shorter traces on tested models and benchmarks.

Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

cs.CL · 2026-04-05 · unverdicted · novelty 6.0

CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller trustworthiness loss.

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

Errors in large reasoning models form a forest structure that grows with more steps, making the first solution best; RED refines the first and prunes the rest for higher performance with less compute.

Memory in the Age of AI Agents

cs.CL · 2025-12-15 · unverdicted · novelty 6.0

The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

cs.CL · 2025-03-20 · accept · novelty 5.0

A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

citing papers explorer

Showing 12 of 12 citing papers.

LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models cs.LG · 2026-05-10 · unverdicted · none · ref 15
LEAD uses online adaptive mechanisms including Potential-Scaled Instability and symmetric efficiency rewards based on correct rollouts to achieve higher accuracy-efficiency scores with substantially shorter reasoning outputs than base models on math benchmarks.
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 70
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models cs.AI · 2026-04-16 · unverdicted · none · ref 10
TrigReason matches large reasoning model accuracy on math and science benchmarks by delegating most steps to small models and intervening selectively on three triggers, cutting latency by 43.9% and cost by 73.3%.
STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes cs.CL · 2026-05-13 · unverdicted · none · ref 64
STOP uses structured on-policy analysis to prune long reasoning traces to their earliest correct node, cutting token usage 19-42% with little accuracy loss on math benchmarks.
Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 28
BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.
Hint Tuning: Less Data Makes Better Reasoners cs.CL · 2026-05-09 · unverdicted · none · ref 15
Hint Tuning uses an instruct model as a difficulty probe to create 1K multi-level hint examples that train reasoning models to calibrate chain-of-thought length, cutting tokens by 31.5% on average across 4B-32B models without accuracy loss.
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models cs.LG · 2026-04-20 · unverdicted · none · ref 50
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning cs.AI · 2026-04-07 · unverdicted · none · ref 3
ETR is a trajectory-aware reward that promotes progressive entropy reduction during CoT reasoning, integrated into GRPO to deliver higher accuracy and 67% shorter traces on tested models and benchmarks.
Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression cs.CL · 2026-04-05 · unverdicted · none · ref 10
CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller trustworthiness loss.
FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models cs.AI · 2026-04-03 · unverdicted · none · ref 8
Errors in large reasoning models form a forest structure that grows with more steps, making the first solution best; RED refines the first and prunes the rest for higher performance with less compute.
Memory in the Age of AI Agents cs.CL · 2025-12-15 · unverdicted · none · ref 152
The paper maps agent memory research via three forms (token-level, parametric, latent), three functions (factual, experiential, working), and dynamics of formation/evolution/retrieval, plus benchmarks and future directions.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models cs.CL · 2025-03-20 · accept · none · ref 127
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

O1-pruner: Length-harmonizing fine-tuning for o1-like reasoning pruning

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer