Think only when you need with large hybrid-reasoning models.arXiv preprint arXiv:2505.14631

Think only when you need with large hybrid-reasoning models , author= · 2025 · arXiv 2505.14631

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

TIME: Temporally Intelligent Meta-reasoning Engine for Context-Triggered Explicit Reasoning

cs.LG · 2026-01-08 · unverdicted · novelty 7.0

TIME trains LLMs to trigger compact, context-triggered reasoning via time tags and tick events, improving TIMEBench scores while cutting explicit reasoning tokens by an order of magnitude.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.

Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation

cs.CL · 2025-07-21 · unverdicted · novelty 6.0

EviOmni unifies evidence reasoning and extraction in a single RL trajectory with token masking and verifiable rewards for answer, length, and format to produce compact high-quality evidence for RAG.

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

cs.CL · 2025-07-20

citing papers explorer

Showing 7 of 7 citing papers.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 52
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
TIME: Temporally Intelligent Meta-reasoning Engine for Context-Triggered Explicit Reasoning cs.LG · 2026-01-08 · unverdicted · none · ref 9
TIME trains LLMs to trigger compact, context-triggered reasoning via time tags and tick events, improving TIMEBench scores while cutting explicit reasoning tokens by an order of magnitude.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 40
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation cs.AI · 2026-04-09 · unverdicted · none · ref 19
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation cs.CL · 2025-07-21 · unverdicted · none · ref 1
EviOmni unifies evidence reasoning and extraction in a single RL trajectory with token masking and verifiable rewards for answer, length, and format to produce compact high-quality evidence for RAG.
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions cs.LG · 2026-05-20 · unverdicted · none · ref 33
Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.
MUR: Momentum Uncertainty guided Reasoning for Large Language Models cs.CL · 2025-07-20 · unreviewed · ref 6

Think only when you need with large hybrid-reasoning models.arXiv preprint arXiv:2505.14631

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer