Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
European conference on machine learning , pages=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, yielding higher throughput, concurrency, and training efficiency than comparable linear-complexity models on language tasks.
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Math-Shepherd is an automatically trained process reward model that scores solution steps to verify and reinforce LLMs, lifting Mistral-7B from 77.9% to 89.1% on GSM8K and 28.6% to 43.5% on MATH.
citing papers explorer
-
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
-
Structured Recurrent Mixers for Massively Parallelized Sequence Generation
Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, yielding higher throughput, concurrency, and training efficiency than comparable linear-complexity models on language tasks.