Meta-Reasoner applies contextual multi-armed bandits to adaptively select LLM reasoning strategies at inference time, reporting 9-12% accuracy gains and 28-35% time reductions on math and science benchmarks.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Meta-Reasoner applies contextual multi-armed bandits to adaptively select LLM reasoning strategies at inference time, reporting 9-12% accuracy gains and 28-35% time reductions on math and science benchmarks.