TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks

· 2026 · cs.AI · arXiv 2601.10245

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical steps$\unicode{x2013}$those likely to derail the solution$\unicode{x2013}$to larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We develop several routing strategies within TRIM, ranging from a simple threshold-based policy to more expressive policies that reason about long-horizon accuracy-cost trade-offs and uncertainty in step-level correctness estimates. On MATH-500, even the simplest thresholding strategy surpasses prior routing methods with 5x higher cost efficiency, while more advanced policies match the strong, expensive model's performance using 80% fewer expensive model tokens. On harder benchmarks such as AIME, TRIM achieves up to 6x higher cost efficiency. All methods generalize effectively across math reasoning tasks, demonstrating that step-level difficulty represents fundamental characteristics of reasoning.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing

cs.LG · 2026-05-14 · accept · novelty 7.0 · 2 refs

TwinRouterBench supplies 970 execution-verified router prefixes across five datasets plus a live harness for 100 held-out SWE-bench cases, scoring routers on tier accuracy, trajectory success, and realized token cost without LLM judges.

A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.

citing papers explorer

Showing 2 of 2 citing papers.

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing cs.LG · 2026-05-14 · accept · none · ref 10 · 2 links · internal anchor
TwinRouterBench supplies 970 execution-verified router prefixes across five datasets plus a live harness for 100 held-out SWE-bench cases, scoring routers on tier accuracy, trajectory success, and realized token cost without LLM judges.
A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability cs.LG · 2026-05-09 · unverdicted · none · ref 50 · internal anchor
LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.

TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer