LLM routers across 21 methods on 5 benchmarks converge to similar accuracy below oracle due to learning global performance trends rather than fine-grained query signals.
arXiv preprint arXiv:2602.03478 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
SeqRoute applies offline RL with CQL and Hindsight Budget Relabeling to sequential LLM routing under global budgets, claiming 6.0-73.5% cost reduction, maintained or improved quality, and under 1% bankruptcy rate.
DARS replaces single-shot response labels with distribution-aware supervision derived from input and output uncertainty to produce more reliable LLM routing policies.
citing papers explorer
-
The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers
LLM routers across 21 methods on 5 benchmarks converge to similar accuracy below oracle due to learning global performance trends rather than fine-grained query signals.
-
SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning
SeqRoute applies offline RL with CQL and Hindsight Budget Relabeling to sequential LLM routing under global budgets, claiming 6.0-73.5% cost reduction, maintained or improved quality, and under 1% bankruptcy rate.
-
From Sampled Outcomes to Capability Distributions: Rethinking Supervision for LLM Routing
DARS replaces single-shot response labels with distribution-aware supervision derived from input and output uncertainty to produce more reliable LLM routing policies.