PROTEUS: SLA-aware routing via Lagrangian RL for multi-LLM serving systems

Amit Singh Bhatti, Vishal Vaddina, Dagnachew Birru · 2026 · arXiv 2601.19402

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

SeqRoute applies offline RL with CQL and Hindsight Budget Relabeling to sequential LLM routing under global budgets, claiming 6.0-73.5% cost reduction, maintained or improved quality, and under 1% bankruptcy rate.

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

cs.LG · 2026-03-22 · unverdicted · novelty 5.0

The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project cs.LG · 2026-03-22 · unverdicted · none · ref 104
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.

PROTEUS: SLA-aware routing via Lagrangian RL for multi-LLM serving systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer