Boute: Cost-efficient llm serving with heterogeneous llms and gpus via multi-objective bayesian optimization

Youhe Jiang, Fangcheng Fu, Eiko Yoneki · 2026 · arXiv 2602.10729

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

TurboServe: Serving Streaming Video Generation Efficiently and Economically

cs.DC · 2026-06-17 · unverdicted · novelty 7.0

TurboServe introduces the first serving system for streaming video generation workloads, using migration-aware placement and load-driven autoscaling to cut worst-case latency by 37.5% and GPU cost by 37.2%.

HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling

cs.DC · 2026-05-15 · unverdicted · novelty 7.0

HexAGenT reduces the SLO scale required for timely agentic LLM workflow completion by an average of 20.1% at 95% attainment and 33.0% at 99% attainment on heterogeneous A100/H100/H200 clusters.

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

cs.DC · 2026-05-05 · unverdicted · novelty 7.0

Coral cuts multi-LLM serving costs by up to 2.79x and raises goodput by up to 2.39x on heterogeneous GPUs through adaptive joint optimization and a lossless two-stage decomposition that solves quickly.

Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.

HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware

cs.DC · 2026-05-08 · unverdicted · novelty 6.0

HexiSeq optimizes sequence and head partitioning across mixed GPUs to improve long-context LLM training throughput by up to 1.72x in simulations.

citing papers explorer

Showing 5 of 5 citing papers.

TurboServe: Serving Streaming Video Generation Efficiently and Economically cs.DC · 2026-06-17 · unverdicted · none · ref 23
TurboServe introduces the first serving system for streaming video generation workloads, using migration-aware placement and load-driven autoscaling to cut worst-case latency by 37.5% and GPU cost by 37.2%.
HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling cs.DC · 2026-05-15 · unverdicted · none · ref 18
HexAGenT reduces the SLO scale required for timely agentic LLM workflow completion by an average of 20.1% at 95% attainment and 33.0% at 99% attainment on heterogeneous A100/H100/H200 clusters.
Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs cs.DC · 2026-05-05 · unverdicted · none · ref 20
Coral cuts multi-LLM serving costs by up to 2.79x and raises goodput by up to 2.39x on heterogeneous GPUs through adaptive joint optimization and a lossless two-stage decomposition that solves quickly.
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics cs.DC · 2026-04-08 · unverdicted · none · ref 48
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware cs.DC · 2026-05-08 · unverdicted · none · ref 26
HexiSeq optimizes sequence and head partitioning across mixed GPUs to improve long-context LLM training throughput by up to 1.72x in simulations.

Boute: Cost-efficient llm serving with heterogeneous llms and gpus via multi-objective bayesian optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer