TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.
Fairbatching: Fairness-aware batch formation for llm inference
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Measurement study finds LLM serving systems sacrifice 60-93% throughput to meet human-centric TTFT/TPOT SLOs unnecessary for programmatic long-horizon tasks.
citing papers explorer
-
Regulating Branch Parallelism in LLM Serving
TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.
-
Human-Less LLM Serving: Quantifying the Human Tax on Throughput
Measurement study finds LLM serving systems sacrifice 60-93% throughput to meet human-centric TTFT/TPOT SLOs unnecessary for programmatic long-horizon tasks.