pith. sign in

Adaserve: Accelerating multi-slo llm serving with slo-customized speculative decoding.arXiv preprint arXiv:2501.12162

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 2 method 1

citation-polarity summary

fields

cs.DC 5 cs.IR 1

years

2026 5 2025 1

polarities

background 3

clear filters

representative citing papers

Regulating Branch Parallelism in LLM Serving

cs.DC · 2026-05-07 · unverdicted · novelty 7.0

TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Regulating Branch Parallelism in LLM Serving cs.DC · 2026-05-07 · unverdicted · none · ref 42

    TAPER regulates LLM branch parallelism by admitting extra branches opportunistically when predicted externality fits slack, delivering 1.48-1.77x higher goodput than eager or fixed-cap baselines on Qwen3-32B while keeping over 95% SLO attainment.

  • FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving cs.DC · 2026-04-22 · unverdicted · none · ref 27

    FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.