pith. sign in

Efficient interactive llm serving with proxy model-based sequence length prediction

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.DC 6 cs.LG 2

years

2026 6 2025 2

roles

background 1

polarities

background 1

representative citing papers

STAR: Decode-Phase Rescheduling for LLM Inference

cs.DC · 2025-10-15 · unverdicted · novelty 5.0

STAR cuts P99 TPOT by 75.1% and raises goodput 2.63x via a lightweight hidden-state length predictor and dynamic decode rescheduling that combines current and predicted loads.

citing papers explorer

Showing 8 of 8 citing papers.