RTP-LLM is a new LLM inference engine achieving 4.7x-6.3x model loading speedup and 1.12x-2.52x throughput gains over vLLM and SGLang via disaggregated phases, multi-tier KV cache, and modular optimizations in production at Alibaba.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.OS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RTP-LLM: High-Performance Alibaba LLM Inference Engine
RTP-LLM is a new LLM inference engine achieving 4.7x-6.3x model loading speedup and 1.12x-2.52x throughput gains over vLLM and SGLang via disaggregated phases, multi-tier KV cache, and modular optimizations in production at Alibaba.