Confidential VM-GPU bridge on Blackwell GPUs serializes host-device transfers and raises setup costs, causing 13-27% LLM serving throughput loss and doubled KV-cache restore latency.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
RTP-LLM is a new LLM inference engine achieving 4.7x-6.3x model loading speedup and 1.12x-2.52x throughput gains over vLLM and SGLang via disaggregated phases, multi-tier KV cache, and modular optimizations in production at Alibaba.
citing papers explorer
-
The Serialized Bridge: Understanding and Recovering LLM Serving Performance under Blackwell GPU Confidential Computing
Confidential VM-GPU bridge on Blackwell GPUs serializes host-device transfers and raises setup costs, causing 13-27% LLM serving throughput loss and doubled KV-cache restore latency.
-
RTP-LLM: High-Performance Alibaba LLM Inference Engine
RTP-LLM is a new LLM inference engine achieving 4.7x-6.3x model loading speedup and 1.12x-2.52x throughput gains over vLLM and SGLang via disaggregated phases, multi-tier KV cache, and modular optimizations in production at Alibaba.