ELDR reduces median TPOT by 5.9-13.9% in PD-disaggregated MoE serving by routing decode requests via prefill-derived expert signatures and K-means locality partitioning over load-balancing baselines.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
SwarmX deploys scheduling-specific neural predictors and a scheduler-agent framework to reduce tail latency by up to 61.5% and double throughput in agentic AI systems on large GPU-CPU clusters.
citing papers explorer
-
ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving
ELDR reduces median TPOT by 5.9-13.9% in PD-disaggregated MoE serving by routing decode requests via prefill-derived expert signatures and K-means locality partitioning over load-balancing baselines.
-
SwarmX: Agentic Scheduling for Low-Latency Agentic Systems
SwarmX deploys scheduling-specific neural predictors and a scheduler-agent framework to reduce tail latency by up to 61.5% and double throughput in agentic AI systems on large GPU-CPU clusters.