Switchless topologies such as 3D full-mesh are 20.6-56.2% more cost-effective than scale-up networks for MoE LLM serving, with current link bandwidths over-provisioned by up to 27%.
Eagle: Speculative sampling requires rethinking feature uncertainty
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
STAR cuts P99 TPOT by 75.1% and raises goodput 2.63x via a lightweight hidden-state length predictor and dynamic decode rescheduling that combines current and predicted loads.
citing papers explorer
-
Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving
Switchless topologies such as 3D full-mesh are 20.6-56.2% more cost-effective than scale-up networks for MoE LLM serving, with current link bandwidths over-provisioned by up to 27%.
-
STAR: Decode-Phase Rescheduling for LLM Inference
STAR cuts P99 TPOT by 75.1% and raises goodput 2.63x via a lightweight hidden-state length predictor and dynamic decode rescheduling that combines current and predicted loads.