MoE-Prefill achieves 1.35-1.59x higher throughput for prefill-only MoE serving by using asynchronous expert parallelism to overlap weight AllGather with computation and prefix-aware routing with true-FLOPs tracking.
Accelerating distributed {MoE} training and inference with lina
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
SprayCheck detects single-link gray failures with 1.5% drop rates in one iteration and 0.5% in five iterations for Llama-3 70B training in a 64-spine topology by passively observing adaptive routing traffic patterns.
Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.
citing papers explorer
-
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
MoE-Prefill achieves 1.35-1.59x higher throughput for prefill-only MoE serving by using asynchronous expert parallelism to overlap weight AllGather with computation and prefix-aware routing with true-FLOPs tracking.
-
SprayCheck: Finding Gray Failures in Adaptive Routing Networks
SprayCheck detects single-link gray failures with 1.5% drop rates in one iteration and 0.5% in five iterations for Llama-3 70B training in a 64-spine topology by passively observing adaptive routing traffic patterns.
-
Eliminating Hidden Serialization in Multi-Node Megakernel Communication
Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.