Canonical reference

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi

Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He · 2022 · arXiv 2201.05596

Canonical reference. 100% of citing Pith papers cite this work as background.

11 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 11 citing papers

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

PCCL: Process Group-Aware Scalable and Generic Collective Algorithm Synthesizer

cs.DC · 2026-06-05 · unverdicted · novelty 7.0

PCCL synthesizes near-optimal topology-aware collective algorithms for arbitrary patterns while being process group-aware and scalable to subsets of devices.

Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference

cs.DC · 2026-05-11 · unverdicted · novelty 7.0

EEP makes wide expert-parallel MoE serving survive single-rank failures with an 11s recovery pause, 8s reintegration pause, and throughput restored to 95% of pre-fault level within 52s while staying within 4.4% of a fixed-membership baseline in steady state.

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

AsyMoE adds hyperbolic geometry for cross-modal hierarchies and evidence-priority experts to address vision-language asymmetry in LVLMs, reporting 1.5% average gains and 25.45% fewer active parameters.

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

cs.DC · 2026-05-09 · unverdicted · novelty 6.0

MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism

cs.DC · 2026-05-06 · unverdicted · novelty 6.0

Piper introduces resource modeling and pipelined hybrid parallelism for MoE training, delivering 2-3.5X higher MFU than prior frameworks and 1.2-9X better all-to-all bandwidth.

Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

Profiling shows persistent expert load imbalance and domain-specific activation patterns in large MoE models; workload-aware grouping and placement reduce all-to-all communication volume by up to 20x.

Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource

cs.CL · 2025-06-13 · conditional · novelty 6.0

MoE models with activation rates in an optimal region outperform dense LLMs of identical total parameter count, training compute, and data budget, with the optimal region consistent across scales.

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

TIDE schedules I/O-aware expert offloading for MoE diffusion LLMs by solving for an optimal refresh interval that exploits temporal stability of activations, yielding up to 1.5x throughput gain losslessly.

Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

cs.CV · 2026-05-04 · unverdicted · novelty 4.0

Mamoda2.5 is a 25B-parameter DiT-MoE unified AR-Diffusion model that reaches top video generation and editing benchmarks with 4-step inference up to 95.9x faster than baselines.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference cs.DC · 2026-05-11 · unverdicted · none · ref 23
EEP makes wide expert-parallel MoE serving survive single-rank failures with an 11s recovery pause, 8s reintegration pause, and throughput restored to 95% of pre-fault level within 52s while staying within 4.4% of a fixed-membership baseline in steady state.
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models cs.DC · 2026-04-08 · unverdicted · none · ref 30
InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production cs.DC · 2026-05-09 · unverdicted · none · ref 44
MegaScale-Omni delivers 1.27x-7.57x higher throughput for dynamic multimodal LLM training by decoupling encoder and LLM parallelism, using unified colocation, and applying adaptive workload balancing.
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts cs.LG · 2026-05-07 · unverdicted · none · ref 39
A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism cs.DC · 2026-05-06 · unverdicted · none · ref 16
Piper introduces resource modeling and pipelined hybrid parallelism for MoE training, delivering 2-3.5X higher MFU than prior frameworks and 1.2-9X better all-to-all bandwidth.
Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE cs.CV · 2026-05-04 · unverdicted · none · ref 43
Mamoda2.5 is a 25B-parameter DiT-MoE unified AR-Diffusion model that reaches top video generation and editing benchmarks with 4-step inference up to 95.9x faster than baselines.

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer