A hybrid tensor-expert-data parallelism approach to optimize mixture- of-experts training

Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

FoE restructures MoE blocks into per-KV-head clusters with sum-based synchronization, removing all-to-all communication in single-node settings and limiting it to intra-node in multi-node settings for up to 5.2x faster inference with comparable quality.

citing papers explorer

Showing 1 of 1 citing paper.

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models cs.LG · 2026-05-07 · unverdicted · none · ref 23
FoE restructures MoE blocks into per-KV-head clusters with sum-based synchronization, removing all-to-all communication in single-node settings and limiting it to intra-node in multi-node settings for up to 5.2x faster inference with comparable quality.

A hybrid tensor-expert-data parallelism approach to optimize mixture- of-experts training

fields

years

verdicts

representative citing papers

citing papers explorer