Shortcut-connected expert parallelism for accelerating mixture-of-experts

Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunghun Kim, Jiayi Huang · 2024 · arXiv 2404.05019

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

MONA integrates Nesterov acceleration into Muon's orthogonalization framework, reporting better convergence than Muon and AdamW on MoE models up to 68B parameters trained on 1T tokens and SOTA fine-tuning results.

Chameleon: Adaptive Fault Tolerance for Distributed Training via Real-time Policy Selection

cs.DC · 2025-08-29 · unverdicted · novelty 6.0

Chameleon provides adaptive fault tolerance for distributed training by real-time selection of optimal recovery policies via a unified performance model, demonstrated with low overhead on a 32-card cluster.

A Survey on Large Language Models for Code Generation

cs.CL · 2024-06-01 · unverdicted · novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

cs.LG · 2026-04-19

citing papers explorer

Showing 2 of 2 citing papers after filters.

A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 35
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference cs.LG · 2026-04-19 · unreviewed · ref 10

Shortcut-connected expert parallelism for accelerating mixture-of-experts

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer