Moe++: Accelerating mixture-of-experts methods with zero-computation experts.arXiv preprint arXiv:2410.07348

Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan · 2024 · arXiv 2410.07348

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning

cs.LG · 2026-02-13 · unverdicted · novelty 7.0

Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.

Post-Trained MoE Can Skip Half Experts via Self-Distillation

cs.LG · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

ZEDA turns post-trained static MoE models into dynamic ones via zero-output expert injection and two-stage self-distillation, cutting over 50% expert FLOPs on Qwen3-30B-A3B and GLM-4.7-Flash with small accuracy drops across 11 benchmarks.

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

cs.AI · 2026-05-14 · conditional · novelty 6.0

BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

citing papers explorer

Showing 4 of 4 citing papers.

PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis cs.CV · 2026-05-09 · unverdicted · none · ref 15
PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.
Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning cs.LG · 2026-02-13 · unverdicted · none · ref 18
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
Post-Trained MoE Can Skip Half Experts via Self-Distillation cs.LG · 2026-05-18 · unverdicted · none · ref 3 · 2 links
ZEDA turns post-trained static MoE models into dynamic ones via zero-output expert injection and two-stage self-distillation, cutting over 50% expert FLOPs on Qwen3-30B-A3B and GLM-4.7-Flash with small accuracy drops across 11 benchmarks.
BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE cs.AI · 2026-05-14 · conditional · none · ref 12
BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

Moe++: Accelerating mixture-of-experts methods with zero-computation experts.arXiv preprint arXiv:2410.07348

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer