Moe++: Accelerating mixture-of-experts methods with zero-computation experts.arXiv preprint arXiv:2410.07348

Jin, P · 2024 · arXiv 2410.07348

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

PromptDx: Differentiable Prompt Tuning for Multimodal In-Context Alzheimer's Diagnosis

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

PromptDx adds a differentiable adapter to align multimodal data with a pre-trained TabPFN-style ICL engine, achieving strong Alzheimer's diagnosis performance with only 1% context samples.

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning

cs.LG · 2026-02-13 · unverdicted · novelty 7.0

Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.

Post-Trained MoE Can Skip Half Experts via Self-Distillation

cs.LG · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

ZEDA turns post-trained static MoE models into dynamic ones via zero-output expert injection and two-stage self-distillation, cutting over 50% expert FLOPs on Qwen3-30B-A3B and GLM-4.7-Flash with small accuracy drops across 11 benchmarks.

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

cs.AI · 2026-05-14 · conditional · novelty 6.0

BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

MEPA: Multi-Scale Representation Alignment for Visual Autoregressive Modeling with Mixture of Experts

cs.CV · 2026-07-01 · unverdicted · novelty 5.0

MEPA adds token-routed MoE and residual self-supervised feature alignment to VAR models, reporting better FID on ImageNet 256x256 with half the training epochs and fewer parameters than dense baselines.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning cs.LG · 2026-02-13 · unverdicted · none · ref 18
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
Post-Trained MoE Can Skip Half Experts via Self-Distillation cs.LG · 2026-05-18 · unverdicted · none · ref 3 · 2 links
ZEDA turns post-trained static MoE models into dynamic ones via zero-output expert injection and two-stage self-distillation, cutting over 50% expert FLOPs on Qwen3-30B-A3B and GLM-4.7-Flash with small accuracy drops across 11 benchmarks.

Moe++: Accelerating mixture-of-experts methods with zero-computation experts.arXiv preprint arXiv:2410.07348

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer