Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning

· 2023 · arXiv 2309.05444

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

method 1

citation-polarity summary

background 1

representative citing papers

Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning

cs.LG · 2026-04-29 · unverdicted · novelty 7.0

DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

cs.LG · 2026-04-21 · unverdicted · novelty 7.0 · 2 refs

Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

cs.RO · 2026-04-22 · unverdicted · novelty 5.0

ALAS disentangles environment and self-state streams via bio-inspired modules to deliver 23% higher subtask success and 29% better execution efficiency on long-horizon HSI tasks.

Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework

cs.LG · 2026-04-14 · unverdicted · novelty 4.0

A low-rank mixture of experts model trained on handwriting data delivers strong Alzheimer's diagnosis performance with substantially reduced parameter activation during inference.

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

cs.LG · 2024-03-21 · accept · novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

citing papers explorer

Showing 5 of 5 citing papers.

Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning cs.LG · 2026-04-29 · unverdicted · none · ref 25
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts cs.LG · 2026-04-21 · unverdicted · none · ref 57 · 2 links
Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement cs.RO · 2026-04-22 · unverdicted · none · ref 39
ALAS disentangles environment and self-state streams via bio-inspired modules to deliver 23% higher subtask success and 29% better execution efficiency on long-horizon HSI tasks.
Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework cs.LG · 2026-04-14 · unverdicted · none · ref 31
A low-rank mixture of experts model trained on handwriting data delivers strong Alzheimer's diagnosis performance with substantially reduced parameter activation during inference.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 60
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer