Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
Misrouter enables input-only attacks on MoE LLMs by optimizing queries on open-source surrogates to route toward weakly aligned experts and transferring them to public APIs.
Hi-MoE uses two-level hierarchical routing objectives to enforce group-level balance while promoting within-group specialization, yielding better perplexity and expert utilization than prior MoE baselines in NLP and vision tasks.
MoS applies theme-aware routing to extract multi-scale theme-specific subsequences from noisy long user sequences, achieving state-of-the-art recommendation performance with fewer FLOPs than comparable MoE models.
FaaSMoE treats MoE experts as on-demand FaaS functions with configurable granularity, using under one-third the resources of a full-model baseline under multi-tenant workloads.
STM3 is a new multiscale Mamba mixture-of-experts model with graph causal networks and contrastive routing that reports state-of-the-art results on 10 long-term spatio-temporal forecasting benchmarks.
PiKV proposes expert-sharded KV storage, PiKV routing, adaptive scheduling, and compression modules to reduce overhead in multi-GPU MoE inference.
OneRec unifies retrieval and ranking in a generative recommender using session-wise decoding and iterative DPO-based preference alignment, achieving real-world gains on Kuaishou.
citing papers explorer
-
Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models
Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.
-
Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs
Misrouter enables input-only attacks on MoE LLMs by optimizing queries on open-source surrogates to route toward weakly aligned experts and transferring them to public APIs.
-
Hierarchical Mixture-of-Experts with Two-Stage Optimization
Hi-MoE uses two-level hierarchical routing objectives to enforce group-level balance while promoting within-group specialization, yielding better perplexity and expert utilization than prior MoE baselines in NLP and vision tasks.
-
Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation
MoS applies theme-aware routing to extract multi-scale theme-specific subsequences from noisy long user sequences, achieving state-of-the-art recommendation performance with fewer FLOPs than comparable MoE models.
-
FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving
FaaSMoE treats MoE experts as on-demand FaaS functions with configurable granularity, using under one-third the resources of a full-model baseline under multi-tenant workloads.
-
STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction
STM3 is a new multiscale Mamba mixture-of-experts model with graph causal networks and contrastive routing that reports state-of-the-art results on 10 long-term spatio-temporal forecasting benchmarks.
-
PiKV: KV Cache Management System for Mixture of Experts
PiKV proposes expert-sharded KV storage, PiKV routing, adaptive scheduling, and compression modules to reduce overhead in multi-GPU MoE inference.
-
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment
OneRec unifies retrieval and ranking in a generative recommender using session-wise decoding and iterative DPO-based preference alignment, achieving real-world gains on Kuaishou.