Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AR 3years
2026 3roles
background 1polarities
background 1representative citing papers
TokenStack's heterogeneous HBM-PIM design with base-die control and topology-aware KV placement delivers 1.62x higher geometric-mean token throughput and 1.70x SLO-compliant serving capacity than AttAcc while cutting per-token energy by 30-47%.
AHASD is a new asynchronous heterogeneous architecture for mobile NPU-PIM systems that enables efficient adaptive speculative decoding for LLMs by decoupling drafting and verification with specialized controls and hardware units, delivering up to 4.2x throughput and 5.6x energy efficiency gains.
citing papers explorer
-
Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models
Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.
-
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
TokenStack's heterogeneous HBM-PIM design with base-die control and topology-aware KV placement delivers 1.62x higher geometric-mean token throughput and 1.70x SLO-compliant serving capacity than AttAcc while cutting per-token energy by 30-47%.
-
AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices
AHASD is a new asynchronous heterogeneous architecture for mobile NPU-PIM systems that enables efficient adaptive speculative decoding for LLMs by decoupling drafting and verification with specialized controls and hardware units, delivering up to 4.2x throughput and 5.6x energy efficiency gains.