MMEE encodes dataflow decisions in matrix form for fast exhaustive search, delivering 40-69% lower latency and energy use than prior methods while running 64-343x faster.
Dynamic sparse attention for scalable transformer acceleration
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AR 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
D-Legion proposes a scalable architecture of Legions containing adaptive-precision systolic array cores that accelerates quantized LLM matrix multiplications, delivering up to 8.2x lower latency and 3.8x higher memory savings versus prior designs.
citing papers explorer
-
Fast Cross-Operator Optimization of Attention Dataflow
MMEE encodes dataflow decisions in matrix form for fast exhaustive search, delivering 40-69% lower latency and energy use than prior methods while running 64-343x faster.
-
D-Legion: A Scalable Many-Core Architecture for Accelerating Matrix Multiplication in Quantized LLMs
D-Legion proposes a scalable architecture of Legions containing adaptive-precision systolic array cores that accelerates quantized LLM matrix multiplications, delivering up to 8.2x lower latency and 3.8x higher memory savings versus prior designs.