AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
Qwen2.5 technical report
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
R2SAEA fine-tunes an LLM with RL to reason about solution relations for surrogate-assisted evolutionary optimization, reporting improved relation prediction and SOTA performance on single- and multi-objective benchmarks.