NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
Roofline: an insightful visual performance model for multicore architectures,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
SparKV reduces time-to-first-token by 1.3x-5.1x and energy use by 1.5x-3.3x for on-device LLM inference by adaptively choosing between cloud KV streaming and local computation while overlapping execution and adjusting for runtime conditions.
Sparsity-aware roofline models are required for accurate SpMM performance prediction because matrix structure alters arithmetic intensity and a single unified model fails across patterns like block, banded, scale-free, and random.
citing papers explorer
-
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
-
SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference
SparKV reduces time-to-first-token by 1.3x-5.1x and energy use by 1.5x-3.3x for on-device LLM inference by adaptively choosing between cloud KV streaming and local computation while overlapping execution and adjusting for runtime conditions.
-
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
Sparsity-aware roofline models are required for accurate SpMM performance prediction because matrix structure alters arithmetic intensity and a single unified model fails across patterns like block, banded, scale-free, and random.