LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
Efficient sampling approaches to shapley value approximation.Proceedings of the ACM on Management of Data, 1(1):1–24
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
EASE minimizes first-order mean squared error for probabilistic value estimation by jointly optimizing the sampling law and surrogate function, outperforming prior methods.
citing papers explorer
-
LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
-
First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint
EASE minimizes first-order mean squared error for probabilistic value estimation by jointly optimizing the sampling law and surrogate function, outperforming prior methods.