Wildcat: Near-linear attention in theory and practice

Tobias Schröder, Lester Mackey · 2026 · cs.LG · arXiv 2602.10056

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O(n^{-\sqrt{\log(\log(n))}})$ error decay while running in near-linear $O(n^{1+o(1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high fidelity. We couple this advance with a GPU-optimized PyTorch implementation and a suite of benchmark experiments demonstrating the benefits of WildCat for image generation, image classification, and language model KV cache compression.

representative citing papers

Nearly Optimal Attention Coresets

cs.DS · 2026-05-07 · unverdicted · novelty 8.0

ε-coresets for attention exist of size O(√d e^{ρ+o(ρ)}/ε) for unit-norm keys/values and queries of norm ≤ρ, nearly matching the Ω(√d e^ρ/ε) lower bound.

Express Language Modeling

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

Express converts non-causal attention approximations to causal versions, achieving log^{3/2}(n)/s error with O(s) memory and O(s^2 log^2(n)) overhead when combined with Thinformer.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Express Language Modeling cs.LG · 2026-06-09 · unverdicted · none · ref 26 · internal anchor
Express converts non-causal attention approximations to causal versions, achieving log^{3/2}(n)/s error with O(s) memory and O(s^2 log^2(n)) overhead when combined with Thinformer.

Wildcat: Near-linear attention in theory and practice

fields

years

verdicts

representative citing papers

citing papers explorer