Unifying Convolution and Attention via Convolutional Nearest Neighbors

· 2025 · cs.CV · arXiv 2511.14137

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Convolutional Neural Networks and Vision Transformers are the two dominant architectural families in computer vision, defined by spatially local convolution and global self-attention respectively. Despite their apparent differences, we show that both operations are special cases of a single $k$-nearest neighbor aggregation framework: convolution selects neighbors by spatial proximity while attention selects by feature similarity, placing them at two ends of a shared operational spectrum. We introduce Convolutional Nearest Neighbors (ConvNN), a unified framework that exactly recovers standard and depthwise convolution, self-attention, and sparse attention variants including KVT-attention as special cases, and exposes the design space of neighbor-selection strategies between them through configurable similarity functions, positional encodings, and aggregation kernels. We validate ConvNN on ImageNet-1K classification across two complementary architectures: a hybrid branching layer in ResNet-50 that combines local and global feature learning, improving top-1 accuracy by 3.0% over the ResNet-50 baseline, and ConvNN-attention in ViT-Base that achieves 81.64% top-1 accuracy, surpassing standard multi-head self-attention by 0.7%. Together, these results demonstrate that ConvNN provides a principled foundation for designing operations that bridge convolutional and attention-based computation.

representative citing papers

Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

Multiprobe grid ANN maintains roughly constant d-scaling on GloVe while graph/tree/partitioning methods degrade, with near-linear N scaling and lower indexing cost.

citing papers explorer

Showing 1 of 1 citing paper.

Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions cs.LG · 2026-07-01 · unverdicted · none · ref 23 · internal anchor
Multiprobe grid ANN maintains roughly constant d-scaling on GloVe while graph/tree/partitioning methods degrade, with near-linear N scaling and lower indexing cost.

Unifying Convolution and Attention via Convolutional Nearest Neighbors

fields

years

verdicts

representative citing papers

citing papers explorer