Localvit: Bringing locality to vision transformers

Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, Luc Van Gool · 2021 · arXiv 2104.05707

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

cs.CV · 2021-10-05 · unverdicted · novelty 6.0

MobileViT is a lightweight vision transformer that reports 78.4% top-1 accuracy on ImageNet-1k with ~6M parameters, outperforming MobileNetv3 by 3.2% and DeIT by 6.2% at similar size, plus gains on MS-COCO detection.

Cross Paradigm Representation and Alignment Transformer for Image Deraining

cs.CV · 2025-04-23 · conditional · novelty 5.0

CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

KAConvNet: Kolmogorov-Arnold Convolutional Networks for Vision Recognition cs.CV · 2026-04-25 · unverdicted · none · ref 47
KAConvNet introduces a Kolmogorov-Arnold Convolutional Layer to build networks competitive with ViTs and CNNs while offering stronger theoretical interpretability.
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer cs.CV · 2021-10-05 · unverdicted · none · ref 11
MobileViT is a lightweight vision transformer that reports 78.4% top-1 accuracy on ImageNet-1k with ~6M parameters, outperforming MobileNetv3 by 3.2% and DeIT by 6.2% at similar size, plus gains on MS-COCO detection.
Cross Paradigm Representation and Alignment Transformer for Image Deraining cs.CV · 2025-04-23 · conditional · none · ref 34
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.

Localvit: Bringing locality to vision transformers

fields

years

verdicts

representative citing papers

citing papers explorer