Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
Deep residual learning for image recognition
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
A persistent homology loss enforces controllable connectivity in autoencoder latent spaces, improving one-class classification via kernel density estimation on the learned representations.
citing papers explorer
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
-
Connectivity-Optimized Representation Learning via Persistent Homology
A persistent homology loss enforces controllable connectivity in autoencoder latent spaces, improving one-class classification via kernel density estimation on the learned representations.