Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2024 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.