RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.
S-jepa: Towards seamless cross-dataset transfer through dy- namic spatial attention.arXiv preprint arXiv:2403.11772
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Recurrent Video Masked Autoencoders
RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.