Scaling Vision Transformers for Functional MRI with Flat Maps

Connor Lane , Mihir Tripathy , Leema Krishna Murali , Ratna Sagari Grandhi , Shamus Sim Zi Yang , Sam Gijsen , Debojyoti Das , Manish Ram

show 10 more authors

Utkarsh Kumar Singh Cesar Kadir Torrico Villanueva Yuxiang Wei Will Beddow Gianfranco Cort\'es Suin Cho Daniel Z. Kaplan Benjamin Warner Tanishq Mathew Abraham Paul S. Scotti

Authors on Pith no claims yet

classification 💻 cs.CV cs.AIq-bio.NC

keywords fmrimodelsflatbrainmarkscortexmaefirstfunctionalmaps

0 comments

read the original abstract

We study the problem of training self-supervised foundation models for functional MRI. Our main contributions are: (1) we introduce a new model family (CortexMAE) trained using the masked autoencoder framework on 2.1K hours of open fMRI data, and (2) we release the first open evaluation suite (Brainmarks) for fMRI foundation models. Our core innovation is simple: we adapt the Vision Transformer to fMRI by first converting each 3D fMRI volume to a 2D map using a cortical flat map projection. We directly compare flat maps to both parcellation and volume-based representations. While each has its advantages, flat maps generally perform best. We perform the first systematic scaling analysis for fMRI and observe strict power law scaling, albeit with limits. Finally, we use Brainmarks to do controlled benchmark comparisons. On subject-level trait prediction, we report a challenging null result: no single model achieves clear state-of-the-art performance. Moreover, all models struggle to outperform a simple functional connectivity baseline. On cognitive state decoding, we observe more robust performance, and in this setting our CortexMAE family outperforms prior models by a large margin. Code, models, and datasets are available at https://github.com/MedARC-AI/CortexMAE and https://github.com/MedARC-AI/Brainmarks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
cs.LG 2026-04 unverdicted novelty 6.0

A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.