pith. machine review for the scientific record. sign in

arxiv: 2605.01240 · v2 · submitted 2026-05-02 · 💻 cs.LG · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI

Carolina Torres-Rojas, Manob Jyoti Saikia, Pankaj Pandey, Pratheek Eranki, Ranganatha Sitaram, Ruthwik Reddy Doodipala

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords self-supervised learningresting-state fMRIhybrid Mamba-Attentionregion-aware maskingbrain disorder classificationABIDE pretrainingIntegrated Gradients
0
0 comments X

The pith

Region-aware hybrid Attention-Mamba pretraining improves fMRI classification of schizophrenia and ADHD.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Rhamba, a self-supervised framework that pairs anatomically guided masking strategies with hybrid Attention-Mamba models for resting-state fMRI. It pretrains models on the ABIDE dataset using region-aligned patches and three masking approaches, then fine-tunes them on separate datasets to classify schizophrenia and ADHD. The Mamba-Attention hybrid reaches the highest average AUROC and exceeds prior methods, though gains depend on how masking and architecture interact rather than any single choice. A sympathetic reader would care because this offers a concrete route to learn useful representations from large unlabeled neuroimaging collections without requiring extensive labeled data.

Core claim

Rhamba integrates region-aligned patch embeddings with three masking strategies of increasing spatial specificity and compares four architectural variants during pretraining on ABIDE. After fine-tuning on COBRE and ADHD-200, the Mamba-Attention hybrid encoder-decoder records the highest average AUROC across both tasks and outperforms state-of-the-art baselines. Integrated Gradients analysis identifies contributing brain regions, and results indicate that downstream performance arises from the specific pairing of masking strategy and architecture.

What carries the argument

Region-aligned patch embeddings processed by hybrid Attention-Mamba encoder-decoder blocks under Any, Majority, or Pure masking strategies.

If this is right

  • Masking strategy produces a consistent ordering of reconstruction loss but only modest and dataset-dependent effects on classification accuracy.
  • The Mamba-Attention configuration achieves the highest average AUROC across the two evaluation datasets.
  • Peak performance requires specific combinations of masking strategy and architecture instead of one universally best option.
  • Integrated Gradients reveals the brain regions that drive predictions for each model variant.
  • Rhamba exceeds state-of-the-art methods in the comparative evaluations performed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid models of this form may scale more efficiently than pure attention models when handling long fMRI sequences.
  • The region-aware emphasis could transfer to other neuroimaging modalities or to predicting functional connectivity patterns.
  • Tuning masking specificity per target disorder may become standard practice when applying similar frameworks.

Load-bearing premise

Performance differences in the downstream tasks result from the masking strategies and hybrid architecture choices rather than dataset properties or unstated implementation details.

What would settle it

A replication on the same pretraining and fine-tuning datasets in which a pure attention model or non-region-aware masking matches or exceeds the reported AUROC of the MA hybrid would falsify the advantage of the Rhamba design.

Figures

Figures reproduced from arXiv: 2605.01240 by Carolina Torres-Rojas, Manob Jyoti Saikia, Pankaj Pandey, Pratheek Eranki, Ranganatha Sitaram, Ruthwik Reddy Doodipala.

Figure 1
Figure 1. Figure 1: Overview of the proposed framework. (a) Pre-training pipeline, including ROI-based view at source ↗
Figure 2
Figure 2. Figure 2: Masking strategy comparison and region-wise architecture performance across datasets. view at source ↗
Figure 3
Figure 3. Figure 3: Reconstruction loss across masking strategies, regions, and architectures. view at source ↗
Figure 4
Figure 4. Figure 4: Interpretation maps generated using the Integrated Gradients (IG) method shown in sagittal view at source ↗
read the original abstract

Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomically guided masking with hybrid Attention-Mamba architectures for resting state functional magnetic resonance imaging (fMRI) analysis. Models were pretrained on the ABIDE dataset using region-aligned patch embeddings and three masking strategies (Any, Majority, and Pure) with increasing spatial specificity. We evaluated four architectural variants: a Mamba only model, an Alternate architecture with interleaved Mamba and Attention blocks, and two hybrid encoder-decoder configurations (Attention-Mamba (AM) and Mamba-Attention (MA)). The pretrained models were fine-tuned on downstream classification tasks using the COBRE and ADHD-200 datasets for schizophrenia and attention-deficit/hyperactivity disorder discrimination. We employed Integrated Gradients, an explainable AI method, to identify the brain regions contributing to model predictions. Masking strategy strongly influenced reconstruction behavior, with reconstruction loss following a consistent ordering (Any > Majority > Pure). However, this trend did not directly translate into downstream performance, where differences were modest and dataset-dependent. The hybrid architecture with the MA configuration achieved the highest average AUROC across both datasets, and Rhamba outperformed state-of-the-art methods in comparative evaluation. Region-wise analysis showed that peak performance depends on the interaction between masking strategy and architecture rather than a single dominant configuration. Overall, Rhamba offers a flexible framework for balancing interpretability, scalability, and performance in large-scale fMRI representation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Rhamba, a region-aware self-supervised pretraining framework for resting-state fMRI that combines anatomically guided masking strategies (Any, Majority, Pure) with hybrid Attention-Mamba sequence models. Models are pretrained on ABIDE using region-aligned patch embeddings, then fine-tuned for binary classification on COBRE (schizophrenia) and ADHD-200 datasets. Four architectures are compared (Mamba-only, Alternate, AM, MA), with the MA hybrid reported to achieve the highest average AUROC; the framework is claimed to outperform prior SOTA methods, and Integrated Gradients is used to highlight contributing brain regions. The abstract notes that masking trends in reconstruction loss do not directly translate to downstream performance, which is described as modest and dataset-dependent.

Significance. If the reported AUROC gains and outperformance hold under rigorous statistical controls, the work would offer a practical, scalable alternative to pure transformer or Mamba baselines for fMRI representation learning, with built-in region-level interpretability. The hybrid design and masking ablation could inform efficient long-sequence modeling in neuroimaging, where data efficiency and anatomical priors matter.

major comments (2)
  1. [Results / Comparative evaluation] Results section (AUROC tables and comparative evaluation): The central claim that the MA hybrid yields the highest average AUROC across COBRE and ADHD-200 and that Rhamba outperforms SOTA rests on modest, dataset-dependent differences without reported standard deviations, multiple random seeds, or statistical significance tests (e.g., McNemar or paired t-tests). This directly undermines the superiority assertion, as the abstract itself qualifies the downstream differences as modest.
  2. [Experiments / Downstream evaluation] Experimental protocol (fine-tuning and evaluation subsections): No details are provided on hyperparameter search ranges, data-split stratification, or whether the same random seeds were used across the four architectures and three masking strategies. Without these controls, observed orderings could arise from implementation variance rather than the region-aware masking plus hybrid design.
minor comments (2)
  1. [Methods / Architecture] Clarify the exact definition of the MA versus AM encoder-decoder configurations (e.g., which blocks are in the encoder versus decoder) and include a diagram or pseudocode for the hybrid stacking.
  2. [Pretraining results] The reconstruction-loss ordering (Any > Majority > Pure) is stated but not quantified with numerical values or linked to a specific figure or table; add these values for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We appreciate the referee's careful reading and address the major comments point by point below. We will revise the manuscript to incorporate additional statistical rigor and experimental details where feasible.

read point-by-point responses
  1. Referee: [Results / Comparative evaluation] Results section (AUROC tables and comparative evaluation): The central claim that the MA hybrid yields the highest average AUROC across COBRE and ADHD-200 and that Rhamba outperforms SOTA rests on modest, dataset-dependent differences without reported standard deviations, multiple random seeds, or statistical significance tests (e.g., McNemar or paired t-tests). This directly undermines the superiority assertion, as the abstract itself qualifies the downstream differences as modest.

    Authors: We agree that the lack of standard deviations, multiple random seeds, and formal statistical tests weakens the comparative claims. The abstract correctly qualifies the differences as modest and dataset-dependent, and we do not claim large effect sizes. In the revised manuscript, we will report AUROC values with standard deviations computed over multiple random seeds and include paired statistical tests (e.g., paired t-tests or McNemar's test) to assess significance of the observed orderings. This will provide a more rigorous basis for the reported trends without overstating the results. revision: yes

  2. Referee: [Experiments / Downstream evaluation] Experimental protocol (fine-tuning and evaluation subsections): No details are provided on hyperparameter search ranges, data-split stratification, or whether the same random seeds were used across the four architectures and three masking strategies. Without these controls, observed orderings could arise from implementation variance rather than the region-aware masking plus hybrid design.

    Authors: We acknowledge that insufficient detail on the experimental controls limits reproducibility and the ability to rule out implementation variance. In the revised version, we will expand the experimental protocol and fine-tuning subsections to specify the hyperparameter search ranges (including learning rate, batch size, and optimizer settings), the data-split stratification approach (e.g., by site or diagnostic label to preserve class balance), and confirmation that identical random seeds were used across all architecture-masking combinations for fair comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical self-supervised pipeline

full rationale

The paper describes an empirical self-supervised pretraining framework (region-aware masking on ABIDE followed by fine-tuning on COBRE/ADHD-200) using standard hybrid Attention-Mamba architectures and Integrated Gradients for post-hoc explanation. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citation chains. All performance claims rest on experimental comparisons rather than any load-bearing mathematical step that imports its own inputs. The methodology is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the framework implicitly assumes that region-aligned patch embeddings and the listed masking strategies capture meaningful anatomical structure in fMRI.

pith-pipeline@v0.9.0 · 5622 in / 1081 out tokens · 62524 ms · 2026-05-11T00:43:14.360895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 20 canonical work pages · 3 internal anchors

  1. [1]

    Kay, and David W

    Seiji Ogawa, Tso-Ming Lee, Alan R. Kay, and David W. Tank. Brain magnetic resonance imaging with contrast dependent on blood oxygenation.Proceedings of the National Academy of Sciences, 87(24):9868–9872, 1990

  2. [2]

    Functional mapping of the human visual cortex by magnetic resonance imaging.Science, 254(5032):716–719, 1991

    Jack W Belliveau, David N Kennedy, Robert C McKinstry, Bradley R Buchbinder, Robert M Weisskoff, Mark S Cohen, JM Vevea, Thomas J Brady, and Bruce R Rosen. Functional mapping of the human visual cortex by magnetic resonance imaging.Science, 254(5032):716–719, 1991

  3. [3]

    Functional connectivity in the motor cortex of resting human brain using echo-planar mri.Magnetic resonance in medicine, 34(4):537–541, 1995

    Bharat Biswal, F Zerrin Yetkin, Victor M Haughton, and James S Hyde. Functional connectivity in the motor cortex of resting human brain using echo-planar mri.Magnetic resonance in medicine, 34(4):537–541, 1995

  4. [4]

    Consistent resting-state networks across healthy subjects.Proceedings of the national academy of sciences, 103(37):13848–13853, 2006

    Jessica S Damoiseaux, Serge ARB Rombouts, Frederik Barkhof, Philip Scheltens, Cornelis J Stam, Stephen M Smith, and Christian F Beckmann. Consistent resting-state networks across healthy subjects.Proceedings of the national academy of sciences, 103(37):13848–13853, 2006

  5. [5]

    Resting state fmri: a personal history.Neuroimage, 62(2):938–944, 2012

    Bharat B Biswal. Resting state fmri: a personal history.Neuroimage, 62(2):938–944, 2012

  6. [6]

    The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism.Molecular psychiatry, 19(6):659–667, 2014

    Adriana Di Martino, Chao-Gan Yan, Qingyang Li, Erin Denio, Francisco X Castellanos, Kaat Alaerts, Jeffrey S Anderson, Michal Assaf, Susan Y Bookheimer, Mirella Dapretto, et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism.Molecular psychiatry, 19(6):659–667, 2014

  7. [7]

    The adhd-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience.Frontiers in systems neuroscience, 6:62, 2012

    ADHD-200 consortium. The adhd-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience.Frontiers in systems neuroscience, 6:62, 2012

  8. [8]

    Common neural patterns of substance use disorder: a seed-based resting-state functional connectivity meta-analysis.Translational Psychiatry, 15(1):190, 2025

    Xiaonan Zhang, Haoyu Zhang, Yingbo Shao, Yang Li, Feifei Zhang, and Hui Zhang. Common neural patterns of substance use disorder: a seed-based resting-state functional connectivity meta-analysis.Translational Psychiatry, 15(1):190, 2025. 18

  9. [9]

    Kevin Hilbert, Joscha Böhnlein, Charlotte Meinke, Alice V Chavanne, Till Langhammer, Lara Stumpe, Nils Winter, Ramona Leenings, Dirk Adolph, V olker Arolt, et al. Lack of evidence for predictive utility from resting state fmri data for individual exposure-based cognitive behavioral therapy outcomes: A machine learning study in two large multi-site samples...

  10. [10]

    The history and future of resting-state functional magnetic resonance imaging.Nature, 641(8065):1121–1131, 2025

    Bharat B Biswal and Lucina Q Uddin. The history and future of resting-state functional magnetic resonance imaging.Nature, 641(8065):1121–1131, 2025

  11. [11]

    Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.Neuroimage, 145:137–165, 2017

    Mohammad R Arbabshirani, Sergey Plis, Jing Sui, and Vince D Calhoun. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.Neuroimage, 145:137–165, 2017

  12. [12]

    Cross-validation failure: Small sample sizes lead to large error bars.Neuroim- age, 180:68–77, 2018

    Gaël Varoquaux. Cross-validation failure: Small sample sizes lead to large error bars.Neuroim- age, 180:68–77, 2018

  13. [13]

    Resting state fmri functional connectivity-based classification using a convolutional neural network architecture.Frontiers in neuroinformatics, 11:61, 2017

    Regina J Meszlényi, Krisztian Buza, and Zoltán Vidnyánszky. Resting state fmri functional connectivity-based classification using a convolutional neural network architecture.Frontiers in neuroinformatics, 11:61, 2017

  14. [14]

    3d-cnn based discrimination of schizophrenia using resting-state fmri.Artificial intelligence in medicine, 98:10–17, 2019

    Muhammad Naveed Iqbal Qureshi, Jooyoung Oh, and Boreom Lee. 3d-cnn based discrimination of schizophrenia using resting-state fmri.Artificial intelligence in medicine, 98:10–17, 2019

  15. [15]

    The use of fmri regional analysis to automatically detect adhd through a 3d cnn-based approach.Journal of Imaging Informatics in Medicine, 38 (1):203–216, 2025

    Perihan Gül¸ sah Gülhan and Güzin Özmen. The use of fmri regional analysis to automatically detect adhd through a 3d cnn-based approach.Journal of Imaging Informatics in Medicine, 38 (1):203–216, 2025

  16. [16]

    Identifying autism from resting-state fmri using long short-term memory networks

    Nicha C Dvornek, Pamela Ventola, Kevin A Pelphrey, and James S Duncan. Identifying autism from resting-state fmri using long short-term memory networks. Ininternational workshop on machine learning in medical imaging, pages 362–370. Springer, 2017

  17. [17]

    Characterization of early stage parkinson’s disease from resting-state fmri data using a long short-term memory network.Frontiers in Neuroimaging, 1:952084, 2022

    Xueqi Guo, Sule Tinaz, and Nicha C Dvornek. Characterization of early stage parkinson’s disease from resting-state fmri data using a long short-term memory network.Frontiers in Neuroimaging, 1:952084, 2022

  18. [18]

    A novel graph neural network framework for resting-state functional mri spatiotemporal dynamics analysis.Physica A: Statistical Mechanics and its Applications, 669: 130582, 2025

    Tao Wang, Zenghui Ding, Zheng Chang, Xianjun Yang, Yanyan Chen, Meng Li, Shu Xu, and Yu Wang. A novel graph neural network framework for resting-state functional mri spatiotemporal dynamics analysis.Physica A: Statistical Mechanics and its Applications, 669: 130582, 2025

  19. [19]

    Classification of brain disorders in rs-fmri via local-to-global graph neural networks.IEEE transactions on medical imaging, 42(2):444–455, 2022

    Hao Zhang, Ran Song, Liping Wang, Lin Zhang, Dawei Wang, Cong Wang, and Wei Zhang. Classification of brain disorders in rs-fmri via local-to-global graph neural networks.IEEE transactions on medical imaging, 42(2):444–455, 2022

  20. [20]

    Representation learning of resting state fmri with variational autoencoder.NeuroImage, 241: 118423, 2021

    Jung-Hoon Kim, Yizhen Zhang, Kuan Han, Zheyu Wen, Minkyu Choi, and Zhongming Liu. Representation learning of resting state fmri with variational autoencoder.NeuroImage, 241: 118423, 2021

  21. [21]

    Classification of mdd using a transformer classifier with large-scale multisite resting-state fmri data.Human brain mapping, 45(1):e26542, 2024

    Peishan Dai, Ying Zhou, Yun Shi, Da Lu, Zailiang Chen, Beiji Zou, Kun Liu, Shenghui Liao, and REST meta MDD Consortium. Classification of mdd using a transformer classifier with large-scale multisite resting-state fmri data.Human brain mapping, 45(1):e26542, 2024

  22. [22]

    Predicting task-related brain activity from resting-state brain dynamics with fmri transformer

    Junbeom Kwon, Jungwoo Seo, Heehwan Wang, Taesup Moon, Shinjae Yoo, and Jiook Cha. Predicting task-related brain activity from resting-state brain dynamics with fmri transformer. Imaging Neuroscience, 3:imag_a_00440, 2025

  23. [23]

    Current challenges in translational and clinical fmri and future directions

    Karsten Specht. Current challenges in translational and clinical fmri and future directions. Frontiers in psychiatry, 10:924, 2020

  24. [24]

    On the generalizability of resting-state fmri machine learning classifiers.Frontiers in human neuroscience, 8:502, 2014

    Wolfgang Huf, Klaudius Kalcher, Roland N Boubela, Georg Rath, Andreas Vecsei, Peter Filzmoser, and Ewald Moser. On the generalizability of resting-state fmri machine learning classifiers.Frontiers in human neuroscience, 8:502, 2014. 19

  25. [25]

    Reproducible brain-wide association studies require thousands of individuals

    Scott Marek, Brenden Tervo-Clemmens, Finnegan J Calabro, David F Montez, Benjamin P Kay, Alexander S Hatoum, Meghan Rose Donohue, William Foran, Ryland L Miller, Timothy J Hendrickson, et al. Reproducible brain-wide association studies require thousands of individuals. Nature, 603(7902):654–660, 2022

  26. [26]

    SwiFT: Swin 4d fMRI transformer

    Peter Yongho Kim, Junbeom Kwon, Sunghwan Joo, Sangyoon Bae, Donggyu Lee, Yoonho Jung, Shinjae Yoo, Jiook Cha, and Taesup Moon. SwiFT: Swin 4d fMRI transformer. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https: //openreview.net/forum?id=dKeWh6EzBB

  27. [27]

    Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.arXiv preprint arXiv:2409.19407, 2024

    Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Su Xian Chong, Fang Ji, Nathanael Ren Jie Tong, Christopher Li Hsian Chen, and Juan Helen Zhou. Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.arXiv preprint arXiv:2409.19407, 2024. URLhttps://arxiv.org/abs/2409.19407

  28. [28]

    Unsupervised contrastive graph learning for resting-state functional mri analysis and brain disorder detection.Human brain mapping, 44(17):5672–5692, 2023

    Xiaochuan Wang, Ying Chu, Qianqian Wang, Liang Cao, Lishan Qiao, Limei Zhang, and Mingxia Liu. Unsupervised contrastive graph learning for resting-state functional mri analysis and brain disorder detection.Human brain mapping, 44(17):5672–5692, 2023

  29. [29]

    Self-supervised graph contrastive learning with diffusion augmentation for functional mri analysis and brain disorder detection.Medical image analysis, 101:103403, 2025

    Xiaochuan Wang, Yuqi Fang, Qianqian Wang, Pew-Thian Yap, Hongtu Zhu, and Mingxia Liu. Self-supervised graph contrastive learning with diffusion augmentation for functional mri analysis and brain disorder detection.Medical image analysis, 101:103403, 2025

  30. [30]

    3d masked autoencoder with spatiotemporal transformer for modeling of 4d fmri data.Medical Image Analysis, page 103861, 2025

    Jie Gao, Bao Ge, Ning Qiang, and Shijie Zhao. 3d masked autoencoder with spatiotemporal transformer for modeling of 4d fmri data.Medical Image Analysis, page 103861, 2025

  31. [31]

    Deep feature extraction for resting-state functional mri by self-supervised learning and application to schizophrenia diagnosis.Frontiers in neuroscience, 15:696853, 2021

    Yuki Hashimoto, Yousuke Ogata, Manabu Honda, and Yuichi Yamashita. Deep feature extraction for resting-state functional mri by self-supervised learning and application to schizophrenia diagnosis.Frontiers in neuroscience, 15:696853, 2021

  32. [32]

    Computing personalized brain functional networks from fmri using self-supervised deep learning.Medical Image Analysis, 85:102756, 2023

    Hongming Li, Dhivya Srinivasan, Chuanjun Zhuo, Zaixu Cui, Raquel E Gur, Ruben C Gur, Desmond J Oathes, Christos Davatzikos, Theodore D Satterthwaite, and Yong Fan. Computing personalized brain functional networks from fmri using self-supervised deep learning.Medical Image Analysis, 85:102756, 2023

  33. [33]

    Whole milc: generalizing learned dynamics across tasks, datasets, and populations

    Usman Mahmood, Md Mahfuzur Rahman, Alex Fedorov, Noah Lewis, Zening Fu, Vince D Calhoun, and Sergey M Plis. Whole milc: generalizing learned dynamics across tasks, datasets, and populations. InInternational Conference on Medical Image Computing and Computer- Assisted Intervention, pages 407–417. Springer, 2020

  34. [34]

    Detecting cognitive fatigue in subjects with traumatic brain injury from fmri scans using self-supervised learning

    Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Glenn Wylie, and Fillia Make- don. Detecting cognitive fatigue in subjects with traumatic brain injury from fmri scans using self-supervised learning. InProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments, pages 83–90, 2023

  35. [35]

    Graph self-supervised learning with application to brain networks analysis.IEEE Journal of Biomedical and Health Informatics, 27(8):4154–4165, 2023

    Guangqi Wen, Peng Cao, Lingwen Liu, Jinzhu Yang, Xizhe Zhang, Fei Wang, and Osmar R Zaiane. Graph self-supervised learning with application to brain networks analysis.IEEE Journal of Biomedical and Health Informatics, 27(8):4154–4165, 2023

  36. [36]

    Graph convolutional network with self-supervised learning for brain disease classification.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 21(6):1830–1841, 2024

    Guangyu Wang, Ying Chu, Qianqian Wang, Limei Zhang, Lishan Qiao, and Mingxia Liu. Graph convolutional network with self-supervised learning for brain disease classification.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 21(6):1830–1841, 2024

  37. [37]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

  38. [38]

    Self-supervised transformer- based foundation model for functional magnetic resonance imaging

    Matteo Ferrante, Stefano Iervese, Laura Astolfi, and Nicola Toschi. Self-supervised transformer- based foundation model for functional magnetic resonance imaging. In2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1–6. IEEE, 2025. 20

  39. [39]

    Causal fmri-mamba: Causal state space model for neural decoding and brain task states recognition

    Weihao Deng, Fei Han, Qinghua Ling, Qing Liu, and Henry Han. Causal fmri-mamba: Causal state space model for neural decoding and brain task states recognition. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

  40. [40]

    State-space model for brain network analysis on rs-fmri

    Brain Network Mamba and A Bi-Directional. State-space model for brain network analysis on rs-fmri. InMachine Learning in Medical Imaging: 16th International Workshop, MLMI 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 23, 2025, Proceedings, page 224. Springer Nature, 2026

  41. [41]

    Towards a general-purpose foundation model for functional MRI analysis

    Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Chang-bae Bang, Lin Zhao, Wanyi Fu, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, Tianming Liu, Xue-Jun Kong, Quanzheng Li, Daniel S. Barron, Anqi Qiu, Randy Hirschtick, Byung-Hoon Kim, Hongbin Han, Xiang Li, and Yixuan Yuan. Towards a general-purpose foundation model for functional mri analysis.Nature Bi...

  42. [42]

    Jamba: A Hybrid Transformer-Mamba Language Model

    Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, and Yoav Shoham. Jamba: A hybrid transformer-mamba langua...

  43. [43]

    Transmamba: Flexibly switching between transformer and mamba

    Yixing Li, Ruobing Xie, Zhen Yang, Xingwu Sun, Shuaipeng Li, Weidong Han, Zhanhui Kang, Yu Cheng, Chengzhong Xu, Di Wang, and Jie Jiang. Transmamba: A sequence-level hybrid transformer-mamba language model.arXiv preprint arXiv:2503.24067, 2026. URL https://arxiv.org/abs/2503.24067

  44. [44]

    Can mamba learn how to learn? a comparative study on in-context learning tasks,

    Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, and Dimitris Papailiopoulos. Can mamba learn how to learn? a comparative study on in-context learning tasks.arXiv preprint arXiv:2402.04248, 2024. URL https: //arxiv.org/abs/2402.04248

  45. [45]

    Waleffe, W

    Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, and Bryan Catanzaro. An empirical study of mamba-based language models.arXiv preprint arXiv:2406.07887, 2024. URL https://arxiv.o...

  46. [46]

    Research on autism diagnosis method based on transformer and mamba

    Le Zhao and Yanli Zhang. Research on autism diagnosis method based on transformer and mamba. In2025 6th International Conference on Machine Learning and Computer Application (ICMLCA), pages 1190–1193. IEEE, 2025

  47. [47]

    Brainmt: A hybrid mamba- transformer architecture for modeling long-range dependencies in functional mri data

    Arunkumar Kannan, Martin A Lindquist, and Brian Caffo. Brainmt: A hybrid mamba- transformer architecture for modeling long-range dependencies in functional mri data. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 150–160. Springer, 2025

  48. [48]

    Brain- MAE: A region-aware self-supervised learning framework for brain signals.arXiv preprint arXiv:2406.17086, 2024

    Yifan Yang, Yutong Mao, Xufu Liu, and Xiao Liu. Brainmae: a region-aware self-supervised learning framework for brain signals.arXiv preprint arXiv:2406.17086, 2024

  49. [49]

    Region-aware reconstruction strategy for pre-training fmri foundation model.arXiv preprint arXiv:2511.00443, 2025

    Ruthwik Reddy Doodipala, Pankaj Pandey, Carolina Torres Rojas, Manob Jyoti Saikia, and Ranganatha Sitaram. Region-aware reconstruction strategy for pre-training fmri foundation model.arXiv preprint arXiv:2511.00443, 2025

  50. [50]

    Uk biobank: bank on it.The Lancet, 369(9578):1980–1982, 2007

    Lyle J Palmer. Uk biobank: bank on it.The Lancet, 369(9578):1980–1982, 2007. ISSN 0140-6736. doi: https://doi.org/10.1016/S0140-6736(07)60924-6. URL https://www. sciencedirect.com/science/article/pii/S0140673607609246

  51. [51]

    Casey, Tariq Cannonier, May I

    B.J. Casey, Tariq Cannonier, May I. Conley, Alexandra O. Cohen, Deanna M. Barch, Mary M. Heitzeg, Mary E. Soules, Theresa Teslovich, Danielle V . Dellarco, Hugh Garavan, Catherine A. Orr, Tor D. Wager, Marie T. Banich, Nicole K. Speer, Matthew T. Sutherland, Michael C. 21 Riedel, Anthony S. Dick, James M. Bjork, Kathleen M. Thomas, Bader Chaarani, Margie ...

  52. [52]

    Van Essen, Stephen M

    David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E.J. Behrens, Essa Yacoub, and Kamil Ugurbil. The wu-minn human connectome project: An overview.NeuroImage, 80:62–79, 2013. ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2013.05.041. URL https://www.sciencedirect.com/science/article/pii/S1053811913005351. Mapping the Connectome

  53. [53]

    Mamba: Linear-time sequence modeling with selective state spaces,

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces,

  54. [54]

    URLhttps://arxiv.org/abs/2312.00752

  55. [55]

    A comprehensive survey of mamba architectures for medical image analysis: Classifi- cation, segmentation, restoration and beyond.arXiv preprint arXiv:2410.02362, 2025

    Shubhi Bansal, Sreeharish A, Madhava Prasath J, Manikandan M, Sreekanth Madisetty, Mo- hammad Zia Ur Rehman, Chandravardhan Singh Raghaw, Gaurav Duggal, and Nagendra Kumar. A comprehensive survey of mamba architectures for medical image analysis: Classifi- cation, segmentation, restoration and beyond.arXiv preprint arXiv:2410.02362, 2025. URL https://arxi...

  56. [56]

    COBRE preprocessed with NIAK 0.12.4

    Pierre Bellec. COBRE preprocessed with NIAK 0.12.4. 1 2015. doi: 10.6084/m9.figshare. 1160600.v15. URL https://figshare.com/articles/dataset/COBRE_preprocessed_ with_NIAK_0_12_4/1160600

  57. [57]

    A model to advance the translational potential of neuroimaging in clinical neuroscience

    The ADHD-200 Consortium. A model to advance the translational potential of neuroimaging in clinical neuroscience. http://fcon_1000.projects.nitrc.org/indi/adhd200/, 2011. Accessed: 2025-08-18

  58. [58]

    Axiomatic attribution for deep networks,

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks,

  59. [59]

    URLhttps://arxiv.org/abs/1703.01365

  60. [60]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  61. [61]

    Resting state hyperconnectivity of the default mode network in schizophrenia and clinical high-risk state for psychosis.Cerebral Cortex, 33(13):8456–8464,

    Daisuke Sasabayashi, Toshiaki Takahashi, Yuki Takayanagi, Kiyotaka Nemoto, Masafumi Ueno, Akira Furuichi, Yasuhiro Higuchi, Yuki Mizukami, Hiroki Kobayashi, Yuki Yuasa, Kiyoto Noguchi, and Michio Suzuki. Resting state hyperconnectivity of the default mode network in schizophrenia and clinical high-risk state for psychosis.Cerebral Cortex, 33(13):8456–8464,

  62. [62]

    doi: 10.1093/cercor/bhad131

  63. [63]

    Woodward

    Hamed Karbasforoushan and Nathan D. Woodward. Resting-state networks in schizophrenia. Current Topics in Medicinal Chemistry, 12(21):2404–2414, 2012

  64. [64]

    Aberrant cerebello-thalamo-cortical functional and effective connectivity in first-episode schizophrenia with auditory verbal hallucinations

    Yarui Wei, Kangkang Xue, Meng Yang, Huan Wang, Jingli Chen, Shaoqiang Han, Xiaoxiao Wang, Hong Li, Yong Zhang, Xueqin Song, et al. Aberrant cerebello-thalamo-cortical functional and effective connectivity in first-episode schizophrenia with auditory verbal hallucinations. Schizophrenia bulletin, 48(6):1336–1343, 2022

  65. [65]

    Resting-state network dysconnectivity in adhd: A system-neuroscience- based meta-analysis.World Journal of Biological Psychiatry, 21(9):662–672, 2020

    Bernis Sutcubasi, Baris Metin, Mustafa Kerem Kurban, Zeynep Elcin Metin, Birsu Beser, and Edmund Sonuga-Barke. Resting-state network dysconnectivity in adhd: A system-neuroscience- based meta-analysis.World Journal of Biological Psychiatry, 21(9):662–672, 2020. doi: 10.1080/15622975.2020.1775889

  66. [66]

    Fair, Jonathan Posner, Bonnie J

    Damien A. Fair, Jonathan Posner, Bonnie J. Nagel, Deepti Bathula, Taciana G. Costa Dias, Kathryn L. Mills, Michael S. Blythe, Aishat Giwa, Colleen F. Schmitt, and Joel T. Nigg. Atypical default network connectivity in youth with attention-deficit/hyperactivity disorder. Biological Psychiatry, 68(12):1084–1091, 2010. doi: 10.1016/j.biopsych.2010.07.003. 22

  67. [67]

    A simple framework for contrastive learning of visual representations.ICML, 2020

    Ting Chen et al. A simple framework for contrastive learning of visual representations.ICML, 2020

  68. [68]

    Masked autoencoders are scalable vision learners.CVPR, 2022

    Kaiming He et al. Masked autoencoders are scalable vision learners.CVPR, 2022

  69. [69]

    Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024

    Randall Balestriero and Yann LeCun. Learning by reconstruction produces uninformative features for perception.arXiv preprint arXiv:2402.11337, 2024

  70. [70]

    Local masked reconstruction for efficient self-supervised learning on high-resolution images

    Jun Chen, Faizan Farooq Khan, Ming Hu, Ammar Sherif, Zongyuan Ge, Boyang Li, and Mohamed Elhoseiny. Local masked reconstruction for efficient self-supervised learning on high-resolution images. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 8046–8056. IEEE, 2025

  71. [71]

    How mask matters: Towards theoretical understandings of masked autoencoders.Advances in Neural Information Processing Systems, 35:27127–27139, 2022

    Qi Zhang, Yifei Wang, and Yisen Wang. How mask matters: Towards theoretical understandings of masked autoencoders.Advances in Neural Information Processing Systems, 35:27127–27139, 2022

  72. [72]

    Large-scale brain networks and psychopathology: a unifying triple network model.Trends in cognitive sciences, 15(10):483–506, 2011

    Vinod Menon. Large-scale brain networks and psychopathology: a unifying triple network model.Trends in cognitive sciences, 15(10):483–506, 2011

  73. [73]

    Brain networks in schizophrenia.Neuropsychology review, 24(1):32–48, 2014

    Martijn P Van Den Heuvel and Alex Fornito. Brain networks in schizophrenia.Neuropsychology review, 24(1):32–48, 2014

  74. [74]

    Efficiently modeling long sequences with structured state spaces.ICLR, 2022

    Albert Gu et al. Efficiently modeling long sequences with structured state spaces.ICLR, 2022

  75. [75]

    Mamba: Linear-time sequence modeling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst Conference on Language Modeling, 2024. URL https://openreview.net/forum? id=tEYskw1VY2

  76. [76]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  77. [77]

    Deviant spontaneous neural activity as a potential early-response predictor for therapeutic interventions in patients with schizophrenia

    Huan Jing, Chunguo Zhang, Haohao Yan, Xiaoling Li, Jiaquan Liang, Wenting Liang, Yangpan Ou, Weibin Wu, Huagui Guo, Wen Deng, et al. Deviant spontaneous neural activity as a potential early-response predictor for therapeutic interventions in patients with schizophrenia. Frontiers in Neuroscience, 17:1243168, 2023

  78. [78]

    Lateralized brain connectivity in auditory verbal hallucinations: fmri insights into the superior and middle temporal gyri.Frontiers in Human Neuroscience, 19:1650178, 2025

    Vyara Zaykova, Sevdalina Kandilarova, Rositsa Paunova, Ferihan Popova, and Drozdstoy Stoyanov. Lateralized brain connectivity in auditory verbal hallucinations: fmri insights into the superior and middle temporal gyri.Frontiers in Human Neuroscience, 19:1650178, 2025

  79. [79]

    Structural and functional alterations of the temporal lobe in schizophrenia: a literature review.Cureus, 12(10), 2020

    Arveen Kaur, Deepak M Basavanagowda, Bindu Rathod, Nupur Mishra, Sehrish Fuad, Sadia Nosher, Zaid A Alrashid, Devyani Mohan, and Stacey E Heindl. Structural and functional alterations of the temporal lobe in schizophrenia: a literature review.Cureus, 12(10), 2020

  80. [80]

    Occipital alpha connectivity dur- ing resting-state electroencephalography in patients with ultra-high risk for psychosis and schizophrenia.Frontiers in psychiatry, 10:553, 2019

    Tiantian Liu, Jian Zhang, Xiaonan Dong, Zhucheng Li, Xiaorui Shi, Yizhou Tong, Ruobing Yang, Jinglong Wu, Changming Wang, and Tianyi Yan. Occipital alpha connectivity dur- ing resting-state electroencephalography in patients with ultra-high risk for psychosis and schizophrenia.Frontiers in psychiatry, 10:553, 2019

Showing first 80 references.