BrainFIBRE: A Foundation Model via Information Decomposition for Brain Microstructure
Pith reviewed 2026-07-02 14:58 UTC · model grok-4.3
The pith
BrainFIBRE uses self-supervised partial information decomposition on three NODDI maps to build a foundation model that produces interpretable microstructure features and outperforms prior methods on age, sex, and brain-health predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BrainFIBRE establishes a versatile foundation for neuroimaging analysis at the microstructural level by pretraining a Mixture-of-Experts architecture on NODDI maps with SPID and CCC, which perturbs modality alignment to disentangle unique, synergistic, and redundant information in a fully self-supervised manner, delivering state-of-the-art performance and interpretable representations on multiple prediction tasks across cohorts.
What carries the argument
Self-supervised Partial Information Decomposition (SPID) extended with Counterfactual Candidate Construction (CCC) that drops or swaps modalities to create contrastive signals for a Mixture-of-Experts model to separate unique, synergistic, and redundant information across the three NODDI maps.
If this is right
- The representations support accurate prediction of age, sex, cerebrovascular and neurodegenerative markers, and cognition on both Caucasian and Asian cohorts.
- The model produces neurobiologically interpretable features that expose task-specific and cohort-specific patterns of information interaction among the maps.
- A single pretrained model serves as a reusable base for multiple downstream neuroimaging tasks at the microstructural level.
- The approach operates entirely without labels during pretraining yet transfers to supervised clinical prediction.
Where Pith is reading between the lines
- Similar decomposition techniques could be tested on other sets of aligned imaging modalities such as structural and functional MRI.
- The interpretable interaction patterns might be examined for links to specific disease progression trajectories in longitudinal data.
- If the disentanglement generalizes, the same pretraining recipe could be applied to derive foundation models from additional biophysical parameter maps beyond NODDI.
Load-bearing premise
Standard representation learning cannot disentangle the unique information in each NODDI map from shared and synergistic interactions, and the proposed SPID plus CCC method can achieve this separation without any downstream labels.
What would settle it
A head-to-head comparison on held-out cohorts where BrainFIBRE shows no accuracy gain over standard multimodal encoders on the listed prediction tasks or where the learned representations fail to display the claimed task- and cohort-specific interaction patterns upon inspection.
Figures
read the original abstract
Diffusion MRI probes brain microstructure with particular sensitivity to early cerebrovascular and neurodegenerative changes. Neurite Orientation Dispersion and Density Imaging (NODDI) decomposes the diffusion signal into three biophysically interpretable maps: neurite density index (NDI), orientation dispersion index (ODI), and free water fraction (FWF), capturing neurite packing, fiber coherence, and extracellular fluid. These 3D maps offer a rich substrate for transferable microstructural representations, yet integrating them is challenging: standard representation learning struggles to disentangle the unique information in each map from their shared and synergistic interactions. We present BrainFIBRE, the first foundation model for brain microstructure, pretrained on NODDI-derived maps from 55,592 UK Biobank participants. We propose Self-supervised Partial Information Decomposition (SPID), which extends PID-guided multimodal learning to the self-supervised regime for the first time. A novel Counterfactual Candidate Construction (CCC) paradigm perturbs inter-modality alignment through modality dropping and swapping, providing the contrastive signal for a Mixture-of-Experts architecture to disentangle unique, synergistic, and redundant information without any downstream label. On both Caucasian and Asian cohorts, BrainFIBRE achieves state-of-the-art performance across diverse tasks predicting age, sex, cerebrovascular and neurodegenerative markers, and cognition, while yielding neurobiologically interpretable representations that reveal task- and cohort-specific interaction patterns. BrainFIBRE establishes a versatile foundation for neuroimaging analysis at the microstructural level.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BrainFIBRE as the first foundation model for brain microstructure, pretrained self-supervised on NODDI-derived NDI, ODI, and FWF maps from 55,592 UK Biobank participants. It proposes Self-supervised Partial Information Decomposition (SPID) together with a Counterfactual Candidate Construction (CCC) paradigm that uses modality dropping and swapping inside a Mixture-of-Experts architecture to disentangle unique, synergistic, and redundant information without downstream labels. The model is reported to achieve state-of-the-art performance on age, sex, cerebrovascular/neurodegenerative marker, and cognition prediction tasks on both Caucasian and Asian cohorts while producing neurobiologically interpretable interaction patterns.
Significance. If the reported performance and interpretability results are supported by rigorous validation, the work would constitute a meaningful advance by establishing the first large-scale foundation model operating directly on microstructural NODDI maps and by extending partial information decomposition into a fully self-supervised multimodal regime. The scale of the pretraining cohort and the cross-cohort evaluation are positive features.
major comments (2)
- [Methods (SPID and CCC description)] The central claim that SPID + CCC achieves label-free disentanglement of unique/synergistic/redundant information rests on the construction of the contrastive signal via modality dropping/swapping; without an explicit statement of the loss terms and a quantitative verification that the learned representations actually recover the PID decomposition (e.g., via controlled synthetic experiments or information-theoretic diagnostics), it is impossible to confirm that the performance gains are attributable to the claimed disentanglement rather than to standard contrastive pretraining.
- [Experiments and Results] The SOTA claims on downstream tasks are load-bearing for the paper's contribution, yet the abstract provides no information on the number or nature of baselines, statistical testing, or ablation controls that isolate the contribution of SPID/CCC versus a standard MoE or simple concatenation; this absence prevents assessment of whether the reported gains are robust.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and will revise the manuscript accordingly to improve clarity on the methodological details and experimental validation.
read point-by-point responses
-
Referee: [Methods (SPID and CCC description)] The central claim that SPID + CCC achieves label-free disentanglement of unique/synergistic/redundant information rests on the construction of the contrastive signal via modality dropping/swapping; without an explicit statement of the loss terms and a quantitative verification that the learned representations actually recover the PID decomposition (e.g., via controlled synthetic experiments or information-theoretic diagnostics), it is impossible to confirm that the performance gains are attributable to the claimed disentanglement rather than to standard contrastive pretraining.
Authors: We agree that an explicit statement of the loss terms would strengthen the presentation. The Methods section outlines the SPID objective as a multi-term contrastive loss based on the CCC perturbations, but we will add a new subsection with the full mathematical formulation of the loss functions for the unique, redundant, and synergistic components. For quantitative verification, the supplementary materials include synthetic experiments demonstrating recovery of PID terms; we will add a reference to these in the main Methods section and include a brief summary of the diagnostics. revision: yes
-
Referee: [Experiments and Results] The SOTA claims on downstream tasks are load-bearing for the paper's contribution, yet the abstract provides no information on the number or nature of baselines, statistical testing, or ablation controls that isolate the contribution of SPID/CCC versus a standard MoE or simple concatenation; this absence prevents assessment of whether the reported gains are robust.
Authors: We acknowledge that the abstract does not detail the experimental controls. The full manuscript's Experiments section describes the baselines (including standard MoE, concatenation, and other self-supervised methods), reports statistical significance testing, and includes ablation studies. To address this, we will revise the abstract to briefly mention the evaluation setup, number of baselines, and presence of ablations and statistical tests. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper introduces SPID and CCC as novel self-supervised mechanisms for information decomposition on NODDI maps, with claims of disentanglement and downstream SOTA performance. No equations, training details, or self-citations are visible in the provided material that would reduce any prediction or result to a fitted quantity or prior author work by construction. The central premise is presented as an extension of PID to self-supervised multimodal learning without load-bearing reductions to inputs or renamings of known results. The derivation chain is therefore self-contained on its own terms.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Neuroimage166, 400–424 (2018)
Alfaro-Almagro, F., Jenkinson, M., Bangerter, N.K., Andersson, J.L., Griffanti, L., Douaud, G., Sotiropoulos, S.N., Jbabdi, S., Hernandez-Fernandez, M., Vallee, E., et al.: Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank. Neuroimage166, 400–424 (2018)
2018
-
[2]
Entropy16(4), 2161–2183 (2014)
Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., Ay, N.: Quantifying unique infor- mation. Entropy16(4), 2161–2183 (2014)
2014
-
[3]
Neuroimage185, 335–348 (2019)
Bookheimer, S.Y., Salat, D.H., Terpstra, M., Ances, B.M., Barch, D.M., Buckner, R.L., Burgess, G.C., Curtiss, S.W., Diaz-Santos, M., Elam, J.S., et al.: The lifespan human connectome project in aging: an overview. Neuroimage185, 335–348 (2019)
2019
-
[4]
In: The Twelfth International Conference on Learning Representations
Caro, J.O., de Oliveira Fonseca, A.H., Rizvi, S.A., Rosati, M., Averill, C., Cross, J.L., Mittal, P., Zappala, E., Dhodapkar, R.M., Abdallah, C., et al.: Brainlm: A foundation model for brain activity recordings. In: The Twelfth International Conference on Learning Representations
-
[5]
Neurobiology of aging71, 161–170 (2018)
Chad, J.A., Pasternak, O., Salat, D.H., Chen, J.J.: Re-examining age-related dif- ferences in white matter microstructure with free-water corrected diffusion tensor imaging. Neurobiology of aging71, 161–170 (2018)
2018
-
[6]
In: International conference on machine learning
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020)
2020
-
[7]
Advances in Neural Information Processing Systems37, 86048–86073 (2024)
Dong, Z., Li, R., Wu, Y., Nguyen, T.T., Chong, J., Ji, F., Tong, N., Chen, C., Zhou, J.H.: Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking. Advances in Neural Information Processing Systems37, 86048–86073 (2024)
2024
-
[8]
In: The Thirty-ninth Annual Conference on Neural Information Processing Systems
Dong, Z., Ruilin, L., Chong, J.S.X., Dehestani, N., Teng, Y., Lin, Y., Li, Z., Zhang, Y., Xie, Y., Ooi, L.Q.R., et al.: Brain harmony: A multimodal foundation model unifying morphology and function into 1d tokens. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems
-
[9]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[10]
Alzheimer’s & Dementia14(6), 764–774 (2018)
Duering, M., Finsterwalder, S., Baykara, E., Tuladhar, A.M., Gesierich, B., Konieczny, M.J., Malik, R., Franzmeier, N., Ewers, M., Jouvent, E., et al.: Free water determines diffusion alterations and clinical status in cerebral small vessel disease. Alzheimer’s & Dementia14(6), 764–774 (2018)
2018
-
[11]
Neuroimage80, 105– 124 (2013)
Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B., Ander- sson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., et al.: The minimal preprocessing pipelines for the human connectome project. Neuroimage80, 105– 124 (2013)
2013
-
[12]
Neurobiology of Aging149, 34–43 (2025)
Greenman, D., Bennett, I.J.: Aging of gray matter microstructure: A brain-wide characterization of age group differences using noddi. Neurobiology of Aging149, 34–43 (2025)
2025
-
[13]
arXiv preprint arXiv:2103.13262 (2021)
He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., Tang, J.: Fastmoe: A fast mixture-of- expert training system. arXiv preprint arXiv:2103.13262 (2021)
-
[14]
Neurology101(2), e151–e163 (2023) BrainFIBRE 17
Ji, F., Chai, Y.L., Liu, S., Kan, C.N., Ong, M., Richards, A.M., Tan, B.Y., Venke- tasubramanian, N., Pasternak, O., Chen, C., et al.: Associations of blood cardio- vascular biomarkers with brain free water and its relationship to cognitive decline: a diffusion-mri study. Neurology101(2), e151–e163 (2023) BrainFIBRE 17
2023
-
[15]
Alzheimer’s Research & Therapy9(1), 63 (2017)
Ji, F., Pasternak, O., Liu, S., Loke, Y.M., Choo, B.L., Hilal, S., Xu, X., Ikram, M.K., Venketasubramanian, N., Chen, C.L.H., et al.: Distinct white matter mi- crostructural abnormalities and extracellular water increases relate to cogni- tive impairment in alzheimer’s disease with and without cerebrovascular disease. Alzheimer’s Research & Therapy9(1), 63 (2017)
2017
-
[16]
Journal of Cerebral Blood Flow & Metabolism44(7), 1218–1230 (2024)
Ji, F., Wei, J.L.K., Leng, S., Zhong, L., Tan, R.S., Gao, F., Ng, K.K., Leong, R.L., Pasternak, O., Chee, M.W., et al.: Heart-brain mapping: Cardiac atrial function is associated with distinct cerebral regions with high free water in older adults. Journal of Cerebral Blood Flow & Metabolism44(7), 1218–1230 (2024)
2024
-
[17]
In: The Thirteenth International Conference on Learning Representations
Jiang, W., Wang, Y., Lu, B.l., Li, D.: Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. In: The Thirteenth International Conference on Learning Representations
-
[18]
In: The Twelfth International Conference on Learning Representations
Jiang, W., Zhao, L., Lu, B.l.: Large brain model for learning generic representations with tremendous eeg data in bci. In: The Twelfth International Conference on Learning Representations
-
[19]
Neuroimage73, 239–254 (2013)
Jones, D.K., Knösche, T.R., Turner, R.: White matter integrity, fiber count, and other fallacies: the do’s and don’ts of diffusion mri. Neuroimage73, 239–254 (2013)
2013
-
[20]
arXiv preprint arXiv:2506.18314 (2025)
Khajehnejad, M., Habibollahi, F., Razi, A.: Brainsymphony: A transformer- driven fusion of fmri time series and structural connectivity. arXiv preprint arXiv:2506.18314 (2025)
-
[21]
Journal of Mag- netic Resonance Imaging: An Official Journal of the International Society for Mag- netic Resonance in Medicine13(4), 534–546 (2001)
Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N., Chabriat, H.: Diffusion tensor imaging: concepts and applications. Journal of Mag- netic Resonance Imaging: An Official Journal of the International Society for Mag- netic Resonance in Medicine13(4), 534–546 (2001)
2001
-
[22]
arXiv preprint arXiv:2409.05929 (2024)
Lei, H., Cheng, X., Qin, Q., Wang, D., Fan, K., Huang, H., Gu, Q., Wu, Y., Jiang, Z., Chen, Y., et al.: M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture. arXiv preprint arXiv:2409.05929 (2024)
-
[23]
Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
Neurobiology of aging43, 79–88 (2016)
Merluzzi, A.P., Dean III, D.C., Adluru, N., Suryawanshi, G.S., Okonkwo, O.C., Oh, J.M., Hermann, B.P., Sager, M.A., Asthana, S., Zhang, H., et al.: Age-dependent differences in brain tissue microstructure assessed with neurite orientation disper- sion and density imaging. Neurobiology of aging43, 79–88 (2016)
2016
-
[26]
Nature neuroscience19(11), 1523–1536 (2016)
Miller, K.L., Alfaro-Almagro, F., Bangerter, N.K., Thomas, D.L., Yacoub, E., Xu, J., Bartsch, A.J., Jbabdi, S., Sotiropoulos, S.N., Andersson, J.L., et al.: Multimodal population brain imaging in the uk biobank prospective epidemiological study. Nature neuroscience19(11), 1523–1536 (2016)
2016
-
[27]
Representation Learning with Contrastive Predictive Coding
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Biological psychiatry75(3), 248–256 (2014)
Peters, B.D., Ikuta, T., DeRosse, P., John, M., Burdick, K.E., Gruner, P., Prender- gast, D.M., Szeszko, P.R., Malhotra, A.K.: Age-related differences in white matter tract microstructure are associated with cognitive performance from childhood to adulthood. Biological psychiatry75(3), 248–256 (2014)
2014
-
[29]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Rui, S., Chen, L., Tang, Z., Wang, L., Liu, M., Zhang, S., Wang, X.: Multi-modal vision pre-training for medical image analysis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5164–5174 (2025) 18 Z. Dong et al
2025
-
[30]
In: International Conference on Learning Representations (2017)
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In: International Conference on Learning Representations (2017)
2017
-
[31]
Advances in neural infor- mation processing systems37, 42048–42070 (2024)
Shen, L., Chen, G., Shao, R., Guan, W., Nie, L.: Mome: Mixture of multimodal experts for generalist multimodal large language models. Advances in neural infor- mation processing systems37, 42048–42070 (2024)
2024
-
[32]
PLoS medicine12(3), e1001779 (2015)
Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al.: Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine12(3), e1001779 (2015)
2015
-
[33]
Human Brain Mapping47(5), e70513 (2026)
Wang, T., Wan, Z., Cao, S., Yu, J., He, Y., Xie, Y., Zhang, F., Wu, Y.: Deep learning empowered microstructure codebook: New paradigm for multi-parameter tissue characterization estimation. Human Brain Mapping47(5), e70513 (2026)
2026
-
[34]
Nonnegative Decomposition of Multivariate Information
Williams,P.L.,Beer,R.D.:Nonnegativedecompositionofmultivariateinformation. arXiv preprint arXiv:1004.2515 (2010)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[35]
arXiv preprint arXiv:2505.19190 (2025)
Xin, J., Yun, S., Peng, J., Choi, I., Ballard, J.L., Chen, T., Long, Q.: I2moe: Interpretable multimodal interaction-aware mixture-of-experts. arXiv preprint arXiv:2505.19190 (2025)
-
[36]
The Journal of Prevention of Alzheimer’s Disease9(1), 40–48 (2022)
Xu, X., Chew, K., Wong, Z., Phua, A., Chong, E., Teo, C., Sathe, N., Chooi, Y., Chia, W., Henry, C., et al.: The singapore geriatric intervention study to re- duce cognitive decline and physical frailty (singer): study design and protocol. The Journal of Prevention of Alzheimer’s Disease9(1), 40–48 (2022)
2022
-
[37]
In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yu, H., Qi, Z., Jang, L.K., Salakhutdinov, R., Morency, L.P., Liang, P.P.: Mmoe: Enhancing multimodal models with mixtures of multimodal interaction experts. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 10006–10030 (2024)
2024
-
[38]
Yu, X., Przybelski, S.A., Reid, R.I., Lesnick, T.G., Raghavan, S., Graff-Radford, J., Lowe, V.J., Kantarci, K., Knopman, D.S., Petersen, R.C., et al.: Noddi in gray matterisasensitivemarkerofagingandearlyadchanges.Alzheimer’s&Dementia: Diagnosis, Assessment & Disease Monitoring16(3), e12627 (2024)
2024
-
[39]
Zhang, H., Schneider, T., Wheeler-Kingshott, C.A., Alexander, D.C.: Noddi: prac- tical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage61(4), 1000–1016 (2012) BrainFIBRE 19 A Theoretical Grounding of SPID as a Self-Supervised PID Objective For notational brevity, we writeO, N, F for ODI, NDI, and FWF, respectively....
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.