BrainFIBRE: A Foundation Model via Information Decomposition for Brain Microstructure

Jianxiong Zhou; Ji Fang; Juan Helen Zhou; Kwun Kei Ng; Yi Lin; Zijian Dong

arxiv: 2607.00573 · v1 · pith:652J4G2Nnew · submitted 2026-07-01 · 💻 cs.CV

BrainFIBRE: A Foundation Model via Information Decomposition for Brain Microstructure

Zijian Dong , Yi Lin , Ji Fang , Jianxiong Zhou , Kwun Kei Ng , Juan Helen Zhou This is my paper

Pith reviewed 2026-07-02 14:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords brain microstructureNODDIfoundation modelpartial information decompositionself-supervised learningdiffusion MRIneuroimagingmultimodal representation learning

0 comments

The pith

BrainFIBRE uses self-supervised partial information decomposition on three NODDI maps to build a foundation model that produces interpretable microstructure features and outperforms prior methods on age, sex, and brain-health predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BrainFIBRE as the first foundation model pretrained on NODDI-derived maps from over 55,000 UK Biobank participants. It claims that a new Self-supervised Partial Information Decomposition method combined with Counterfactual Candidate Construction can separate unique information in each of the three maps from their shared and synergistic parts without any labels. This separation is said to yield representations that are both more accurate for downstream prediction tasks and more neurobiologically interpretable than standard representation learning. The model is evaluated on separate Caucasian and Asian cohorts for tasks including age and sex prediction, cerebrovascular and neurodegenerative markers, and cognition. If the approach holds, it would supply a reusable microstructural foundation that reveals task- and cohort-specific interaction patterns.

Core claim

BrainFIBRE establishes a versatile foundation for neuroimaging analysis at the microstructural level by pretraining a Mixture-of-Experts architecture on NODDI maps with SPID and CCC, which perturbs modality alignment to disentangle unique, synergistic, and redundant information in a fully self-supervised manner, delivering state-of-the-art performance and interpretable representations on multiple prediction tasks across cohorts.

What carries the argument

Self-supervised Partial Information Decomposition (SPID) extended with Counterfactual Candidate Construction (CCC) that drops or swaps modalities to create contrastive signals for a Mixture-of-Experts model to separate unique, synergistic, and redundant information across the three NODDI maps.

If this is right

The representations support accurate prediction of age, sex, cerebrovascular and neurodegenerative markers, and cognition on both Caucasian and Asian cohorts.
The model produces neurobiologically interpretable features that expose task-specific and cohort-specific patterns of information interaction among the maps.
A single pretrained model serves as a reusable base for multiple downstream neuroimaging tasks at the microstructural level.
The approach operates entirely without labels during pretraining yet transfers to supervised clinical prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar decomposition techniques could be tested on other sets of aligned imaging modalities such as structural and functional MRI.
The interpretable interaction patterns might be examined for links to specific disease progression trajectories in longitudinal data.
If the disentanglement generalizes, the same pretraining recipe could be applied to derive foundation models from additional biophysical parameter maps beyond NODDI.

Load-bearing premise

Standard representation learning cannot disentangle the unique information in each NODDI map from shared and synergistic interactions, and the proposed SPID plus CCC method can achieve this separation without any downstream labels.

What would settle it

A head-to-head comparison on held-out cohorts where BrainFIBRE shows no accuracy gain over standard multimodal encoders on the listed prediction tasks or where the learned representations fail to display the claimed task- and cohort-specific interaction patterns upon inspection.

Figures

Figures reproduced from arXiv: 2607.00573 by Jianxiong Zhou, Ji Fang, Juan Helen Zhou, Kwun Kei Ng, Yi Lin, Zijian Dong.

**Figure 2.** Figure 2: Overview of BrainFIBRE. (A) By the design of Self-supervised Partial Information Decomposition (SPID), BrainFIBRE extends classical PID to isolate unique, synergistic, and redundant information for self-supervised pretraining. Three NODDI maps are processed by unimodal encoders into embeddings h_m , which are routed to five interaction experts: three for uniqueness and two for synergy and redundancy. A Re… view at source ↗

**Figure 3.** Figure 3: Distribution of expert weights across the test set [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation study on age (UKB) and WMH volume (Asian) prediction [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Spatial attention patterns captured by different interaction experts [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Empirical validation of SPID disentanglement via z-score normalized CCA. In this normalized space, the absolute magnitude of the z-score represents the significance of deviation from the average baseline alignment. Specifically, a large positive value indicates a prominent Correlation, implying stronger feature alignment. Conversely, a large negative value denotes an exceptionally weak Correlation, indica… view at source ↗

**Figure 7.** Figure 7: White matter tract attention rollout. BrainFIBRE learns differentiated modality–tract specialization, whereas ViT-LF produces more homogeneous responses. that qred extracts the shared intersection of information that is simultaneously present in and extractable from every single modality. Fig.7 further shows that BrainFIBRE learns disease-relevant modality–tract specialization for WMH prediction. Unlike th… view at source ↗

**Figure 8.** Figure 8: Distribution of expert weights across the test set [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Spatial attention patterns captured by different interaction experts. F.2 Salient Brain Regions Identified by Different Experts In Section 4.5 of the main manuscript, we provided two examples of salient brain regions identified by different experts for the UKB and Asian age prediction tasks. Here, we present additional visualizations for the Flanker and Cardsort prediction tasks using the HCP-Aging dataset… view at source ↗

read the original abstract

Diffusion MRI probes brain microstructure with particular sensitivity to early cerebrovascular and neurodegenerative changes. Neurite Orientation Dispersion and Density Imaging (NODDI) decomposes the diffusion signal into three biophysically interpretable maps: neurite density index (NDI), orientation dispersion index (ODI), and free water fraction (FWF), capturing neurite packing, fiber coherence, and extracellular fluid. These 3D maps offer a rich substrate for transferable microstructural representations, yet integrating them is challenging: standard representation learning struggles to disentangle the unique information in each map from their shared and synergistic interactions. We present BrainFIBRE, the first foundation model for brain microstructure, pretrained on NODDI-derived maps from 55,592 UK Biobank participants. We propose Self-supervised Partial Information Decomposition (SPID), which extends PID-guided multimodal learning to the self-supervised regime for the first time. A novel Counterfactual Candidate Construction (CCC) paradigm perturbs inter-modality alignment through modality dropping and swapping, providing the contrastive signal for a Mixture-of-Experts architecture to disentangle unique, synergistic, and redundant information without any downstream label. On both Caucasian and Asian cohorts, BrainFIBRE achieves state-of-the-art performance across diverse tasks predicting age, sex, cerebrovascular and neurodegenerative markers, and cognition, while yielding neurobiologically interpretable representations that reveal task- and cohort-specific interaction patterns. BrainFIBRE establishes a versatile foundation for neuroimaging analysis at the microstructural level.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BrainFIBRE applies partial information decomposition in a self-supervised way to NODDI maps from a large UK Biobank sample and reports strong downstream results on age, sex, and clinical markers.

read the letter

The core contribution is SPID, which adapts partial information decomposition to self-supervised multimodal learning on the three NODDI maps, paired with a modality-drop-and-swap scheme they call CCC inside a mixture-of-experts backbone. The model is pretrained on 55k subjects and then evaluated on both Caucasian and Asian cohorts for a range of prediction tasks. That scale and the cross-cohort testing are the parts that stand out.

The work does a clean job of framing the problem: standard contrastive or reconstruction objectives do not explicitly separate unique, redundant, and synergistic information across the NDI, ODI, and FWF maps. The authors supply a concrete mechanism to generate the necessary contrastive signal without labels, and the abstract indicates the resulting representations are both predictive and interpretable in neurobiological terms. If the full methods and ablations back this up, it is a useful incremental step for microstructural imaging.

The main soft spot is that the abstract alone does not show the actual decomposition validation or the ablation controls that would confirm the PID terms are doing the claimed work rather than the MoE architecture simply learning better features. The SOTA numbers are reported without visible baseline details or statistical tests in the summary material, so those claims need the full tables to be convincing. No circularity or obvious fitting artifact jumps out from the description.

This paper is aimed at neuroimaging groups that already work with NODDI or similar multi-map diffusion data and want a reusable encoder. It is worth a serious referee because the dataset size, the explicit information-decomposition framing, and the cross-population testing give it enough substance to merit detailed review even if revisions are needed on the validation side.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces BrainFIBRE as the first foundation model for brain microstructure, pretrained self-supervised on NODDI-derived NDI, ODI, and FWF maps from 55,592 UK Biobank participants. It proposes Self-supervised Partial Information Decomposition (SPID) together with a Counterfactual Candidate Construction (CCC) paradigm that uses modality dropping and swapping inside a Mixture-of-Experts architecture to disentangle unique, synergistic, and redundant information without downstream labels. The model is reported to achieve state-of-the-art performance on age, sex, cerebrovascular/neurodegenerative marker, and cognition prediction tasks on both Caucasian and Asian cohorts while producing neurobiologically interpretable interaction patterns.

Significance. If the reported performance and interpretability results are supported by rigorous validation, the work would constitute a meaningful advance by establishing the first large-scale foundation model operating directly on microstructural NODDI maps and by extending partial information decomposition into a fully self-supervised multimodal regime. The scale of the pretraining cohort and the cross-cohort evaluation are positive features.

major comments (2)

[Methods (SPID and CCC description)] The central claim that SPID + CCC achieves label-free disentanglement of unique/synergistic/redundant information rests on the construction of the contrastive signal via modality dropping/swapping; without an explicit statement of the loss terms and a quantitative verification that the learned representations actually recover the PID decomposition (e.g., via controlled synthetic experiments or information-theoretic diagnostics), it is impossible to confirm that the performance gains are attributable to the claimed disentanglement rather than to standard contrastive pretraining.
[Experiments and Results] The SOTA claims on downstream tasks are load-bearing for the paper's contribution, yet the abstract provides no information on the number or nature of baselines, statistical testing, or ablation controls that isolate the contribution of SPID/CCC versus a standard MoE or simple concatenation; this absence prevents assessment of whether the reported gains are robust.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and will revise the manuscript accordingly to improve clarity on the methodological details and experimental validation.

read point-by-point responses

Referee: [Methods (SPID and CCC description)] The central claim that SPID + CCC achieves label-free disentanglement of unique/synergistic/redundant information rests on the construction of the contrastive signal via modality dropping/swapping; without an explicit statement of the loss terms and a quantitative verification that the learned representations actually recover the PID decomposition (e.g., via controlled synthetic experiments or information-theoretic diagnostics), it is impossible to confirm that the performance gains are attributable to the claimed disentanglement rather than to standard contrastive pretraining.

Authors: We agree that an explicit statement of the loss terms would strengthen the presentation. The Methods section outlines the SPID objective as a multi-term contrastive loss based on the CCC perturbations, but we will add a new subsection with the full mathematical formulation of the loss functions for the unique, redundant, and synergistic components. For quantitative verification, the supplementary materials include synthetic experiments demonstrating recovery of PID terms; we will add a reference to these in the main Methods section and include a brief summary of the diagnostics. revision: yes
Referee: [Experiments and Results] The SOTA claims on downstream tasks are load-bearing for the paper's contribution, yet the abstract provides no information on the number or nature of baselines, statistical testing, or ablation controls that isolate the contribution of SPID/CCC versus a standard MoE or simple concatenation; this absence prevents assessment of whether the reported gains are robust.

Authors: We acknowledge that the abstract does not detail the experimental controls. The full manuscript's Experiments section describes the baselines (including standard MoE, concatenation, and other self-supervised methods), reports statistical significance testing, and includes ablation studies. To address this, we will revise the abstract to briefly mention the evaluation setup, number of baselines, and presence of ablations and statistical tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces SPID and CCC as novel self-supervised mechanisms for information decomposition on NODDI maps, with claims of disentanglement and downstream SOTA performance. No equations, training details, or self-citations are visible in the provided material that would reduce any prediction or result to a fitted quantity or prior author work by construction. The central premise is presented as an extension of PID to self-supervised multimodal learning without load-bearing reductions to inputs or renamings of known results. The derivation chain is therefore self-contained on its own terms.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that NODDI maps form a rich transferable substrate and that the proposed contrastive signal from modality dropping/swapping isolates the desired information components.

pith-pipeline@v0.9.1-grok · 5809 in / 1181 out tokens · 22208 ms · 2026-07-02T14:58:41.314539+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 9 canonical work pages · 5 internal anchors

[1]

Neuroimage166, 400–424 (2018)

Alfaro-Almagro, F., Jenkinson, M., Bangerter, N.K., Andersson, J.L., Griffanti, L., Douaud, G., Sotiropoulos, S.N., Jbabdi, S., Hernandez-Fernandez, M., Vallee, E., et al.: Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank. Neuroimage166, 400–424 (2018)

2018
[2]

Entropy16(4), 2161–2183 (2014)

Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., Ay, N.: Quantifying unique infor- mation. Entropy16(4), 2161–2183 (2014)

2014
[3]

Neuroimage185, 335–348 (2019)

Bookheimer, S.Y., Salat, D.H., Terpstra, M., Ances, B.M., Barch, D.M., Buckner, R.L., Burgess, G.C., Curtiss, S.W., Diaz-Santos, M., Elam, J.S., et al.: The lifespan human connectome project in aging: an overview. Neuroimage185, 335–348 (2019)

2019
[4]

In: The Twelfth International Conference on Learning Representations

Caro, J.O., de Oliveira Fonseca, A.H., Rizvi, S.A., Rosati, M., Averill, C., Cross, J.L., Mittal, P., Zappala, E., Dhodapkar, R.M., Abdallah, C., et al.: Brainlm: A foundation model for brain activity recordings. In: The Twelfth International Conference on Learning Representations
[5]

Neurobiology of aging71, 161–170 (2018)

Chad, J.A., Pasternak, O., Salat, D.H., Chen, J.J.: Re-examining age-related dif- ferences in white matter microstructure with free-water corrected diffusion tensor imaging. Neurobiology of aging71, 161–170 (2018)

2018
[6]

In: International conference on machine learning

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020)

2020
[7]

Advances in Neural Information Processing Systems37, 86048–86073 (2024)

Dong, Z., Li, R., Wu, Y., Nguyen, T.T., Chong, J., Ji, F., Tong, N., Chen, C., Zhou, J.H.: Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking. Advances in Neural Information Processing Systems37, 86048–86073 (2024)

2024
[8]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems

Dong, Z., Ruilin, L., Chong, J.S.X., Dehestani, N., Teng, Y., Lin, Y., Li, Z., Zhang, Y., Xie, Y., Ooi, L.Q.R., et al.: Brain harmony: A multimodal foundation model unifying morphology and function into 1d tokens. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems
[9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[10]

Alzheimer’s & Dementia14(6), 764–774 (2018)

Duering, M., Finsterwalder, S., Baykara, E., Tuladhar, A.M., Gesierich, B., Konieczny, M.J., Malik, R., Franzmeier, N., Ewers, M., Jouvent, E., et al.: Free water determines diffusion alterations and clinical status in cerebral small vessel disease. Alzheimer’s & Dementia14(6), 764–774 (2018)

2018
[11]

Neuroimage80, 105– 124 (2013)

Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B., Ander- sson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., et al.: The minimal preprocessing pipelines for the human connectome project. Neuroimage80, 105– 124 (2013)

2013
[12]

Neurobiology of Aging149, 34–43 (2025)

Greenman, D., Bennett, I.J.: Aging of gray matter microstructure: A brain-wide characterization of age group differences using noddi. Neurobiology of Aging149, 34–43 (2025)

2025
[13]

arXiv preprint arXiv:2103.13262 (2021)

He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., Tang, J.: Fastmoe: A fast mixture-of- expert training system. arXiv preprint arXiv:2103.13262 (2021)

work page arXiv 2021
[14]

Neurology101(2), e151–e163 (2023) BrainFIBRE 17

Ji, F., Chai, Y.L., Liu, S., Kan, C.N., Ong, M., Richards, A.M., Tan, B.Y., Venke- tasubramanian, N., Pasternak, O., Chen, C., et al.: Associations of blood cardio- vascular biomarkers with brain free water and its relationship to cognitive decline: a diffusion-mri study. Neurology101(2), e151–e163 (2023) BrainFIBRE 17

2023
[15]

Alzheimer’s Research & Therapy9(1), 63 (2017)

Ji, F., Pasternak, O., Liu, S., Loke, Y.M., Choo, B.L., Hilal, S., Xu, X., Ikram, M.K., Venketasubramanian, N., Chen, C.L.H., et al.: Distinct white matter mi- crostructural abnormalities and extracellular water increases relate to cogni- tive impairment in alzheimer’s disease with and without cerebrovascular disease. Alzheimer’s Research & Therapy9(1), 63 (2017)

2017
[16]

Journal of Cerebral Blood Flow & Metabolism44(7), 1218–1230 (2024)

Ji, F., Wei, J.L.K., Leng, S., Zhong, L., Tan, R.S., Gao, F., Ng, K.K., Leong, R.L., Pasternak, O., Chee, M.W., et al.: Heart-brain mapping: Cardiac atrial function is associated with distinct cerebral regions with high free water in older adults. Journal of Cerebral Blood Flow & Metabolism44(7), 1218–1230 (2024)

2024
[17]

In: The Thirteenth International Conference on Learning Representations

Jiang, W., Wang, Y., Lu, B.l., Li, D.: Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. In: The Thirteenth International Conference on Learning Representations
[18]

In: The Twelfth International Conference on Learning Representations

Jiang, W., Zhao, L., Lu, B.l.: Large brain model for learning generic representations with tremendous eeg data in bci. In: The Twelfth International Conference on Learning Representations
[19]

Neuroimage73, 239–254 (2013)

Jones, D.K., Knösche, T.R., Turner, R.: White matter integrity, fiber count, and other fallacies: the do’s and don’ts of diffusion mri. Neuroimage73, 239–254 (2013)

2013
[20]

arXiv preprint arXiv:2506.18314 (2025)

Khajehnejad, M., Habibollahi, F., Razi, A.: Brainsymphony: A transformer- driven fusion of fmri time series and structural connectivity. arXiv preprint arXiv:2506.18314 (2025)

work page arXiv 2025
[21]

Journal of Mag- netic Resonance Imaging: An Official Journal of the International Society for Mag- netic Resonance in Medicine13(4), 534–546 (2001)

Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N., Chabriat, H.: Diffusion tensor imaging: concepts and applications. Journal of Mag- netic Resonance Imaging: An Official Journal of the International Society for Mag- netic Resonance in Medicine13(4), 534–546 (2001)

2001
[22]

arXiv preprint arXiv:2409.05929 (2024)

Lei, H., Cheng, X., Qin, Q., Wang, D., Fan, K., Huang, H., Gu, Q., Wu, Y., Jiang, Z., Chen, Y., et al.: M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture. arXiv preprint arXiv:2409.05929 (2024)

work page arXiv 2024
[23]

DeepSeek-V3 Technical Report

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Neurobiology of aging43, 79–88 (2016)

Merluzzi, A.P., Dean III, D.C., Adluru, N., Suryawanshi, G.S., Okonkwo, O.C., Oh, J.M., Hermann, B.P., Sager, M.A., Asthana, S., Zhang, H., et al.: Age-dependent differences in brain tissue microstructure assessed with neurite orientation disper- sion and density imaging. Neurobiology of aging43, 79–88 (2016)

2016
[26]

Nature neuroscience19(11), 1523–1536 (2016)

Miller, K.L., Alfaro-Almagro, F., Bangerter, N.K., Thomas, D.L., Yacoub, E., Xu, J., Bartsch, A.J., Jbabdi, S., Sotiropoulos, S.N., Andersson, J.L., et al.: Multimodal population brain imaging in the uk biobank prospective epidemiological study. Nature neuroscience19(11), 1523–1536 (2016)

2016
[27]

Representation Learning with Contrastive Predictive Coding

Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Biological psychiatry75(3), 248–256 (2014)

Peters, B.D., Ikuta, T., DeRosse, P., John, M., Burdick, K.E., Gruner, P., Prender- gast, D.M., Szeszko, P.R., Malhotra, A.K.: Age-related differences in white matter tract microstructure are associated with cognitive performance from childhood to adulthood. Biological psychiatry75(3), 248–256 (2014)

2014
[29]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Rui, S., Chen, L., Tang, Z., Wang, L., Liu, M., Zhang, S., Wang, X.: Multi-modal vision pre-training for medical image analysis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5164–5174 (2025) 18 Z. Dong et al

2025
[30]

In: International Conference on Learning Representations (2017)

Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In: International Conference on Learning Representations (2017)

2017
[31]

Advances in neural infor- mation processing systems37, 42048–42070 (2024)

Shen, L., Chen, G., Shao, R., Guan, W., Nie, L.: Mome: Mixture of multimodal experts for generalist multimodal large language models. Advances in neural infor- mation processing systems37, 42048–42070 (2024)

2024
[32]

PLoS medicine12(3), e1001779 (2015)

Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al.: Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine12(3), e1001779 (2015)

2015
[33]

Human Brain Mapping47(5), e70513 (2026)

Wang, T., Wan, Z., Cao, S., Yu, J., He, Y., Xie, Y., Zhang, F., Wu, Y.: Deep learning empowered microstructure codebook: New paradigm for multi-parameter tissue characterization estimation. Human Brain Mapping47(5), e70513 (2026)

2026
[34]

Nonnegative Decomposition of Multivariate Information

Williams,P.L.,Beer,R.D.:Nonnegativedecompositionofmultivariateinformation. arXiv preprint arXiv:1004.2515 (2010)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[35]

arXiv preprint arXiv:2505.19190 (2025)

Xin, J., Yun, S., Peng, J., Choi, I., Ballard, J.L., Chen, T., Long, Q.: I2moe: Interpretable multimodal interaction-aware mixture-of-experts. arXiv preprint arXiv:2505.19190 (2025)

work page arXiv 2025
[36]

The Journal of Prevention of Alzheimer’s Disease9(1), 40–48 (2022)

Xu, X., Chew, K., Wong, Z., Phua, A., Chong, E., Teo, C., Sathe, N., Chooi, Y., Chia, W., Henry, C., et al.: The singapore geriatric intervention study to re- duce cognitive decline and physical frailty (singer): study design and protocol. The Journal of Prevention of Alzheimer’s Disease9(1), 40–48 (2022)

2022
[37]

In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Yu, H., Qi, Z., Jang, L.K., Salakhutdinov, R., Morency, L.P., Liang, P.P.: Mmoe: Enhancing multimodal models with mixtures of multimodal interaction experts. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 10006–10030 (2024)

2024
[38]

Yu, X., Przybelski, S.A., Reid, R.I., Lesnick, T.G., Raghavan, S., Graff-Radford, J., Lowe, V.J., Kantarci, K., Knopman, D.S., Petersen, R.C., et al.: Noddi in gray matterisasensitivemarkerofagingandearlyadchanges.Alzheimer’s&Dementia: Diagnosis, Assessment & Disease Monitoring16(3), e12627 (2024)

2024
[39]

Zhang, H., Schneider, T., Wheeler-Kingshott, C.A., Alexander, D.C.: Noddi: prac- tical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage61(4), 1000–1016 (2012) BrainFIBRE 19 A Theoretical Grounding of SPID as a Self-Supervised PID Objective For notational brevity, we writeO, N, F for ODI, NDI, and FWF, respectively....

2012

[1] [1]

Neuroimage166, 400–424 (2018)

Alfaro-Almagro, F., Jenkinson, M., Bangerter, N.K., Andersson, J.L., Griffanti, L., Douaud, G., Sotiropoulos, S.N., Jbabdi, S., Hernandez-Fernandez, M., Vallee, E., et al.: Image processing and quality control for the first 10,000 brain imaging datasets from uk biobank. Neuroimage166, 400–424 (2018)

2018

[2] [2]

Entropy16(4), 2161–2183 (2014)

Bertschinger, N., Rauh, J., Olbrich, E., Jost, J., Ay, N.: Quantifying unique infor- mation. Entropy16(4), 2161–2183 (2014)

2014

[3] [3]

Neuroimage185, 335–348 (2019)

Bookheimer, S.Y., Salat, D.H., Terpstra, M., Ances, B.M., Barch, D.M., Buckner, R.L., Burgess, G.C., Curtiss, S.W., Diaz-Santos, M., Elam, J.S., et al.: The lifespan human connectome project in aging: an overview. Neuroimage185, 335–348 (2019)

2019

[4] [4]

In: The Twelfth International Conference on Learning Representations

Caro, J.O., de Oliveira Fonseca, A.H., Rizvi, S.A., Rosati, M., Averill, C., Cross, J.L., Mittal, P., Zappala, E., Dhodapkar, R.M., Abdallah, C., et al.: Brainlm: A foundation model for brain activity recordings. In: The Twelfth International Conference on Learning Representations

[5] [5]

Neurobiology of aging71, 161–170 (2018)

Chad, J.A., Pasternak, O., Salat, D.H., Chen, J.J.: Re-examining age-related dif- ferences in white matter microstructure with free-water corrected diffusion tensor imaging. Neurobiology of aging71, 161–170 (2018)

2018

[6] [6]

In: International conference on machine learning

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020)

2020

[7] [7]

Advances in Neural Information Processing Systems37, 86048–86073 (2024)

Dong, Z., Li, R., Wu, Y., Nguyen, T.T., Chong, J., Ji, F., Tong, N., Chen, C., Zhou, J.H.: Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking. Advances in Neural Information Processing Systems37, 86048–86073 (2024)

2024

[8] [8]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems

Dong, Z., Ruilin, L., Chong, J.S.X., Dehestani, N., Teng, Y., Lin, Y., Li, Z., Zhang, Y., Xie, Y., Ooi, L.Q.R., et al.: Brain harmony: A multimodal foundation model unifying morphology and function into 1d tokens. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems

[9] [9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[10] [10]

Alzheimer’s & Dementia14(6), 764–774 (2018)

Duering, M., Finsterwalder, S., Baykara, E., Tuladhar, A.M., Gesierich, B., Konieczny, M.J., Malik, R., Franzmeier, N., Ewers, M., Jouvent, E., et al.: Free water determines diffusion alterations and clinical status in cerebral small vessel disease. Alzheimer’s & Dementia14(6), 764–774 (2018)

2018

[11] [11]

Neuroimage80, 105– 124 (2013)

Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B., Ander- sson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., et al.: The minimal preprocessing pipelines for the human connectome project. Neuroimage80, 105– 124 (2013)

2013

[12] [12]

Neurobiology of Aging149, 34–43 (2025)

Greenman, D., Bennett, I.J.: Aging of gray matter microstructure: A brain-wide characterization of age group differences using noddi. Neurobiology of Aging149, 34–43 (2025)

2025

[13] [13]

arXiv preprint arXiv:2103.13262 (2021)

He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., Tang, J.: Fastmoe: A fast mixture-of- expert training system. arXiv preprint arXiv:2103.13262 (2021)

work page arXiv 2021

[14] [14]

Neurology101(2), e151–e163 (2023) BrainFIBRE 17

Ji, F., Chai, Y.L., Liu, S., Kan, C.N., Ong, M., Richards, A.M., Tan, B.Y., Venke- tasubramanian, N., Pasternak, O., Chen, C., et al.: Associations of blood cardio- vascular biomarkers with brain free water and its relationship to cognitive decline: a diffusion-mri study. Neurology101(2), e151–e163 (2023) BrainFIBRE 17

2023

[15] [15]

Alzheimer’s Research & Therapy9(1), 63 (2017)

Ji, F., Pasternak, O., Liu, S., Loke, Y.M., Choo, B.L., Hilal, S., Xu, X., Ikram, M.K., Venketasubramanian, N., Chen, C.L.H., et al.: Distinct white matter mi- crostructural abnormalities and extracellular water increases relate to cogni- tive impairment in alzheimer’s disease with and without cerebrovascular disease. Alzheimer’s Research & Therapy9(1), 63 (2017)

2017

[16] [16]

Journal of Cerebral Blood Flow & Metabolism44(7), 1218–1230 (2024)

Ji, F., Wei, J.L.K., Leng, S., Zhong, L., Tan, R.S., Gao, F., Ng, K.K., Leong, R.L., Pasternak, O., Chee, M.W., et al.: Heart-brain mapping: Cardiac atrial function is associated with distinct cerebral regions with high free water in older adults. Journal of Cerebral Blood Flow & Metabolism44(7), 1218–1230 (2024)

2024

[17] [17]

In: The Thirteenth International Conference on Learning Representations

Jiang, W., Wang, Y., Lu, B.l., Li, D.: Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. In: The Thirteenth International Conference on Learning Representations

[18] [18]

In: The Twelfth International Conference on Learning Representations

Jiang, W., Zhao, L., Lu, B.l.: Large brain model for learning generic representations with tremendous eeg data in bci. In: The Twelfth International Conference on Learning Representations

[19] [19]

Neuroimage73, 239–254 (2013)

Jones, D.K., Knösche, T.R., Turner, R.: White matter integrity, fiber count, and other fallacies: the do’s and don’ts of diffusion mri. Neuroimage73, 239–254 (2013)

2013

[20] [20]

arXiv preprint arXiv:2506.18314 (2025)

Khajehnejad, M., Habibollahi, F., Razi, A.: Brainsymphony: A transformer- driven fusion of fmri time series and structural connectivity. arXiv preprint arXiv:2506.18314 (2025)

work page arXiv 2025

[21] [21]

Journal of Mag- netic Resonance Imaging: An Official Journal of the International Society for Mag- netic Resonance in Medicine13(4), 534–546 (2001)

Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N., Chabriat, H.: Diffusion tensor imaging: concepts and applications. Journal of Mag- netic Resonance Imaging: An Official Journal of the International Society for Mag- netic Resonance in Medicine13(4), 534–546 (2001)

2001

[22] [22]

arXiv preprint arXiv:2409.05929 (2024)

Lei, H., Cheng, X., Qin, Q., Wang, D., Fan, K., Huang, H., Gu, Q., Wu, Y., Jiang, Z., Chen, Y., et al.: M3-jepa: Multimodal alignment via multi-gate moe based on the joint-embedding predictive architecture. arXiv preprint arXiv:2409.05929 (2024)

work page arXiv 2024

[23] [23]

DeepSeek-V3 Technical Report

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

Neurobiology of aging43, 79–88 (2016)

Merluzzi, A.P., Dean III, D.C., Adluru, N., Suryawanshi, G.S., Okonkwo, O.C., Oh, J.M., Hermann, B.P., Sager, M.A., Asthana, S., Zhang, H., et al.: Age-dependent differences in brain tissue microstructure assessed with neurite orientation disper- sion and density imaging. Neurobiology of aging43, 79–88 (2016)

2016

[26] [26]

Nature neuroscience19(11), 1523–1536 (2016)

Miller, K.L., Alfaro-Almagro, F., Bangerter, N.K., Thomas, D.L., Yacoub, E., Xu, J., Bartsch, A.J., Jbabdi, S., Sotiropoulos, S.N., Andersson, J.L., et al.: Multimodal population brain imaging in the uk biobank prospective epidemiological study. Nature neuroscience19(11), 1523–1536 (2016)

2016

[27] [27]

Representation Learning with Contrastive Predictive Coding

Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Biological psychiatry75(3), 248–256 (2014)

Peters, B.D., Ikuta, T., DeRosse, P., John, M., Burdick, K.E., Gruner, P., Prender- gast, D.M., Szeszko, P.R., Malhotra, A.K.: Age-related differences in white matter tract microstructure are associated with cognitive performance from childhood to adulthood. Biological psychiatry75(3), 248–256 (2014)

2014

[29] [29]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Rui, S., Chen, L., Tang, Z., Wang, L., Liu, M., Zhang, S., Wang, X.: Multi-modal vision pre-training for medical image analysis. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5164–5174 (2025) 18 Z. Dong et al

2025

[30] [30]

In: International Conference on Learning Representations (2017)

Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., Dean, J.: Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In: International Conference on Learning Representations (2017)

2017

[31] [31]

Advances in neural infor- mation processing systems37, 42048–42070 (2024)

Shen, L., Chen, G., Shao, R., Guan, W., Nie, L.: Mome: Mixture of multimodal experts for generalist multimodal large language models. Advances in neural infor- mation processing systems37, 42048–42070 (2024)

2024

[32] [32]

PLoS medicine12(3), e1001779 (2015)

Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al.: Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine12(3), e1001779 (2015)

2015

[33] [33]

Human Brain Mapping47(5), e70513 (2026)

Wang, T., Wan, Z., Cao, S., Yu, J., He, Y., Xie, Y., Zhang, F., Wu, Y.: Deep learning empowered microstructure codebook: New paradigm for multi-parameter tissue characterization estimation. Human Brain Mapping47(5), e70513 (2026)

2026

[34] [34]

Nonnegative Decomposition of Multivariate Information

Williams,P.L.,Beer,R.D.:Nonnegativedecompositionofmultivariateinformation. arXiv preprint arXiv:1004.2515 (2010)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[35] [35]

arXiv preprint arXiv:2505.19190 (2025)

Xin, J., Yun, S., Peng, J., Choi, I., Ballard, J.L., Chen, T., Long, Q.: I2moe: Interpretable multimodal interaction-aware mixture-of-experts. arXiv preprint arXiv:2505.19190 (2025)

work page arXiv 2025

[36] [36]

The Journal of Prevention of Alzheimer’s Disease9(1), 40–48 (2022)

Xu, X., Chew, K., Wong, Z., Phua, A., Chong, E., Teo, C., Sathe, N., Chooi, Y., Chia, W., Henry, C., et al.: The singapore geriatric intervention study to re- duce cognitive decline and physical frailty (singer): study design and protocol. The Journal of Prevention of Alzheimer’s Disease9(1), 40–48 (2022)

2022

[37] [37]

In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Yu, H., Qi, Z., Jang, L.K., Salakhutdinov, R., Morency, L.P., Liang, P.P.: Mmoe: Enhancing multimodal models with mixtures of multimodal interaction experts. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 10006–10030 (2024)

2024

[38] [38]

Yu, X., Przybelski, S.A., Reid, R.I., Lesnick, T.G., Raghavan, S., Graff-Radford, J., Lowe, V.J., Kantarci, K., Knopman, D.S., Petersen, R.C., et al.: Noddi in gray matterisasensitivemarkerofagingandearlyadchanges.Alzheimer’s&Dementia: Diagnosis, Assessment & Disease Monitoring16(3), e12627 (2024)

2024

[39] [39]

Zhang, H., Schneider, T., Wheeler-Kingshott, C.A., Alexander, D.C.: Noddi: prac- tical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage61(4), 1000–1016 (2012) BrainFIBRE 19 A Theoretical Grounding of SPID as a Self-Supervised PID Objective For notational brevity, we writeO, N, F for ODI, NDI, and FWF, respectively....

2012