arxiv: 2605.14048 · v1 · submitted 2026-05-13 · 💻 cs.AI · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

Leo Milecki , Qingyu Hu , Bahram Jafrasteh , Mert R. Sabuncu , Qingyu Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:23 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords functional connectivitymasked autoencodersself-supervised learningbrain networksbilinear factorizationrepresentation learningcross-cohort generalizationneuroimaging

0 comments

The pith

Partitioning brain functional connectivity matrices into network-specific patches and embedding them via bilinear factorization produces more stable, transferable representations for predicting behavior and psychopathology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard masked autoencoders for functional connectivity data can be improved by tokenizing matrices according to large-scale brain networks rather than treating them as uniform regions or graphs. This network-aware approach partitions connectivity into intra- and inter-network blocks using an anatomical parcellation, then applies a bilinear factorization to embed the resulting heterogeneous patches while keeping network identity intact. The method reduces parameter scaling from quadratic to linear in the number of networks. Experiments across three developmental cohorts demonstrate that these representations generalize better in cross-cohort settings than both image-style and graph-based self-supervised baselines. Ablation results indicate that both the network partitioning and the bilinear embedding step are necessary for the observed gains.

Core claim

NERVE redefines FC tokenization by dividing matrices into patches defined by network pairs and embeds those patches through structured bilinear factorization. This preserves distinct functional roles of each network block, avoids quadratic parameter growth, and yields representations that remain stable and transferable when tested on unseen cohorts for behavior and psychopathology prediction tasks.

What carries the argument

Structured bilinear factorization that embeds heterogeneous FC patches defined by network pairs while preserving network identity and achieving linear parameter scaling.

If this is right

The network-aware representations transfer more reliably across independent developmental cohorts than structurally agnostic alternatives.
Bilinear factorization reduces parameter count while maintaining the ability to distinguish distinct functional roles of network pairs.
Anatomically grounded parcellation is required for the performance advantage; removing it degrades cross-cohort stability.
The same tokenization scheme improves prediction of both behavioral traits and psychopathology scores in held-out data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar bilinear tokenization could be adapted to other graph-structured neuroimaging data such as structural connectivity or dynamic FC.
The linear scaling property may allow the method to handle finer-grained parcellations without prohibitive compute cost.
Improved cross-cohort stability suggests the learned embeddings capture more invariant features of brain organization rather than cohort-specific noise.
The approach could be tested for its effect on causal modeling tasks that link connectivity patterns to specific behavioral outcomes.

Load-bearing premise

That an anatomically grounded parcellation into intra- and inter-network blocks aligns with the brain's intrinsic modular organization and that this alignment benefits downstream representation learning.

What would settle it

A controlled experiment in which FC matrices are randomly partitioned into patches of matching sizes but without using network labels, then trained with the same bilinear embedding and MAE objective, and evaluated on the same cross-cohort prediction tasks.

Figures

Figures reproduced from arXiv: 2605.14048 by Bahram Jafrasteh, Leo Milecki, Mert R. Sabuncu, Qingyu Hu, Qingyu Zhao.

**Figure 1.** Figure 1: Overview of NERVE. A. The functional connectivity (FC) matrix is partitioned into patches defined by pairs of functional brain networks. B. Network-aware Bilinear Tokenization. Each functional network is assigned learnable networkspecific weights at initialization, and patch tokens are computed through structured bilinear interactions between network weights during forward. C. MAE Framework. We apply a s… view at source ↗

read the original abstract

Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat FC as structurally homogeneous elements and overlook the large-scale network brain organization. We introduce NERVE (Network-Aware Representations of Brain Functional Connectivity via Bilinear Tokenization), a self-supervised learning framework that redefines FC tokenization by partitioning FC matrices into patches of intra- and inter-network connectivity blocks. Unlike image-based MAE, where fixed-size patches share a common tokenizer, FC patches defined by network pairs are heterogeneous in size and correspond to distinct functional roles. To resolve this problem, NERVE embeds FC patches through a novel structured bilinear factorization. This formulation preserves network identity and reduces parameter complexity from quadratic to linear scaling in the number of networks. We evaluate NERVE across three large-scale developmental cohorts (ABCD, PNC, and CCNP) for behavior and psychopathology prediction. Compared to structurally agnostic MAE variants and graph-based self-supervised baselines, the proposed network-aware formulation yields more stable and transferable representations, particularly in cross-cohort evaluation. Ablation studies confirm that the proposed bilinear network embedding and anatomically grounded parcellation are critical for performance. These findings highlight the importance of incorporating domain-specific structural priors into self-supervised learning for functional connectomics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The bilinear tokenization for network-pair FC patches is the actual new piece, and it looks like it delivers more stable cross-cohort results than plain MAE or graph baselines.

read the letter

The main thing to know is that NERVE uses a structured bilinear factorization to embed FC patches defined by intra- and inter-network blocks. This keeps network identity while dropping parameter scaling from quadratic to linear, which directly tackles the size and role differences that standard tokenizers ignore in brain connectivity data. The paper reports better transfer on behavior and psychopathology prediction across ABCD, PNC, and CCNP, with ablations showing both the bilinear step and the anatomical parcellation contribute to the gains. That multi-cohort setup is a plus for a field where site effects usually dominate. The construction is straightforward and the inductive bias is tested downstream rather than asserted. The soft spots are the usual ones at abstract level: exact baseline implementations, full statistical details, and error bars are not visible here, so the magnitude of the improvement is still plausible but not fully pinned down. The claim that the parcellation aligns with intrinsic modules is treated as an empirical question, which is fair, but it would be stronger with some post-hoc check on what the embeddings actually capture. This is for people working on self-supervised methods for neuroimaging who need a way to inject known network structure without extra complexity. A reader focused on functional connectomics or domain-aware tokenization would get concrete value from the design. Send it to peer review; the core idea is clean enough and the evaluation hits the relevant tasks.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces NERVE, a self-supervised framework for brain functional connectivity (FC) representation learning. It partitions FC matrices into heterogeneous intra- and inter-network patches using an anatomically grounded parcellation and embeds them via structured bilinear factorization to preserve network identity while reducing parameter scaling from quadratic to linear. The method is evaluated on three developmental cohorts (ABCD, PNC, CCNP) for behavior and psychopathology prediction, claiming more stable and transferable representations than structurally agnostic MAE variants and graph-based baselines, with ablations confirming the importance of the bilinear embedding and parcellation.

Significance. If the cross-cohort gains hold under rigorous verification, the work would demonstrate a practical way to inject domain-specific network priors into self-supervised tokenization for connectomics, addressing a gap in existing MAE and graph approaches. The linear scaling benefit and emphasis on transferability are notable strengths for multi-site neuroimaging applications.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation sections: the central claim of superior stability and transferability in cross-cohort settings is supported by ablations but lacks explicit reporting of baseline implementations, hyperparameter search details, exact sample sizes per cohort, and statistical tests (e.g., confidence intervals or p-values on performance differences), leaving the magnitude of gains unverified at the level needed for the claim.
[Methods] Methods (bilinear factorization description): the reduction to linear parameter scaling is presented as a direct consequence of the structured factorization, but the manuscript should include the explicit equations showing how network-specific factors are shared across patches to confirm it does not implicitly reintroduce quadratic terms via the parcellation atlas.

minor comments (2)

[Abstract] Abstract: sample sizes and key demographics for the three cohorts are not stated, which would help readers assess the scale and generalizability of the reported results.
[Methods] Notation: define the precise dimensions and initialization of the bilinear factors (e.g., network embedding matrices) to clarify how heterogeneity in patch sizes is handled without additional padding or masking steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below and have revised the manuscript to incorporate the requested details and clarifications.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation sections: the central claim of superior stability and transferability in cross-cohort settings is supported by ablations but lacks explicit reporting of baseline implementations, hyperparameter search details, exact sample sizes per cohort, and statistical tests (e.g., confidence intervals or p-values on performance differences), leaving the magnitude of gains unverified at the level needed for the claim.

Authors: We agree that additional details are required to substantiate the claims. In the revised manuscript, we will expand the Evaluation section to report: exact sample sizes for ABCD, PNC, and CCNP cohorts; full specifications of baseline implementations (including any adaptations from original papers); hyperparameter search ranges and selection procedures; and statistical tests with p-values and confidence intervals on performance differences. These additions will allow verification of the magnitude of gains in stability and transferability. The abstract will be updated to reference the enhanced evaluation protocol. revision: yes
Referee: [Methods] Methods (bilinear factorization description): the reduction to linear parameter scaling is presented as a direct consequence of the structured factorization, but the manuscript should include the explicit equations showing how network-specific factors are shared across patches to confirm it does not implicitly reintroduce quadratic terms via the parcellation atlas.

Authors: We thank the referee for this observation. To rigorously demonstrate the linear scaling, we will add explicit equations in the Methods section. These will define the structured bilinear factorization where network-specific factors (e.g., left and right factors U_n and V_n for each network n) are shared across all intra- and inter-network patches involving that network. The total parameter count will be shown to scale as O(K * N) where N is the number of networks and K is the embedding dimension, confirming no quadratic terms arise from the atlas-based parcellation. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces NERVE as a new self-supervised framework with two explicit design choices: anatomically grounded network partitioning of FC matrices into intra-/inter-network patches, and a structured bilinear factorization to embed those heterogeneous patches while preserving network identity. These are presented as inductive biases whose value is measured via downstream empirical evaluation on ABCD, PNC, and CCNP cohorts, with ablations confirming their contribution. No equation reduces a claimed prediction to a fitted parameter by construction, no load-bearing premise rests solely on self-citation, and the central claim (improved cross-cohort stability) is not asserted a priori but reported as an observed outcome. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that large-scale brain networks provide a meaningful partitioning of FC matrices and on the design choice of bilinear factorization to handle heterogeneous patch sizes; no new physical entities are postulated.

free parameters (1)

network parcellation atlas
Choice of anatomical or functional atlas that defines the network blocks; treated as an input rather than learned.

axioms (1)

domain assumption Brain functional connectivity exhibits modular organization at the scale of large-scale networks
Invoked to justify partitioning into intra- and inter-network blocks rather than uniform patches.

pith-pipeline@v0.9.0 · 5579 in / 1228 out tokens · 31733 ms · 2026-05-15T05:23:48.152636+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we propose a bilinear network-aware tokenization... W_{l,m}=U_l ⊙ U_m ... replaces quadratic growth in patch-specific parameters with a linear scaling in the number of networks
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

structured bilinear interactions between network weights... preserves network identity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Psychological Bulletin85(6), 1275–1301 (1978)

Achenbach, T.M., Edelbrock, C.S.: The classification of child psychopathology: A review and analysis of empirical efforts. Psychological Bulletin85(6), 1275–1301 (1978)

work page 1978
[2]

Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S., et al.: Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci.14(5), 365–376 (2013)

work page 2013
[3]

In: ICLR

Caro, J.O., Fonseca, A.H.d.O., Averill, C., Rizvi, S.A., Rosati, M., Cross, J.L., et al.: BrainLM: A foundation model for brain activity recordings. In: ICLR. Curran Associates, Inc. (2024)

work page 2024
[4]

In: NeurIPS

Dong, Z., Li, R., Wu, Y., Nguyen, T., Su, J., Chong, et al.: Brain-JEPA: Brain Dy- namics Foundation Model with Gradient Positioning and Spatiotemporal Masking. In: NeurIPS. vol. 37, pp. 86048–86073. Curran Associates, Inc. (2024)

work page 2024
[5]

Frontiers in Neuroscience13(2019)

Farahani, F.V., Karwowski, W., Lighthall, N.R.: Application of graph theory for identifying connectivity patterns in human brain networks: A systematic review. Frontiers in Neuroscience13(2019)

work page 2019
[6]

Medical Image Analysis107(Pt B), 103861 (2026)

Gao, J., Ge, B., Qiang, N., Zhao, S.: 3D masked autoencoder with spatiotemporal transformer for modeling of 4D fMRI data. Medical Image Analysis107(Pt B), 103861 (2026)

work page 2026
[7]

Devel- opmental Cognitive Neuroscience32, 16–22 (2018)

Garavan, H., Bartsch, H., Conway, K., Decastro, A., Goldstein, R.Z., Heeringa, S., et al.: Recruiting the ABCD sample: design considerations and procedures. Devel- opmental Cognitive Neuroscience32, 16–22 (2018)

work page 2018
[8]

In: CVPR

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked Autoencoders Are Scalable Vision Learners. In: CVPR. IEEE, Inc. (2021)

work page 2021
[9]

NeuroImage206(2020)

He, T., Kong, R., Holmes, A.J., Nguyen, M., Sabuncu, M.R., et al.: Deep neural networks and kernel regression achieve comparable accuracies for functional connec- tivity prediction of behavior and demographics. NeuroImage206(2020)

work page 2020
[10]

IEEE, Inc

He, T., Kong, R., Holmes, A.J., Sabuncu, M.R., Eickhoff, S.B., Bzdok, et al.: Is deep learning better than kernel regression for functional connectivity prediction of fluid intelligence? In: PRNI. IEEE, Inc. (2018)

work page 2018
[11]

Assessment31(2), 502–517 (2024)

Hoffmann, M.S., Moore, T.M., Axelrud, L.K., Tottenham, N., Pan, P.M., Miguel, et al.: An Evaluation of Item Harmonization Strategies Between Assessment Tools of Psychopathology in Children and Adolescents. Assessment31(2), 502–517 (2024)

work page 2024
[12]

In: SIGKDD

Hou, Z., Liu, X., Cen, Y., Dong, Y., Yang, H., Wang, C., et al.: GraphMAE: Self- Supervised Masked Graph Autoencoders. In: SIGKDD. pp. 594–604. Association for Computing Machinery (2022) 10 L. Milecki et al

work page 2022
[13]

NeuroImage80, 360–378 (2013)

Hutchison, R.M., Womelsdorf, T., Allen, E.A., Bandettini, P.A., Calhoun, V.D., Corbetta, et al.: Dynamic functional connectivity: Promise, issues, and interpreta- tions. NeuroImage80, 360–378 (2013)

work page 2013
[14]

In: NeurIPS

Kan, X., Dai, W., Cui, H., Zhang, Z., Guo, Y., Yang, C.: Brain Network Trans- former. In: NeurIPS. vol. 35. Curran Associates, Inc. (2022)

work page 2022
[15]

NeuroImage146, 1038–1049 (2017)

Kawahara, J., Brown, C.J., Miller, S.P., Booth, B.G., Chau, V., Grunau, R.E., et al.: BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage146, 1038–1049 (2017)

work page 2017
[16]

The Indian Journal of Statistics30(2), 167–180 (1968)

Khatri, C.G., Radhakrishna Rao, C.: Solutions to Some Functional Equations and Their Applications to Characterization of Probability Distributions. The Indian Journal of Statistics30(2), 167–180 (1968)

work page 1968
[17]

Medical Image Analysis 74, 102233 (2021)

Li, X., Zhou, Y., Dvornek, N., Zhang, M., Gao, S., Zhuang, et al.: BrainGNN: In- terpretable Brain Graph Neural Network for fMRI Analysis. Medical Image Analysis 74, 102233 (2021)

work page 2021
[18]

NeuroImage262(3), 119531 (2022)

Litwińczuk, M.C., Muhlert, N., Cloutman, L., Trujillo-Barreto, N., Woollams, A.: Combination of structural and functional connectivity explains unique variation in specific domains of cognitive function. NeuroImage262(3), 119531 (2022)

work page 2022
[19]

Developmental Cognitive Neuroscience52, 101020 (2021)

Liu, S., Wang, Y.S., Zhang, Q., Zhou, Q., Cao, L.Z., Jiang, C., et al.: Chinese Color Nest Project : An accelerated longitudinal brain-mind cohort. Developmental Cognitive Neuroscience52, 101020 (2021)

work page 2021
[20]

IEEE transac- tions on neural networks and learning systems36(6), 10707–10720 (2025)

Ma, H., Xu, Y., Tian, L.: RS-MAE: Region-State Masked Autoencoder for Neu- ropsychiatric Disorder Classifications Based on Resting-State fMRI. IEEE transac- tions on neural networks and learning systems36(6), 10707–10720 (2025)

work page 2025
[21]

NeuroImage263, 119636 (2022)

Ooi, L.Q.R., Chen, J., Zhang, S., Kong, R., Tam, A., Li, J., et al.: Comparison of individualized behavioral predictions across anatomical, diffusion and functional connectivity MRI. NeuroImage263, 119636 (2022)

work page 2022
[22]

IEEE Transactions on Medical Imaging42(2), 391–402 (2023)

Peng, L., Wang, N., Xu, J., Zhu, X., Li, X.: GATE: Graph CCA for Temporal Self-Supervised Learning for Label-Efficient fMRI Analysis. IEEE Transactions on Medical Imaging42(2), 391–402 (2023)

work page 2023
[23]

NeuroImage211, 116604 (2020)

Pervaiz, U., Vidaurre, D., Woolrich, M.W., Smith, S.M.: Optimising network mod- elling methods for fMRI. NeuroImage211, 116604 (2020)

work page 2020
[24]

Nature Methods22(3), 473–476 (2025)

Ren,J.,An,N.,Lin,C.,Zhang,Y.,Sun,Z.,Zhang,etal.:DeepPrep:anaccelerated, scalable and robust pipeline for neuroimaging preprocessing empowered by deep learning. Nature Methods22(3), 473–476 (2025)

work page 2025
[25]

Neu- roImage86, 544–553 (2014)

Satterthwaite, T.D., Elliott, M.A., Ruparel, K., Loughead, J., Prabhakaran, K., Calkins, et al.: Neuroimaging of the Philadelphia Neurodevelopmental Cohort. Neu- roImage86, 544–553 (2014)

work page 2014
[26]

Cerebral cortex28(9), 3095–3114 (2018)

Schaefer, A., Kong, R., Gordon, E.M., Laumann, T.O., Zuo, X.N., Holmes, et al.: Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cerebral cortex28(9), 3095–3114 (2018)

work page 2018
[27]

Nature Communications11(1), 1–15 (2020)

Schulz, M.A., Yeo, B.T., Vogelstein, J.T., Mourao-Miranada, J., Kather, J.N., Ko- rding, K., et al.: Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nature Communications11(1), 1–15 (2020)

work page 2020
[28]

Nature Mental Health1(5), 304–315 (2023)

Tiego, J., Martin, E.A., DeYoung, C.G., Hagan, K., Cooper, S.E., Pasion, et al.: Precision behavioral phenotyping as a strategy for uncovering the biological corre- lates of psychopathology. Nature Mental Health1(5), 304–315 (2023)

work page 2023
[29]

Brain Research1822, 148634 (2024) Network-Aware Bilinear Tokenization for Brain Functional Connectivity 11

Wei, W., Zhang, K., Chang, J., Zhang, S., Ma, L., Wang, H., et al.: Analyzing 20 years of Resting-State fMRI Research: Trends and collaborative networks revealed. Brain Research1822, 148634 (2024) Network-Aware Bilinear Tokenization for Brain Functional Connectivity 11

work page 2024
[30]

IEEE Journal of Biomedical and Health Informatics27(8), 4154–4165 (2023)

Wen, G., Cao, P., Liu, L., Yang, J., Zhang, X., Wang, F., et al.: Graph Self- Supervised Learning With Application to Brain Networks Analysis. IEEE Journal of Biomedical and Health Informatics27(8), 4154–4165 (2023)

work page 2023
[31]

Woo, C.W., Chang, L.J., Lindquist, M.A., Wager, T.D.: Building better biomark- ers:brainmodelsintranslationalneuroimaging.NatureNeuroscience20(3),365–377 (2017)

work page 2017
[32]

IEEE Transactions on Medical Imaging43(11), 4004–4016 (2024)

Yang,Y.,Ye,C.,Su,G.,Zhang,Z.,Chang,Z.,Chen,H.,etal.:BrainMass:Advanc- ing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning. IEEE Transactions on Medical Imaging43(11), 4004–4016 (2024)

work page 2024
[33]

Journal of Neurophysiology106(3), 1125–1165 (2011)

Yeo, B.T., Krienen, F.M., Sepulcre, J., Sabuncu, M.R., Lashkari, D., Hollinshead, M., et al.: The organization of the human cerebral cortex estimated by intrinsic functional connectivity. Journal of Neurophysiology106(3), 1125–1165 (2011)

work page 2011