pith. sign in

arxiv: 2605.20883 · v1 · pith:PMNNKPJBnew · submitted 2026-05-20 · 💻 cs.LG

Learning fMRI activations dictionaries across individual geometries via optimal transport

Pith reviewed 2026-05-21 06:35 UTC · model grok-4.3

classification 💻 cs.LG
keywords dictionary learningfMRIoptimal transportGromov-Wassersteinbrain geometryamortized optimizationgraph representations
0
0 comments X

The pith

A dictionary learning method for fMRI uses optimal transport to handle differences in individual brain geometries without projecting to a template.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a way to learn dictionaries of brain activity patterns from fMRI scans that respects the unique shape of each person's brain. Standard approaches map every brain onto one average template, which discards personal geometric details. Instead, this method compares brain graphs using the Fused Gromov-Wasserstein distance and approximates the required transport plans with a neural network to keep computation feasible. Dictionary atoms are made to vary with the balance between matching features and matching structure. Experiments on the HCP dataset indicate that the resulting representations capture varying degrees of geometric difference while retaining key information for tasks like classification.

Core claim

By combining the Fused Gromov-Wasserstein distance with amortized neural approximation of transport plans, dictionary atoms that depend on the FGW trade-off parameter can be learned for graphs with different geometries and features; numerical experiments on the HCP dataset show that the approach captures different levels of geometric variability in the data and provides representations that preserve essential information.

What carries the argument

The Fused Gromov-Wasserstein distance, which compares graphs with different geometries and features by balancing feature alignment and structural consistency, together with amortized optimization that trains a neural network to predict approximate optimal transport plans.

If this is right

  • Dictionary atoms depend on the trade-off parameter that controls the balance between feature alignment and structural consistency.
  • The amortized neural approximation makes repeated FGW computations feasible on large fMRI graphs.
  • The learned representations capture different levels of geometric variability across subjects.
  • Essential information for downstream tasks such as classification is preserved in the representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Retaining subject-specific geometry could improve accuracy in population-level analyses that currently lose individual detail through template alignment.
  • The same amortized transport approximation might reduce cost when applying dictionary learning to other mismatched graph datasets, such as connectomes from different species.
  • Downstream models trained on these representations may generalize better to new subjects because geometric variability is explicitly modeled rather than removed.

Load-bearing premise

That the Fused Gromov-Wasserstein distance combined with amortized neural approximation of transport plans is sufficient to learn dictionaries whose atoms meaningfully depend on the trade-off parameter while remaining computationally tractable for fMRI-scale graphs.

What would settle it

An experiment showing that the learned dictionary atoms do not change meaningfully with the FGW trade-off parameter or that the resulting representations fail to preserve essential information better than those obtained by projecting brains to a common template.

Figures

Figures reproduced from arXiv: 2605.20883 by Bertrand Thirion, R\'emi Flamary, Sonia Mazelet.

Figure 1
Figure 1. Figure 1: Visualization of the learned dictionary atoms for different values of [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Original contrast (top) and its reconstruction (bottom) for a given subject and contrast story - math. AGDL We train AGDL with both the linear interpolation (6) and the MLP (7) on the HCP dataset by minimizing the loss in Eq. (5), using a batch size of 32, a learning rate of 0.001, and training for 1000 epochs. We train the model stochastically by sampling 100 graphs per epoch. The atoms are initialized wi… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the reconstructions obtained by moving along the first two principal [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 9
Figure 9. Figure 9: 30 20 10 0 10 20 30 t-SNE dimension 1 30 20 10 0 10 20 30 40 t-SNE dimension 2 t-SNE visualization of AGDL dictionary coefficients colored by contrast GAMBLING win - loss EMOTION fear - neut MOTOR rf - avg rh - avg RELATIONAL match - relation WM bk2_avg - bk0_avg SOCIAL mental - rnd LANGUAGE story - math [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE visualization of the learned em￾beddings for α = 0.22 (best contrast classification performance). Each point corresponds to a graph, colored according to its contrast. PCA in the embedding space To further il￾lustrate subject variability in the learned rep￾resentation, we choose two classical contrasts right hand - average and fear - neutral and per￾form PCA in the embedding space across many subject… view at source ↗
Figure 5
Figure 5. Figure 5: Performance for contrast classification (left) and subject classification (right) for different [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization of the training set SVC scores for [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of the activations for the first [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FUGW loss on validation dataset for ULOT training along epochs [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: All learned dictionary atoms with AGDL for [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: t-SNE visualization of the learned embeddings for all contrasts for [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: t-SNE visualization of the learned atoms, where each point corresponds to an atom. The [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
read the original abstract

Dictionary learning is a powerful tool for creating interpretable representations. When applied to functional magnetic resonance imaging (fMRI) data, the resulting patterns of brain activity can be used for various downstream tasks, such as brain state classification or population-level analysis. However, a major challenge is the variability in brain geometry across individuals. This is usually addressed by projecting each individual brain geometry onto a common template, which removes subject-specific information. In this work, we introduce a novel approach to dictionary learning on fMRI data that explicitly accounts for this variability. We use the optimal transport-based Fused Gromov-Wasserstein (FGW) distance to compare graphs with different geometries and features. To address the challenge of computing multiple FGW distances for large graphs such as those arising from fMRI data, we rely on amortized optimization to learn a neural network that predicts an approximation of the optimal transport plans, which substantially reduces the computational cost. Additionally, we learn dictionary atoms that depend on the FGW trade-off parameter, which controls the balance between feature alignment and structural consistency. Numerical experiments on the HCP dataset demonstrate that the proposed approach captures different levels of geometric variability in the data and provides representations that preserve essential information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a dictionary learning method for fMRI data that uses the Fused Gromov-Wasserstein (FGW) distance to compare graphs with varying individual brain geometries and node features. It addresses computational cost on large HCP-scale graphs via amortized neural-network approximation of transport plans and makes dictionary atoms explicitly dependent on the FGW trade-off parameter that balances feature alignment against structural consistency. Experiments on the HCP dataset are claimed to show that the resulting representations capture different levels of geometric variability while preserving essential information for downstream tasks.

Significance. If the numerical claims hold, the work provides a principled way to retain subject-specific geometric information in fMRI dictionary learning instead of projecting onto a common template, which could improve interpretability and performance in population-level analyses and brain-state classification. The combination of FGW with amortization is a technically interesting extension of optimal-transport tools to neuroimaging graphs.

major comments (2)
  1. [§3.2] §3.2 (amortized FGW approximation): the central claim that dictionary atoms 'meaningfully depend on the trade-off parameter' while remaining tractable requires that the neural-network predictor preserves sensitivity to the FGW balance parameter. No approximation-error bounds, ablation against exact FGW solvers, or sensitivity plots on representative HCP graph sizes are reported, leaving open the possibility that amortization bias erases the intended dependence on the trade-off parameter.
  2. [§4] §4 (numerical experiments on HCP): the abstract asserts that the method 'captures different levels of geometric variability,' yet the provided description contains no quantitative metrics, error bars, ablation studies, or baseline comparisons (e.g., against template-based dictionary learning). Without these, it is impossible to assess whether the learned atoms actually vary controllably with the trade-off parameter or merely reflect post-hoc choices.
minor comments (2)
  1. [§2] The notation for the FGW objective and the precise manner in which the trade-off parameter enters the dictionary-learning loss should be stated explicitly, ideally with an equation reference.
  2. [§4] Figure captions and axis labels in the experimental section should indicate which values of the trade-off parameter are shown and how variability is quantified.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which help clarify how to better demonstrate the properties of the amortized FGW approximation and the experimental results. We address each major point below and will revise the manuscript to incorporate additional evidence where feasible.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (amortized FGW approximation): the central claim that dictionary atoms 'meaningfully depend on the trade-off parameter' while remaining tractable requires that the neural-network predictor preserves sensitivity to the FGW balance parameter. No approximation-error bounds, ablation against exact FGW solvers, or sensitivity plots on representative HCP graph sizes are reported, leaving open the possibility that amortization bias erases the intended dependence on the trade-off parameter.

    Authors: We agree that explicit verification of sensitivity is important. The neural network is conditioned on the trade-off parameter α as an explicit input, and the training objective directly minimizes discrepancy to exact FGW plans; this design choice is intended to retain dependence on α. However, we acknowledge the absence of dedicated sensitivity analysis and ablations in the current version. In the revision we will add (i) sensitivity plots of atom variation across α for both the amortized predictor and exact FGW on representative HCP subgraph sizes, and (ii) an ablation comparing reconstruction and downstream performance when using the amortized versus exact solver on smaller instances. Theoretical approximation-error bounds are not currently available and would require substantial additional analysis beyond the scope of the present work. revision: partial

  2. Referee: [§4] §4 (numerical experiments on HCP): the abstract asserts that the method 'captures different levels of geometric variability,' yet the provided description contains no quantitative metrics, error bars, ablation studies, or baseline comparisons (e.g., against template-based dictionary learning). Without these, it is impossible to assess whether the learned atoms actually vary controllably with the trade-off parameter or merely reflect post-hoc choices.

    Authors: The current manuscript presents qualitative visualizations of atoms at different α values together with qualitative downstream-task preservation, but we concur that quantitative support is needed to substantiate the claims. In the revised version we will include (i) quantitative metrics such as atom stability across α, reconstruction error on held-out subjects, and brain-state classification accuracy with error bars, (ii) direct comparisons against a standard template-based dictionary-learning baseline, and (iii) ablation results showing performance as a function of the trade-off parameter. These additions will make the controllability of geometric variability explicit and allow readers to evaluate the claims quantitatively. revision: yes

standing simulated objections not resolved
  • Theoretical approximation-error bounds for the neural-network FGW predictor

Circularity Check

0 steps flagged

No circularity; established OT machinery applied to new domain with independent amortization

full rationale

The derivation relies on the pre-existing Fused Gromov-Wasserstein distance (external to this paper) to compare graphs, with an amortized neural network introduced as a separate computational approximation to enable scaling. Dictionary atoms are explicitly parameterized by the trade-off parameter as a controllable input rather than derived tautologically from fitted outputs. Experiments on the external HCP dataset provide validation outside the method's own equations. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that FGW distance is an appropriate metric for comparing fMRI graphs with mismatched geometries and that the neural approximation preserves enough accuracy for dictionary learning. No new physical entities are postulated.

free parameters (1)
  • FGW trade-off parameter
    Controls the balance between feature alignment and structural consistency; dictionary atoms are learned to depend on it.
axioms (1)
  • domain assumption Optimal transport plans exist and can be approximated by a neural network for the graph sizes arising in fMRI parcellations.
    Invoked to justify the amortized optimization step that makes the method tractable.

pith-pipeline@v0.9.0 · 5743 in / 1406 out tokens · 48024 ms · 2026-05-21T06:35:28.579237+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Optuna: A next-generation hyperparameter optimization framework

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019

  2. [2]

    Tutorial on amortized optimization.Foundations and Trends® in Machine Learning, 16(5):592–732, 2023

    Brandon Amos et al. Tutorial on amortized optimization.Foundations and Trends® in Machine Learning, 16(5):592–732, 2023

  3. [3]

    Nilearn contributors. nilearn

  4. [4]

    Gorgolewski, Demian Wassermann, Bertrand Thirion, and Arthur Mensch

    Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J. Gorgolewski, Demian Wassermann, Bertrand Thirion, and Arthur Mensch. Fine-grain atlases of functional modes for fmri analysis.NeuroImage, 221:117126, 2020

  5. [5]

    Learning the parts of objects by non-negative matrix factorization.Nature, 401:788– 791, 1999

    D Daniel. Learning the parts of objects by non-negative matrix factorization.Nature, 401:788– 791, 1999

  6. [6]

    Sparse dictionary learning of resting state fmri networks

    Harini Eavani, Roman Filipovych, Christos Davatzikos, Theodore D Satterthwaite, Raquel E Gur, and Ruben C Gur. Sparse dictionary learning of resting state fmri networks. In2012 Second International Workshop on Pattern Recognition in NeuroImaging, pages 73–76. IEEE, 2012

  7. [7]

    Freesurfer.Neuroimage, 62(2):774–781, 2012

    Bruce Fischl. Freesurfer.Neuroimage, 62(2):774–781, 2012

  8. [8]

    Individual variability of the system-level organization of the human brain.Cereb

    Evan M Gordon, Timothy O Laumann, Babatunde Adeyemo, and Steven E Petersen. Individual variability of the system-level organization of the human brain.Cereb. Cortex, 27(1):386–399, January 2017

  9. [9]

    Independent component analysis

    Aapo Hyvärinen, Jarmo Hurri, and Patrik O Hoyer. Independent component analysis. In Natural image statistics: A probabilistic approach to early computational vision, pages 151–

  10. [10]

    Shared and subject-specific dictionary learning (shssdl) algorithm for multisubject fmri data analysis.IEEE Transactions on Biomedical Engineering, 65(11):2519–2528, 2018

    Asif Iqbal, Abd-Krim Seghouane, and Tülay Adalı. Shared and subject-specific dictionary learning (shssdl) algorithm for multisubject fmri data analysis.IEEE Transactions on Biomedical Engineering, 65(11):2519–2528, 2018

  11. [11]

    A novel subject-wise dictionary learning approach using multi-subject fmri spatial and temporal components.Scientific Reports, 13(1):20201, 2023

    Muhammad Usman Khalid and Malik Muhammad Nauman. A novel subject-wise dictionary learning approach using multi-subject fmri spatial and temporal components.Scientific Reports, 13(1):20201, 2023

  12. [12]

    Sparse representation of whole-brain fmri signals for identification of functional networks.Medical image analysis, 20(1):112–134, 2015

    Jinglei Lv, Xi Jiang, Xiang Li, Dajiang Zhu, Hanbo Chen, Tuo Zhang, Shu Zhang, Xintao Hu, Junwei Han, Heng Huang, et al. Sparse representation of whole-brain fmri signals for identification of functional networks.Medical image analysis, 20(1):112–134, 2015

  13. [13]

    Online dictionary learning for sparse coding

    Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online dictionary learning for sparse coding. InProceedings of the 26th annual international conference on machine learning, pages 689–696, 2009

  14. [14]

    Unsupervised learning for optimal transport plan prediction between unbalanced graphs.arXiv preprint arXiv:2506.12025, 2025

    Sonia Mazelet, Rémi Flamary, and Bertrand Thirion. Unsupervised learning for optimal transport plan prediction between unbalanced graphs.arXiv preprint arXiv:2506.12025, 2025

  15. [15]

    Stochastic Subsampling for Factorizing Huge Matrices.IEEE Transactions on Signal Processing, 66(1):113–128, January 2018

    Arthur Mensch, Julien Mairal, Bertrand Thirion, and Gaël Varoquaux. Stochastic Subsampling for Factorizing Huge Matrices.IEEE Transactions on Signal Processing, 66(1):113–128, January 2018

  16. [16]

    Gromov-wasserstein averaging of kernel and distance matrices

    Gabriel Peyré, Marco Cuturi, and Justin Solomon. Gromov-wasserstein averaging of kernel and distance matrices. InInternational conference on machine learning, pages 2664–2672. PMLR, 2016

  17. [17]

    Making group inferences using sparse representation of resting-state functional mri data with application to sleep deprivation.Human brain mapping, 38(9):4671–4689, 2017

    Hui Shen, Huaze Xu, Lubin Wang, Yu Lei, Liu Yang, Peng Zhang, Jian Qin, Ling-Li Zeng, Zongtan Zhou, Zheng Yang, et al. Making group inferences using sparse representation of resting-state functional mri data with application to sleep deprivation.Human brain mapping, 38(9):4671–4689, 2017. 11

  18. [18]

    Learning parametric dictionaries for signals on graphs.IEEE Transactions on Signal Processing, 62(15):3849–3862, 2014

    Dorina Thanou, David I Shuman, and Pascal Frossard. Learning parametric dictionaries for signals on graphs.IEEE Transactions on Signal Processing, 62(15):3849–3862, 2014

  19. [19]

    From deep brain phenotyping to functional atlasing.Current Opinion in Behavioral Sciences, 40:201–212, August 2021

    Bertrand Thirion, Alexis Thual, and Ana Luísa Pinho. From deep brain phenotyping to functional atlasing.Current Opinion in Behavioral Sciences, 40:201–212, August 2021

  20. [20]

    Which fmri clustering gives good brain parcellations?Frontiers in neuroscience, 8:167, 2014

    Bertrand Thirion, Gaël Varoquaux, Elvis Dohmatob, and Jean-Baptiste Poline. Which fmri clustering gives good brain parcellations?Frontiers in neuroscience, 8:167, 2014

  21. [21]

    Aligning individual brains with fused unbalanced gromov wasserstein.Advances in neural information processing systems, 35:21792–21804, 2022

    Alexis Thual, Quang Huy Tran, Tatiana Zemskova, Nicolas Courty, Rémi Flamary, Stanislas Dehaene, and Bertrand Thirion. Aligning individual brains with fused unbalanced gromov wasserstein.Advances in neural information processing systems, 35:21792–21804, 2022

  22. [22]

    Dictionary learning.IEEE Signal Processing Magazine, 28(2):27–38, 2011

    Ivana Toši´c and Pascal Frossard. Dictionary learning.IEEE Signal Processing Magazine, 28(2):27–38, 2011

  23. [23]

    The human connectome project: a data acquisition perspective.Neuroimage, 62(4):2222–2231, October 2012

    D C Van Essen, K Ugurbil, E Auerbach, D Barch, T E J Behrens, R Bucholz, A Chang, L Chen, M Corbetta, S W Curtiss, S Della Penna, D Feinberg, M F Glasser, N Harel, A C Heath, L Larson-Prior, D Marcus, G Michalareas, S Moeller, R Oostenveld, S E Petersen, F Prior, B L Schlaggar, S M Smith, A Z Snyder, J Xu, E Yacoub, and WU-Minn HCP Consortium. The human c...

  24. [24]

    The wu-minn human connectome project: an overview.Neuroimage, 80:62–79, 2013

    David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. The wu-minn human connectome project: an overview.Neuroimage, 80:62–79, 2013

  25. [25]

    Multi-subject dictionary learning to segment an atlas of brain spontaneous activity

    Gaël Varoquaux, Alexandre Gramfort, Fabian Pedregosa, Vincent Michel, and Bertrand Thirion. Multi-subject dictionary learning to segment an atlas of brain spontaneous activity. InBien- nial International Conference on information processing in medical imaging, pages 562–573. Springer, 2011

  26. [26]

    Fused gromov-wasserstein distance for structured objects.Algorithms, 13(9):212, 2020

    Titouan Vayer, Laetitia Chapel, Rémi Flamary, Romain Tavenard, and Nicolas Courty. Fused gromov-wasserstein distance for structured objects.Algorithms, 13(9):212, 2020

  27. [27]

    Online graph dictionary learning

    Cédric Vincent-Cuaz, Titouan Vayer, Rémi Flamary, Marco Corneli, and Nicolas Courty. Online graph dictionary learning. InInternational conference on machine learning, pages 10564–10574. PMLR, 2021

  28. [28]

    Jianwen Xie, Pamela K Douglas, Ying Nian Wu, Arthur L Brody, and Ariana E Anderson. Decoding the encoding of functional brain networks: An fmri classification comparison of non-negative matrix factorization (nmf), independent component analysis (ica), and sparse coding algorithms.Journal of neuroscience methods, 282:81–94, 2017

  29. [29]

    Gromov-wasserstein learning for graph matching and node embedding

    Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin Duke. Gromov-wasserstein learning for graph matching and node embedding. InInternational conference on machine learning, pages 6932–6941. PMLR, 2019. A AGDL algorithm and amortized ULOT training A.1 AGDL algorithm The algorithm used for AGDL is described in Algorithm 1, and the unmixing step is desc...