pith. machine review for the scientific record. sign in

arxiv: 2605.13798 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: 1 theorem link

· Lean Theorem

VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

Ender Konukoglu, Ertunc Erdil, Guney Tombak

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords volumetric featuresmultimodal correspondencetraining-freevoxel correspondencecross-modal transfervision transformersmedical image registrationfeature projection
0
0 comments X

The pith

A training-free fit-transform method creates reusable volumetric features from frozen 2D vision transformers for cross-modal voxel correspondence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that frozen 2D ViT models can be turned into consistent 3D volumetric representations by running triplanar inference and fitting a compact weighted partial least squares projection on initial voxel correspondences. This produces features that transfer to new volumes via linear projection alone, without fine-tuning or registration at test time. Direct nearest-neighbor search then yields voxel correspondences usable for registration, segmentation, and landmark tasks. A sympathetic reader would care because existing pipelines require per-pair adaptation or handcrafted descriptors, limiting reuse across scanners and modalities.

Core claim

VoxCor is a training-free fit-transform method that combines triplanar ViT inference with a closed-form weighted partial least squares projection fitted on correspondences to select modality-stable anatomical directions; at transform time new volumes receive the same triplanar features followed by the fixed projection, after which correspondences are obtained by nearest-neighbor search, yielding improved performance in the hardest cross-subject cross-modality settings and registration results competitive with handcrafted descriptors and learned 3D features.

What carries the argument

The closed-form weighted partial least squares (WPLS) projection on triplanar ViT features, which uses fitting-time correspondences to identify modality-stable anatomical directions.

If this is right

  • Voxel correspondences on new volumes can be obtained directly by nearest-neighbor search without any registration step.
  • Registration performance becomes competitive with handcrafted descriptors and learned 3D features.
  • Encoder sensitivity decreases for dense correspondence transfer across modalities.
  • The same features support downstream tasks such as voxelwise k-nearest-neighbor segmentation and segmentation-center landmark localization.
  • The resulting representations serve as a reusable feature layer for multimodal analysis beyond single-pair registration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fitting procedure could be repeated on other 2D foundation models to produce modality-stable 3D features without redesigning the projection step.
  • Fitting correspondences from a wider range of anatomical sites might allow the method to handle previously unseen body regions with minimal extra data.
  • Because no per-volume optimization occurs at test time, the approach could be inserted into real-time clinical pipelines that currently avoid learned features due to compute cost.
  • Combining the projected features with classical intensity-based registration as a coarse-to-fine step might further reduce residual errors in difficult cross-subject cases.

Load-bearing premise

The modality-stable anatomical directions identified by the WPLS projection on fitting-time correspondences generalize to new volumes and unseen modality combinations without further adaptation.

What would settle it

A clear drop in nearest-neighbor correspondence accuracy or deformable registration Dice scores when the fitted projection is applied to a new cross-modality volume pair absent from the fitting correspondences.

Figures

Figures reproduced from arXiv: 2605.13798 by Ender Konukoglu, Ertunc Erdil, Guney Tombak.

Figure 1
Figure 1. Figure 1: VoxCor pipeline, organized into a fit phase run once on a paired training set and a [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evaluation protocol (shown for Abdomen MR–CT with L2OCV). [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Direct ConvexAdam (CA, plain boxes) versus Globally-Initialized ConvexAdam [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Reusable Dataset-Fit (plain boxes) versus pair-specific Pair-Fit (hatched boxes) under [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: DINOv3 voxelwise kNN segmentation Dice radar plots under the [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative direct feature-space correspondence on Abdomen MR–CT (right kid [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Semantic-versus-geometric correspondence in the Generalization (G) category. Each [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: All-encoder voxelwise kNN segmentation Dice radar plots under the [PITH_FULL_IMAGE:figures/full_fig_p039_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: kNN sensitivity to the number of neighbors under the [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗
read the original abstract

Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions in the triplanar feature space. At transform time, new volumes are mapped by triplanar ViT inference and linear projection alone, without fine-tuning or registration. Voxel correspondences can then be queried directly by nearest-neighbor search. We evaluate VoxCor on intra-subject Abdomen MR--CT and inter-subject HCP T2w--T1w tasks using deformable registration, voxelwise k-nearest-neighbor segmentation, and segmentation-center landmark localization. VoxCor improves the hardest cross-subject, cross-modality transfer settings, reduces encoder sensitivity for dense correspondence transfer, and yields registration performance competitive with handcrafted descriptors and learned 3D features. This positions VoxCor as a reusable feature layer for downstream multimodal analysis beyond pairwise registration. Code, configuration files, and implementation details are publicly available on GitHub at \href{https://github.com/guneytombak/VoxCor}{guneytombak/VoxCor}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces VoxCor, a training-free fit-transform method that extracts reusable volumetric features from frozen 2D ViT foundation models via triplanar inference followed by a closed-form weighted partial least squares (WPLS) projection fitted once on voxel correspondences. These features support direct nearest-neighbor voxel correspondence across modalities and subjects without per-pair adaptation or fine-tuning, and are evaluated on intra-subject Abdomen MR-CT and inter-subject HCP T2w-T1w tasks for deformable registration, kNN segmentation, and landmark localization, claiming gains in the hardest cross-subject cross-modality settings, reduced encoder sensitivity, and competitive performance versus handcrafted and learned 3D descriptors.

Significance. If the WPLS-derived directions prove to generalize beyond the fitting distribution, VoxCor would supply a practical, reusable feature layer for multimodal 3D medical imaging that avoids task-specific training or per-pair solvers, simplifying pipelines for registration and dense correspondence. The training-free design and public code release are notable strengths for reproducibility.

major comments (3)
  1. [Abstract] Abstract: the claims of performance improvements, reduced encoder sensitivity, and competitive registration results are stated without any quantitative numbers, error bars, data-split details, baseline specifications, or subject counts for fitting versus test phases, making it impossible to verify whether the data support the central claims.
  2. [Abstract] Abstract and evaluation description: the manuscript does not state whether the fitting set used to learn the WPLS projection is disjoint from the test volumes or how many subjects are used for fitting, which is load-bearing for the claim that modality-stable directions generalize to new volumes and unseen modality combinations.
  3. [Method (WPLS)] Method section on WPLS projection: because the projection is fitted using external voxel correspondences from a fitting set, the selected directions may encode dataset-specific anatomical or acquisition biases rather than truly invariant features; without explicit held-out validation this risks circularity in the 'training-free reusable feature' positioning.
minor comments (1)
  1. [Abstract] The GitHub link is given but the main text could include a brief reproducibility checklist (exact ViT backbone, triplanar axis choices, and WPLS hyperparameters) to aid readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve clarity and support for the central claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claims of performance improvements, reduced encoder sensitivity, and competitive registration results are stated without any quantitative numbers, error bars, data-split details, baseline specifications, or subject counts for fitting versus test phases, making it impossible to verify whether the data support the central claims.

    Authors: We agree that the abstract lacks the necessary quantitative support. In the revised manuscript we will insert specific performance metrics (e.g., Dice scores, landmark errors), standard deviations or error bars, baseline specifications, and explicit subject counts for the fitting versus test phases so that readers can directly assess the strength of the reported improvements. revision: yes

  2. Referee: [Abstract] Abstract and evaluation description: the manuscript does not state whether the fitting set used to learn the WPLS projection is disjoint from the test volumes or how many subjects are used for fitting, which is load-bearing for the claim that modality-stable directions generalize to new volumes and unseen modality combinations.

    Authors: The fitting set is disjoint from all test volumes; the WPLS projection is learned once on a separate cohort (10 subjects for Abdomen, 20 subjects for HCP) and then applied without further adaptation. We will add these exact subject counts and an explicit statement of disjointness to both the abstract and the evaluation section to make the generalization claim verifiable. revision: yes

  3. Referee: [Method (WPLS)] Method section on WPLS projection: because the projection is fitted using external voxel correspondences from a fitting set, the selected directions may encode dataset-specific anatomical or acquisition biases rather than truly invariant features; without explicit held-out validation this risks circularity in the 'training-free reusable feature' positioning.

    Authors: We acknowledge the risk of dataset-specific bias. To address it we will add a new held-out validation experiment in the revised manuscript that applies the fitted WPLS directions to completely unseen subjects and modality pairs (including cross-dataset transfer) and reports the resulting correspondence accuracy, thereby demonstrating that the selected directions capture modality-stable anatomical structure rather than fitting-set idiosyncrasies. revision: yes

Circularity Check

0 steps flagged

No circularity: closed-form WPLS fit on external correspondences yields independent transform-time features

full rationale

The derivation consists of an offline closed-form WPLS projection computed from externally supplied fitting-time voxel correspondences, followed by a linear transform applied unchanged to new volumes. No equation reduces the output to a redefinition of its own fitted parameters, no self-citation chain is load-bearing for the central claim, and no ansatz or uniqueness result is smuggled in. The reusability claim is therefore an empirical generalization statement rather than a definitional equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on standard linear-algebra assumptions for partial least squares and the domain assumption that triplanar 2D features contain complementary stable anatomical information across modalities.

free parameters (1)
  • WPLS projection weights
    Weights are determined from fitting-time voxel correspondences to emphasize modality-stable directions.
axioms (1)
  • domain assumption Triplanar features from a frozen 2D ViT capture complementary anatomical information that can be linearly combined into modality-stable volumetric descriptors
    Invoked when the method assumes that the three orthogonal views together provide sufficient information for cross-modal consistency.

pith-pipeline@v0.9.0 · 5621 in / 1294 out tokens · 45763 ms · 2026-05-14T19:30:58.368894+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

36 extracted references · 11 canonical work pages · 4 internal anchors

  1. [1]

    A survey of medical image registration.Medical image analysis, 2(1):1–36, 1998

    JB Antoine Maintz and Max A Viergever. A survey of medical image registration.Medical image analysis, 2(1):1–36, 1998

  2. [2]

    Deformable medical image registration: A survey.IEEE transactions on medical imaging, 32(7):1153–1190, 2013

    Aristeidis Sotiras, Christos Davatzikos, and Nikos Paragios. Deformable medical image registration: A survey.IEEE transactions on medical imaging, 32(7):1153–1190, 2013

  3. [3]

    A review of atlas-based segmentation for magnetic resonance brain images.Computer methods and programs in biomedicine, 104(3):e158–e177, 2011

    Mariano Cabezas, Arnau Oliver, Xavier Lladó, Jordi Freixenet, and Meritxell Bach Cuadra. A review of atlas-based segmentation for magnetic resonance brain images.Computer methods and programs in biomedicine, 104(3):e158–e177, 2011

  4. [4]

    Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.Medical image analysis, 12(1):26–41, 2008

    Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.Medical image analysis, 12(1):26–41, 2008

  5. [5]

    Elastix: a toolbox for intensity-based medical image registration.IEEE transactions on medical imaging, 29(1):196–205, 2009

    Stefan Klein, Marius Staring, Keelin Murphy, Max A Viergever, and Josien PW Pluim. Elastix: a toolbox for intensity-based medical image registration.IEEE transactions on medical imaging, 29(1):196–205, 2009

  6. [6]

    Diffeomorphic demons: Efficient non-parametric image registration.NeuroImage, 45(1):S61–S72, 2009

    Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache. Diffeomorphic demons: Efficient non-parametric image registration.NeuroImage, 45(1):S61–S72, 2009

  7. [7]

    A fast diffeomorphic image registration algorithm.Neuroimage, 38(1): 95–113, 2007

    John Ashburner. A fast diffeomorphic image registration algorithm.Neuroimage, 38(1): 95–113, 2007

  8. [8]

    Voxel- morph: a learning framework for deformable medical image registration.IEEE transactions on medical imaging, 38(8):1788–1800, 2019

    Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. Voxel- morph: a learning framework for deformable medical image registration.IEEE transactions on medical imaging, 38(8):1788–1800, 2019

  9. [9]

    End-to-end unsupervised deformable image registration with a convolutional neural net- work

    Bob D De Vos, Floris F Berendsen, Max A Viergever, Marius Staring, and Ivana Išgum. End-to-end unsupervised deformable image registration with a convolutional neural net- work. InDeep Learning in Medical Image Analysis and Multimodal Learning for Clini- cal Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS ...

  10. [10]

    Cross-modal attention for multi-modal image registration.Medical Image Analysis, 82:102612, 2022

    Xinrui Song, Hanqing Chao, Xuanang Xu, Hengtao Guo, Sheng Xu, Baris Turkbey, Brad- ford J Wood, Thomas Sanford, Ge Wang, and Pingkun Yan. Cross-modal attention for multi-modal image registration.Medical Image Analysis, 82:102612, 2022

  11. [11]

    Prince, and Yong Du

    Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L. Prince, and Yong Du. A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond.Medical Image Analysis, 100:103385, 2025. ISSN 1361-8415. doi: 10.1016/j.media.2024.103385. URLhttps://www. sciencedirect....

  12. [12]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

  13. [13]

    Emerging properties in self-supervised vision transformers

    MathildeCaron, HugoTouvron, IshanMisra, HervéJégou, JulienMairal, PiotrBojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021. 27

  14. [14]

    DINOv2: Learning Robust Visual Features without Supervision

    MaximeOquab, TimothéeDarcet, ThéoMoutakanni, HuyVo, MarcSzafraniec, VasilKhali- dov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  15. [15]

    DINOv3

    Oriane Siméoni, Huy V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025

  16. [16]

    Segment anything

    AlexanderKirillov, EricMintun, NikhilaRavi, HanziMao, ChloeRolland, LauraGustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015– 4026, 2023

  17. [17]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024

  18. [18]

    SAM 3: Segment Anything with Concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. SAM 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025

  19. [19]

    Segment anything in medical images.Nature Communications, 15:654, 2024

    Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Communications, 15:654, 2024

  20. [20]

    Medsam2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600, 2025

    Jun Ma, Zongxin Yang, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallah- pour, Reza Asakereh, Hongwei Lyu, and Bo Wang. MedSAM2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600, 2025

  21. [21]

    Towards general purpose vision foundation models for medical image analysis: An experimental study of dinov2 on radiology benchmarks.arXiv preprint arXiv:2312.02366, 2023

    Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu, Kilian Phol, Ab- dulrhman Aljouie, and Wei Peng. Towards general purpose vision foundation models for medical image analysis: An experimental study of dinov2 on radiology benchmarks.arXiv preprint arXiv:2312.02366, 2023

  22. [22]

    Do vision foundation models enhance domain generalization in medical image segmentation?arXiv preprint arXiv:2409.07960, 2024

    Kerem Cekmeceli, Meva Himmetoglu, Guney I Tombak, Anna Susmelj, Ertunc Erdil, and Ender Konukoglu. Do vision foundation models enhance domain generalization in medical image segmentation?arXiv preprint arXiv:2409.07960, 2024

  23. [23]

    DINO-Reg: General purpose image encoder for training-free multi-modal deformable medical image registration

    Xinrui Song, Xuanang Xu, and Pingkun Yan. DINO-Reg: General purpose image encoder for training-free multi-modal deformable medical image registration. InInternational Con- ference on Medical Image Computing and Computer-Assisted Intervention, pages 608–617. Springer, 2024

  24. [24]

    Wong, Clinton J

    Neel Dey, Benjamin Billot, Hallee E. Wong, Clinton J. Wang, Mengwei Ren, P. Ellen Grant, Adrian V. Dalca, and Polina Golland. Learning general-purpose biomedical vol- ume representations using randomized synthesis. InInternational Conference on Learning Representations, 2025

  25. [25]

    Totalsegmentator: robust segmentation of 104 anatomic structures in ct images.Radiology: Artificial Intelligence, 5(5), 2023

    Jakob Wasserthal, Hanns-Christian Breit, Manfred T Meyer, Maurice Pradella, Daniel Hinck, Alexander W Sauter, Tobias Heye, Daniel T Boll, Joshy Cyriac, Shan Yang, et al. Totalsegmentator: robust segmentation of 104 anatomic structures in ct images.Radiology: Artificial Intelligence, 5(5), 2023

  26. [26]

    Are vision foundation models ready for out-of-the-box medical image registration?arXiv preprint arXiv:2507.11569, 2025

    Hanxue Gu, Yaqian Chen, Nicholas Konz, Qihang Li, and Maciej A Mazurowski. Are vision foundation models ready for out-of-the-box medical image registration?arXiv preprint arXiv:2507.11569, 2025. 28

  27. [27]

    VISTA3D: A unified segmentation foundation model for 3d medical imaging

    Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, and Wenqi Li. VISTA3D: A unified segmentation foundation model for 3d medical imaging. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

  28. [28]

    SegVol: Universal and interactive volu- metric medical image segmentation

    Yuxin Du, Fan Bai, Tiejun Huang, and Bo Zhao. SegVol: Universal and interactive volu- metric medical image segmentation. InAdvances in Neural Information Processing Systems, 2024

  29. [29]

    Alessa Hering, Lasse Hansen, Tony C. W. Mok, et al. Learn2reg: Comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. IEEE Transactions on Medical Imaging, 42(3):697–712, 2023

  30. [30]

    Van Essen, Stephen M

    David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E. J. Behrens, Essa Yacoub, Kamil Ugurbil, and WU-Minn HCP Consortium. The wu-minn human connectome project: an overview.Neuroimage, 80:62–79, 2013

  31. [31]

    Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-blockcase. TechnicalReportTechnicalReport371, DepartmentofStatistics, University of Washington, 2000

  32. [32]

    A whitening approach to probabilistic canonical correlation analysis for omics data integration.BMC Bioinformatics, 20(1):15, 2019

    Takoua Jendoubi and Korbinian Strimmer. A whitening approach to probabilistic canonical correlation analysis for omics data integration.BMC Bioinformatics, 20(1):15, 2019. doi: 10.1186/s12859-018-2572-9

  33. [33]

    Heinrich, Mark Jenkinson, Bartłomiej W

    Mattias P. Heinrich, Mark Jenkinson, Bartłomiej W. Papież, Michael Brady, and Julia A. Schnabel. Towards realtime multimodal fusion for image-guided interventions using self- similarities. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2013, volume 8151 ofLecture Notes in Computer Science, pages 187–194. Springer, 2013. doi: 10.1007...

  34. [34]

    Convex- adam: Self-configuring dual-optimisation-based 3d multitask medical image registration

    Hanna Siebert, Christoph Großbröhmer, Lasse Hansen, and Mattias P Heinrich. Convex- adam: Self-configuring dual-optimisation-based 3d multitask medical image registration. IEEE Transactions on Medical Imaging, 2024

  35. [35]

    Freesurfer.NeuroImage, 62(2):774–781, 2012

    Bruce Fischl. Freesurfer.NeuroImage, 62(2):774–781, 2012

  36. [36]

    xformers: A modu- lar and hackable transformer modelling library.https://github.com/facebookresearch/ xformers, 2022

    Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Wenhan Xiong, Vittorio Caggiano, Sean Naren, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang, Patrick Labatut, Daniel Haziza, Luca Wehrstedt, Jeremy Reizenstein, and Grigory Sizov. xformers: A modu- lar and hackable transformer modelling library.https://github.com/facebookresearch/ xformers, 2022. 29 A Me...