arxiv: 2604.19474 · v1 · submitted 2026-04-21 · 📡 eess.IV

Recognition: unknown

Harmonizing MR Images Across 100+ Scanners: Multi-site Validation with Traveling Subjects and Real-world Protocols

Savannah P. Hays , Lianrui Zuo , Muhammad Faizyab Ali Chaudhary , Kathleen M. Bartz , Samuel W. Remedios , Jinwei Zhang , Jiachen Zhuo , Murat Bilgel

show 6 more authors

Shiv Saidha Ellen M. Mowry Scott D. Newsome Jerry L. Prince Blake E. Dewey Aaron Carass

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:23 UTC · model grok-4.3

classification 📡 eess.IV

keywords MR image harmonizationmulti-site validationtraveling subjectsartifact encoderattention mechanismsbrain MRIimage imputationwhole brain segmentation

0 comments

The pith

An enhanced HACA3^+ algorithm aligns MR brain images from over 100 scanners more consistently than the original version.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HACA3^+, an updated approach to making magnetic resonance images consistent across many different scanners and hospitals. It improves on an earlier method by adding a stronger artifact encoder, attention tools that distinguish background from foreground, and training on scans from 64 sites and more than 100 scanners. This matters for multi-center studies because scanner differences otherwise distort combined datasets and reduce the reliability of machine learning tools for diagnosis or research. The authors validate the changes using traveling subjects who receive scans at multiple sites under real clinical protocols for four common image types. They also test effects on brain segmentation and image completion tasks while running ablations to confirm each addition helps.

Core claim

By integrating an improved artifact encoder, background- and foreground-sensitive attention mechanisms, and training data from 100+ scanners across 64 independent sites, HACA3^+ delivers superior inter-site harmonization for T1-weighted, T2-weighted, proton-density, and FLAIR images, as measured in traveling-subject experiments and reflected in gains for whole-brain segmentation and image imputation.

What carries the argument

HACA3^+, the enhanced harmonization model that processes input images through an improved artifact encoder and attention mechanisms trained on broad multi-site data to reduce scanner-specific variations.

If this is right

Whole-brain segmentation accuracy rises when images are processed with HACA3^+ rather than the prior version.
Image imputation tasks show measurable gains on harmonized outputs from multiple sites.
Each added component (artifact encoder, attention, expanded data) contributes measurably to the overall improvement, per ablation results.
The method supports realistic clinical protocols across four standard contrasts without requiring protocol changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Larger pooled datasets from clinical trials become feasible once scanner effects are reduced to this level.
The same training strategy could be tested on other MRI contrasts or body regions to check transfer.
If the gains hold on external scanners, the approach could serve as a preprocessing step before federated learning across hospitals.

Load-bearing premise

The improvements in artifact handling and attention will generalize to scanners and acquisition protocols outside the 64 sites used for training and validation.

What would settle it

A new scanner not among the 64 sites produces harmonized images whose downstream segmentation or imputation performance fails to exceed the original HACA3 results.

Figures

Figures reproduced from arXiv: 2604.19474 by Aaron Carass, Blake E. Dewey, Ellen M. Mowry, Jerry L. Prince, Jiachen Zhuo, Jinwei Zhang, Kathleen M. Bartz, Lianrui Zuo, Muhammad Faizyab Ali Chaudhary, Murat Bilgel, Samuel W. Remedios, Savannah P. Hays, Scott D. Newsome, Shiv Saidha.

**Figure 1.** Figure 1: Qualitative results from the (A) T1-w degraded anterior simulation, (B) T2-w degraded anterior simulation, and (C) FLAIR degraded left/right simulation. Each row shows the different source images input to the model. We compare between HACA3 and HACA3+. HACA3 uses the background mask from the first source image, which, in this scenario, is the T1-w image. As a result, HACA3 does not attempt to impute in (A)… view at source ↗

**Figure 2.** Figure 2: The PSNR and SSIM of the harmonized limited FOV images compared with the acquired full FOV image for each image contrast. We used two limited FOV simulations: anterior and left/right. Each contrast was tested separately. For example, when the T1-w image had limited FOV, the T2-w and FLAIR were full FOV. For each image contrast, HACA3+ significantly (p-value < 0.0001) outperformed HACA3 [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 3.** Figure 3: Qualitative results on real, clinically acquired MR images. Each row corresponds to a different person with MS. (A) shows the harmonized T1-w result using a limited FOV T1-w input image. (B) shows the harmonized T2-w result using a limited FOV T2-w input image. (C) shows the harmonized FLAIR result using a limited FOV FLAIR input image. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Inter-site harmonization results using HACA3 and HACA3+. The input source images were acquired at a single non-Johns Hopkins site. The input PD was not included to demonstrate the imputation ability. The target images were acquired at the Johns Hopkins site. between them. There were no statistically significant results in this dataset most likely due to a smaller number of subjects. 4.5. Ablation [PITH_FU… view at source ↗

**Figure 5.** Figure 5: PSNR and SSIM of each harmonized MR image contrast over the TREAT-MS Traveling Subjects dataset. Acquired images were harmonized to the Johns Hopkins site. PSNR and SSIM were calculated using the acquired Johns Hopkins image as the reference image [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: The DSC for each brain region computed using the segmentation result on the unharmonized, harmonized using HACA3, and harmonized using HACA3+ T1- w images. Significant differences between results are indicated using a paired Wilcoxon test with Benjamini–Hochberg correction (symbols indicate significance level (∗∗: p-value < 0.01, ∗ : p-value < 0.05)). sponding PSNR and SSIM metrics. The full HACA3+ model, … view at source ↗

**Figure 7.** Figure 7: Representative harmonization results for an ON-Harmony subject using different model configurations for T1w and FLAIR images. The full HACA3+ model produces outputs that are more consistent with the target domain contrast compared to ablated variants and the baseline HACA3 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: The PSNR and SSIM results from the ablation experiment using the ON-Harmony dataset. The harmonization target for each configuration was the subject’s initial scanning session. Ablation #1 is HACA3 with only the attention and artifact enhancements. Ablation #2 is HACA3 with the larger training dataset, attention enhancement, and no artifact encoder. enhancement introduced a spatially-aware attention mechan… view at source ↗

read the original abstract

Reliable harmonization of heterogeneous magnetic resonance~(MR) image datasets, especially those acquired in pragmatic clinical trials, is critical to advance multi-center neuroimaging studies and translational machine learning in healthcare. We present an enhanced and rigorously validated version of the HACA3 harmonization algorithm, which we refer to as HACA3$^+$, incorporating key methodological enhancements: (1)~an improved artifact encoder to better isolate and mitigate image artifacts, (2)~background and foreground-sensitive attention mechanisms to increase harmonization specificity, and (3)~extensive training using data spanning 100+ scanners from 64 independent sites, providing a broader diversity of scanners than other harmonization methods. Our study focuses on four commonly acquired MR image contrasts (T1-weighted, T2-weighted, proton density, \& fluid-attenuated inversion recovery), reflecting realistic clinical protocols. We perform inter-site harmonization experiments using traveling subjects to assess the generalization and robustness of the harmonization model. We compare the results of the publicly available version of HACA3 and our implementation, HACA3$^+$. Downstream relevance is further established through whole brain segmentation and image imputation. Finally, we justify each enhancement through an ablation experiment. Pre-trained weights and code for HACA3$^+$ are made publicly available at https://github.com/shays15/haca3-plus.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HACA3+ adds an artifact encoder, attention tweaks, and more training data to the original, with traveling-subject tests and public code, but the generalization claim to truly new scanners is undercut by possible site overlap.

read the letter

The core update here is HACA3+ with three changes: a better artifact encoder, background/foreground attention, and training across 64 sites and 100+ scanners on standard T1/T2/PD/FLAIR contrasts. They test inter-site harmonization with traveling subjects, run ablations, and check downstream effects on whole-brain segmentation and image imputation. The public code and pre-trained weights are a clear plus for anyone who wants to try it directly.

Referee Report

2 major / 2 minor

Summary. The paper presents HACA3^+, an enhanced version of the HACA3 harmonization algorithm for multi-contrast MR images (T1w, T2w, PD, FLAIR). It incorporates an improved artifact encoder, background/foreground-sensitive attention mechanisms, and training on data spanning 100+ scanners from 64 sites. The central claims are that HACA3^+ achieves superior inter-site harmonization relative to the original HACA3, as demonstrated by traveling-subject experiments, ablation studies justifying each enhancement, and improved performance on downstream tasks including whole-brain segmentation and image imputation. Pre-trained weights and code are released publicly.

Significance. If the generalization claims hold, the work provides a practically useful advance for multi-center neuroimaging and translational ML by enabling more reliable harmonization under real-world clinical protocols. Strengths include the use of traveling subjects for direct validation of inter-site performance, ablation experiments to isolate contributions of the artifact encoder and attention modules, and evaluation on downstream tasks. Public code release supports reproducibility and adoption.

major comments (2)

[Traveling subject experiments] Traveling subject experiments section: the manuscript does not explicitly state whether the scanners/sites used for traveling-subject validation are completely disjoint from the 64 training sites. The central claim of superior generalization to unseen scanners and protocols (relative to original HACA3) is load-bearing on out-of-distribution performance; any overlap would reduce the validation to in-distribution testing and weaken the superiority argument.
[Ablation experiments] Results and ablation sections: while ablations are performed, the reported metrics for harmonization quality (e.g., similarity measures or error reductions on traveling subjects) lack accompanying statistical tests (p-values, confidence intervals) comparing HACA3^+ to HACA3 and to ablated variants. This makes it difficult to assess whether the observed gains are robust and attributable to the proposed enhancements.

minor comments (2)

[Abstract and Methods] The abstract and methods should specify the exact number of traveling subjects, number of scans per subject, and the precise quantitative metrics (e.g., PSNR, SSIM, or harmonization-specific scores) used to claim superiority.
[Figures] Figure captions for traveling-subject results could more clearly label which panels show before/after harmonization and which scanners are involved to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate the suggested clarifications and additions to strengthen the presentation of our results.

read point-by-point responses

Referee: Traveling subject experiments section: the manuscript does not explicitly state whether the scanners/sites used for traveling-subject validation are completely disjoint from the 64 training sites. The central claim of superior generalization to unseen scanners and protocols (relative to original HACA3) is load-bearing on out-of-distribution performance; any overlap would reduce the validation to in-distribution testing and weaken the superiority argument.

Authors: We agree that explicit confirmation of disjointness is essential for supporting the out-of-distribution generalization claim. The traveling-subject scanners and sites were acquired on hardware and protocols not represented in the 64-site training set, ensuring true unseen-scanner evaluation. In the revised manuscript we will add a clear statement in the Traveling subject experiments section specifying that these sites are completely disjoint from the training data. revision: yes
Referee: Results and ablation sections: while ablations are performed, the reported metrics for harmonization quality (e.g., similarity measures or error reductions on traveling subjects) lack accompanying statistical tests (p-values, confidence intervals) comparing HACA3^+ to HACA3 and to ablated variants. This makes it difficult to assess whether the observed gains are robust and attributable to the proposed enhancements.

Authors: We acknowledge that the absence of statistical testing limits the interpretability of the reported improvements. In the revised manuscript we will add paired statistical comparisons (Wilcoxon signed-rank tests) together with p-values and 95% confidence intervals for the primary harmonization metrics on the traveling-subject cohort, both for HACA3^+ versus the original HACA3 and versus each ablated variant. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation stands independent of training inputs

full rationale

The paper presents methodological enhancements to HACA3 (artifact encoder, attention mechanisms, expanded 64-site training) and supports its claims through separate traveling-subject harmonization experiments, ablation studies, and downstream evaluations on segmentation and imputation. These constitute external empirical checks rather than any derivation or prediction that reduces by construction to the training data or prior self-citations. No equations or steps are shown to equate outputs to inputs, and the validation is explicitly framed as testing generalization, rendering the chain self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach assumes separability of scanner artifacts from content, with no new physical entities introduced; the main dependencies are on the training data diversity and the architectural choices.

free parameters (1)

Deep neural network parameters
The weights of the harmonization network are learned from the training data spanning 64 sites.

axioms (1)

domain assumption Scanner-specific effects in MR images can be isolated and removed while preserving anatomical and contrast information.
Fundamental to the harmonization task and the design of the artifact encoder and attention modules.

pith-pipeline@v0.9.0 · 5606 in / 1276 out tokens · 50670 ms · 2026-05-10T01:23:45.501831+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

188 extracted references · 10 canonical work pages · 1 internal anchor

[1]

, volume =

, title =. , volume =
[2]

Breiman and J

L. Breiman and J. H. Friedman and R. A. Olshen and C. J. Stone , title =
[3]

Geurts and D

P. Geurts and D. Ernst and L. Wehenkel , title =
[4]

Geurts and N

P. Geurts and N. Touleimat and M. Dutreix and F. d'Alche-Buc , title =
[5]

Mairal and F

J. Mairal and F. Bach and J. Ponce and G. Sapiro and A. Zisserman , title =. Advances in Neural Information Processing Systems (NIPS) 21 , volume =
[6]

Barnes and E

C. Barnes and E. Shechtman and D. B. Golman and A. Finkelstein , title =
[7]

M. D. Zeiler , title =. , archivePrefix =. 1212.5701 , number =

work page arXiv
[8]

Ciresan and A

D. Ciresan and A. Giusti and L. M. Gambardella and J. Schmidhuber , title =. Advances in Neural Information Processing Systems 25 , publisher =
[9]

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pages=

Improving deep neural networks for LVCSR using rectified linear units and dropout , author=. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pages=. 2013 , organization=

2013
[10]

H. R. Roth and L. Lu and A. Seff and K. M. Cherry and J. Hoffman and S. Wang and J. Liu and E. Turkbey and R. M. Summers , title =
[11]

V. T. Ta and R. Giraud and D. L. Collins and P. Coup\'
[12]

Roy and others , title =

S. Roy and others , title =
[13]

Roy and A

S. Roy and A. Carass and J. L. Prince and D. L. Pham , title =. Machine Learning in Medical Imaging (MLMI 2014) , volume =

2014
[14]

Li and R

H. Li and R. Zhao and X. Wang , title =. CoRR , archivePrefix =. 1412.4526 , number =

work page arXiv
[15]

Advances in neural information processing systems , pages=

Generative adversarial nets , author=. Advances in neural information processing systems , pages=
[16]

Chen and G

L.-C. Chen and G. Papandreou and I. Kokkinos and K. Murphy and A. L. Yuille , title =. CoRR , volume =
[17]

Roy and Q

S. Roy and Q. He and E. Sweeney and A. Carass and D. S. Reich and J. L. Prince and D. L. Pham , title =. IEEE Journal of Biomedical and Health Informatics , volume =. 2015 , ISSN =

2015
[18]

L. C. Chen and G. Papandreou and I. Kokkinos and K. Murphy and A. L. Yuille , journal= TPAMI, title=. 2017 , volume=. doi:10.1109/TPAMI.2017.2699184 , ISSN=

work page doi:10.1109/tpami.2017.2699184 2017
[19]

CoRR , volume =

Vijay Badrinarayanan and Alex Kendall and Roberto Cipolla , title =. CoRR , volume =
[20]

Long and E

J. Long and E. Shelhamer and T. Darrell , title =. 2015 , pages =

2015
[21]

Ronneberger and others , title =

O. Ronneberger and others , title =. International Conference on Medical Image Computing and Computer-Assisted Intervention , volume =
[22]

Noh and S

H. Noh and S. Hong and B. Han , title =. The IEEE International Conference on Computer Vision (ICCV) , pages =
[23]

Proceedings of the IEEE International Conference on Computer Vision , pages=

Conditional random fields as recurrent neural networks , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=
[24]

BenTaieb and G

A. BenTaieb and G. Hamarneh , title =
[25]

Shelhamer and J

E. Shelhamer and J. Long and T. Darrell , title =
[26]

Ravishankar and R

H. Ravishankar and R. Venkataramani and S. Thiruvenkadam and P. Sudhakar and V. Vaidya , title =
[27]

ICLR , year =

Fisher Yu and Vladlen Koltun , title =. ICLR , year =
[28]

Medical Physics , volume=

Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets , author=. Medical Physics , volume=. 2016 , publisher=

2016
[29]

arXiv preprint arXiv:1705.07450 , year=

Image Segmentation by Iterative Inference from Conditional Score Estimation , author=. arXiv preprint arXiv:1705.07450 , year=

work page arXiv
[30]

Zheng, Shuai and Jayasumana, Sadeep and Romera-Paredes, Bernardino and Vineet, Vibhav and Su, Zhizhong and Du, Dalong and Huang, Chang and Torr, Philip H. S. , title =. The IEEE International Conference on Computer Vision (ICCV) , month =
[31]

arXiv preprint arXiv:1702.05747 , year=

A survey on deep learning in medical image analysis , author=. arXiv preprint arXiv:1702.05747 , year=

work page arXiv
[32]

2017 , publisher=

3D deeply supervised network for automated segmentation of volumetric medical images , author=. 2017 , publisher=

2017
[33]

The Journal of Machine Learning Research , volume=

End-to-end training of deep visuomotor policies , author=. The Journal of Machine Learning Research , volume=. 2016 , publisher=

2016
[34]

The IEEE Conference on Computer Vision and Pattern Recognition , year=

Improving Landmark Localization with Semi-Supervised Learning , author=. The IEEE Conference on Computer Vision and Pattern Recognition , year=
[35]

An Overview of Multi-Task Learning in Deep Neural Networks

An overview of multi-task learning in deep neural networks , author=. arXiv preprint arXiv:1706.05098 , year=

work page internal anchor Pith review arXiv
[36]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[37]

2015 , publisher=

Jog, Amod and others , journal= MIA, volume=. 2015 , publisher=

2015
[38]

2019 , publisher=

Disentangled representation learning in cardiac image analysis , author=. 2019 , publisher=

2019
[39]

2010 , publisher=

A topology-preserving approach to the segmentation of brain images with multiple sclerosis lesions , author=. 2010 , publisher=

2010
[40]

Image synthesis in multi-contrast

Dar, Salman UH and Yurt, Mahmut and Karacan, Levent and Erdem, Aykut and Erdem, Erkut and. Image synthesis in multi-contrast. 2019 , publisher=

2019
[41]

Zhu, Jun-Yan and others , booktitle=
[42]

2017 , publisher=

Random forest regression for magnetic resonance image synthesis , author=. 2017 , publisher=

2017
[43]

Unpaired brain

Yang, Heran and others , booktitle=. Unpaired brain. 2018 , publisher=

2018
[44]

Gudbjartsson, H. The. Magnetic resonance in medicine , volume=. 1995 , publisher=

1995
[45]

European Conference on Computer Vision , pages=

Perceptual losses for real-time style transfer and super-resolution , author=. European Conference on Computer Vision , pages=
[46]

Radford, Alec and Metz, Luke and Chintala, Soumith , journal=
[47]

2015 , organization=

Ronneberger, Olaf and others , booktitle=. 2015 , organization=

2015
[48]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

A style-based generator architecture for generative adversarial networks , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
[49]

2004 , publisher=

Image quality assessment: from error visibility to structural similarity , author=. 2004 , publisher=

2004
[50]

Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp , booktitle=
[51]

2010 , publisher=

Tustison, Nicholas J and others , journal= TMI, volume=. 2010 , publisher=

2010
[52]

Evaluating the impact of intensity normalization on

Reinhold, Jacob C and others , booktitle=. Evaluating the impact of intensity normalization on. 2019 , organization=

2019
[53]

2018 , publisher=

Medical image synthesis with deep convolutional adversarial networks , author=. 2018 , publisher=

2018
[54]

2019 , publisher=

Generative adversarial network in medical imaging: A review , author=. 2019 , publisher=

2019
[55]

International Workshop on Simulation and Synthesis in Medical Imaging , pages=

Medical image synthesis for data augmentation and anonymization using generative adversarial networks , author=. International Workshop on Simulation and Synthesis in Medical Imaging , pages=. 2018 , organization=

2018
[56]

Haacke, E Mark and others , year=
[57]

2000 , publisher=

Resnick, Susan M and others , journal=. 2000 , publisher=

2000
[58]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Rethinking the inception architecture for computer vision , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[59]

Dewey, Blake E and others , journal=
[60]

Dewey, Blake E and others , booktitle=
[61]

Martin, J and Ong, Frank and Ma, Jun and Tamir, Jonathan I and Lustig, Michael and Grissom, William A , booktitle=
[62]

He, Yufan and others , booktitle=
[63]

Zhao, Can and others , booktitle=
[64]

Zuo, Lianrui and others , booktitle=
[65]

Wolterink, Jelmer M and others , booktitle=
[66]

Huang, Xun and others , booktitle=
[67]

Wu, Wayne and others , booktitle=
[68]

Lee, Hsin-Ying and others , booktitle=
[69]

Lindvall, Torgny , year=
[70]

Liu, Ming-Yu and others , booktitle=
[71]

Liu, Alexander H and others , booktitle=
[72]

Kodirov, Elyor and others , booktitle=
[73]

Xia, Weihao and others , journal=
[74]

Tishby, Naftali and others , booktitle=
[75]

Dai, Bin and others , booktitle=
[76]

Burgess, Christopher P and others , booktitle=
[77]

2017 , booktitle =

Deep Variational Information Bottleneck , author=. 2017 , booktitle =

2017
[78]

Sohn, Kihyuk and others , booktitle=
[79]

and others , title =

LaMontagne, Pamela J. and others , title =. 2019 , journal =

2019
[80]

Saito, Kuniaki and others , booktitle=

Showing first 80 references.