Do We Really Need Diffusion? A Fast U-Net for Paired Medical Image Translation
Pith reviewed 2026-06-27 01:45 UTC · model grok-4.3
The pith
Lightweight U-Net outperforms DDPM for paired MRI translation with higher accuracy and 208 times faster inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The lightweight 4-level U-Net outperforms the DDPM on Pearson correlation (r = 0.975 vs. 0.962) and mean absolute error (MAE = 0.014 +/- 0.015 vs. 0.019 +/- 0.019) while reducing inference time by a factor of 208 (25.2 ms vs. 5227.2 ms per image with 50 DDIM steps). Both models exceed the identity baseline (r = 0.769, MAE = 0.070), confirming they learn a meaningful cross-modal mapping from T2-weighted images to signal fat fraction maps on the NAKO cohort data.
What carries the argument
The 4-level lightweight U-Net that performs direct supervised paired image-to-image translation from T2-weighted MRI to signal fat fraction maps.
If this is right
- Real-time clinical estimation of signal fat fraction becomes feasible on standard hardware.
- Diffusion models are not required for this paired medical image translation task.
- Similar lightweight U-Nets may replace diffusion approaches in other paired medical imaging problems.
- Large paired datasets enable high-accuracy models without generative model complexity.
Where Pith is reading between the lines
- The result suggests direct regression with convolutional networks can outperform generative diffusion for paired data tasks.
- The same lightweight architecture could be tested on unpaired settings or different MRI contrasts to check generalization.
- Integration into scanner software might allow immediate fat fraction output during routine T2-weighted acquisitions.
Load-bearing premise
The DDPM implementation and training protocol represent a fair state-of-the-art baseline without undisclosed hyper-parameter disadvantages.
What would settle it
Retraining the DDPM with extensive hyperparameter search, alternative sampling schedules, or more than 50 DDIM steps and checking whether accuracy and effective speed can match or exceed the U-Net results.
Figures
read the original abstract
Magnetic resonance imaging-signal fat fraction (MRI-SFF) quantifies tissue fat and serves as an established biomarker for metabolic and musculoskeletal disorders. The acquisition requires, however, specialized MRI sequences, which are not available routinely. We investigate whether SFF can be estimated from widely available T2-weighted (T2w) MRI via image-to-image translation (I2I). We further compare a lightweight 4-level U-Net to a state-of-the-art Denoising Diffusion Probabilistic Model (DDPM) using a dataset of 230 048 paired 2D images (183 517 train, 23 621 val, 22 910 test) from the German National Cohort (NAKO). Both models clearly outperform the identity baseline (Pearson correlation r = 0.769, mean absolute error MAE = 0.070 +/- 0.054), which confirms that the models learn a non-trivial cross-modal mapping. Interestingly, the lightweight U-Net outperforms the DDPM in both correlation (r = 0.975 vs. 0.962) and error (MAE = 0.014 +/- 0.015 vs. 0.019 +/- 0.019), while reducing inference time by a factor of 208 (25.2 ms vs. 5 227.2 ms per image using 50 Denoising Diffusion Implicit Model (DDIM) steps). The strong clinical performance at substantially reduced computational cost enables real-time clinical use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a lightweight 4-level U-Net outperforms a DDPM for paired T2w-to-MRI-SFF image translation on 230k NAKO images (r=0.975 vs 0.962; MAE=0.014 vs 0.019), while being 208x faster at inference (25.2 ms vs 5227 ms with 50 DDIM steps), and that both models substantially beat the identity baseline.
Significance. If the DDPM baseline is shown to be properly optimized, the result would indicate that diffusion models are not required for this paired medical I2I task and that a simple U-Net suffices for high-accuracy, real-time clinical use. The large paired dataset and direct numerical comparison are strengths.
major comments (2)
- [Abstract / Methods] Abstract and Methods: No training schedule, learning-rate schedule, noise schedule, number of diffusion steps during training, or ablation on DDIM sampling steps is reported for the DDPM. This prevents verification that the 0.013 r and 0.005 MAE gaps reflect inherent model differences rather than unequal optimization, which is load-bearing for the headline claim of U-Net superiority.
- [Abstract] Abstract: The DDPM is described only as 'state-of-the-art' with 50 DDIM steps at inference; without hyper-parameter search details or confirmation of classifier-free guidance tuning, the fairness of the baseline cannot be assessed from the given information.
minor comments (1)
- [Abstract] Abstract: The reported MAE values include standard deviations; it is unclear whether these are computed over images or over pixels and whether paired statistical tests (e.g., Wilcoxon) were performed to support the claimed superiority.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for greater transparency in the DDPM baseline implementation. We agree that these details are important for verifying the fairness of the comparison and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and Methods: No training schedule, learning-rate schedule, noise schedule, number of diffusion steps during training, or ablation on DDIM sampling steps is reported for the DDPM. This prevents verification that the 0.013 r and 0.005 MAE gaps reflect inherent model differences rather than unequal optimization, which is load-bearing for the headline claim of U-Net superiority.
Authors: We acknowledge the omission of these implementation details. In the revised manuscript we will add a dedicated subsection in Methods describing the DDPM training schedule, learning-rate schedule, noise schedule, number of training diffusion steps, and an ablation study on the number of DDIM inference steps. These additions will allow readers to confirm that the reported performance gap is not attributable to unequal optimization effort. revision: yes
-
Referee: [Abstract] Abstract: The DDPM is described only as 'state-of-the-art' with 50 DDIM steps at inference; without hyper-parameter search details or confirmation of classifier-free guidance tuning, the fairness of the baseline cannot be assessed from the given information.
Authors: We will expand the Methods section to document the hyper-parameter choices used for the DDPM, including any grid or random search performed and the classifier-free guidance scale that was selected. This will substantiate the claim that the baseline reflects standard state-of-the-art practice for this task. revision: yes
Circularity Check
No circularity: direct empirical comparison on held-out data
full rationale
The paper reports measured Pearson correlations and MAE values on a fixed held-out test set of 22 910 images after training both models. No equations, fitted parameters, or self-citations are used to derive the reported performance numbers; the results are obtained by standard supervised training and evaluation. The comparison therefore does not reduce to any input by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Anand, S., et al.: Synthesizing proton-density fat fraction andR∗ 2 from 2-point Dixon MRI with generative machine learning. In: Proceedings of the IEEE Inter- national Symposium on Biomedical Imaging (ISBI) (2025), arXiv:2410.11186
arXiv 2025
-
[2]
Computerized Medical Imaging and Graphics79, 101684 (Jan 2020)
Armanious, K., Jiang, C., Fischer, M., Küstner, T., Hepp, T., Niko- laou, K., Gatidis, S., Yang, B.: MedGAN: Medical image transla- tion using GANs. Computerized Medical Imaging and Graphics79, 101684 (Jan 2020). https://doi.org/10.1016/j.compmedimag.2019.101684, https://linkinghub.elsevier.com/retrieve/pii/S0895611119300990
-
[3]
Radiology277(1), 206–220 (2015)
Bamberg, F., Kauczor, H.U., Weckbach, S., Schlett, C.L., et al.: Whole-body mr imaging in the german national cohort: Rationale, de- sign, and technical background. Radiology277(1), 206–220 (2015). https://doi.org/10.1148/radiol.2015142272
-
[4]
Ben-Cohen, A., Klang, E., Raskin, S.P., Soffer, S., Ben-Haim, S., Konen, E., Amitai, M.M., Greenspan, H.: Cross-Modality Synthesis from CT to PET using FCN and GAN Networks for Improved Automated Lesion Detection (Jul 2018). https://doi.org/10.48550/arXiv.1802.07846, http://arxiv.org/abs/1802.07846, arXiv:1802.07846 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1802.07846 2018
-
[5]
In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Blau, Y., Michaeli, T.: The Perception-Distortion Tradeoff. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6228–6237 (Jun 2018). https://doi.org/10.1109/CVPR.2018.00652, http://arxiv.org/abs/1711.06077, arXiv:1711.06077 [cs]
-
[6]
Frontiers in Radiol- ogy1, 664444 (Jul 2021)
Gadermayr, M., Heckmann, L., Li, K., Bähr, F., Müller, M., Truhn, D., Merhof, D., Gess, B.: Image-to-Image Translation for Simplified MRI Muscle Segmentation. Frontiers in Radiol- ogy1, 664444 (Jul 2021). https://doi.org/10.3389/fradi.2021.664444, https://www.frontiersin.org/articles/10.3389/fradi.2021.664444/full
-
[7]
Graf, R., Platzek, P., Riedel, E.O., et al.: Vibesegmentator: Full body mri segmentation for the nako and uk biobank. European Radiology (2025). https://doi.org/10.1007/s00330-025-12035-9
-
[8]
European Radiology Experimen- tal7(1), 70 (Nov 2023)
Graf, R., Schmitt, J., Schlaeger, S., Möller, H.K., Sideri-Lampretsa, V., Sekuboyina, A., Krieg, S.M., Wiestler, B., Menze, B., Rueckert, D., Kirschke, J.S.: Denoising diffusion-based MRI to CT image translation enables automated spinal segmentation. European Radiology Experimen- tal7(1), 70 (Nov 2023). https://doi.org/10.1186/s41747-023-00385-2, https://...
-
[9]
Medical Physics 44(4), 1408–1419 (2017)
Han, X.: MR-based synthetic CT generation using a deep convolutional neural network method. Medical Physics 44(4), 1408–1419 (2017). https://doi.org/10.1002/mp.12155, https://onlinelibrary.wiley.com/doi/abs/10.1002/mp.12155
-
[10]
Hess, H., Oswald, A., Daneshvar, K., Gerber, N., Schär, M., Zumstein, M.A., Ger- ber, K.: Quantitative fat-fraction analysis of the rotator cuff muscles on clinical sagittal and coronal T1-weighted MRI using deep learning algorithms. Scientific Reports (2026). https://doi.org/10.1038/s41598-026-38108-3
-
[11]
Hiasa, Y., Otake, Y., Takao, M., Matsuoka, T., Takashima, K., Prince, J.L., Sugano, N., Sato, Y.: Cross-modality image synthesis from un- paired data using CycleGAN: Effects of gradient consistency loss and training data size (Jul 2018). https://doi.org/10.48550/arXiv.1803.06629, http://arxiv.org/abs/1803.06629, arXiv:1803.06629 [cs] A Fast U-Net for Pair...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.06629 2018
-
[12]
In: Advances in Neural Information Processing Systems (NeurIPS)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 33, pp. 6840–6851. Cur- ran Associates, Inc. (2020)
2020
-
[13]
https://doi.org/10.48550/ARXIV.2509.22049, https://arxiv.org/abs/2509.22049, version Number: 1
Honey, E., Helbo, A., Petersen, J.: Comparative Analysis of GAN and Diffusion for MRI-to-CT translation (2025). https://doi.org/10.48550/ARXIV.2509.22049, https://arxiv.org/abs/2509.22049, version Number: 1
-
[14]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
-
[15]
Medical Image Analysis88, 102846 (2023)
Kazerouni, A., Aghdam, E.K., Heidari, M., Azad, R., Fayyaz, M., Hacihaliloglu, I., Merhof, D.: Diffusion models in medical imaging: A comprehensive survey. Medical Image Analysis88, 102846 (2023). https://doi.org/10.1016/j.media.2023.102846
-
[16]
Supply chain logistics with quantum and classical annealing algorithms
Khosravi, P., et al.: Denoising diffusion probabilistic models for 3D medical im- age generation. Scientific Reports13(2023). https://doi.org/10.1038/s41598-023- 34341-2
-
[17]
Journal of Cachexia, Sarcopenia and Muscle13(2), 781–794 (2022)
Li, C.W., Yu, K., Shyh-Chang, N., Jiang, Z., Liu, T., Ma, S., Luo, L., Guang, L., Liang, K., Ma, W., Miao, H., Cao, W., Liu, R., Jiang, L.J., Yu, S.L., Li, C., Liu, H.J., Xu, L.Y., Liu, R.J., Zhang, X.Y., Liu, G.S.: Pathogenesis of sarcopenia and the relationship with fat mass: Descriptive review. Journal of Cachexia, Sarcopenia and Muscle13(2), 781–794 (...
-
[18]
Quantitative Imaging in Medicine and Surgery10(6), 1223–1236 (Jun 2020)
Li, W., Li, Y., Qin, W., Liang, X., Xu, J., Xiong, J., Xie, Y.: Mag- netic resonance image (MRI) synthesis from brain computed tomogra- phy (CT) images based on deep learning methods for magnetic reso- nance (MR)-guided radiotherapy. Quantitative Imaging in Medicine and Surgery10(6), 1223–1236 (Jun 2020). https://doi.org/10.21037/qims-19-885, http://qims....
-
[19]
Computers in Biology and Medicine157, 106738 (May 2023)
Li, Y., Xu, S., Chen, H., Sun, Y., Bian, J., Guo, S., Lu, Y., Qi, Z.: CT synthesis from multi-sequence MRI using adaptive fu- sion network. Computers in Biology and Medicine157, 106738 (May 2023). https://doi.org/10.1016/j.compbiomed.2023.106738, https://linkinghub.elsevier.com/retrieve/pii/S0010482523002032
-
[20]
BMC Medical Informatics and Decision Mak- ing25, 390 (Oct 2025)
Luo, J., Yang, L., Liu, Y., Hu, C., Wang, G., Yang, Y., Yang, T.L., Zhou, X.: Review of diffusion models and its applications in biomedical informatics. BMC Medical Informatics and Decision Mak- ing25, 390 (Oct 2025). https://doi.org/10.1186/s12911-025-03210-5, https://pmc.ncbi.nlm.nih.gov/articles/PMC12541957/
-
[21]
https://doi.org/10.48550/arXiv.2209.12104, http://arxiv.org/abs/2209.12104, arXiv:2209.12104 [eess]
Lyu, Q., Wang, G.: Conversion Between CT and MRI Images Using Diffusion and Score-Matching Models (Sep 2022). https://doi.org/10.48550/arXiv.2209.12104, http://arxiv.org/abs/2209.12104, arXiv:2209.12104 [eess]
-
[22]
BMC Medi- cal Imaging23(1), 48 (2023)
Masi, S., Rye, M., Roussac, A., Naghdi, N., Rosenstein, B., Bailey, J.F., Fortin, M.: Comparison of paraspinal muscle composition measure- ments using IDEAL fat-water and T2-weighted MR images. BMC Medi- cal Imaging23(1), 48 (2023). https://doi.org/10.1186/S12880-023-00992-W, https://doi.org/10.1186/s12880-023-00992-w
-
[23]
In: Rodolà, E., Galasso, F., Masi, I
Moschetto, A., Puglisi, L., Sargood, A., Dell’Acqua, P., Guarnera, F., Battiato, S., Ravì, D.: Benchmarking gans, diffusion models, and flow matching for t1w-to-t2w mri translation. In: Rodolà, E., Galasso, F., Masi, I. (eds.) Image Analysis and Processing - ICIAP 2025 Workshops. pp. 429–440. Springer Nature Switzerland (2026) 16 A. Pirwass et al
2025
-
[24]
Computer Methods and Programs in Biomedicine 210, 106371 (Oct 2021)
Moya-Sáez, E., Peña-Nogales, Ó., Luis-García, R.D., Alberola-López, C.: A deep learning approach for synthetic MRI based on two routine sequences and train- ing with synthetic data. Computer Methods and Programs in Biomedicine 210, 106371 (Oct 2021). https://doi.org/10.1016/j.cmpb.2021.106371, https://linkinghub.elsevier.com/retrieve/pii/S0169260721004454
-
[25]
PLOS One20(8), e0328867 (Aug 2025)
Nasir, M., Xu, Y., Hasenstab, K., Yechoor, A., Dodhia, R., Weeks, W.B., Ferres, J.L., Cunha, G.M.: Liver MRI proton density fat fraction inference from con- trast enhanced CT images using deep learning: A proof-of-concept study. PLOS One20(8), e0328867 (Aug 2025). https://doi.org/10.1371/journal.pone.0328867, https://dx.plos.org/10.1371/journal.pone.0328867
-
[26]
IEEE Transactions on Medical Imaging 42(12), 3524–3539 (Dec 2023)
Özbey, M., Dalmaz, O., Dar, S.U.H., Bedel, H.A., Özturk, Ş., Güngör, A., Çukur, T.: Unsupervised Medical Image Translation With Ad- versarial Diffusion Models. IEEE Transactions on Medical Imaging 42(12), 3524–3539 (Dec 2023). https://doi.org/10.1109/TMI.2023.3290149, https://ieeexplore.ieee.org/document/10167641/
-
[27]
European Journal of Epidemiology37(10), 1107–1124 (2022)
Peters, A., Consortium, G.N.C.N., et al.: Framework and baseline examination of the german national cohort (nako). European Journal of Epidemiology37(10), 1107–1124 (2022). https://doi.org/10.1007/s10654-022-00890-5
-
[28]
BMC Medical Imaging25(1), 499 (2025)
Pirwass, A., Glimm, B., Munz, M., Wilke, H.J.: Automatability and validity of methods for the quantification of intra-/intermuscular adipose tissue in con- ventional mri: A systematic review. BMC Medical Imaging25(1), 499 (2025). https://doi.org/10.1186/s12880-025-02037-w
-
[29]
IEEE Transactions on Medical Imaging45(5), 2156– 2172 (2026)
Rassmann, S., Kügler, D., Ewert, C., Reuter, M.: Regression is all you need for medical image translation. IEEE Transactions on Medical Imaging45(5), 2156– 2172 (2026). https://doi.org/10.1109/TMI.2025.3650412
-
[30]
Journal of Magnetic Resonance Imaging36(5), 1011–1014 (2012)
Reeder, S.B., Hu, H.H., Sirlin, C.B.: Proton density fat-fraction: A standardized MR-based biomarker of tissue fat concentration. Journal of Magnetic Resonance Imaging36(5), 1011–1014 (2012). https://doi.org/10.1002/jmri.23741
-
[31]
High-Resolution Image Synthesis with Latent Diffusion Models
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High- Resolution Image Synthesis with Latent Diffusion Models (Apr 2022). https://doi.org/10.48550/arXiv.2112.10752, http://arxiv.org/abs/2112.10752, arXiv:2112.10752 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.10752 2022
-
[32]
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28
-
[33]
In: International Conference on Learning Representations (ICLR) (2021), https://openreview.net/forum?id=St1giarCHLP
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (ICLR) (2021), https://openreview.net/forum?id=St1giarCHLP
2021
-
[34]
American Journal of Roentgenology221(5), 620–631 (2023)
Wang, K., Cunha, G.M., Hasenstab, K., Henderson, W.C., Middleton, M.S., Cole, S.A., Umans, J.G., Ali, T., Hsiao, A., Sirlin, C.B.: Deep learning for inference of hepatic proton density fat fraction from T1-weighted in-phase and opposed-phase MRI: Retrospective analysis of population-based trial data. American Journal of Roentgenology221(5), 620–631 (2023)...
-
[35]
Weißer, Linda: NAKO Gesundheitsstudie (Sep 2025), https://drks.de:443/search/de/trial/DRKS00037328
2025
-
[36]
North American Spine Society journal17, 100313 (Mar 2024)
Wesselink, E.O., Elliott, J.M., Pool-Goudzwaard, A., Coppieters, M.W., Pevenage, P.P., Di Ieva, A., Weber Ii, K.A.: Quantifying lumbar paraspinal intramuscular fat: Accuracy and reliability of automated thresholding A Fast U-Net for Paired Medical Image Translation 17 models. North American Spine Society journal17, 100313 (Mar 2024). https://doi.org/10.10...
-
[37]
Scientific Reports 10(1), 3753 (Feb 2020)
Yang, Q., Li, N., Zhao, Z., Fan, X., Chang, E.I.C., Xu, Y.: MRI Cross-Modality Image-to-Image Translation. Scientific Reports 10(1), 3753 (Feb 2020). https://doi.org/10.1038/s41598-020-60520-6, https://www.nature.com/articles/s41598-020-60520-6
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.