pith. sign in

arxiv: 2607.01401 · v1 · pith:ST74AMRPnew · submitted 2026-07-01 · 💻 cs.LG · cs.AI· cs.CV

NeuroBridge: Bridging Multi-Task MRI Knowledge for Neurodegenerative Disease Diagnosis

Pith reviewed 2026-07-03 21:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords multi-task learningMRIAlzheimer's diseasehippocampal segmentationneurodegenerative diseasedeep learningmedical imagingopportunistic screening
0
0 comments X

The pith

Multi-task learning on MRI that adds hippocampal segmentation, atrophy classification and reconstruction improves Alzheimer's diagnosis accuracy over single-task baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops NeuroBridge to handle subtle and heterogeneous structural changes in brain MRI for classifying Alzheimer's disease, mild cognitive impairment and related dementias. It combines large-scale self-supervised pretraining with three auxiliary tasks—hippocampal segmentation, hippocampal atrophy classification and reconstruction—then applies gated fusion during fine-tuning for the primary diagnosis objective. Evaluated on ADNI and OASIS cohorts, the method records the highest accuracies reported, including 88.17 percent for AD versus cognitively normal on ADNI and 82.78 percent on OASIS, with largest gains in MCI and mixed settings plus effective cross-cohort transfer. A sympathetic reader cares because routine MRI scans could support more reliable early detection and probability-based screening without requiring new hardware or separate models.

Core claim

NeuroBridge integrates self-supervised MRI pretraining with hippocampal segmentation, hippocampal atrophy classification and reconstruction objectives, followed by gated fusion fine-tuning, and thereby achieves the highest performance across evaluated classification tasks while demonstrating strong cross-cohort generalization, systematic associations between predicted-class probability and accuracy, and the feasibility of probability-based opportunistic screening.

What carries the argument

Gated fusion fine-tuning that merges representations learned from the auxiliary clinical tasks with the primary diagnosis objective.

If this is right

  • Accuracy reaches 88.17 percent for AD versus cognitively normal controls on ADNI and 82.78 percent on OASIS.
  • The largest improvements appear in MCI-related and mixed-diagnosis classification settings.
  • Models trained on one cohort transfer effectively to the other cohort.
  • Predicted-class probabilities show systematic correlation with actual diagnostic accuracy.
  • Probability thresholds enable opportunistic screening on existing MRI scans.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same auxiliary-task structure might be tested on other neurodegenerative conditions that also affect the hippocampus.
  • Probability outputs could be used to prioritize follow-up clinical review without additional imaging.
  • If the gated fusion step proves robust, similar multi-task pretraining could be applied to other MRI-based diagnostic problems.

Load-bearing premise

The auxiliary tasks of hippocampal segmentation, atrophy classification and reconstruction supply clinically relevant signals that meaningfully improve the primary diagnosis task.

What would settle it

An ablation study that removes the auxiliary tasks, retrains on identical ADNI and OASIS data splits, and obtains equal or higher accuracy on the same AD-versus-CN and MCI tasks would falsify the claim that the multi-task setup drives the observed gains.

read the original abstract

INTRODUCTION: Accurate MRI-based identification of Alzheimer's disease (AD), mild cognitive impairment (MCI), and related dementias remains challenging because disease-related structural changes are often subtle and heterogeneous. We developed NeuroBridge, a clinically guided multi-task MRI framework for neurodegenerative disease diagnosis. METHODS: NeuroBridge integrates large-scale self-supervised MRI pretraining with hippocampal segmentation, hippocampal atrophy classification, and reconstruction objectives, followed by gated fusion fine-tuning. Performance was evaluated across ADNI and OASIS cohorts, including cross-cohort transfer, probability-based analysis, and opportunistic screening. RESULTS: NeuroBridge achieved the highest performance across evaluated classification tasks, reaching 88.17% accuracy for AD versus cognitively normal controls in ADNI and 82.78% in OASIS. The largest gains occurred in MCI-related and mixed-diagnosis settings. The framework demonstrated strong cross-cohort generalization, systematic associations between predicted-class probability and accuracy, and the feasibility of probability-based opportunistic screening. DISCUSSION: Clinically guided multi-task representation learning improves neurodegenerative MRI diagnosis beyond conventional single-task approaches. NeuroBridge provides a robust and scalable framework for dementia assessment and MRI-based opportunistic screening.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NeuroBridge, a multi-task MRI framework that combines large-scale self-supervised pretraining with auxiliary objectives (hippocampal segmentation, atrophy classification, and reconstruction) and gated fusion fine-tuning for neurodegenerative disease diagnosis. It reports state-of-the-art accuracies of 88.17% (ADNI) and 82.78% (OASIS) for AD vs. cognitively normal classification, along with strong cross-cohort generalization, probability-accuracy correlations, and feasibility for opportunistic screening, attributing gains to clinically guided multi-task learning over single-task baselines.

Significance. If the performance gains and generalization claims hold after proper controls, the work would demonstrate a practical way to inject domain-specific clinical signals into representation learning for MRI-based dementia diagnosis, with potential downstream value for scalable screening. The multi-cohort evaluation and probability-based analysis are positive elements, but the current lack of isolation for the auxiliary-task contributions limits the strength of the central methodological claim.

major comments (2)
  1. [Results] Results section (and abstract): The central claim that 'clinically guided multi-task representation learning improves ... beyond conventional single-task approaches' is load-bearing but unsupported by ablation experiments. No quantitative comparison isolates the contribution of the hippocampal segmentation, atrophy classification, and reconstruction auxiliaries versus self-supervised pretraining or gated fusion alone; without these, the attribution of the reported 88.17% / 82.78% accuracies and cross-cohort gains specifically to the clinical tasks cannot be assessed.
  2. [Methods / Results] Methods and Results sections: The reported accuracies lack accompanying baseline details, statistical tests (e.g., McNemar or paired t-tests), error bars, cohort demographics, exclusion criteria, or hyperparameter sensitivity analysis. These omissions make it impossible to evaluate whether the claimed superiority over single-task approaches is robust or merely reflects differences in training scale or data splits.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'the largest gains occurred in MCI-related and mixed-diagnosis settings' is stated without accompanying per-task numbers or tables, reducing clarity.
  2. [Methods] Notation: The gated fusion mechanism is described at a high level but would benefit from an explicit equation or diagram showing how task-specific features are combined before the final classifier.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below and commit to revisions that strengthen the empirical support for our claims.

read point-by-point responses
  1. Referee: [Results] Results section (and abstract): The central claim that 'clinically guided multi-task representation learning improves ... beyond conventional single-task approaches' is load-bearing but unsupported by ablation experiments. No quantitative comparison isolates the contribution of the hippocampal segmentation, atrophy classification, and reconstruction auxiliaries versus self-supervised pretraining or gated fusion alone; without these, the attribution of the reported 88.17% / 82.78% accuracies and cross-cohort gains specifically to the clinical tasks cannot be assessed.

    Authors: We acknowledge that the manuscript presents comparisons to single-task baselines but does not include explicit ablation experiments that isolate the individual contributions of the hippocampal segmentation, atrophy classification, and reconstruction auxiliaries from the self-supervised pretraining and gated fusion stages. This gap limits the precision with which performance gains can be attributed specifically to the clinically guided components. We will add these ablation studies, including quantitative results for variants with and without each auxiliary task, to the revised Results section. revision: yes

  2. Referee: [Methods / Results] Methods and Results sections: The reported accuracies lack accompanying baseline details, statistical tests (e.g., McNemar or paired t-tests), error bars, cohort demographics, exclusion criteria, or hyperparameter sensitivity analysis. These omissions make it impossible to evaluate whether the claimed superiority over single-task approaches is robust or merely reflects differences in training scale or data splits.

    Authors: We agree that additional methodological and statistical details are required for a rigorous evaluation. In the revised manuscript we will expand the Methods and Results sections to include full descriptions of all baselines, statistical significance tests (McNemar and paired t-tests), error bars derived from multiple runs, complete cohort demographics and exclusion criteria, and hyperparameter sensitivity analyses. These additions will directly address concerns about robustness and potential confounding factors such as training scale or data splits. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on external cohorts are independent of model definitions

full rationale

The paper's central claims consist of measured classification accuracies (88.17% AD vs CN on ADNI, 82.78% on OASIS) and cross-cohort generalization obtained by training the described multi-task framework on the named public datasets and evaluating on held-out splits. These quantities are not algebraically equivalent to any internal parameters, loss terms, or self-citations; they are external empirical outcomes. The auxiliary tasks (segmentation, atrophy classification, reconstruction) are distinct objectives whose contribution is asserted via experimental comparison rather than by definitional identity. No derivation step reduces to a fitted input renamed as prediction or to a self-citation chain that itself lacks independent verification. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, invented entities, or non-standard axioms are stated. The approach rests on the standard domain assumption that MRI contains extractable structural signals for disease classification when auxiliary tasks are chosen appropriately.

axioms (1)
  • domain assumption MRI scans contain structural information sufficient for distinguishing disease states when combined with appropriate learning objectives.
    Implicit in the decision to use MRI for diagnosis and the choice of hippocampal tasks.

pith-pipeline@v0.9.1-grok · 5741 in / 1485 out tokens · 31591 ms · 2026-07-03T21:22:07.976001+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 50 canonical work pages · 5 internal anchors

  1. [1]

    Impact of dementia: health disparities, population trends, care interventions, and economic costs

    Aranda MP, Kremer IN, Hinton L, et al. Impact of dementia: health disparities, population trends, care interventions, and economic costs. J Am Geriatr Soc. 2021;69:1774-1783. doi:10.1111/jgs.17345

  2. [2]

    NIA-AA Research Framework: toward a biological definition of Alzheimer’s disease

    Jack CR Jr, Bennett DA, Blennow K, et al. NIA-AA Research Framework: toward a biological definition of Alzheimer’s disease. Alzheimers Dement. 2018;14:535-562. doi:10.1016/j.jalz.2018.02.018

  3. [3]

    Sperling RA, Aisen PS, Beckett LA, et al. Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7:280-292. doi:10.1016/j.jalz.2011.03.003

  4. [4]

    Mild cognitive impairment

    Gauthier S, Reisberg B, Zaudig M, et al. Mild cognitive impairment. Lancet. 2006;367:1262-1270. doi:10.1016/S0140-6736(06)68542-5

  5. [5]

    Mild cognitive impairment: clinical characterization and outcome

    Petersen RC, Smith GE, Waring SC, et al. Mild cognitive impairment: clinical characterization and outcome. Arch Neurol. 1999;56:303-308. doi:10.1001/archneur.56.3.303

  6. [6]

    Focusing on earlier diagnosis of Alzheimer’s disease

    Frederiksen KS, Arus XM, Zetterberg H, et al. Focusing on earlier diagnosis of Alzheimer’s disease. Future Neurol. 2024;19:2337452. doi:10.2217/fnl-2023-0024

  7. [7]

    What Alzheimer’s disease can learn from oncology’s early-detection revolution: toward scalable, cost-effective dementia diagnostics

    Sun Y. What Alzheimer’s disease can learn from oncology’s early-detection revolution: toward scalable, cost-effective dementia diagnostics. Alzheimers Dement (Amst). 2026;18. doi:10.1002/dad2.70306 43

  8. [8]

    Expected and diagnosed rates of mild cognitive impairment and dementia in the US Medicare population: observational analysis

    Mattke S, Jun H, Chen E, et al. Expected and diagnosed rates of mild cognitive impairment and dementia in the US Medicare population: observational analysis. Alzheimers Res Ther. 2023;15:128. doi:10.1186/s13195-023-01272-z

  9. [9]

    Correlates of missed or late versus timely diagnosis of dementia in healthcare settings

    Chen Y, Power MC, Grodstein F, et al. Correlates of missed or late versus timely diagnosis of dementia in healthcare settings. Alzheimers Dement. 2024;20:5551-5560. doi:10.1002/alz.14067

  10. [10]

    Prevalence and determinants of undetected dementia in the community: a systematic literature review and meta-analysis

    Lang L, Clifford A, Wei L, et al. Prevalence and determinants of undetected dementia in the community: a systematic literature review and meta-analysis. BMJ Open. 2017;7. doi:10.1136/bmjopen-2016-011146

  11. [11]

    Time to diagnosis in dementia: a systematic review with meta-analysis

    Kusoro O, Roche M, Del-Pino-Casado R, et al. Time to diagnosis in dementia: a systematic review with meta-analysis. Int J Geriatr Psychiatry. 2025;40. doi:10.1002/gps.70129

  12. [12]

    On the Opportunities and Risks of Foundation Models

    Bommasani R, Hudson DA, Adeli E, et al. On the opportunities and risks of foundation models. arXiv [Preprint]. Published August 16, 2021. doi:10.48550/arXiv.2108.07258

  13. [13]

    Foundation models for generalist medical artificial intelligence

    Moor M, Banerjee O, Abad ZS, et al. Foundation models for generalist medical artificial intelligence. Nature. 2023;616:259-265. doi:10.1038/s41586-023- 05881-4

  14. [14]

    Masked Autoencoders Are Scalable Vision Learners

    He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:16000-16009. doi:10.48550/arXiv.2111.06377 44

  15. [15]

    Overcoming data scarcity in biomedical imaging with a foundational multi-task model

    Schäfer R, Nicke T, Höfener H, et al. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nat Comput Sci. 2024;4:495-509. doi:10.1038/s43588-024-00662-z

  16. [16]

    Foundation model for cancer imaging biomarkers

    Pai S, Bontempi D, Hadzic I, et al. Foundation model for cancer imaging biomarkers. Nat Mach Intell. 2024;6:354-367. doi:10.1038/s42256-024-00807-9

  17. [17]

    A generalizable foundation model for analysis of human brain MRI

    Tak D, Garomsa BA, Zapaishchykova A, et al. A generalizable foundation model for analysis of human brain MRI. Nat Neurosci. Published online February 5,

  18. [18]

    doi:10.1038/s41593-026-02202-6

  19. [19]

    The clinical use of structural MRI in Alzheimer disease

    Frisoni GB, Fox NC, Jack CR Jr, et al. The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol. 2010;6:67-77. doi:10.1038/nrneurol.2009.215

  20. [20]

    Imaging biomarkers of dementia: recommended visual rating scales with teaching cases

    Wahlund LO, Westman E, van Westen D, et al. Imaging biomarkers of dementia: recommended visual rating scales with teaching cases. Insights Imaging. 2017;8:79-90. doi:10.1007/s13244-016-0521-6

  21. [21]

    General lightweight framework for vision foundation model supporting multi-task and multi-center medical image analysis

    Lu S, Chen Y, Chen Y, et al. General lightweight framework for vision foundation model supporting multi-task and multi-center medical image analysis. Nat Commun. 2025;16:2097. doi:10.1038/s41467-025-57427-z

  22. [22]

    Attention Is All You Need

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30. doi:10.48550/arXiv.1706.03762

  23. [23]

    The limits of fair medical imaging AI in real-world generalization

    Yang Y, Zhang H, Gichoya JW, et al. The limits of fair medical imaging AI in real-world generalization. Nat Med. 2024;30:2838-2848. doi:10.1038/s41591-024- 03113-4 45

  24. [24]

    Tackling prediction uncertainty in machine learning for healthcare

    Chua M, Kim D, Choi J, et al. Tackling prediction uncertainty in machine learning for healthcare. Nat Biomed Eng. 2023;7:711-718. doi:10.1038/s41551-022- 00988-x

  25. [25]

    Tailored for real-world: a whole- slide image classification system validated on uncurated multisite data emulating the prospective pathology workload

    Ianni JD, Soans RE, Sankarapandian S, et al. Tailored for real-world: a whole- slide image classification system validated on uncurated multisite data emulating the prospective pathology workload. Sci Rep. 2020;10:3217. doi:10.1038/s41598-020- 59985-2

  26. [26]

    Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

    Young AT, Fernandez K, Pfau J, et al. Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models. NPJ Digit Med. 2021;4:10. doi:10.1038/s41746-020-00380-6

  27. [27]

    Opportunistic screening: Radiology Scientific Expert Panel

    Pickhardt PJ, Summers RM, Garrett JW, et al. Opportunistic screening: Radiology Scientific Expert Panel. Radiology. 2023;307. doi:10.1148/radiol.222044

  28. [28]

    Medial temporal lobe atrophy is underreported and may have important clinical correlates in medical inpatients

    Torisson G, van Westen D, Stavenow L, et al. Medial temporal lobe atrophy is underreported and may have important clinical correlates in medical inpatients. BMC Geriatr. 2015;15:65. doi:10.1186/s12877-015-0066-4

  29. [29]

    Radiological reporting of brain atrophy in MRI: real-life comparison between narrative reports, semiquantitative scales, and automated software-based volumetry

    Bruno F, Fagotti C, Saltarelli G, et al. Radiological reporting of brain atrophy in MRI: real-life comparison between narrative reports, semiquantitative scales, and automated software-based volumetry. Diagnostics (Basel). 2025;15:1246. doi:10.3390/diagnostics15101246

  30. [30]

    Structural imaging findings on non- enhanced computed tomography are severely underreported in the primary care diagnostic work-up of subjective cognitive decline

    Håkansson C, Torisson G, Londos E, et al. Structural imaging findings on non- enhanced computed tomography are severely underreported in the primary care diagnostic work-up of subjective cognitive decline. Neuroradiology. 2019;61:397-404. doi:10.1007/s00234-019-02156-6 46

  31. [31]

    Automated opportunistic osteoporotic fracture risk assessment using computed tomography scans to aid in FRAX underutilization

    Dagan N, Elnekave E, Barda N, et al. Automated opportunistic osteoporotic fracture risk assessment using computed tomography scans to aid in FRAX underutilization. Nat Med. 2020;26:77-82. doi:10.1038/s41591-019-0720-z

  32. [32]

    Incidental coronary artery calcium: opportunistic screening of previous nongated chest computed tomography scans to improve statin rates—the NOTIFY -1 project

    Sandhu AT, Rodriguez F, Ngo S, et al. Incidental coronary artery calcium: opportunistic screening of previous nongated chest computed tomography scans to improve statin rates—the NOTIFY -1 project. Circulation. 2023;147:703-714. doi:10.1161/CIRCULATIONAHA.122.062746

  33. [33]

    Automated CT biomarkers for opportunistic prediction of future cardiovascular events and mortality in an asymptomatic screening population: a retrospective cohort study

    Pickhardt PJ, Graffy PM, Zea R, et al. Automated CT biomarkers for opportunistic prediction of future cardiovascular events and mortality in an asymptomatic screening population: a retrospective cohort study. Lancet Digit Health. 2020;2. doi:10.1016/S2589-7500(20)30025-X

  34. [34]

    RadImageNet: an open radiologic deep learning research dataset for effective transfer learning

    Mei X, Liu Z, Robson PM, et al. RadImageNet: an open radiologic deep learning research dataset for effective transfer learning. Radiol Artif Intell. 2022;4. doi:10.1148/ryai.210315

  35. [35]

    Jenkinson M, Beckmann CF, Behrens TE, et al. FSL. Neuroimage. 2012;62:782-790. doi:10.1016/j.neuroimage.2011.09.015

  36. [36]

    Non-linear registration, aka spatial normalisation

    Andersson JL, Jenkinson M, Smith S. Non-linear registration, aka spatial normalisation. FMRIB Technical Report TR07JA2. FMRIB Analysis Group, University of Oxford; 2007. Accessed [June, 2026]. https://www.fmrib.ox.ac.uk/datasets/techrep/tr07ja2/tr07ja2.pdf

  37. [37]

    Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

    Varoquaux G, Raamana PR, Engemann DA, et al. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. Neuroimage. 2017;145:166-179. doi:10.1016/j.neuroimage.2016.10.038 47

  38. [38]

    A guide to cross-validation for artificial intelligence in medical imaging

    Bradshaw TJ, Huemann Z, Hu J, Rahmim A. A guide to cross-validation for artificial intelligence in medical imaging. Radiol Artif Intell. 2023;5. doi:10.1148/ryai.220232

  39. [39]

    Monte Carlo cross-validation for a study with binary outcome and limited sample size

    Shan G. Monte Carlo cross-validation for a study with binary outcome and limited sample size. BMC Med Inform Decis Mak. 2022;22:270. doi:10.1186/s12911- 022-02016-z

  40. [40]

    Few-shot deployment of pretrained MRI transformers in brain imaging tasks

    Li M, Shen G, Farris CW, Zhang X. Few-shot deployment of pretrained MRI transformers in brain imaging tasks. Front Artif Intell. 2026;9:1771088. doi:10.3389/frai.2026.1771088

  41. [41]

    Decoupled Weight Decay Regularization

    Loshchilov I, Hutter F. Decoupled weight decay regularization. In: International Conference on Learning Representations; 2019. doi:10.48550/arXiv.1711.05101

  42. [42]

    Image quality assessment: from error visibility to structural similarity

    Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13:600-

  43. [43]

    doi:10.1109/TIP.2003.819861

  44. [44]

    Mean squared error: love it or leave it? A new look at signal fidelity measures

    Wang Z, Bovik AC. Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process Mag. 2009;26:98-117. doi:10.1109/MSP.2008.930649

  45. [45]

    Scope of validity of PSNR in image/video quality assessment

    Huynh-Thu Q, Ghanbari M. Scope of validity of PSNR in image/video quality assessment. Electron Lett. 2008;44:800-801. doi:10.1049/el:20080522

  46. [46]

    Densely connected convolutional networks

    Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:4700-4708. doi:10.1109/CVPR.2017.243 48

  47. [47]

    MedViT: a robust vision transformer for generalized medical image classification

    Manzari ON, Ahmadabadi H, Kashiani H, et al. MedViT: a robust vision transformer for generalized medical image classification. Comput Biol Med. 2023;157:106791. doi:10.1016/j.compbiomed.2023.106791

  48. [48]

    Deep residual learning for image recognition

    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778. doi:10.1109/CVPR.2016.90

  49. [49]

    Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

    Hara K, Kataoka H, Satoh Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6546-6555. doi:10.48550/arXiv.1711.09577

  50. [50]

    Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study

    Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018;15. doi:10.1371/journal.pmed.1002683

  51. [51]

    Second opinion needed: communicating uncertainty in medical machine learning

    Kompa B, Snoek J, Beam AL. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit Med. 2021;4:4. doi:10.1038/s41746-020-00367-3

  52. [52]

    Effects of artificial intelligence implementation on efficiency in medical imaging: a systematic literature review and meta-analysis

    Wenderott K, Krups J, Zaruchas F, Weigl M. Effects of artificial intelligence implementation on efficiency in medical imaging: a systematic literature review and meta-analysis. NPJ Digit Med. 2024;7:265. doi:10.1038/s41746-024-01248-9

  53. [53]

    Structural magnetic resonance imaging in the practical assessment of dementia: beyond exclusion

    Scheltens P, Fox N, Barkhof F, De Carli C. Structural magnetic resonance imaging in the practical assessment of dementia: beyond exclusion. Lancet Neurol. 2002;1:13-21. doi:10.1016/S1474-4422(02)00002-9