pith. sign in

arxiv: 2605.24609 · v1 · pith:44HQYUHTnew · submitted 2026-05-23 · ⚛️ physics.med-ph · cs.AI· cs.CV

Catching MRI outliers: unsupervised detection and localization of MRI artefacts and clinical anomalies using deep learning

Pith reviewed 2026-06-30 11:56 UTC · model grok-4.3

classification ⚛️ physics.med-ph cs.AIcs.CV
keywords unsupervised anomaly detectionMRI quality controlpelvic MRIbrain MRIdeep learningtokenizationradiotherapyoutlier detection
0
0 comments X

The pith

A two-stage unsupervised framework detects and localizes anomalies in pelvic and brain MRI by tokenizing slices and scoring deviations from normal token distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests a fully automated system that learns only from normal MRI examples drawn from public collections and then flags both synthetic and real anomalies in new scans. It works by first turning image slices into discrete tokens and then modeling the expected distribution of those tokens so that unusual ones produce high surprisal scores. These scores are combined with perceptual image differences to generate both a detection decision and a spatial heatmap. The approach is evaluated on hidden cohorts and produces AUCs of 0.97 for pelvic cases and 0.81 for brain cases while showing spatial agreement with ground-truth anomaly locations. Such a system could act as an automated quality-control step before AI tools are applied in radiotherapy planning.

Core claim

The two-stage framework, trained solely on reference images from public pelvic and brain datasets, compresses slices into discrete tokens, models the distribution of normal tokens, and estimates anomaly evidence by combining perceptual differences with token-surprisal scores based on negative log-likelihood. On held-out evaluation data the system reaches AUCs of 0.97 (pelvic MRI with synthetic and real anomalies) and 0.81 (brain MRI with clinically annotated abnormalities) while heatmaps align with ground-truth locations, supporting its use as an automated MRI quality-control layer.

What carries the argument

Two-stage tokenization-plus-distribution-modeling pipeline that converts MRI slices to discrete tokens and scores anomalies via combined perceptual difference and negative-log-likelihood surprisal.

If this is right

  • The method supplies both a binary detection flag and a spatial heatmap that can highlight regions likely to compromise downstream AI tasks.
  • Unsupervised training on normal images alone removes the need for labeled anomaly examples during model development.
  • The same architecture can be applied to both pelvic and brain MRI without task-specific retraining of the core token model.
  • Transparent visualization of flagged regions supports interpretability for clinical quality-control review.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The token-based representation might transfer to other MRI contrasts or body sites if the normal-token distribution can be re-estimated from appropriate reference scans.
  • Integration into a radiotherapy planning pipeline could reduce the volume of images requiring manual inspection before AI segmentation or dose calculation.
  • Because the method relies on public datasets, its reported performance sets a baseline that future work can compare against when new reference collections become available.

Load-bearing premise

Public reference datasets adequately represent the distribution of normal anatomy encountered in the target radiotherapy workflow, and the synthetic anomalies used for evaluation are representative of real clinical anomalies.

What would settle it

Performance measured on a new set of real clinical pelvic MRI cases drawn directly from the radiotherapy workflow falls substantially below the reported AUC of 0.97.

read the original abstract

Artificial intelligence is increasingly integrated into radiotherapy workflows, yet such pipelines remain vulnerable to out-of-distribution image data that may introduce unexpected behavior in clinical tasks. Deep learning-based anomaly detection for pelvic magnetic resonance imaging (MRI) remains largely unexplored, and transparent evaluation of its feasibility for full automation is limited. We developed and evaluated a fully automated, unsupervised anomaly-detection framework for pelvic and brain MRI. A two-stage framework was trained on reference images from public datasets: LUND-PROBE for pelvic MRI, and IXI, fastMRI, and fastMRI+ for brain MRI. In the first stage, MRI slices were compressed into discrete tokens; in the second, the distribution of normal tokens was modeled. Anomaly evidence was estimated by combining perceptual image differences with token-surprisal scores based on negative log-likelihood. Automated detection was evaluated on pelvic MRI with synthetic global and real clinical anomalies, and on brain MRI with clinically annotated fastMRI+ abnormalities. Sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and false-positive behavior in held-out normal cases were assessed. The framework achieved robust detection across hidden evaluation cohorts, with AUCs of 0.97 (95% CI, 0.95-0.98) and 0.81 (95% CI, 0.74-0.87) for pelvic and brain MRI, respectively. Heatmap analysis showed strong spatial agreement between detected anomalies and ground-truth locations, supporting localization accuracy and interpretability. These results support the potential of unsupervised anomaly detection as an automated MRI quality-control layer for radiotherapy workflows, with transparent visualization of image regions likely to compromise downstream AI-based tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript presents a two-stage unsupervised anomaly detection framework for pelvic and brain MRI. It trains a token-compression model followed by a token-distribution model exclusively on public datasets (LUND-PROBE for pelvic; IXI, fastMRI, fastMRI+ for brain), then scores anomalies by combining perceptual image differences with negative-log-likelihood surprisal. Automated detection is evaluated on held-out pelvic cases containing synthetic global and real clinical anomalies and on brain cases with clinically annotated fastMRI+ abnormalities, yielding AUCs of 0.97 (95% CI 0.95-0.98) and 0.81 (95% CI 0.74-0.87) respectively, together with spatially localized heatmaps.

Significance. If the performance claims survive domain-matched validation, the work would supply a practical, fully unsupervised and interpretable quality-control layer that could be inserted upstream of downstream radiotherapy AI tasks to flag out-of-distribution MRI inputs.

major comments (3)
  1. [Abstract/Methods] Abstract and Methods: The central performance claims rest on the untested premise that the cited public reference datasets adequately represent the distribution of normal pelvic and brain anatomy encountered in radiotherapy workflows (different field strengths, coil configurations, positioning, and patient populations). No domain-shift quantification, statistical comparison, or radiotherapy-specific normal reference set is described; mismatch would directly invalidate calibration of the token-surprisal and perceptual-difference scores.
  2. [Methods] Methods: No information is supplied on the tokenization architecture, the probabilistic model used to capture the normal token distribution, training hyperparameters or procedure, baseline anomaly-detection methods, or the rule used to set decision thresholds. These omissions are load-bearing because they prevent any assessment of whether the reported AUCs are reproducible or whether thresholds were chosen post-hoc on the evaluation set.
  3. [Evaluation] Evaluation: Pelvic performance is assessed with synthetic global anomalies whose realism relative to actual clinical anomalies is not demonstrated; combined with the domain-shift issue above, this weakens the evidential support for the claim that the framework is ready for radiotherapy quality control.
minor comments (1)
  1. The phrase 'hidden evaluation cohorts' is used without an explicit statement of how these cohorts were constructed or how they differ from the training distribution beyond being held out.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate where revisions will be made to improve clarity, reproducibility, and evidential support.

read point-by-point responses
  1. Referee: [Abstract/Methods] Abstract and Methods: The central performance claims rest on the untested premise that the cited public reference datasets adequately represent the distribution of normal pelvic and brain anatomy encountered in radiotherapy workflows (different field strengths, coil configurations, positioning, and patient populations). No domain-shift quantification, statistical comparison, or radiotherapy-specific normal reference set is described; mismatch would directly invalidate calibration of the token-surprisal and perceptual-difference scores.

    Authors: We acknowledge the importance of domain shift considerations. The public datasets (LUND-PROBE, IXI, fastMRI, fastMRI+) were selected because they provide large-scale, standardized, high-quality acquisitions that are standard benchmarks in MRI research. However, we agree that explicit discussion of potential mismatches with radiotherapy-specific protocols is warranted. In the revised manuscript we will add a new subsection in Methods/Discussion that qualitatively compares acquisition parameters (field strength, coil type, patient positioning) between the training sets and typical radiotherapy MRI, note this as a limitation, and suggest prospective validation on in-house radiotherapy data. No new quantitative domain-shift experiments are feasible within the current study scope, but the added text will temper the claims accordingly. revision: partial

  2. Referee: [Methods] Methods: No information is supplied on the tokenization architecture, the probabilistic model used to capture the normal token distribution, training hyperparameters or procedure, baseline anomaly-detection methods, or the rule used to set decision thresholds. These omissions are load-bearing because they prevent any assessment of whether the reported AUCs are reproducible or whether thresholds were chosen post-hoc on the evaluation set.

    Authors: We apologize for the insufficient detail in the submitted Methods section. The full paper contains a high-level description, but we agree it lacks the necessary specifics for reproducibility. In the revision we will expand the Methods section to include: (i) the exact tokenization architecture and its hyperparameters, (ii) the probabilistic model (including its formulation), (iii) the complete training procedure and hyperparameter values, (iv) any baseline methods evaluated, and (v) the precise rule for threshold selection (performed on a held-out validation subset of normal cases, not the test set). These additions will directly address reproducibility concerns. revision: yes

  3. Referee: [Evaluation] Evaluation: Pelvic performance is assessed with synthetic global anomalies whose realism relative to actual clinical anomalies is not demonstrated; combined with the domain-shift issue above, this weakens the evidential support for the claim that the framework is ready for radiotherapy quality control.

    Authors: We agree that the realism of the synthetic anomalies should be explicitly justified. The synthetic anomalies were constructed to replicate common clinical artifacts (global intensity shifts, localized signal voids, noise patterns) observed in the real clinical anomaly cases within the evaluation cohort. In the revision we will add a supplementary figure and accompanying text that visually and quantitatively compares the synthetic anomalies to the real clinical anomalies present in the pelvic test set. We will also revise the Discussion to frame the framework as a promising quality-control approach whose clinical readiness requires further multi-center validation, rather than claiming immediate deployment readiness. revision: partial

Circularity Check

0 steps flagged

No circularity: training on external public datasets and evaluation on held-out anomaly cohorts are independent

full rationale

The paper trains an unsupervised token-based model on reference images drawn from independent public datasets (LUND-PROBE, IXI, fastMRI, fastMRI+) and evaluates detection performance on separate hidden cohorts containing synthetic or clinically annotated anomalies. No equations, fitted parameters, or self-citations are described that would make the reported AUCs equivalent to the training inputs by construction. The derivation chain consists of standard unsupervised density modeling followed by out-of-distribution scoring; the evaluation metrics are computed against externally labeled ground truth and therefore remain falsifiable outside the fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that the chosen public datasets capture the relevant normal distribution.

pith-pipeline@v0.9.1-grok · 5874 in / 1034 out tokens · 22452 ms · 2026-06-30T11:56:06.544770+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    Artificial intelligence in radiation oncology

    Huynh E, Hosny A, Guthier C, Bitterman DS, Petit SF , Haas-Kogan DA, et al. Artificial intelligence in radiation oncology. Nat Rev Clin Oncol 2020;17:771–81. https://doi.org/10.1038/s41571-020-0417-8

  2. [2]

    The Evolving Role of Artificial Intelligence in Radiotherapy Treatment Planning—A Literature Review

    Kalsi S, French H, Chhaya S, Madani H, Mir R, Anosova A, et al. The Evolving Role of Artificial Intelligence in Radiotherapy Treatment Planning—A Literature Review. Clinical Oncology 2024;36:596–605. https://doi.org/10.1016/j.clon.2024.06.005

  3. [3]

    Artificial intelligence in radiotherapy: Current applications and future trends

    Giraud P , Bibault J-E. Artificial intelligence in radiotherapy: Current applications and future trends. Diagnostic and Interventional Imaging 2024;105:475–80. https://doi.org/10.1016/j.diii.2024.06.001

  4. [4]

    Real world AI-driven segmentation: Efficiency gains and workflow challenges in radiotherapy

    Malone C, Nicholson J, Ryan S, Thirion P , Woods R, McBride P , et al. Real world AI-driven segmentation: Efficiency gains and workflow challenges in radiotherapy. Radiotherapy and Oncology 2025;209:110977. https://doi.org/10.1016/j.radonc.2025.110977

  5. [5]

    The role of artificial intelligence in radiotherapy clinical practice

    Landry G, Kurz C, Traverso A. The role of artificial intelligence in radiotherapy clinical practice. BJR Open 2023;5:20230030. https://doi.org/10.1259/bjro.20230030

  6. [6]

    AI in Radiation Oncology: A Comprehensive Review of Current Applications and Future Directions

    Zafar F , Vilsan J, Mani S, Al Yousif AR, Cano-Reyes SE, Abraham G, et al. AI in Radiation Oncology: A Comprehensive Review of Current Applications and Future Directions. Cureus 2025;17:e92964. https://doi.org/10.7759/cureus.92964

  7. [7]

    Artificial intelligence-powered innovations in radiotherapy: boosting efficiency and efficacy

    Chen J, Zhu X, Jin J-Y , Kong F-MS, Yang G. Artificial intelligence-powered innovations in radiotherapy: boosting efficiency and efficacy. Med Rev (2021) 2025;5:348–51. https://doi.org/10.1515/mr-2025-0007

  8. [8]

    Overview of artificial intelligence-based applications in radiotherapy: Recommendations for implementation and quality assurance

    Vandewinckele L, Claessens M, Dinkla A, Brouwer C, Crijns W, Verellen D, et al. Overview of artificial intelligence-based applications in radiotherapy: Recommendations for implementation and quality assurance. Radiotherapy and Oncology 2020;153:55–66. https://doi.org/10.1016/j.radonc.2020.09.008

  9. [9]

    Artificial intelligence for quality assurance in radiotherapy

    Simon L, Robert C, Meyer P . Artificial intelligence for quality assurance in radiotherapy. Cancer/Radiothérapie 2021;25:623–6. https://doi.org/10.1016/j.canrad.2021.06.012

  10. [10]

    Artificial intelligence (AI)-based multi- organ contour quality assurance with uncertainty estimation for online adaptive radiotherapy (oART)

    Yan S, Xie J, Chen N, Nguyen D, Su F-C, Yang D, et al. Artificial intelligence (AI)-based multi- organ contour quality assurance with uncertainty estimation for online adaptive radiotherapy (oART). Mach Learn: Health 2026;2:015001. https://doi.org/10.1088/3049- 477X/ae3320

  11. [11]

    Quality Assurance for AI-Based Applications in Radiation Therapy

    Claessens M, Oria CS, Brouwer CL, Ziemer BP , Scholey JE, Lin H, et al. Quality Assurance for AI-Based Applications in Radiation Therapy. Seminars in Radiation Oncology 2022;32:421–

  12. [12]

    https://doi.org/10.1016/j.semradonc.2022.06.011

  13. [14]

    Kleber CEJ, Karius R, Naessens LE, Van Toledo CO, A. C. Van Osch J, Boomsma MF , et al. Advancements in supervised deep learning for metal artifact reduction in computed tomography: A systematic review. European Journal of Radiology 2024;181:111732. https://doi.org/10.1016/j.ejrad.2024.111732

  14. [15]

    Data drift in medical machine learning: implications and potential remedies

    Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. The British Journal of Radiology 2023;96:20220878. https://doi.org/10.1259/bjr.20220878

  15. [16]

    A review of deep learning-based Unsupervised Anomaly Detection in brain MRI

    Behrendt F , Bhattacharya D, Maack L, Krüger J, Opfer R, Schlaefer A. A review of deep learning-based Unsupervised Anomaly Detection in brain MRI. Medical Image Analysis 2026;112:104076. https://doi.org/10.1016/j.media.2026.104076

  16. [17]

    Unsupervised brain imaging 3D anomaly detection and segmentation with transformers

    Pinaya WHL, Tudosiu P-D, Gray R, Rees G, Nachev P , Ourselin S, et al. Unsupervised brain imaging 3D anomaly detection and segmentation with transformers. Medical Image Analysis 2022;79:102475. https://doi.org/10.1016/j.media.2022.102475

  17. [18]

    Anomaly detection in brain MRI: a comprehensive review

    Kim J, Shin Y . Anomaly detection in brain MRI: a comprehensive review. Biomed Eng Lett 2026;16:369–85. https://doi.org/10.1007/s13534-026-00551-6

  18. [20]

    Evaluating normative representation learning in generative AI for robust anomaly detection in brain imaging

    Bercea CI, Wiestler B, Rueckert D, Schnabel JA. Evaluating normative representation learning in generative AI for robust anomaly detection in brain imaging. Nat Commun 2025;16:1624. https://doi.org/10.1038/s41467-025-56321-y

  19. [21]

    Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study

    Baur C, Denner S, Wiestler B, Navab N, Albarqouni S. Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study. Medical Image Analysis 2021;69:101952. https://doi.org/10.1016/j.media.2020.101952

  20. [22]

    Applications of Artificial Intelligence in Prostate Cancer Radiotherapy: A Narrative Review

    Piras A, Comelli A, D’Aviero A, Dispensa N, Pavan N, Di Maida F , et al. Applications of Artificial Intelligence in Prostate Cancer Radiotherapy: A Narrative Review. Radiation 2026;6:15. https://doi.org/10.3390/radiation6020015

  21. [23]

    Evaluation of a deep learning magnetic resonance imaging reconstruction method for synthetic computed tomography generation in prostate radiotherapy

    Olsson LE, Af Wetterstedt S, Scherman J, Gunnlaugsson A, Persson E, Jamtheim Gustafsson C. Evaluation of a deep learning magnetic resonance imaging reconstruction method for synthetic computed tomography generation in prostate radiotherapy. Physics and Imaging in Radiation Oncology 2024;29:100557. https://doi.org/10.1016/j.phro.2024.100557

  22. [24]

    Magnetic Resonance Imaging only Workflow for Radiotherapy Simulation and Planning in Prostate Cancer

    Kerkmeijer LGW, Maspero M, Meijer GJ, Van Der Voort Van Zyp JRN, De Boer HCJ, Van Den Berg CAT. Magnetic Resonance Imaging only Workflow for Radiotherapy Simulation and Planning in Prostate Cancer. Clinical Oncology 2018;30:692–701. https://doi.org/10.1016/j.clon.2018.08.009

  23. [25]

    MR-guided radiotherapy for prostate cancer: state of the art and future perspectives

    Sritharan K, Tree A. MR-guided radiotherapy for prostate cancer: state of the art and future perspectives. The British Journal of Radiology 2022;95:20210800. https://doi.org/10.1259/bjr.20210800

  24. [26]

    MRI-only treatment planning: benefits and challenges

    Owrangi AM, Greer PB, Glide-Hurst CK. MRI-only treatment planning: benefits and challenges. Phys Med Biol 2018;63:05TR01. https://doi.org/10.1088/1361-6560/aaaca4

  25. [27]

    Auto-Segmentation and Auto-Planning in Automated Radiotherapy for Prostate Cancer

    Huang S, Wu J, Lin X, Wang G, Song T, Chen L, et al. Auto-Segmentation and Auto-Planning in Automated Radiotherapy for Prostate Cancer. Bioengineering (Basel) 2025;12:620. https://doi.org/10.3390/bioengineering12060620

  26. [28]

    Assessment of a fully-automated diagnostic AI software in prostate MRI: Clinical evaluation and histopathological correlation

    Bayerl N, Adams LC, Cavallaro A, Bäuerle T, Schlicht M, Wullich B, et al. Assessment of a fully-automated diagnostic AI software in prostate MRI: Clinical evaluation and histopathological correlation. European Journal of Radiology 2024;181:111790. https://doi.org/10.1016/j.ejrad.2024.111790

  27. [29]

    Errors in radiation oncology: a study in pathways and dosimetric impact

    Klein EE, Drzymala RE, Purdy JA, Michalski J. Errors in radiation oncology: a study in pathways and dosimetric impact. J Appl Clin Med Phys 2005;6:81–94. https://doi.org/10.1120/jacmp.v6i3.2105

  28. [30]

    Clinical adoption of deep learning target auto-segmentation for radiation therapy: challenges, clinical risks, and mitigation strategies

    De Biase A, Sijtsema NM, Janssen T, Hurkmans C, Brouwer C, Van Ooijen P . Clinical adoption of deep learning target auto-segmentation for radiation therapy: challenges, clinical risks, and mitigation strategies. BJR|Artificial Intelligence 2024;1:ubae015. https://doi.org/10.1093/bjrai/ubae015

  29. [31]

    LUND- PROBE – LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset

    Rogowski V , Olsson LE, Scherman J, Persson E, Kadhim M, Af Wetterstedt S, et al. LUND- PROBE – LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset. Sci Data 2025;12:611. https://doi.org/10.1038/s41597-025-04954-5

  30. [32]

    https://brain-development.org/ixi-dataset; 2026 [acessed 25 May 2026]

    IXI Dataset. https://brain-development.org/ixi-dataset; 2026 [acessed 25 May 2026]

  31. [34]

    fastMRI+, Clinical pathology annotations for knee and brain fully sampled magnetic resonance imaging data

    Zhao R, Yaman B, Zhang Y , Stewart R, Dixon A, Knoll F , et al. fastMRI+, Clinical pathology annotations for knee and brain fully sampled magnetic resonance imaging data. Sci Data 2022;9:152. https://doi.org/10.1038/s41597-022-01255-z

  32. [37]

    https://github.com/MustafaKadhim/Self-supervised-anomaly-detection- for-medical-images; 2026 [accessed 25 May 2026]

    GitHub Repository. https://github.com/MustafaKadhim/Self-supervised-anomaly-detection- for-medical-images; 2026 [accessed 25 May 2026]

  33. [38]

    Anomaly detection in radiotherapy plans using deep autoencoder networks

    Huang P , Shang J, Xu Y , Hu Z, Zhang K, Dai J, et al. Anomaly detection in radiotherapy plans using deep autoencoder networks. Front Oncol 2023;13:1142947. https://doi.org/10.3389/fonc.2023.1142947

  34. [39]

    Deep learning-based automatic contour quality assurance for auto-segmented abdominal MR-Linac contours

    Zarenia M, Zhang Y , Sarosiek C, Conlin R, Amjad A, Paulson E. Deep learning-based automatic contour quality assurance for auto-segmented abdominal MR-Linac contours. Phys Med Biol 2024;69:215029. https://doi.org/10.1088/1361-6560/ad87a6

  35. [40]

    Proof of concept of a fully unsupervised anomaly detection framework in CBCT‐guided radiotherapy

    Luximon DC, Ritter M, Petragallo R, Pijanowski J, Neylon J, Ritter T, et al. Proof of concept of a fully unsupervised anomaly detection framework in CBCT‐guided radiotherapy. Medical Physics 2025;52:e18020. https://doi.org/10.1002/mp.18020

  36. [41]

    arXiv (2023)

    Bercea CI, Wiestler B, Rueckert D, Schnabel JA. Towards Universal Unsupervised Anomaly Detection in Medical Imaging 2024. https://doi.org/10.48550/ARXIV .2401.10637. Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: CJG is a part time con...