arxiv: 2604.12778 · v1 · submitted 2026-04-14 · ⚛️ physics.med-ph · cs.AI· cs.CV

Recognition: unknown

DoseRAD2026 Challenge dataset: AI accelerated photon and proton dose calculation for radiotherapy

Fan Xiao , Nikolaos Delopoulos , Niklas Wahl , Lennart Volz , Lina Bucher , Matteo Maspero , Miguel Palacios , Muheng Li

show 8 more authors

Samir Schulz Viktor Rogowski Ye Zhang Zoltan Perko Christopher Kurz George Dedes Guillaume Landry Adrian Thummerer

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:29 UTC · model grok-4.3

classification ⚛️ physics.med-ph cs.AIcs.CV

keywords dose calculationradiotherapyphoton therapyproton therapyMRI-guideddatasetbenchmarkAI acceleration

0 comments

The pith

The DoseRAD2026 dataset supplies paired CT-MRI scans and Monte Carlo photon and proton dose maps from 115 patients as a public benchmark for AI dose calculation in radiotherapy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the DoseRAD2026 dataset and challenge, which consists of paired CT and MRI images from 115 patients treated on an MRI-linac for thoracic or abdominal lesions, together with corresponding ground-truth dose distributions. These doses were generated for thousands of individual photon beams and proton beamlets using open-source Monte Carlo algorithms after deformable registration and air-cavity correction. A sympathetic reader would value this because current dose engines are often too slow for real-time adaptive or MRI-guided workflows, and a standardized public benchmark could let AI methods learn to deliver comparable accuracy at much higher speed. The data are split into training and test sets with beam-level outputs and configuration files, released under a non-commercial license to support method development across photon and proton modalities.

Core claim

The paper claims that the DoseRAD2026 dataset, built from the SynthRAD2025 collection and containing 40,500 photon beams plus 81,000 proton beamlets with paired CT-MRI images, provides reliable ground-truth Monte Carlo doses that can serve as training targets and evaluation standards for fast AI-based dose calculation algorithms in both photon and proton radiotherapy.

What carries the argument

The dataset of paired CT-MRI volumes, beam-level Monte Carlo dose maps, and JSON beam configuration files produced after deformable image registration and air-cavity correction.

If this is right

AI models trained on the data can be evaluated for beam-level dose estimation in photon therapy.
The paired MRI data support development of dose calculation methods that operate directly on MRI without CT.
The challenge format with a withheld test set allows standardized comparison of different fast-calculation approaches.
Real-time adaptive radiotherapy workflows gain a resource for training models that recalculate doses rapidly on daily imaging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread adoption of models trained on this benchmark could shorten overall treatment planning time in MRI-linac clinics.
The thoracic and abdominal focus may help AI methods handle respiratory motion better than current analytical approximations.
Public release of both photon and proton subsets in the same format encourages unified research across particle and conventional radiotherapy.

Load-bearing premise

The deformable registration between CT and MRI, the air-cavity corrections, and the Monte Carlo simulations produce paired data and dose values accurate enough to act as reliable targets for training AI models that will be used in real clinical scenarios.

What would settle it

An independent experiment that measures actual delivered doses in a phantom or patient cohort and finds that AI models trained on the dataset systematically deviate from those measurements by more than clinical tolerance limits.

Figures

Figures reproduced from arXiv: 2604.12778 by Adrian Thummerer, Christopher Kurz, Fan Xiao, George Dedes, Guillaume Landry, Lennart Volz, Lina Bucher, Matteo Maspero, Miguel Palacios, Muheng Li, Niklas Wahl, Nikolaos Delopoulos, Samir Schulz, Viktor Rogowski, Ye Zhang, Zoltan Perko.

**Figure 2.** Figure 2: Representative proton beamlet dose distributions overlaid on CT (Task 3, top [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Folder structure for the photon dataset. B, beam; CP, control point. [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Folder structure for the proton dataset. B, beam; R, ray; L, beamlet. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

read the original abstract

Purpose: Accurate dose calculation is essential in radiotherapy for precise tumor irradiation while sparing healthy tissue. With the growing adoption of MRI-guided and real-time adaptive radiotherapy, fast and accurate dose calculation on CT and MRI is increasingly needed. The DoseRAD2026 dataset and challenge provide a public benchmark of paired CT and MRI data with beam-level photon and proton Monte Carlo dose distributions for developing and evaluating advanced dose calculation methods. Acquisition and validation methods: The dataset comprises paired CT and MRI from 115 patients (75 training, 40 testing) treated on an MRI-linac for thoracic or abdominal lesions, derived from the SynthRAD2025 dataset. Pre-processing included deformable image registration, air-cavity correction, and resampling. Ground-truth photon (6 MV) and proton dose distributions were computed using open-source Monte Carlo algorithms, yielding 40,500 photon beams and 81,000 proton beamlets. Data format and usage notes: Data are organized into photon and proton subsets with paired CT-MRI images, beam-level dose distributions, and JSON beam configuration files. Files are provided in compressed MetaImage (.mha) format. The dataset is released under CC BY-NC 4.0, with training data available from April 2026 and the test set withheld until March 2030. Potential applications: The dataset supports benchmarking of fast dose calculation methods, including beam-level dose estimation for photon and proton therapy, MRI-based dose calculation in MRI-guided workflows, and real-time adaptive radiotherapy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a dataset release that adds beam-level photon and proton Monte Carlo doses to existing CT-MRI pairs, but skips the quantitative checks needed to trust the ground truths.

read the letter

The main point is that DoseRAD2026 supplies paired CT and MRI scans from 115 thoracic and abdominal patients along with Monte Carlo photon and proton dose maps computed at the beam level. It extends the SynthRAD2025 collection by running open-source MC simulations for 6 MV photons and proton beamlets, producing over 120,000 individual beams total, and packages everything with JSON configs in a challenge-ready format under a CC license. That combination of modalities and scale is new for this kind of public benchmark and should be useful for groups working on fast AI dose engines for MRI-guided workflows. The data split and long test-set hold-out also follow standard challenge practice. What they did cleanly is keep the release practical and reproducible by sticking to open tools and clear file organization. The soft spot is the validation. The paper outlines the deformable registration, air-cavity correction, and resampling steps but gives no numbers on registration error, landmark accuracy, or organ overlap. It also skips any dose-level checks such as gamma pass rates against measurements or other codes, and offers no uncertainty estimates for density overrides. In regions with motion and large density gradients, those gaps mean users cannot easily judge how reliable the paired images and dose targets actually are. This paper is mainly for medical physicists and AI researchers who need public benchmarks to train and compare dose-calculation models. Anyone running a challenge or building MRI-based planning tools could get practical value from the data itself. I would send it to peer review so referees can ask for the missing metrics and decide whether the dataset meets the bar for a reliable training resource.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the DoseRAD2026 dataset and associated challenge, comprising paired CT-MRI scans from 115 patients (75 training, 40 testing) with corresponding beam-level Monte Carlo dose distributions for 40,500 photon (6 MV) beams and 81,000 proton beamlets. Derived from the SynthRAD2025 dataset after deformable image registration, air-cavity correction, and resampling, the data are provided in MetaImage format with JSON beam configurations under CC BY-NC 4.0 (training data from April 2026, test set withheld until March 2030) to benchmark AI-accelerated dose calculation methods for photon and proton radiotherapy, including MRI-guided workflows.

Significance. If the registration and Monte Carlo doses prove accurate, the dataset would offer a substantial public benchmark for developing fast dose calculation algorithms in MRI-linac and adaptive radiotherapy settings, leveraging open-source MC tools and providing an unusually large number of beam-level examples. This could support reproducible AI research and address the need for paired CT-MRI ground truth in regions with motion and density gradients.

major comments (2)

[Acquisition and validation methods] Acquisition and validation methods (abstract and § on pre-processing): The manuscript describes deformable image registration, air-cavity correction, and resampling at a high level but reports no quantitative metrics such as target registration error, landmark accuracy, or Dice scores for organs at risk in the thoracic/abdominal cohort. This omission directly undermines the claim that the paired data constitute reliable training targets, as even modest alignment errors in high-gradient regions would propagate into the dose distributions.
[Ground-truth photon and proton dose distributions] Ground-truth photon and proton dose distributions (abstract and § on Monte Carlo): Open-source MC algorithms are used for 6 MV photons and proton beamlets, yet no validation against measurements, other codes, gamma-index pass rates, or uncertainty estimates for air-cavity handling and density overrides is provided. Without these, the doses cannot be confidently treated as ground truth for clinical AI benchmarking.

minor comments (2)

[Data format and usage notes] Data format and usage notes: The organization into photon/proton subsets with JSON files is helpful, but explicit details on beamlet sampling, energy spectra, and how the exact counts of 40,500/81,000 were obtained would aid reproducibility.
[Abstract] Abstract: The unusually long embargo on the test set (until 2030) should be justified or accompanied by an earlier partial release plan to maximize community utility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript describing the DoseRAD2026 dataset. We address each major comment below and will revise the manuscript to incorporate additional validation details where feasible.

read point-by-point responses

Referee: [Acquisition and validation methods] Acquisition and validation methods (abstract and § on pre-processing): The manuscript describes deformable image registration, air-cavity correction, and resampling at a high level but reports no quantitative metrics such as target registration error, landmark accuracy, or Dice scores for organs at risk in the thoracic/abdominal cohort. This omission directly undermines the claim that the paired data constitute reliable training targets, as even modest alignment errors in high-gradient regions would propagate into the dose distributions.

Authors: We agree that quantitative metrics would strengthen confidence in the paired CT-MRI data. The dataset is derived from SynthRAD2025 after the described pre-processing steps, and we will revise the manuscript to reference the validation metrics (including Dice scores and any landmark or registration error measures) reported in the SynthRAD2025 publication. We will also add a brief discussion of potential residual alignment effects in high-gradient regions and note any additional metrics computable from our pipeline. revision: yes
Referee: [Ground-truth photon and proton dose distributions] Ground-truth photon and proton dose distributions (abstract and § on Monte Carlo): Open-source MC algorithms are used for 6 MV photons and proton beamlets, yet no validation against measurements, other codes, gamma-index pass rates, or uncertainty estimates for air-cavity handling and density overrides is provided. Without these, the doses cannot be confidently treated as ground truth for clinical AI benchmarking.

Authors: We acknowledge the value of explicit validation details for the Monte Carlo doses. The open-source algorithms have been validated in the literature against measurements and other codes; we will revise the Monte Carlo section to include relevant citations, any available gamma-index pass rates from our computations, and uncertainty estimates or details on air-cavity handling and density overrides. This will better support their use as ground truth for AI benchmarking. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release paper with no derivations or predictions

full rationale

This manuscript releases a public dataset of paired CT-MRI images and Monte Carlo-computed photon/proton dose distributions. It describes standard pre-processing steps (deformable registration, air-cavity correction, resampling) and use of open-source MC codes to generate 40,500 photon beams and 81,000 proton beamlets, but presents no equations, first-principles derivations, fitted parameters, or predictions that reduce to the inputs by construction. The central contribution is data provision under CC BY-NC 4.0, not a load-bearing claim whose validity collapses into self-definition, self-citation, or renaming. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard assumptions in radiotherapy physics and medical imaging; no new free parameters or invented entities are introduced as this is a data release rather than a theoretical derivation.

axioms (2)

domain assumption Monte Carlo simulations provide accurate ground-truth dose distributions
Invoked in the acquisition and validation methods section of the abstract.
domain assumption Deformable image registration can accurately align CT and MRI for dose calculation purposes
Mentioned in pre-processing.

pith-pipeline@v0.9.0 · 5626 in / 1367 out tokens · 51846 ms · 2026-05-10T13:29:46.404610+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references

[1]

1 M. B. Barton, S. Jacob, J. Shafiq, K. Wong, S. R. Thompson, T. P. Hanna, and G. P. Delaney, Estimating the demand for radiotherapy from the evidence: a review of changes from 2003 to 2012, Radiotherapy and oncology112, 140–144 (2014). 2 Y. Lievens, J. M. Borras, and C. Grau, Provision and use of radiotherapy in Europe, Molecular oncology14, 1461–1469 (2...

2003
[2]

photon

Parameter Description V alue disp hwd The displacement range in voxels of the discretised search space 6 grid sp The grid spacing in voxels used for the effective res- olution of features and deformable transformation 5 convex mind r The MIND radius in voxels 2 convex mind d The MIND dilation in voxels 2 adam mind r The MIND radius in voxels 1 adam mind d...

2000