How Sensitive Are Radiomic AI Models to Acquisition Parameters?
Pith reviewed 2026-06-30 21:01 UTC · model grok-4.3
The pith
Radiomic AI models for lung cancer in CT improve from 0.79 sensitivity and 0.47 specificity to 0.90 and 0.79 under specific acquisition settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a mixed-effects framework quantifies the influence of clinically relevant acquisition parameters on radiomic AI performance while accounting for subject-level random effects, and that adjusting parameters to tube current >= 200 mA, spiral pitch <= 1.5 and slice thickness <= 1.25 mm raises sensitivity from 0.79+-0.04 to 0.90+-0.10 and specificity from 0.47+-0.10 to 0.79+-0.13, with the gains reproduced when parameters chosen on one dataset are tested on the other.
What carries the argument
A mixed-effects framework that models fixed effects of acquisition parameters on performance metrics while treating subject variations as random effects, applied to lung cancer classification across two independent multicentre CT datasets.
If this is right
- Standardising on the identified parameter ranges improves cross-dataset robustness without increasing radiation dose.
- The framework can flag clinically relevant parameter regions that stabilise model output across centres.
- Low-quality scans suffer measurable drops that are recoverable by moving to the higher-quality end of the identified ranges.
- The same performance lift appears across several state-of-the-art architectures when the parameters are adjusted.
- Adjusting parameters on one collected dataset and validating on a public set demonstrates reproducibility of the gains.
Where Pith is reading between the lines
- Hospitals could adopt these thresholds as default protocols for radiomic AI workflows to reduce the need for later domain adaptation.
- The same mixed-effects approach might reveal whether identical parameter ranges matter for other cancer sites or imaging modalities.
- If the gains hold, future studies could test whether controlling acquisition upfront reduces reliance on post-hoc correction methods.
- A direct follow-up would apply the framework to a third independent dataset to check whether the same three parameters remain optimal.
Load-bearing premise
The mixed-effects model can separate the effects of acquisition parameters from subject-level differences and other unmeasured factors in multicentre data.
What would settle it
Re-analysis on a fresh multicentre cohort shows no statistically significant performance difference between the recommended parameter ranges and other commonly used settings.
Figures
read the original abstract
A main barrier for the deployment of AI radiomic systems in clinical routine is their drop in performance under heterogeneous multicentre acquisition protocols. This work presents a performance-oriented framework for quantifying scan parameter sensitivity of radiomic AI models, while identifying clinically significant parameter regions associated with improved cross-dataset robustness. We formulate a mixed-effects framework for quantifying the influence that clinically relevant acquisition parameters have on models performance, while accounting for subject-level random effects. We have applied our framework to lung cancer diagnosis in CT scans using two independent multicentre datasets (a public database and own-collected data) and several SoA architectures. To evaluate across-database reproducibility, CT parameters have been adjusted using the data collected and tested on the public set. The optimal configuration selected is the current of the X-ray tube >= 200 mA, spiral pitch <= 1.5, slice thickness <= 1.25 mm, which balances diagnostic quality with low radiation dose. These configuration push metrics from 0.79+-0.04 sensitivity, 0.47+-0.10 specificity in low quality scans to 0.90+-0.10 sensitivity, 0.79 +- 0.13 specificity in high quality ones.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a performance-oriented framework using a mixed-effects model to quantify the sensitivity of radiomic AI models to CT acquisition parameters (tube current, spiral pitch, slice thickness) for lung cancer diagnosis across multicentre datasets. It identifies an optimal configuration (tube current >= 200 mA, pitch <= 1.5, slice thickness <= 1.25 mm) that improves sensitivity from 0.79+-0.04 to 0.90+-0.10 and specificity from 0.47+-0.10 to 0.79+-0.13, and evaluates across-database reproducibility by deriving parameters from collected data and testing on a public set.
Significance. If the framework correctly isolates parameter effects, the work would provide actionable guidance for standardizing acquisition protocols to enhance cross-centre robustness of radiomic AI systems, addressing a key deployment barrier. The use of two independent multicentre datasets and an across-database test is a methodological strength for assessing generalizability.
major comments (2)
- [Abstract] Abstract (mixed-effects framework): The model accounts only for subject-level random effects while quantifying influence of acquisition parameters. In multicentre CT data, these parameters are typically confounded with center-specific factors (scanner vendor, reconstruction kernel, patient population) that are not stated as included in the model. If center-level variation is absorbed into the parameter coefficients, the identified optimal region (>=200 mA, <=1.5 pitch, <=1.25 mm) and the reported lift cannot be attributed cleanly to the parameters themselves.
- [Abstract] Abstract (across-database reproducibility evaluation): The optimal parameter thresholds appear derived from analysis of the collected dataset, raising the possibility that the reported improvements reflect data-driven selection rather than independent prediction. The abstract provides no details on model validation, statistical significance testing of the framework, handling of multiple comparisons, or potential data selection effects, limiting assessment of whether the central claim is supported.
minor comments (1)
- The abstract mentions 'several SoA architectures' without specifying which models were used, which would aid in assessing the generality of the findings.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract (mixed-effects framework): The model accounts only for subject-level random effects while quantifying influence of acquisition parameters. In multicentre CT data, these parameters are typically confounded with center-specific factors (scanner vendor, reconstruction kernel, patient population) that are not stated as included in the model. If center-level variation is absorbed into the parameter coefficients, the identified optimal region (>=200 mA, <=1.5 pitch, <=1.25 mm) and the reported lift cannot be attributed cleanly to the parameters themselves.
Authors: We agree that the potential confounding with center-specific factors is an important consideration not explicitly addressed in the current model. The mixed-effects framework was designed to account for subject-level variability while estimating the fixed effects of the acquisition parameters. To provide a cleaner attribution, we will revise the analysis to include center as a random effect in addition to subject. This will help isolate the parameter effects from center-level variations. revision: yes
-
Referee: [Abstract] Abstract (across-database reproducibility evaluation): The optimal parameter thresholds appear derived from analysis of the collected dataset, raising the possibility that the reported improvements reflect data-driven selection rather than independent prediction. The abstract provides no details on model validation, statistical significance testing of the framework, handling of multiple comparisons, or potential data selection effects, limiting assessment of whether the central claim is supported.
Authors: The across-database test uses thresholds derived from the collected dataset and applies them to the independent public dataset, which supports the reproducibility claim. However, we acknowledge that the abstract lacks sufficient methodological details. In the revised manuscript, we will expand the abstract to include information on the validation procedure, statistical significance testing, handling of multiple comparisons, and clarification on data selection to better support the central claims. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical sensitivity analysis via a mixed-effects model applied to two multicentre CT datasets. Optimal parameter thresholds are identified from the collected data and evaluated for reproducibility on the public set, with performance metrics reported as observed differences between parameter-defined scan quality groups. No equations, self-citations, or uniqueness claims are present that reduce any result to its inputs by construction. The workflow is a standard data-driven observational study; the across-database step supplies an independent test set. No load-bearing self-definitional, fitted-prediction, or ansatz-smuggling patterns appear in the provided text.
Axiom & Free-Parameter Ledger
free parameters (1)
- Optimal parameter thresholds =
>=200 mA, <=1.5, <=1.25 mm
axioms (1)
- domain assumption Mixed-effects model quantifies acquisition parameter influence on model performance while accounting for subject-level random effects
Reference graph
Works this paper leans on
-
[1]
Radiomicsandclinicaldataforthediagnosisofincidentalpulmonarynodulesandlungcancerscreening: radiolung integrative predictive model
Baeza,S.,Gil,D.,Sanchez,C.,etal.,2024. Radiomicsandclinicaldataforthediagnosisofincidentalpulmonarynodulesandlungcancerscreening: radiolung integrative predictive model. Arch. Bronconeumol. 60, S22–S30. Booth, J.G.,
2024
-
[2]
NeuroImage 167, 104–120
Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, 104–120. Foy,J.J.,Al-Hallaq,H.A.,Grekoski,V.,etal.,2020. Harmonizationofradiomicfeaturevariabilityresultingfromdifferencesinctimageacquisition and reconstruction: assessment in a cadaveric liver. Phys. Med. Biol. 65, 205008. gil, D., Rosell, A., Sánchez Ramos, C., et al.,
2020
-
[3]
RadioLung. doi:10.34810/data1972. He, K., Zhang, X., Ren, S., et al.,
-
[4]
Deep residual learning for image recognition, in: CVPR, pp. 770–778. Hosseini,S.H.,Monsefi,R.,Shadroo,S.,2023. Deeplearningapplicationsforlungcancerdiagnosis:asystematicreview. Multimed.ToolsAppl., 1–31. Ibrahim, A., Lu, L., Yang, H., et al.,
2023
-
[5]
PLOS ONE 16, e0251147
The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLOS ONE 16, e0251147. Li,Y.,Reyhan,M.,Zhang,Y.,etal.,2022. Theimpactofphantomdesignandmaterial-dependenceonrepeatabilityandreproducibilityofct-based radiomics features. Med. Phys. 49, 1648–1659. Ligero, M., Torres, G., San...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.