How Sensitive Are Radiomic AI Models to Acquisition Parameters?

C. Sanchez; D. Gil; I. Sanchez

arxiv: 2605.14667 · v1 · pith:7FE5JU6Hnew · submitted 2026-05-14 · 💻 cs.AI

How Sensitive Are Radiomic AI Models to Acquisition Parameters?

D. Gil , I. Sanchez , C. Sanchez This is my paper

Pith reviewed 2026-06-30 21:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords radiomicsCT acquisition parametersmixed-effects modellung cancer diagnosisAI robustnessmulticentre datasensitivity analysiscross-dataset reproducibility

0 comments

The pith

Radiomic AI models for lung cancer in CT improve from 0.79 sensitivity and 0.47 specificity to 0.90 and 0.79 under specific acquisition settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a mixed-effects statistical framework to measure how CT acquisition parameters influence the accuracy of radiomic AI models for lung cancer diagnosis while separating out patient-to-patient differences. It applies the framework to two separate multicentre datasets and finds that raising tube current to at least 200 mA, keeping spiral pitch at or below 1.5, and limiting slice thickness to 1.25 mm or less produces the largest gains in cross-centre performance. A reader would care because variable scan protocols are a primary reason these AI tools fail when moved from one hospital to another, and the work points to concrete parameter ranges that can be adopted without raising patient radiation exposure.

Core claim

The paper claims that a mixed-effects framework quantifies the influence of clinically relevant acquisition parameters on radiomic AI performance while accounting for subject-level random effects, and that adjusting parameters to tube current >= 200 mA, spiral pitch <= 1.5 and slice thickness <= 1.25 mm raises sensitivity from 0.79+-0.04 to 0.90+-0.10 and specificity from 0.47+-0.10 to 0.79+-0.13, with the gains reproduced when parameters chosen on one dataset are tested on the other.

What carries the argument

A mixed-effects framework that models fixed effects of acquisition parameters on performance metrics while treating subject variations as random effects, applied to lung cancer classification across two independent multicentre CT datasets.

If this is right

Standardising on the identified parameter ranges improves cross-dataset robustness without increasing radiation dose.
The framework can flag clinically relevant parameter regions that stabilise model output across centres.
Low-quality scans suffer measurable drops that are recoverable by moving to the higher-quality end of the identified ranges.
The same performance lift appears across several state-of-the-art architectures when the parameters are adjusted.
Adjusting parameters on one collected dataset and validating on a public set demonstrates reproducibility of the gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals could adopt these thresholds as default protocols for radiomic AI workflows to reduce the need for later domain adaptation.
The same mixed-effects approach might reveal whether identical parameter ranges matter for other cancer sites or imaging modalities.
If the gains hold, future studies could test whether controlling acquisition upfront reduces reliance on post-hoc correction methods.
A direct follow-up would apply the framework to a third independent dataset to check whether the same three parameters remain optimal.

Load-bearing premise

The mixed-effects model can separate the effects of acquisition parameters from subject-level differences and other unmeasured factors in multicentre data.

What would settle it

Re-analysis on a fresh multicentre cohort shows no statistically significant performance difference between the recommended parameter ranges and other commonly used settings.

Figures

Figures reproduced from arXiv: 2605.14667 by C. Sanchez, D. Gil, I. Sanchez.

**Figure 1.** Figure 1: Scheme of the main steps in the optimization of the acquisition parameters. HQ conditions, we also require that performance in HQ and LQ is, respectively, significantly higher and lower than performance obtained in the whole set. To do so, a multi-comparison test between LQ, HQ and ALL metrics is applied to identify significant differences for each metric in the pairs (LQ, ALL), (HQ, ALL) [PITH_FULL_IMAGE… view at source ↗

**Figure 2.** Figure 2: Distribution of CT acquisition parameters across samples of LUNA Database (a) and Radiolung Database (b). 2. Performance Sensitivity. The impact of Pareto optimal parameters was evaluated in both datasets using the model-specific formulations of Section 2.2. Independent models were adjusted for each set. Different fixedeffects models were adjusted for each dataset to assess method-specific sensitivities. … view at source ↗

**Figure 3.** Figure 3: Acquisition parameter optimization procedure applied to Radiolung database [200,1.5,1.25] with CI=[0.09,0.17] and 0.13 average. The configuration chosen for the remaining experiments is [200,1.5,1.25]. 5.2. Performance Sensitivity [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Performance decay for LUNA16, (a), and Radiolung, (b), Databases [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Performance boxplots for weighted F1Score, (a), and Accuracy, (b). in radiomic domain and DenseNet169 in intensity domain are the backbones that achieve best performance for small and non-small nodules in HQ conditions. This suggests stronger representation capacity of foundation models when combined with radiomic 3d representation domains. 6.3. Limitations We are aware that OR estimates differences in the… view at source ↗

read the original abstract

A main barrier for the deployment of AI radiomic systems in clinical routine is their drop in performance under heterogeneous multicentre acquisition protocols. This work presents a performance-oriented framework for quantifying scan parameter sensitivity of radiomic AI models, while identifying clinically significant parameter regions associated with improved cross-dataset robustness. We formulate a mixed-effects framework for quantifying the influence that clinically relevant acquisition parameters have on models performance, while accounting for subject-level random effects. We have applied our framework to lung cancer diagnosis in CT scans using two independent multicentre datasets (a public database and own-collected data) and several SoA architectures. To evaluate across-database reproducibility, CT parameters have been adjusted using the data collected and tested on the public set. The optimal configuration selected is the current of the X-ray tube >= 200 mA, spiral pitch <= 1.5, slice thickness <= 1.25 mm, which balances diagnostic quality with low radiation dose. These configuration push metrics from 0.79+-0.04 sensitivity, 0.47+-0.10 specificity in low quality scans to 0.90+-0.10 sensitivity, 0.79 +- 0.13 specificity in high quality ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The mixed-effects model only includes subject random effects, so center and scanner confounders likely drive the reported gains rather than the acquisition parameters themselves.

read the letter

The main takeaway is that this paper applies a mixed-effects model to link CT acquisition parameters to radiomic AI performance on lung cancer diagnosis, then picks an optimal region and shows metric improvement when testing across two multicentre datasets.

It does a solid job of using independent data sources and running an across-database check: parameters tuned on their collected scans are evaluated on the public set. That step is a practical way to look for reproducibility, and the reported lift from 0.79 to 0.90 sensitivity and 0.47 to 0.79 specificity under the chosen settings gives a concrete target.

The soft spot sits in the model itself. The abstract states it accounts only for subject-level random effects while treating tube current, pitch, and slice thickness as fixed. In real multicentre CT, those parameters are tied to site, vendor, and reconstruction choices, so the coefficients probably capture those unmodeled factors. The optimal thresholds (>=200 mA, <=1.5 pitch, <=1.25 mm) and the performance numbers therefore cannot be cleanly attributed to the parameters. The wide standard deviations and lack of any mention of center effects or multiple-comparison handling add to the uncertainty.

This is aimed at imaging researchers who need to make radiomics work across hospitals. It deserves a serious referee because the deployment barrier it targets is real and the cross-dataset test is a reasonable start, but the authors will have to show the model isolates the parameters or add center-level terms. I would bring it to a reading group to walk through the exact random-effects specification.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a performance-oriented framework using a mixed-effects model to quantify the sensitivity of radiomic AI models to CT acquisition parameters (tube current, spiral pitch, slice thickness) for lung cancer diagnosis across multicentre datasets. It identifies an optimal configuration (tube current >= 200 mA, pitch <= 1.5, slice thickness <= 1.25 mm) that improves sensitivity from 0.79+-0.04 to 0.90+-0.10 and specificity from 0.47+-0.10 to 0.79+-0.13, and evaluates across-database reproducibility by deriving parameters from collected data and testing on a public set.

Significance. If the framework correctly isolates parameter effects, the work would provide actionable guidance for standardizing acquisition protocols to enhance cross-centre robustness of radiomic AI systems, addressing a key deployment barrier. The use of two independent multicentre datasets and an across-database test is a methodological strength for assessing generalizability.

major comments (2)

[Abstract] Abstract (mixed-effects framework): The model accounts only for subject-level random effects while quantifying influence of acquisition parameters. In multicentre CT data, these parameters are typically confounded with center-specific factors (scanner vendor, reconstruction kernel, patient population) that are not stated as included in the model. If center-level variation is absorbed into the parameter coefficients, the identified optimal region (>=200 mA, <=1.5 pitch, <=1.25 mm) and the reported lift cannot be attributed cleanly to the parameters themselves.
[Abstract] Abstract (across-database reproducibility evaluation): The optimal parameter thresholds appear derived from analysis of the collected dataset, raising the possibility that the reported improvements reflect data-driven selection rather than independent prediction. The abstract provides no details on model validation, statistical significance testing of the framework, handling of multiple comparisons, or potential data selection effects, limiting assessment of whether the central claim is supported.

minor comments (1)

The abstract mentions 'several SoA architectures' without specifying which models were used, which would aid in assessing the generality of the findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Abstract] Abstract (mixed-effects framework): The model accounts only for subject-level random effects while quantifying influence of acquisition parameters. In multicentre CT data, these parameters are typically confounded with center-specific factors (scanner vendor, reconstruction kernel, patient population) that are not stated as included in the model. If center-level variation is absorbed into the parameter coefficients, the identified optimal region (>=200 mA, <=1.5 pitch, <=1.25 mm) and the reported lift cannot be attributed cleanly to the parameters themselves.

Authors: We agree that the potential confounding with center-specific factors is an important consideration not explicitly addressed in the current model. The mixed-effects framework was designed to account for subject-level variability while estimating the fixed effects of the acquisition parameters. To provide a cleaner attribution, we will revise the analysis to include center as a random effect in addition to subject. This will help isolate the parameter effects from center-level variations. revision: yes
Referee: [Abstract] Abstract (across-database reproducibility evaluation): The optimal parameter thresholds appear derived from analysis of the collected dataset, raising the possibility that the reported improvements reflect data-driven selection rather than independent prediction. The abstract provides no details on model validation, statistical significance testing of the framework, handling of multiple comparisons, or potential data selection effects, limiting assessment of whether the central claim is supported.

Authors: The across-database test uses thresholds derived from the collected dataset and applies them to the independent public dataset, which supports the reproducibility claim. However, we acknowledge that the abstract lacks sufficient methodological details. In the revised manuscript, we will expand the abstract to include information on the validation procedure, statistical significance testing, handling of multiple comparisons, and clarification on data selection to better support the central claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical sensitivity analysis via a mixed-effects model applied to two multicentre CT datasets. Optimal parameter thresholds are identified from the collected data and evaluated for reproducibility on the public set, with performance metrics reported as observed differences between parameter-defined scan quality groups. No equations, self-citations, or uniqueness claims are present that reduce any result to its inputs by construction. The workflow is a standard data-driven observational study; the across-database step supplies an independent test set. No load-bearing self-definitional, fitted-prediction, or ansatz-smuggling patterns appear in the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; ledger limited to explicitly stated elements. The mixed-effects model is presented as the core method without additional free parameters or invented entities described.

free parameters (1)

Optimal parameter thresholds = >=200 mA, <=1.5, <=1.25 mm
Selected thresholds (>=200 mA, <=1.5, <=1.25 mm) derived from the analysis to identify improved performance regions.

axioms (1)

domain assumption Mixed-effects model quantifies acquisition parameter influence on model performance while accounting for subject-level random effects
Invoked as the basis for the performance-oriented framework in the abstract.

pith-pipeline@v0.9.1-grok · 5739 in / 1343 out tokens · 54509 ms · 2026-06-30T21:01:07.726185+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 1 canonical work pages

[1]

Radiomicsandclinicaldataforthediagnosisofincidentalpulmonarynodulesandlungcancerscreening: radiolung integrative predictive model

Baeza,S.,Gil,D.,Sanchez,C.,etal.,2024. Radiomicsandclinicaldataforthediagnosisofincidentalpulmonarynodulesandlungcancerscreening: radiolung integrative predictive model. Arch. Bronconeumol. 60, S22–S30. Booth, J.G.,

2024
[2]

NeuroImage 167, 104–120

Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, 104–120. Foy,J.J.,Al-Hallaq,H.A.,Grekoski,V.,etal.,2020. Harmonizationofradiomicfeaturevariabilityresultingfromdifferencesinctimageacquisition and reconstruction: assessment in a cadaveric liver. Phys. Med. Biol. 65, 205008. gil, D., Rosell, A., Sánchez Ramos, C., et al.,

2020
[3]

doi:10.34810/data1972

RadioLung. doi:10.34810/data1972. He, K., Zhang, X., Ren, S., et al.,

work page doi:10.34810/data1972
[4]

Deep residual learning for image recognition, in: CVPR, pp. 770–778. Hosseini,S.H.,Monsefi,R.,Shadroo,S.,2023. Deeplearningapplicationsforlungcancerdiagnosis:asystematicreview. Multimed.ToolsAppl., 1–31. Ibrahim, A., Lu, L., Yang, H., et al.,

2023
[5]

PLOS ONE 16, e0251147

The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLOS ONE 16, e0251147. Li,Y.,Reyhan,M.,Zhang,Y.,etal.,2022. Theimpactofphantomdesignandmaterial-dependenceonrepeatabilityandreproducibilityofct-based radiomics features. Med. Phys. 49, 1648–1659. Ligero, M., Torres, G., San...

2022

[1] [1]

Radiomicsandclinicaldataforthediagnosisofincidentalpulmonarynodulesandlungcancerscreening: radiolung integrative predictive model

Baeza,S.,Gil,D.,Sanchez,C.,etal.,2024. Radiomicsandclinicaldataforthediagnosisofincidentalpulmonarynodulesandlungcancerscreening: radiolung integrative predictive model. Arch. Bronconeumol. 60, S22–S30. Booth, J.G.,

2024

[2] [2]

NeuroImage 167, 104–120

Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, 104–120. Foy,J.J.,Al-Hallaq,H.A.,Grekoski,V.,etal.,2020. Harmonizationofradiomicfeaturevariabilityresultingfromdifferencesinctimageacquisition and reconstruction: assessment in a cadaveric liver. Phys. Med. Biol. 65, 205008. gil, D., Rosell, A., Sánchez Ramos, C., et al.,

2020

[3] [3]

doi:10.34810/data1972

RadioLung. doi:10.34810/data1972. He, K., Zhang, X., Ren, S., et al.,

work page doi:10.34810/data1972

[4] [4]

Deep residual learning for image recognition, in: CVPR, pp. 770–778. Hosseini,S.H.,Monsefi,R.,Shadroo,S.,2023. Deeplearningapplicationsforlungcancerdiagnosis:asystematicreview. Multimed.ToolsAppl., 1–31. Ibrahim, A., Lu, L., Yang, H., et al.,

2023

[5] [5]

PLOS ONE 16, e0251147

The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLOS ONE 16, e0251147. Li,Y.,Reyhan,M.,Zhang,Y.,etal.,2022. Theimpactofphantomdesignandmaterial-dependenceonrepeatabilityandreproducibilityofct-based radiomics features. Med. Phys. 49, 1648–1659. Ligero, M., Torres, G., San...

2022