Useful nonrobust features are ubiquitous in biomedical images

Christopher Hansen, Claus-C. Gl\"uer, Coenraad Mouton, Jan-Bernd H\"ovener, Nicolai Krekiehn, Niklas C. Koser, Randle Rabe

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:55 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords nonrobust featuresbiomedical imagesmedical image classificationadversarial trainingdistribution shiftsrobustnessaccuracy trade-off

0 comments

The pith

Nonrobust features are useful predictors in biomedical images but create a trade-off with robustness to distribution shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines deep networks for medical imaging to see if they learn nonrobust features that are predictive yet vulnerable to small changes and not interpretable by humans. It finds that models trained exclusively on these features still classify images well above chance level in five standard tasks. Adversarially trained models that focus on robust features lose some accuracy on normal tests but do better when the image distribution shifts in controlled ways. The work establishes that nonrobust features enhance typical performance at the cost of stability under changes, pointing to a need to balance the two based on how the system will be used.

Core claim

Deep networks for medical imaging classification learn useful nonrobust features that allow models trained only on them to achieve well above chance accuracy on standard tasks, while adversarially trained models relying on robust features show improved performance under distribution shifts despite lower in-distribution accuracy, indicating a robustness-accuracy trade-off.

What carries the argument

Nonrobust features, defined as predictive input patterns in images that are not human interpretable and highly susceptible to small adversarial perturbations. They enable higher standard accuracy while causing degradation under distribution shifts.

Load-bearing premise

That the performance differences between standard and adversarially trained models are caused by the presence or absence of nonrobust features rather than other aspects of the training process.

What would settle it

Observing that models trained exclusively on nonrobust features achieve only chance accuracy on the classification tasks, or that adversarially trained models do not show better performance under the controlled distribution shifts.

read the original abstract

We study whether deep networks for medical imaging learn useful nonrobust features - predictive input patterns that are not human interpretable and highly susceptible to small adversarial perturbations - and how these features impact test performance. We show that models trained only on nonrobust features achieve well above chance accuracy across five MedMNIST classification tasks, confirming their predictive value in-distribution. Conversely, adversarially trained models that primarily rely on robust features sacrifice in-distribution accuracy but yield markedly better performance under controlled distribution shifts (MedMNIST-C). Overall, nonrobust features boost standard accuracy yet degrade out-of-distribution performance, revealing a practical robustness-accuracy trade-off in medical imaging classification tasks that should be tailored to the requirements of the deployment setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that deep networks for medical imaging learn useful nonrobust features (predictive patterns susceptible to small adversarial perturbations) that are ubiquitous across biomedical images. Models trained only on these features achieve well above chance accuracy on five MedMNIST classification tasks, confirming in-distribution predictive value, while adversarially trained models relying on robust features sacrifice standard accuracy but perform better under controlled distribution shifts (MedMNIST-C), revealing a robustness-accuracy trade-off that should be tailored to deployment needs.

Significance. If the central empirical results hold after addressing methodological gaps, the work extends the nonrobust features concept from natural images to the medical domain and provides actionable evidence that nonrobust features boost in-distribution performance at the cost of OOD robustness. This has direct implications for medical AI deployment where distribution shifts are common, and the use of standard datasets like MedMNIST with reproducible training experiments is a strength.

major comments (3)

[Methods] Methods section: The nonrobust feature isolation procedure (via adversarial training or feature splitting) is not described with sufficient controls or ablations to demonstrate that the above-chance accuracies arise from inherent predictive nonrobust patterns rather than artifacts of the isolation process, such as leakage of label information or interactions with MedMNIST properties like small size and class imbalance.
[Experiments] Experiments and Results sections: The claim of 'well above chance accuracy' across tasks lacks reported statistical details (e.g., number of random seeds, variance across runs, or significance tests), and no comparison is made to baseline models or alternative isolation methods, weakening the ubiquity and trade-off conclusions.
[Results] Results on MedMNIST-C: While robust models show better OOD performance, the manuscript does not include targeted ablations (e.g., perturbing only nonrobust features) to confirm that the observed degradation is specifically caused by reliance on nonrobust features rather than other model properties.

minor comments (2)

[Abstract] Abstract: Specify the exact five MedMNIST tasks used, as this aids reproducibility and context for the ubiquity claim.
[Figures] Notation and figures: Ensure consistent use of terms like 'nonrobust features' and improve clarity of any plots showing accuracy vs. robustness trade-offs by adding error bars or confidence intervals.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the methodological and experimental concerns, adding controls, statistical reporting, baselines, and targeted ablations as described below.

read point-by-point responses

Referee: [Methods] Methods section: The nonrobust feature isolation procedure (via adversarial training or feature splitting) is not described with sufficient controls or ablations to demonstrate that the above-chance accuracies arise from inherent predictive nonrobust patterns rather than artifacts of the isolation process, such as leakage of label information or interactions with MedMNIST properties like small size and class imbalance.

Authors: We agree that additional controls strengthen the isolation claims. The revised Methods section now provides the full adversarial training procedure (PGD with epsilon=0.03, 10 steps, step size 0.007) and feature splitting details. We added three ablations: (1) label-shuffled training yields chance-level accuracy, ruling out leakage; (2) class-balanced subsampling of MedMNIST shows consistent above-chance results, addressing imbalance; (3) resolution scaling experiments on a subset confirm small image size does not drive the effect. These demonstrate the predictive value arises from inherent nonrobust patterns. revision: yes
Referee: [Experiments] Experiments and Results sections: The claim of 'well above chance accuracy' across tasks lacks reported statistical details (e.g., number of random seeds, variance across runs, or significance tests), and no comparison is made to baseline models or alternative isolation methods, weakening the ubiquity and trade-off conclusions.

Authors: We accept that statistical details and baselines improve rigor. The revised Experiments section reports all accuracies as mean ± std over 5 random seeds, with one-sample t-tests vs. chance (p < 0.01 for all tasks). We added baselines: linear SVM on raw pixels, robust-feature-only models, and an alternative non-adversarial feature splitting method. Nonrobust models consistently exceed chance and the linear baseline while underperforming full models, supporting ubiquity; the accuracy-robustness trade-off is now shown with error bars across methods. revision: yes
Referee: [Results] Results on MedMNIST-C: While robust models show better OOD performance, the manuscript does not include targeted ablations (e.g., perturbing only nonrobust features) to confirm that the observed degradation is specifically caused by reliance on nonrobust features rather than other model properties.

Authors: We agree a direct causal link requires targeted ablation. The revised Results include a new experiment using feature splitting to isolate components, then selectively perturbing only nonrobust features on MedMNIST-C test sets. This produces OOD drops matching full attacks on standard models, while robust-feature perturbations affect adversarially trained models more. This confirms the degradation is specifically attributable to nonrobust feature reliance rather than unrelated model properties. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical study with direct experimental evaluation

full rationale

The paper is an empirical investigation that trains models on isolated nonrobust features and evaluates accuracy on MedMNIST tasks and MedMNIST-C shifts. No mathematical derivation chain, equations, or self-referential predictions exist that reduce to inputs by construction. Results follow from standard training procedures and held-out test evaluation rather than any fitted parameter renamed as a prediction or ansatz smuggled via self-citation. The isolation of nonrobust features relies on established adversarial methods, but the paper presents no load-bearing self-citation or uniqueness theorem that collapses the central claim. This matches the default expectation for experimental papers: self-contained against external benchmarks with no circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical ML study; it relies on standard assumptions of deep network training and adversarial robustness definitions from prior literature rather than new axioms or invented entities.

pith-pipeline@v0.9.0 · 5447 in / 1166 out tokens · 34078 ms · 2026-05-08T08:55:46.627952+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Useful nonrobust features are ubiquitous in biomedical images

INTRODUCTION Deep neural networks (DNNs) are increasingly being de- ployed in medical imaging domains such as radiology, digital pathology, and ophthalmology. Given the high-stakes of these application domains, it is critical that these models are reliable, trustworthy and (ideally) explainable. Despite this, prior work on natural images has shown that ne...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Letfdenote a deep neural network (DNN) with parametersθtrained on a dataset of sample-label pairsD={(x i, yi)}N i=1 using a suitable loss functionL

ROBUST AND NONROBUST FEA TURES In this section we introduce key notation and formalize robust and nonrobust features. Letfdenote a deep neural network (DNN) with parametersθtrained on a dataset of sample-label pairsD={(x i, yi)}N i=1 using a suitable loss functionL. We define auseful featurelearned byffor a classcas a function gc :R d →Rwhich is correlate...
[3]

nonrobust fea- tures

EXPERIMENTAL SETUP In this section we describe the datasets, models, and proce- dures used to isolate and evaluate robust vs. nonrobust fea- tures. 3.1. Isolating nonrobust features Our first avenue of inquiry concerns determining whether well generalizing nonrobust features can be found in the vari- ous medical imaging datasets we consider. More specific...
[4]

RESULTS In Fig. 2 we report the in-distribution (solid bars) and OOD performance (dashed bars) across five MedMNIST datasets, comparing base models (trained on all features; blue), robust models (green), and nonrobust-only models (red). Balanced accuracy is used throughout due to class imbalance. 4.1. Comparing in-distribution test performance When consid...
[5]

While still inferior to robust features, they perform well above chance

DISCUSSION Let us consider our various observations and revisit the two originally posed questions: 1.Do DNNs learn useful nonrobust features from medi- cal images?Yes - nonrobust features alone yield accurate classification on several datasets. While still inferior to robust features, they perform well above chance. By con- trast, under OOD shifts, accur...
[6]

This clarifies a robustness- accuracy trade-off that should be taken into account

CONCLUSION In conclusion, we have shown that (1) nonrobust features are present in medical imaging datasets and exploited by DNNs, (2) these features are predictive in-distribution, and suppress- ing them harms in-distribution accuracy, yet (3) their presence degrades OOD performance, whereas robust features alone are markedly more invariant. This clarifi...
[7]

Ethical approval was not required as confirmed by the license attached with the open access data

COMPLIANCE WITH ETHICAL STANDARDS This research study was conducted retrospectively using hu- man subject data made available in open access by the MedM- NIST and MedMNIST-C datasets [5, 12]. Ethical approval was not required as confirmed by the license attached with the open access data
[8]

ACKNOWLEDGEMENTS This project was supported by the modular AI Imaging Pipelines (mAIPipes) Grant, Application No. 22024025 KI- F¨orderrichtlinie Schleswig-Holstein, Germany, and supported in part by the National Research Foundation of South Africa (Ref Numbers PSTD23042898868, RCDL240215206999)
[9]

Adversarial examples are not bugs, they are features,

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Lo- gan Engstrom, Brandon Tran, and Aleksander Madry, “Adversarial examples are not bugs, they are features,” inAdvances in Neural Information Processing Systems, 2019

2019
[10]

What can the neu- ral tangent kernel tell us about adversarial robustness?,

Nikolaos Tsilivis and Julia Kempe, “What can the neu- ral tangent kernel tell us about adversarial robustness?,” inAdvances in Neural Information Processing Systems, 2022

2022
[11]

Robustness may be at odds with accuracy,

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry, “Robustness may be at odds with accuracy,” inInternational Confer- ence on Learning Representations, 2019

2019
[12]

Ro- bustbench: a standardized adversarial robustness bench- mark,

Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein, “Ro- bustbench: a standardized adversarial robustness bench- mark,” inAdvances in Neural Information Processing Systems, 2021

2021
[13]

Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification,

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, and Bingbing Ni, “Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification,”Scientific Data, vol. 10, 2023

2023
[14]

Wide resid- ual networks,

Sergey Zagoruyko and Nikos Komodakis, “Wide resid- ual networks,” inProceedings of the British Machine Vision Conference (BMVC), 2016

2016
[15]

Towards deep learning models resistant to adversarial attacks,

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018

2018
[16]

Theoreti- cally principled trade-off between robustness and accu- racy,

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan, “Theoreti- cally principled trade-off between robustness and accu- racy,” inInternational Conference on Machine Learn- ing, 2019

2019
[17]

Overfitting in adversarially robust deep learning,

Leslie Rice, Eric Wong, and Zico Kolter, “Overfitting in adversarially robust deep learning,” inInternational Conference on Machine Learning, 2020

2020
[18]

Better diffusion models fur- ther improve adversarial training,

Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, and Shuicheng Yan, “Better diffusion models fur- ther improve adversarial training,” inInternational Con- ference on Machine Learning, 2023

2023
[19]

Bag of tricks for adversarial training,

Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu, “Bag of tricks for adversarial training,” in International Conference on Learning Representations, 2021

2021
[20]

Medmnist-c: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions,

Francesco Di Salvo, Sebastian Doerrich, and Christian Ledig, “Medmnist-c: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions,”arXiv preprint arXiv:2406.17536, 2024

work page arXiv 2024
[21]

Reliable evalu- ation of adversarial robustness with an ensemble of di- verse parameter-free attacks,

Francesco Croce and Matthias Hein, “Reliable evalu- ation of adversarial robustness with an ensemble of di- verse parameter-free attacks,” inInternational Confer- ence on Machine Learning, 2020. A. ADDITIONAL RESULTS A.1. Area under the curve In the main paper, we rely on balanced accuracy to compare model performance across datasets. For completeness, we...

2020
[22]

All Features

Models that rely on nonrobust features are almost completely broken under these attacks, achieving0%balanced accuracy across datasets. 3 By contrast, adversarially trained models retain substantial performance, achieving between57%and74%balanced accuracy. Table 1 compares the adversarial performance of these models to the clean performance. Table 1. Clean...
[23]

This is expected, as larger perturbations impose a stronger constraint on the features that can be reliably used for classification by the model

Firstly, we observe that robust models trained with the larger perturbation budget consistently exhibit slightly lower test performance. This is expected, as larger perturbations impose a stronger constraint on the features that can be reliably used for classification by the model. Secondly, the performance of the nonrobust-only models varies across datas...