Recognition: no theorem link
ROAST: Risk-aware Outlier-exposure for Adversarial Selective Training of Anomaly Detectors Against Evasion Attacks
Pith reviewed 2026-05-14 23:21 UTC · model grok-4.3
The pith
ROAST improves anomaly detector recall against evasion attacks by selectively training on data from less vulnerable patients.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a risk-aware outlier-exposure selective training framework called ROAST, which identifies and trains on data from patients less vulnerable to evasion attacks while injecting adversarial samples for those patients, increases recall by 16.2% in black-box settings and 5.89% in white-box settings on average, reduces training time by 88.3%, and has minimal impact on precision compared to training indiscriminately on all data.
What carries the argument
Risk-aware selective training that classifies patients by vulnerability to attacks and applies outlier exposure selectively to the low-vulnerability subset to enhance robustness.
Load-bearing premise
The risk metric can reliably identify patients less vulnerable to evasion attacks, and excluding data from more vulnerable patients does not remove information critical for effective anomaly detection.
What would settle it
Testing on a dataset where the risk classification fails to predict actual attack success rates, resulting in no recall improvement or even degradation when using the selective training method.
Figures
read the original abstract
Safety-critical domains like healthcare rely on deep neural networks (DNNs) for prediction, yet DNNs remain vulnerable to evasion attacks. Anomaly detectors (ADs) are widely used to protect DNNs, but conventional ADs are trained indiscriminately on benign data from all patients, overlooking physiological differences that introduce noise, degrade robustness, and reduce recall. In this paper, we propose ROAST, a novel risk-aware outlier exposure (OE) selective training framework that improves AD recall while largely preserving precision. ROAST identifies patients who are less vulnerable to attack and focuses training on these cleaner, more reliable data, thereby reducing false negatives and improving recall. To preserve precision, the framework applies OE by injecting adversarial samples into the training set of the less vulnerable patients, avoiding noisy data from others. Experiments show that ROAST increases recall by 16.2\% (black-box attack setting) and 5.89\% (white-box attack setting) on average while reducing the training time by 88.3\% on average compared to indiscriminate training, with minimal impact on precision.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ROAST, a risk-aware outlier-exposure selective training framework for anomaly detectors (ADs) in safety-critical healthcare DNNs. It identifies patients less vulnerable to evasion attacks via an unspecified risk metric, trains ADs exclusively on their data augmented with OE adversarial samples, and claims this yields average recall gains of 16.2% (black-box) and 5.89% (white-box) while cutting training time by 88.3% with minimal precision loss compared to indiscriminate training on all patients.
Significance. If the risk metric proves independent and the selection does not discard critical population variability, ROAST could provide an efficient, targeted way to harden ADs against evasion in domains with heterogeneous data. The reported efficiency gains and recall improvements are practically relevant for resource-constrained healthcare settings, but the absence of methodological transparency on the core selection mechanism limits the result's immediate generalizability and reproducibility.
major comments (3)
- [§3.2] §3.2 (Risk Metric Definition): the patient vulnerability risk metric is never formally defined or shown to be independent of the physiological features and model outputs used by the AD itself; without this, the selective training step risks circularity and may simply filter noisier samples rather than attack-resistant ones.
- [§4] §4 (Experiments): average recall and training-time improvements are reported without error bars, dataset sizes, attack strength parameters, or statistical significance tests, and the only baseline is indiscriminate training; this leaves the central claim of 16.2%/5.89% recall gains unverified and non-reproducible.
- [§4.2] §4.2 (Ablation and Validation): no ablation on risk-metric construction, no cross-patient hold-out validation, and no analysis of whether excluding high-vulnerability patients removes information needed for generalization to unseen vulnerable cases; these omissions directly undermine the claim that selective training improves robustness without harming coverage.
minor comments (3)
- [Abstract] Abstract: the phrase 'minimal impact on precision' is not quantified; reporting the actual precision deltas (with confidence intervals) would make the trade-off claim concrete.
- [§2] §2 (Related Work): the discussion of prior OE methods lacks explicit comparison of how ROAST's selective component differs from standard OE in terms of data filtering.
- [§3.3] Notation: the OE injection procedure would benefit from an explicit equation showing how adversarial samples are mixed into the selected patient subset.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which have helped us identify areas for improvement in clarity and rigor. We address each major comment point-by-point below. Where revisions are needed, we will incorporate them in the next version of the manuscript to enhance reproducibility and address concerns about the risk metric and experimental validation.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Risk Metric Definition): the patient vulnerability risk metric is never formally defined or shown to be independent of the physiological features and model outputs used by the AD itself; without this, the selective training step risks circularity and may simply filter noisier samples rather than attack-resistant ones.
Authors: We agree that a more explicit formal definition is required to eliminate any ambiguity. In the revised manuscript, we will add a formal mathematical definition of the risk metric in §3.2, specifying it as a pre-computed score based on historical attack success rates derived from auxiliary attack simulations on patient subgroups, using only demographic and historical data independent of the current AD's feature set or outputs. We will include an empirical analysis (e.g., correlation coefficients) demonstrating statistical independence from the physiological features and model predictions used by the AD, thereby ruling out circularity and confirming that selection targets inherently attack-resistant patients rather than merely low-noise samples. revision: yes
-
Referee: [§4] §4 (Experiments): average recall and training-time improvements are reported without error bars, dataset sizes, attack strength parameters, or statistical significance tests, and the only baseline is indiscriminate training; this leaves the central claim of 16.2%/5.89% recall gains unverified and non-reproducible.
Authors: We acknowledge the need for greater experimental detail to support reproducibility. In the revision, we will augment §4 with error bars computed over 5 independent runs with different random seeds, explicit dataset sizes and splits, precise attack strength parameters (including perturbation budgets ε and attack iterations), and results of statistical significance tests (e.g., paired t-tests with p-values). We will also introduce additional baselines, such as standard outlier exposure without patient selection, to better contextualize the reported gains of 16.2% (black-box) and 5.89% (white-box) recall. revision: yes
-
Referee: [§4.2] §4.2 (Ablation and Validation): no ablation on risk-metric construction, no cross-patient hold-out validation, and no analysis of whether excluding high-vulnerability patients removes information needed for generalization to unseen vulnerable cases; these omissions directly undermine the claim that selective training improves robustness without harming coverage.
Authors: We will strengthen §4.2 by adding a dedicated ablation study varying the risk-metric construction (e.g., different historical data windows and weighting schemes). We will incorporate cross-patient hold-out validation, training on subsets of patients and testing on completely unseen patients. To address generalization to vulnerable cases, we will include an analysis evaluating AD performance specifically on held-out high-vulnerability patients, demonstrating that selective training on low-vulnerability data with OE does not degrade coverage or robustness on the excluded population compared to full training. revision: yes
Circularity Check
No significant circularity: ROAST is an empirical selective-training heuristic with no equations or self-referential derivations
full rationale
The provided abstract and description contain no mathematical derivations, equations, or fitted parameters that are later renamed as predictions. ROAST is presented as a practical framework that selects patients via an (unspecified) risk metric and applies outlier exposure; performance deltas are reported as direct experimental outcomes rather than quantities forced by construction from the selection rule itself. No self-citation chains, uniqueness theorems, or ansatzes appear in the text that would reduce the central claim to its own inputs. This is the common honest case of a method paper whose validity rests on external validation rather than internal definitional closure.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Maneet Kaur Bohmrah and Harjot Kaur. Advanced hybridization and optimization of dnns for medical imaging: A survey on disease detection techniques.Artificial Intelligence Review, 58(4):122, 2025
work page 2025
-
[2]
Enhanced deep learning model for personalized cancer treat- ment.IEEE Access, 10:106050–106058, 2022
Hanan Ahmed, Safwat Hamad, Howida A Shedeed, and Ashraf Saad Hussein. Enhanced deep learning model for personalized cancer treat- ment.IEEE Access, 10:106050–106058, 2022
work page 2022
-
[3]
Explaining and Harnessing Adversarial Examples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[4]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017
work page 2017
-
[5]
Abdur Rahman, M Shamim Hossain, Nabil A Alrajeh, and Fawaz Alsolami. Adversarial examples—security threats to covid-19 deep learning systems in medical iot devices.IEEE Internet of Things Journal, 8(12):9603–9610, 2020
work page 2020
-
[6]
Defending evasion attacks via adversarially adaptive training
Minh-Hao Van, Wei Du, Xintao Wu, Feng Chen, and Aidong Lu. Defending evasion attacks via adversarially adaptive training. In2022 IEEE International Conference on Big Data (Big Data), pages 1515–
-
[7]
Feature denoising for improving adversarial robustness
Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 501–509, 2019
work page 2019
-
[8]
Rong Huang and Yuancheng Li. Adversarial attack mitigation strategy for machine learning-based network attack detection model in power system.IEEE Transactions on Smart Grid, 14(3):2367–2376, 2022
work page 2022
-
[9]
Junhao Dong, Junxi Chen, Xiaohua Xie, Jianhuang Lai, and Hao Chen. Survey on adversarial attack and defense for medical image analysis: Methods and challenges.ACM Computing Surveys, 57(3):1–38, 2024
work page 2024
-
[10]
Mad-gan: Multivariate anomaly detection for time series data with generative adversarial networks
Dan Li, Dacheng Chen, Baihong Jin, Lei Shi, Jonathan Goh, and See- Kiong Ng. Mad-gan: Multivariate anomaly detection for time series data with generative adversarial networks. InInternational conference on artificial neural networks, pages 703–716. Springer, 2019
work page 2019
-
[11]
Byunggill Joe, Akshay Mehra, Insik Shin, and Jihun Hamm. Machine learning with electronic health records is vulnerable to backdoor trigger attacks.arXiv preprint arXiv:2106.07925, 2021
-
[12]
Shivani Gupta and Atul Gupta. Dealing with noise problem in machine learning data-sets: A systematic review.Procedia Computer Science, 161:466–474, 2019. The Fifth Information Systems International Con- ference, 23-24 July 2019, Surabaya, Indonesia
work page 2019
-
[13]
Fengxiang He, Shaopeng Fu, Bohan Wang, and Dacheng Tao. Robust- ness, privacy, and generalization of adversarial training.arXiv preprint arXiv:2012.13573, 2020
-
[14]
Deep Anomaly Detection with Outlier Exposure
Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure.arXiv preprint arXiv:1812.04606, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
Dua’a Mkhiemir Akhtom, Manmeet Mahinderjit Singh, and Chew Xiny- ing. Enhancing trustworthy deep learning for image classification against evasion attacks: a systematic literature review.Artificial Intelligence Review, 57(7):174, 2024
work page 2024
-
[16]
The ohiot1dm dataset for blood glucose level prediction: Update 2020
Cindy Marling and Razvan Bunescu. The ohiot1dm dataset for blood glucose level prediction: Update 2020. InCEUR workshop proceedings, volume 2675, page 71. NIH Public Access, 2020
work page 2020
-
[17]
A Johnson, L Bulgarelli, T Pollard, S Horng, LA Celi, and R Mark. Mimic-iv (version 2.0), 2020
work page 2020
-
[18]
Matthew A Reyna, Christopher S Josef, Russell Jeter, Supreeth P Shashikumar, M Brandon Westover, Shamim Nemati, Gari D Clifford, and Ashish Sharma. Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019.Critical care medicine, 48(2):210–217, 2020
work page 2019
-
[19]
URET: Universal Robustness Evaluation Toolkit (for Evasion)
Kevin Eykholt, Taesung Lee, Douglas Schales, Jiyong Jang, and Ian Molloy. URET: Universal Robustness Evaluation Toolkit (for Evasion). InUSENIX Security 23, pages 3817–3833, 2023
work page 2023
-
[20]
Towards deep learning models resistant to adversarial attacks.stat, 1050(9), 2017
Aleksander M ˛ adry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.stat, 1050(9), 2017
work page 2017
-
[21]
Reinforcement learning: An introduction.A Bradford Book, 2018
Richard S Sutton. Reinforcement learning: An introduction.A Bradford Book, 2018
work page 2018
-
[22]
Generative adversarial nets.Advances in neural information processing systems, 27, 2014
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014
work page 2014
-
[23]
Surrogate representation learning with isometric mapping for gray-box graph adversarial attacks
Zihan Liu, Yun Luo, Zelin Zang, and Stan Z Li. Surrogate representation learning with isometric mapping for gray-box graph adversarial attacks. InProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 591–598, 2022
work page 2022
-
[24]
Eze, Nicholas Geard, Ivo Mueller, and Iadine Chades
Peter U. Eze, Nicholas Geard, Ivo Mueller, and Iadine Chades. Anomaly detection in endemic disease surveillance data using machine learning techniques.Healthcare, 11(13), 2023
work page 2023
-
[25]
Willa Potosnak, Cristian Challu, Kin G Olivares, Keith A Dufendach, and Artur Dubrawski. Global deep forecasting with patient-specific pharmacokinetics.Proceedings of Machine Learning Research, 287:1– 29, 2025
work page 2025
-
[26]
Luca Allodi and Fabio Massacci. Security events and vulnerability data for cybersecurity risk estimation.Risk Analysis, 37(8):1606–1627, 2017
work page 2017
-
[27]
Dataset distillation.arXiv preprint arXiv:1811.10959, 2018
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation.arXiv preprint arXiv:1811.10959, 2018
-
[28]
Dataset condensation with gradient matching.arXiv preprint arXiv:2006.05929, 2020
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching.arXiv preprint arXiv:2006.05929, 2020
-
[29]
Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, and Xinchao Wang. Dataset distillation via factorization.Advances in neural information processing systems, 35:1100–1113, 2022
work page 2022
-
[30]
Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009
work page 2009
-
[31]
Remlx: Resilience for ml ensembles using xai at inference against faulty training data
Abraham Chan, Arpan Gujarati, Karthik Pattabiraman, and Sathish Gopalakrishnan. Remlx: Resilience for ml ensembles using xai at inference against faulty training data. In2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 691–705. IEEE, 2025
work page 2025
-
[32]
Roser Bono, María J Blanca, Jaume Arnau, and Juana Gómez-Benito. Non-normal distributions commonly used in health, education, and social sciences: A systematic review.Frontiers in psychology, 8:1602, 2017
work page 2017
-
[33]
Yuchen Niu and Siew-Kei Lam. Securing automated insulin delivery systems: A review of security threats and protectives strategies.arXiv preprint arXiv:2503.14006, 2025
-
[34]
BLURtooth: Exploiting Cross- Transport Key Derivation in Bluetooth Classic and Bluetooth Low Energy
Kasper Rasmussen. BLURtooth: Exploiting Cross- Transport Key Derivation in Bluetooth Classic and Bluetooth Low Energy. InAsiaCCS, 2022
work page 2022
-
[35]
Com- partmentation policies for android apps: A combinatorial optimization approach
Guillermo Suarez-Tangil, Juan E Tapiador, and Pedro Peris-Lopez. Com- partmentation policies for android apps: A combinatorial optimization approach. InNSS 2015, pages 63–77, 2015
work page 2015
-
[36]
Hemalkumar B Mehta, Vinay Mehta, Cynthia J Girman, Deepak Ad- hikari, and Michael L Johnson. Regression coefficient–based scoring system should be used to assign weights to the risk index.Journal of clinical epidemiology, 79:22–28, 2016
work page 2016
-
[37]
Springer Berlin Heidelberg, Berlin, Heidelberg, 2014
Marko Sarstedt and Erik Mooi.Cluster Analysis, pages 273–324. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014
work page 2014
-
[38]
Wenhao Lai, Mengran Zhou, Feng Hu, Kai Bian, and Qi Song. A new dbscan parameters determination method based on improved mvo.Ieee access, 7:104085–104095, 2019
work page 2019
-
[39]
Hierarchical clustering schemes.Psychometrika, 32(3):241–254, 1967
Stephen C Johnson. Hierarchical clustering schemes.Psychometrika, 32(3):241–254, 1967
work page 1967
-
[40]
Generalizability vs. Robustness: Adversarial Examples for Medical Imaging
Magdalini Paschali, Sailesh Conjeti, Fernando Navarro, and Nassir Navab. Generalizability vs. robustness: adversarial examples for medical imaging.arXiv preprint arXiv:1804.00504, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[41]
Deep Residual Time- Series Forecasting: Application to Blood Glucose Prediction
Harry Rubin-Falcone, Ian Fox, and Jenna Wiens. Deep Residual Time- Series Forecasting: Application to Blood Glucose Prediction. InKDH@ ECAI, pages 105–109, 2020
work page 2020
-
[42]
An Extensive Data Pro- cessing Pipeline for MIMIC-IV
Mehak Gupta, Brennan Gallamoza, Nicolas Cutrona, Pranjal Dhakal, Raphael Poulain, and Rahmatollah Beheshti. An Extensive Data Pro- cessing Pipeline for MIMIC-IV. InProceedings of the 2nd Machine Learning for Health symposium, volume 193 ofProceedings of Machine Learning Research, pages 311–325. PMLR, 11 2022
work page 2022
-
[43]
Petr Nejedly, Filip Plesinger, Ivo Viscor, Josef Halamek, and Pavel Jurak. Prediction of sepsis using lstm neural network with hyperparameter optimization with a genetic algorithm. In2019 Computing in Cardiology (CinC), pages Page–1. IEEE, 2019
work page 2019
-
[44]
H Gilbert Welch and Honor J Passow. Quantifying the benefits and harms of screening mammography.JAMA internal medicine, 174(3):448–454, 2014
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.