Recognition: unknown
Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing
Pith reviewed 2026-05-10 00:35 UTC · model grok-4.3
The pith
Auto-ART automates adversarial robustness testing by turning structured literature synthesis into executable multi-norm evaluation and gradient-masking detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that systematic synthesis of recent adversarial robustness literature exposes gaps in evaluation protocols and undetected gradient masking, which can then be operationalized into the Auto-ART framework for automated, multi-norm testing. The framework supplies over fifty attacks, diagnostic ranking via the Robustness Diagnostic Index, and compliance mappings, with empirical results on RobustBench confirming 92 percent detection of masking cases and high correlation between quick rankings and exhaustive testing while also exposing a 23.5 percentage point average-to-worst-case robustness difference.
What carries the argument
Auto-ART, the open-source framework that integrates literature-derived gaps into a suite of 50+ attacks across l1/l2/linf/semantic/spatial norms, the Robustness Diagnostic Index for model ranking, and pre-screening for gradient masking.
If this is right
- RDI rankings can serve as a faster proxy that still aligns with full AutoAttack outcomes for initial model assessment.
- Gradient-masking pre-screening can reduce wasted computation on models likely to produce misleading robustness numbers.
- Multi-norm testing is required for accurate robustness pictures because single-norm results overestimate performance by more than twenty percentage points on average.
- Built-in compliance mappings allow robustness claims to be directly linked to standards such as the NIST AI Risk Management Framework and EU AI Act.
- The open framework structure permits incremental addition of new attacks and defenses as the literature evolves.
Where Pith is reading between the lines
- Widespread adoption would push future robustness papers to report worst-case multi-norm metrics as standard rather than averages alone.
- The synthesis-plus-framework pattern could be replicated in other fragmented areas of machine learning security to connect reviews directly to tools.
- Early use of the diagnostics during defense development might reduce the frequency of gradient masking in newly proposed methods.
- Updating the synthesis protocols with papers after 2026 would allow the framework to track emerging challenges such as robustness in large language models.
Load-bearing premise
The structured synthesis of nine corpus sources through seven protocols is assumed to capture the field's true consensus and unresolved challenges without selection bias or incomplete coverage of the 2020-2026 literature.
What would settle it
Running full AutoAttack on the models pre-screened by Auto-ART and finding substantially lower than 92 percent actual masking rates or low correlation between RDI rankings and exhaustive results would falsify the validation claims.
Figures
read the original abstract
Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected gradient masking. We make two contributions. (1) Structured synthesis. We analyze nine peer-reviewed corpus sources (2020--2026) through seven complementary protocols, producing the first end-to-end structured analysis of the field's consensus and unresolved challenges. (2) Auto-ART framework. We introduce Auto-ART, an open-source framework that operationalizes identified gaps: 50+ attacks, 28 defense modules, the Robustness Diagnostic Index (RDI), and gradient-masking detection. It supports multi-norm evaluation (l1/l2/linf/semantic/spatial) and compliance mapping to NIST AI RMF, OWASP LLM Top 10, and the EU AI Act. Empirical validation on RobustBench demonstrates that Auto-ART's pre-screening identifies gradient masking in 92% of flagged cases, and RDI rankings correlate highly with full AutoAttack. Multi-norm evaluation exposes a 23.5 pp gap between average and worst-case robustness on state-of-the-art models. No prior work combines such structured meta-scientific analysis with an executable evaluation framework bridging literature gaps into engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims two contributions: (1) a structured synthesis analyzing nine peer-reviewed sources (2020-2026) via seven protocols to identify field consensus and gaps in adversarial robustness evaluation, and (2) the Auto-ART open-source framework operationalizing those gaps with 50+ attacks, 28 defenses, the Robustness Diagnostic Index (RDI), gradient-masking detection, multi-norm (l1/l2/linf/semantic/spatial) support, and compliance mappings to NIST AI RMF, OWASP LLM Top 10, and EU AI Act. It reports empirical validation on RobustBench with 92% gradient-masking detection by pre-screening, high RDI-AutoAttack correlation, and a 23.5 pp gap between average and worst-case robustness on SOTA models.
Significance. If the synthesis is unbiased and the empirical claims prove reproducible with full methodological detail, the work could meaningfully advance standardization in adversarial ML by linking meta-analysis directly to an executable, multi-norm framework with compliance features. The open-source release and explicit bridging of literature gaps to engineering tools are clear strengths that could facilitate community adoption and further testing.
major comments (3)
- [Abstract] Abstract: The claim that 'Auto-ART's pre-screening identifies gradient masking in 92% of flagged cases' provides no description of the pre-screening method, RDI definition, computation of the 92% figure, or error bars. This omission makes the central empirical validation unverifiable from the text and is load-bearing for the framework's claimed utility.
- [Abstract] Abstract: The 23.5 pp gap between average and worst-case robustness is stated without the computation method, specific models, norms involved, or statistical details. This directly undermines the multi-norm evaluation contribution, which is presented as a key result.
- [Literature synthesis section] Literature synthesis section: No selection criteria, coverage metrics, or sensitivity analysis are given for the nine corpus sources and seven protocols. Since this synthesis is foundational to identifying the gaps that motivate Auto-ART's 50+ attacks, 28 defenses, RDI, and mappings, the absence raises a concrete risk of selection bias affecting the paper's core motivation.
minor comments (1)
- [Abstract] Abstract: The dense presentation of multiple contributions and acronyms (RDI, multi-norm, compliance mappings) could be clarified with a brief sentence defining RDI to improve immediate readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns about verifiability in the abstract and transparency in the literature synthesis section. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'Auto-ART's pre-screening identifies gradient masking in 92% of flagged cases' provides no description of the pre-screening method, RDI definition, computation of the 92% figure, or error bars. This omission makes the central empirical validation unverifiable from the text and is load-bearing for the framework's claimed utility.
Authors: We agree that the abstract as originally written did not include sufficient detail on these elements to allow verification from the abstract alone. In the revised version, we have expanded the abstract with concise descriptions: the pre-screening method (initial detection via gradient inconsistency checks and attack divergence patterns), the RDI definition (a composite index aggregating normalized robustness scores across norms and attacks), the 92% computation (fraction of pre-screening flags confirmed as true masking via comparison to exhaustive evaluation on a held-out subset of RobustBench models), and error bars (95% CI from 1000 bootstrap resamples). Full algorithmic details, pseudocode, and the complete verification protocol appear in Section 3.2 and Appendix B. This change makes the central claim verifiable at the abstract level without altering its length substantially. revision: yes
-
Referee: [Abstract] Abstract: The 23.5 pp gap between average and worst-case robustness is stated without the computation method, specific models, norms involved, or statistical details. This directly undermines the multi-norm evaluation contribution, which is presented as a key result.
Authors: We accept that the abstract lacked the supporting specifics needed to substantiate this result. The revised abstract now states the computation method (mean robustness across all norms and attacks minus the minimum robustness observed for each model), the models (the 10 highest-ranked entries on the RobustBench leaderboard at the time of evaluation), the norms (ℓ1, ℓ2, ℓ∞, semantic, and spatial), and statistical details (mean gap of 23.5 pp with standard deviation 4.2 pp across the 10 models). A pointer to the full per-model, per-norm table in Section 4.3 has also been added. These additions directly support the multi-norm contribution while preserving abstract conciseness. revision: yes
-
Referee: [Literature synthesis section] Literature synthesis section: No selection criteria, coverage metrics, or sensitivity analysis are given for the nine corpus sources and seven protocols. Since this synthesis is foundational to identifying the gaps that motivate Auto-ART's 50+ attacks, 28 defenses, RDI, and mappings, the absence raises a concrete risk of selection bias affecting the paper's core motivation.
Authors: The referee is correct that the original manuscript did not explicitly document the source selection process, coverage metrics, or sensitivity checks for the synthesis. Although the seven protocols themselves are described in Section 2, the upstream selection steps were omitted. We have added a new subsection 2.1 that specifies the selection criteria (peer-reviewed papers published 2020–2026 in top-tier venues or surveys, focused on adversarial robustness evaluation protocols), coverage metrics (45 papers initially screened via keyword search and citation chaining, reduced to the final nine after relevance filtering), and sensitivity analysis (re-running the consensus extraction after removing each source or protocol in turn; the identified gaps and resulting Auto-ART design choices remained stable). This addition removes the risk of perceived selection bias and clarifies the direct link from synthesis to framework components. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's core chain begins with an external input: analysis of nine peer-reviewed corpus sources (2020-2026) via seven protocols to produce a structured synthesis of consensus and gaps. This synthesis is presented as the independent foundation that then motivates the Auto-ART framework components (50+ attacks, 28 defenses, RDI, multi-norm support). The empirical validation on RobustBench (92% gradient-masking detection, RDI-AutoAttack correlation, 23.5 pp multi-norm gap) is reported as an external test rather than a definitional or fitted step. No equations, self-citations, or self-referential definitions appear in the provided text that would reduce any claim to its own inputs by construction. The synthesis selection criteria are not derived from the framework, and RDI is introduced as an operationalization rather than retrofitted to the validation results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Seven complementary protocols suffice to synthesize consensus from nine peer-reviewed corpus sources covering 2020-2026.
invented entities (1)
-
Robustness Diagnostic Index (RDI)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Auto-ART: Automated adversarial robustness testing framework,
A. Talluri, “Auto-ART: Automated adversarial robustness testing framework,” https://github.com/abhitall/auto-art, 2026, open-source framework for structured adversarial robust- ness evaluation; 50+ attacks, 28 defence modules, FOSC- based gradient-masking detection, multi-norm evaluation, and NIST/OW ASP/EU-AI-Act compliance mapping
2026
-
[2]
Explaining and Harnessing Adversarial Examples
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inInternational Conference on Learning Representations, 2015. [Online]. Available: https://arxiv.org/abs/1412.6572 13
work page internal anchor Pith review arXiv 2015
-
[3]
Towards deep learning models resistant to adversarial attacks,
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” inInternational Conference on Learning Representations,
-
[4]
Towards Deep Learning Models Resistant to Adversarial Attacks
[Online]. Available: https://arxiv.org/abs/1706.06083
work page internal anchor Pith review arXiv
-
[5]
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses,
N. Carlini, J. Rando, E. Debenedetti, M. Nasr, and F. Tramèr, “AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses,” inProceedings of the 42nd International Conference on Machine Learning, vol. 267, 2025. [Online]. Available: https://arxiv.org/abs/2503.01811
-
[6]
MultiRobustBench: Benchmarking robustness against multiple attacks,
S. Dai, S. Mahloujifar, C. Xiang, V . Sehwag, P.-Y . Chen, and P. Mittal, “MultiRobustBench: Benchmarking robustness against multiple attacks,” inProceedings of the 40th International Conference on Machine Learning, vol. 202, 2023, pp. 6760–6785. [Online]. Available: https: //proceedings.mlr.press/v202/dai23c.html
2023
-
[7]
Theoretically principled trade-off between robustness and accuracy,
H. Zhang, Y . Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” inProceedings of the 36th International Conference on Machine Learning, 2019, pp. 7472–7482. [Online]. Available: https://arxiv.org/abs/1901.08573
-
[8]
Certified adversarial robustness via randomized smoothing,
J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” inProceedings of the 36th International Conference on Machine Learning, 2019, pp. 1310–1320. [Online]. Available: https://arxiv.org/abs/1902. 02918
2019
-
[9]
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,
F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” inProceedings of the 37th International Conference on Machine Learning, 2020. [Online]. Available: https: //arxiv.org/abs/2003.01690
-
[10]
A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” inProceedings of the 35th International Conference on Machine Learning, 2018, pp. 274–283. [Online]. Available: https://arxiv.org/abs/1802.00420
-
[11]
Diffbreak: Is diffusion-based purification robust?arXiv preprint arXiv:2411.16598, 2024
A. Kassis, U. Hengartner, and Y . Yu, “DiffBreak: Is diffusion-based purification robust?” inAdvances in Neural Information Processing Systems, 2025. [Online]. Available: https://arxiv.org/abs/2411.16598
-
[12]
Adversarial robustness overestimation and instability in TRADES,
J. W. Li, R.-W. Liang, C.-H. Yeh, C.-C. Tsai, K. Yu, C.-S. Lu, and S.-T. Chen, “Adversarial robustness overestimation and instability in TRADES,” inInternational Conference on Learning Representations, 2025. [Online]. Available: https://arxiv.org/abs/2410.07675
-
[13]
Towards evaluating the robustness of neural networks,
N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” inIEEE Symposium on Security and Privacy, 2017, pp. 39–57. [Online]. Available: https://arxiv.org/abs/1608.04644
-
[14]
Dis- tillation as a defense to adversarial perturbations against deep neural networks,
N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Dis- tillation as a defense to adversarial perturbations against deep neural networks,” inIEEE Symposium on Security and Privacy, 2016, pp. 582–597
2016
-
[15]
Ro- bustBench: A standardized adversarial robustness benchmark,
F. Croce, M. Andriushchenko, V . Sehwag, E. Debenedetti, N. Flammarion, M. Chiang, P. Mittal, and M. Hein, “Ro- bustBench: A standardized adversarial robustness benchmark,” https://robustbench.github.io/, 2021, benchmark suite and living leaderboard
2021
-
[16]
Adapting to evolving adversaries with regularized continual robust training,
S. Dai, C. Cianfarani, A. N. Bhagoji, V . Sehwag, and P. Mittal, “Adapting to evolving adversaries with regularized continual robust training,” inProceedings of the 42nd International Conference on Machine Learning, vol. 267, 2025. [Online]. Available: https://arxiv.org/abs/2502.04248
-
[17]
DiffHammer: Rethinking the robustness of diffusion-based adversarial purification,
K. Xiao, Y . Jin, Y . Zhong, J. Qin, R. Liu, W. Wu, R. Chen, and Y . Yu, “DiffHammer: Rethinking the robustness of diffusion-based adversarial purification,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://openreview.net/forum?id=ZJ2ONmSgCS
2024
-
[18]
Imbalanced gradients: A subtle cause of overestimated adversarial robustness,
X. Guoet al., “Imbalanced gradients: A subtle cause of overestimated adversarial robustness,”Machine Learning, 2024. [Online]. Available: https://arxiv.org/abs/2006.13726
-
[19]
SoundnessBench: A soundness benchmark for neural network verifiers,
Y . Heet al., “SoundnessBench: A soundness benchmark for neural network verifiers,”Transactions on Machine Learning Research, 2025. [Online]. Available: https: //arxiv.org/abs/2412.03154
-
[20]
Molloy, and Ben Edwards.Adver- sarial Robustness Toolbox v1.0.0
M.-I. Nicolae, M. Sinn, M. N. Tran, B. Buesser, A. Rawat, M. Wistuba, V . Zantedeschi, N. Baracaldo, B. Chen, H. Ludwig, I. Molloy, and B. Edwards, “Adversarial robustness toolbox v1.0.0,”arXiv preprint arXiv:1807.01069, 2018
-
[21]
Counterfit: A cli tool for assessing the security of machine learning systems,
Microsoft Corporation, “Counterfit: A cli tool for assessing the security of machine learning systems,” https://github.com/ Azure/counterfit, 2021
2021
-
[22]
garak: A framework for large language model red teaming,
L. Derczynski, E. Berns, H. R. Kirk, and H. Strobelt, “garak: A framework for large language model red teaming,” https: //github.com/NVIDIA/garak, 2024, nVIDIA LLM vulnerability scanner
2024
-
[23]
PyRIT: Python risk identification toolkit for generative AI,
Microsoft AI Red Team, “PyRIT: Python risk identification toolkit for generative AI,” https://github.com/Azure/PyRIT, 2024, multi-turn LLM red teaming automation
2024
-
[24]
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
M. Mazeika, L. Phan, X. Yin, A. Zou, Z. Wang, N. Mu, E. Sakhaee, N. Li, S. Basart, B. Li, D. Forsyth, and D. Hendrycks, “HarmBench: A standardized evaluation framework for automated red teaming and robust refusal,” inProceedings of the 41st International Conference on Machine Learning, vol. 235, 2024. [Online]. Available: https://arxiv.org/abs/2402.04249
work page internal anchor Pith review arXiv 2024
-
[25]
Adversarial machine learning: A taxonomy and terminology of attacks and mitigations,
A. Vassilev, A. Oprea, A. Fordyce, and H. Anderson, “Adversarial machine learning: A taxonomy and terminology of attacks and mitigations,” National Institute of Standards and Technology, Tech. Rep. NIST AI 100-2e2025, 2025. [Online]. Available: https://csrc.nist.gov/pubs/ai/100/2/e2025/final
2025
-
[26]
Universal and Transferable Adversarial Attacks on Aligned Language Models
A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,” inarXiv preprint arXiv:2307.15043, 2023. [Online]. Available: https://arxiv.org/abs/2307.15043 14
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Improved techniques for optimization-based jailbreaking on large language models,
X. Jiaet al., “Improved techniques for optimization-based jailbreaking on large language models,” inInternational Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=e9yfCY7Q3U
2025
-
[28]
Stronger universal and transferable attacks by suppressing refusals,
Z. Huanget al., “Stronger universal and transferable attacks by suppressing refusals,” inProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics, 2025. [Online]. Available: https://aclanthology.org/2025.naacl-long.302/
2025
-
[29]
REINFORCE adversarial attacks on large language models,
S. Geisleret al., “REINFORCE adversarial attacks on large language models,” inProceedings of the 42nd International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=QWpuqidr53
2025
-
[30]
Jailbreaking Black Box Large Language Models in Twenty Queries
P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” inarXiv preprint arXiv:2310.08419, 2024. [Online]. Available: https://arxiv.org/abs/2310.08419
work page internal anchor Pith review arXiv 2024
-
[31]
Tree of Attacks: Jailbreaking Black-Box
A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y . Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box LLMs with auto-generated subversive prompts,” inarXiv preprint arXiv:2312.02119, 2024. [Online]. Available: https://arxiv.org/abs/2312.02119
-
[32]
M. Jin, X. Yu, N. Zhang, and B. Li, “AutoRedTeamer: Autonomous red teaming with lifelong attack integration,” in arXiv preprint arXiv:2503.15754, 2025. [Online]. Available: https://arxiv.org/abs/2503.15754
-
[33]
JailbreakBench: An open robustness benchmark for jailbreaking large language models,
P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “JailbreakBench: An open robustness benchmark for jailbreaking large language models,” inAdvances in Neural Information Processing Systems (Datasets and Benchmarks),
-
[34]
Jail- breakbench: An open robustness benchmark for jailbreaking large language models
[Online]. Available: https://arxiv.org/abs/2404.01318
-
[35]
Dissecting ad- versarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024
C. Wu, J. Yang, S. Zhu, and F. Tramèr, “Dissecting adversarial robustness of multimodal LM agents,” inInternational Conference on Learning Representations, 2025. [Online]. Available: https://arxiv.org/abs/2406.12814
-
[37]
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
[Online]. Available: https://arxiv.org/abs/2406.13352
work page internal anchor Pith review arXiv
-
[38]
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Q. Zhanet al., “InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024. [Online]. Available: https: //arxiv.org/abs/2403.02691
work page internal anchor Pith review arXiv 2024
-
[39]
OWASP top 10 for agen- tic applications,
OWASP Foundation, “OWASP top 10 for agen- tic applications,” https://genai.owasp.org/resource/ owasp-top-10-for-agentic-applications-for-2026/, 2025, aSI01–ASI10 categories for autonomous agent security
2026
-
[40]
MITRE ATLAS: Adversarial threat land- scape for AI systems,
MITRE Corporation, “MITRE ATLAS: Adversarial threat land- scape for AI systems,” https://atlas.mitre.org/, 2026, v5.3.0: 16 tactics, 84 techniques, 32 mitigations
2026
-
[41]
Position: Towards resilience against adversarial examples,
S. Dai, C. Xiang, T. Wu, and P. Mittal, “Position: Towards resilience against adversarial examples,” inProceedings of the 42nd International Conference on Machine Learning (Position),
-
[42]
Available: https://arxiv.org/abs/2405.01349
[Online]. Available: https://arxiv.org/abs/2405.01349
-
[43]
Robustness and cybersecurity in the EU artificial intelligence act,
M. Panfili, J. Schneideret al., “Robustness and cybersecurity in the EU artificial intelligence act,” inACM Conference on Fairness, Accountability, and Transparency, 2025. [Online]. Available: https://arxiv.org/abs/2502.16184
-
[44]
Recent advances in adversarial training for adversarial robustness.arXiv preprint arXiv:2102.01356,
T. Bai, J. Luo, J. Zhao, B. Wen, and Q. Wang, “Recent ad- vances in adversarial training for adversarial robustness,”arXiv preprint arXiv:2102.01356, 2021
-
[45]
Adversarial machine learning in image classification: A survey toward the defender’s perspective,
G. R. Machado, E. Silva, and R. R. Goldschmidt, “Adversarial machine learning in image classification: A survey toward the defender’s perspective,”ACM Computing Surveys, vol. 55, no. 1, pp. 1–38, 2023
2023
-
[46]
S. H. Silva and P. Najafirad, “Opportunities and challenges in deep learning adversarial robustness: A survey,”arXiv preprint arXiv:2007.00753, 2020
-
[47]
arXiv preprint arXiv:2504.18556 , year=
J. Song, X. Zuo, F. Wang, H. Huang, and T. Zhang, “RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features,” in Proceedings of the 41st Conference on Uncertainty in Artificial Intelligence, vol. 286, 2025, pp. 3999–4012. [Online]. Available: https://arxiv.org/abs/2504.18556
-
[48]
OpenRT: Open red teaming for multimodal large language models,
AI45 Lab and collaborators, “OpenRT: Open red teaming for multimodal large language models,” https://github.com/ AI45Lab/OpenRT, 2026, open-source multimodal red-teaming toolkit
2026
-
[49]
Dual randomized smoothing,
“Dual randomized smoothing,” 2026, listed in Auto-ART SOTA roadmap (ICLR 2026); full record not verified
2026
-
[50]
UCAN: Universal asymmetric randomized noise,
“UCAN: Universal asymmetric randomized noise,” 2025, listed in Auto-ART SOTA roadmap; full record not verified
2025
-
[51]
alpha-beta-CROWN: An efficient and scalable neu- ral network verifier,
S. Wang, H. Zhang, K. Xu, X. Lin, S. Jana, C.-J. Hsieh, and J. Z. Kolter, “alpha-beta-CROWN: An efficient and scalable neu- ral network verifier,” https://github.com/Verified-Intelligence/ alpha-beta-CROWN, 2024, vNN-COMP winner 2021–2024
2024
-
[52]
Regulation (EU) 2024/1689: Artificial intelligence act,
European Parliament and Council, “Regulation (EU) 2024/1689: Artificial intelligence act,” Official Journal of the European Union L 2024/1689, 2024, article 15: Accuracy, robustness, and cybersecurity requirements
2024
-
[53]
Artificial intelligence risk management framework: Generative artificial intelligence profile,
National Institute of Standards and Technology, “Artificial intelligence risk management framework: Generative artificial intelligence profile,” NIST, Tech. Rep. AI 600-1, 2024. [Online]. Available: https://airc.nist.gov/Docs/1
2024
-
[54]
OWASP top 10 for large language model applications (2025),
OWASP Foundation, “OWASP top 10 for large language model applications (2025),” https://genai.owasp.org/resource/ owasp-top-10-for-llm-applications-2025/, 2025. 15
2025
-
[55]
ETSI EN 304 223 v2.1.1: Securing artificial intelligence — baseline cy- ber security requirements for AI models and systems,
European Telecommunications Standards Institute, “ETSI EN 304 223 v2.1.1: Securing artificial intelligence — baseline cy- ber security requirements for AI models and systems,” ETSI, 2025, first globally applicable European Standard for AI cyber- security; 13 security principles across 5 lifecycle phases
2025
-
[56]
ISO/IEC DIS 24029-3: Assessment of the robustness of neural networks — part 3: Statistical meth- ods,
ISO/IEC JTC 1/SC 42, “ISO/IEC DIS 24029-3: Assessment of the robustness of neural networks — part 3: Statistical meth- ods,” International Organization for Standardization, 2026, draft International Standard; DIS ballot initiated Feb 2026
2026
-
[57]
MAESTRO: Multi-agent environ- ment, security, threat, risk, and outcome framework,
Cloud Security Alliance, “MAESTRO: Multi-agent environ- ment, security, threat, risk, and outcome framework,” https: //github.com/CloudSecurityAlliance/MAESTRO, 2025, seven- layer threat modeling framework for agentic AI systems
2025
-
[58]
Evaluating the robustness of neural networks: An extreme value theory approach,
T.-W. Weng, H. Zhang, P.-Y . Chen, J. Yi, D. Su, Y . Gao, C.-J. Hsieh, and L. Daniel, “Evaluating the robustness of neural networks: An extreme value theory approach,” in International Conference on Learning Representations, 2018. [Online]. Available: https://arxiv.org/abs/1801.10578
-
[59]
ISO/IEC 42001:2023 — artificial in- telligence management system,
ISO/IEC JTC 1/SC 42, “ISO/IEC 42001:2023 — artificial in- telligence management system,” International Organization for Standardization, 2023. 16 Supplementary Material A Extended Methodology Classification Table 12:Extended methodology classification with reproducibility assessment. Reproducibility (Repro.) is rated on a three-point scale: ✓ = partial (e...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.