pith. machine review for the scientific record. sign in

arxiv: 2604.03843 · v1 · submitted 2026-04-04 · 💻 cs.CR · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Explainability-Guided Adversarial Attacks on Transformer-Based Malware Detectors Using Control Flow Graphs

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:06 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords adversarial evasiontransformer malware detectioncontrol flow graphsintegrated gradientsexplainable AIwhite-box attacksPE malware classifiers
0
0 comments X

The pith

A white-box attack uses integrated gradients to swap influential function calls in linearized control flow graphs and force misclassification by transformer malware detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that explainability tools can guide targeted perturbations on sequences derived from control flow graphs, allowing an adversary to replace key function calls with synthetic imports and evade detection without changing program behavior. This matters because transformer models achieve high accuracy on such representations yet remain vulnerable to these structure-preserving changes. A sympathetic reader would care because the same attribution methods used for interpretability also surface precise attack surfaces in security pipelines. Experiments confirm the approach works on both small and large Windows PE datasets even when models reach high accuracy.

Core claim

The central claim is that token- and word-level attributions from integrated gradients on a RoBERTa model processing linearized control flow graph sequences can identify positively attributed function calls; iteratively replacing those calls with synthetic external imports produces adversarial examples that induce misclassification while preserving overall program structure, as shown by reliable evasion on small- and large-scale PE datasets.

What carries the argument

Explainability-guided perturbation using integrated gradients attributions to select and replace influential function-call tokens in sequences obtained by linearizing control flow graphs.

Load-bearing premise

Linearizing control flow graphs into sequences of function calls creates token-level sensitivities that targeted replacements can exploit without changing the program's overall behavior.

What would settle it

A retrained model that incorporates adversarial examples generated by the same integrated-gradients replacement procedure and still resists misclassification would falsify the claim that the attack reliably succeeds.

Figures

Figures reproduced from arXiv: 2604.03843 by Andrew Wheeler, Kshitiz Aryal, Maanak Gupta.

Figure 1
Figure 1. Figure 1: Generalized threat model of malware detection system. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the adversarial generation procedure. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of the Graphene framework. rounds had been completed, all remaining rounds were skipped and the next sample was started. As is demonstrated in equation 1, if we take A as the adversarial generation function, then it can be seen that the function is recursive in nature (i.e. the output of the previous iteration serves as the input for the next). xi = ( A(xi) if i = 0 A(xi−1) if i > 0 (1) Advers… view at source ↗
read the original abstract

Transformer-based malware detection systems operating on graph modalities such as control flow graphs (CFGs) achieve strong performance by modeling structural relationships in program behavior. However, their robustness to adversarial evasion attacks remains underexplored. This paper examines the vulnerability of a RoBERTa-based malware detector that linearizes CFGs into sequences of function calls, a design choice that enables transformer modeling but may introduce token-level sensitivities and ordering artifacts exploitable by adversaries. By evaluating evasion strategies within this graph-to-sequence framework, we provide insight into the practical robustness of transformer-based malware detectors beyond aggregate detection accuracy. This paper proposes a white-box adversarial evasion attack that leverages explainability mechanisms to identify and perturb most influential graph components. Using token- and word-level attributions derived from integrated gradients, the attack iteratively replaces positively attributed function calls with synthetic external imports, producing adversarial CFG representations without altering overall program structure. Experimental evaluation on small- and large-scale Windows Portable Executable (PE) datasets demonstrates that the proposed method can reliably induce misclassification, even against models trained to high accuracy. Our results highlight that explainability tools, while valuable for interpretability, can also expose critical attack surfaces in transformer-based malware detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a white-box adversarial evasion attack against a RoBERTa-based malware detector that linearizes control-flow graphs (CFGs) from Windows PE binaries into sequences of function calls. The attack uses integrated-gradients attributions to identify positively influential tokens and iteratively replaces them with synthetic external imports, producing perturbed sequences that preserve overall program structure. Experiments on small- and large-scale PE datasets are reported to show reliable misclassification even against high-accuracy models, highlighting that explainability tools can expose attack surfaces in graph-to-sequence transformer detectors.

Significance. If the quantitative results hold, the work is significant because it supplies a concrete, explainability-guided attack that exploits the linearization step common to many transformer-based malware detectors. It thereby supplies a falsifiable test of robustness for an increasingly popular modeling choice and supplies a practical method that future defense papers can use as a baseline.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (results): the central claim that the method 'reliably induce[s] misclassification' is load-bearing yet unsupported by any reported success rates, number of attacked samples, dataset statistics, or comparison to non-explainability baselines; without these numbers the experimental evaluation cannot be assessed.
  2. [§3.2] §3.2 (attack construction): the claim that replacements with synthetic imports preserve overall program structure is asserted but not demonstrated; the paper must show that the resulting binaries remain executable and functionally equivalent, otherwise the attack is not realizable and the practical significance is overstated.
minor comments (2)
  1. [Figure 2 and §4.1] Figure 2 and §4.1: axis labels and legend entries are too small to read; enlarge fonts and add a caption that states the exact perturbation budget used.
  2. [§2] Notation: the symbols for token attribution (e.g., IG(v)) and replacement function are introduced without a consolidated table; add a short notation table in §2.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which will help improve the clarity and rigor of our paper. We provide point-by-point responses to the major comments below.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (results): the central claim that the method 'reliably induce[s] misclassification' is load-bearing yet unsupported by any reported success rates, number of attacked samples, dataset statistics, or comparison to non-explainability baselines; without these numbers the experimental evaluation cannot be assessed.

    Authors: We agree with this observation. Upon review, the manuscript does include some experimental details in §4, but they lack the specific quantitative metrics mentioned. In the revised version, we will explicitly report success rates (such as the fraction of adversarial examples that cause misclassification), the exact number of samples attacked, full dataset statistics, and comparisons against non-explainability baselines like random perturbation. These will be added to both the abstract and the results section to support the central claim. revision: yes

  2. Referee: [§3.2] §3.2 (attack construction): the claim that replacements with synthetic imports preserve overall program structure is asserted but not demonstrated; the paper must show that the resulting binaries remain executable and functionally equivalent, otherwise the attack is not realizable and the practical significance is overstated.

    Authors: This is a valid point. The current manuscript asserts preservation of program structure based on the nature of the perturbations (replacing function calls with synthetic imports without changing control flow edges), but does not provide empirical validation. We will revise §3.2 to include demonstrations, such as verification that the modified CFGs correspond to executable binaries on a sample of cases, and discuss any limitations in achieving functional equivalence. If full equivalence cannot be guaranteed for all cases, we will qualify the claims accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical proposal for an explainability-guided adversarial attack on a RoBERTa-based malware detector operating on linearized CFGs. No mathematical derivations, equations, fitted parameters, or first-principles results are described that could reduce to the inputs by construction. The central claims rest on experimental evaluations of misclassification rates on PE datasets, which are independent of the attack construction. The linearization step is presented as an explicit design choice creating exploitable artifacts rather than a self-referential definition. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, or new postulated entities are described in the abstract; the contribution is an empirical attack method built on existing components (RoBERTa, integrated gradients, CFG linearization).

pith-pipeline@v0.9.0 · 5515 in / 1055 out tokens · 37933 ms · 2026-05-13T17:06:50.470933+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

  1. [1]

    Android malware detection using control flow graphs and text analysis,

    A. Muzaffar, A. H. Riaz, and H. Ragab Hassen, “Android malware detection using control flow graphs and text analysis,” inProceedings of the International Conference on Applied Cybersecurity (ACS) 2023, H. Zantout and H. Ragab Hassen, Eds., 2023

  2. [2]

    Technique for IoT malware detection based on control flow graph analysis,

    K. Bobrovnikova, S. Lysenko, B. Savenko, P. Gaj, and O. Savenko, “Technique for IoT malware detection based on control flow graph analysis,”RADIOELECTRONIC AND COMPUTER SYSTEMS, no. 1, pp. 141–153, Feb. 2022. [Online]. Available: http://nti.khai.edu/ojs/ index.php/reks/article/view/reks.2022.1.11

  3. [3]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Aug. 2023, arXiv:1706.03762 [cs]. [Online]. Available: http://arxiv.org/abs/1706.03762

  4. [4]

    A Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms,

    Z. Ma, H. Ge, Y . Liu, M. Zhao, and J. Ma, “A Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms,”IEEE Access, vol. 7, pp. 21 235–21 245, 2019. [Online]. Available: https://ieeexplore.ieee.org/document/8629067/

  5. [5]

    Adversarial malware binaries: Evading deep learning for malware detection in executables,

    B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli, “Adversarial malware binaries: Evading deep learning for malware detection in executables,” in2018 26th European signal processing conference (EUSIPCO). IEEE, 2018, pp. 533–537

  6. [6]

    A survey on adversarial attacks for malware analysis,

    K. Aryal, M. Gupta, M. Abdelsalam, P. Kunwar, and B. Thuraisingham, “A survey on adversarial attacks for malware analysis,”IEEE Access, vol. 13, pp. 428–459, 2024

  7. [7]

    Explainability guided adversarial evasion attacks on malware detectors,

    K. Aryal, M. Gupta, M. Abdelsalam, and M. Saleh, “Explainability guided adversarial evasion attacks on malware detectors,” in2024 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 2024, pp. 1–9

  8. [8]

    Intra-section code cave injection for adversarial evasion attacks on windows pe malware file,

    ——, “Intra-section code cave injection for adversarial evasion attacks on windows pe malware file,”Computers & Security, p. 104690, 2025

  9. [9]

    Malware Detection by Eating a Whole EXE,

    E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware Detection by Eating a Whole EXE,” Oct. 2017, arXiv:1710.09435 [stat]. [Online]. Available: http://arxiv.org/abs/1710. 09435

  10. [10]

    A Generative Adversarial Network Based Approach to Malware Generation Based on Behavioural Graphs,

    R. A. J. McLaren, K. O. Babaagba, and Z. Tan, “A Generative Adversarial Network Based Approach to Malware Generation Based on Behavioural Graphs,” inMachine Learning, Optimization, and Data Science, G. Nicosia, V . Ojha, E. La Malfa, G. La Malfa, P. Pardalos, G. Di Fatta, G. Giuffrida, and R. Umeton, Eds. Cham: Springer Nature Switzerland, 2023, vol. 1381...

  11. [11]

    LiteXGNN: A Lightweight Explainable Graph Neural Network for Malware Detection with Control Flow Graph Explainability,

    K. K. W. Yan, J. Suaboot, M. Puongmanee, and W. Werapun, “LiteXGNN: A Lightweight Explainable Graph Neural Network for Malware Detection with Control Flow Graph Explainability,” in2025 9th International Conference on Information Technology (InCIT). Phuket, Thailand: IEEE, Nov. 2025, pp. 658–664. [Online]. Available: https://ieeexplore.ieee.org/document/11276013/

  12. [12]

    On the consistency of GNN explanations for malware detection,

    H. Shokouhinejad, G. Higgins, R. Razavi-Far, H. Mohammadian, and A. A. Ghorbani, “On the consistency of GNN explanations for malware detection,”Information Sciences, vol. 721, p. 122603, Dec

  13. [13]

    Available: https://linkinghub.elsevier.com/retrieve/pii/ S0020025525007364

    [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ S0020025525007364

  14. [14]

    Enhancing android malware detection explainability through function call graph APIs,

    D. Soi, A. Sanna, D. Maiorca, and G. Giacinto, “Enhancing android malware detection explainability through function call graph APIs,” Journal of Information Security and Applications, vol. 80, p. 103691, Feb. 2024. [Online]. Available: https://linkinghub.elsevier.com/retrieve/ pii/S2214212623002752

  15. [15]

    Hybrid AI for Predictive Cyber Risk Assessment: Federated Graph-Transformer Architecture With Explainability,

    J. Govea, R. Gutierrez, W. Villegas-Ch, and A. Maldonado Navarro, “Hybrid AI for Predictive Cyber Risk Assessment: Federated Graph-Transformer Architecture With Explainability,”IEEE Access, vol. 13, pp. 122 187–122 206, 2025. [Online]. Available: https: //ieeexplore.ieee.org/document/11077151/

  16. [16]

    Graphene: Leveraging Transform- ers with Control Flow Modalities for Malware Detection,

    A. Wheeler, K. Aryal, and M. Gupta, “Graphene: Leveraging Transform- ers with Control Flow Modalities for Malware Detection,” in5th IEEE International Conference on AI in Cybersecurity (ICAIC). Houston, TX, United States: IEEE, 2026

  17. [17]

    Subgraph-Based Adversarial Examples Against Graph-Based IoT Malware Detection Systems,

    A. Abusnaina, H. Alasmary, M. Abuhamad, S. Salem, D. Nyang, and A. Mohaisen, “Subgraph-Based Adversarial Examples Against Graph-Based IoT Malware Detection Systems,” inComputational Data and Social Networks, A. Tagarelli and H. Tong, Eds. Cham: Springer International Publishing, 2019, vol. 11917, pp. 268–281, series Title: Lecture Notes in Computer Scienc...

  18. [18]

    Shield Broken: Black-Box Adversarial Attacks on LLM-Based Vulnerability Detectors,

    Y . Jiang, S. Huang, C. Treude, X. Su, and T. Wang, “Shield Broken: Black-Box Adversarial Attacks on LLM-Based Vulnerability Detectors,”IEEE Transactions on Software Engineering, vol. 52, no. 1, pp. 246–265, Jan. 2026. [Online]. Available: https://ieeexplore.ieee.org/ document/11271845/

  19. [19]

    Sok: Leveraging transformers for malware analysis,

    P. Kunwar, K. Aryal, M. Gupta, M. Abdelsalam, and E. Bertino, “Sok: Leveraging transformers for malware analysis,”IEEE Transactions on Dependable and Secure Computing, 2025

  20. [20]

    Explainability-informed targeted malware misclassification,

    Q. Card, K. Aryal, and M. Gupta, “Explainability-informed targeted malware misclassification,” in2024 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 2024, pp. 1–8

  21. [21]

    Available: https://captum.ai/

    Meta Open Source, “Captum.” [Online]. Available: https://captum.ai/

  22. [22]

    PE malware machine learning dataset

    M. Lester, “PE malware machine learning dataset.” [Online]. Available: https://practicalsecurityanalytics.com/ pe-malware-machine-learning-dataset/

  23. [23]

    Available: https://bazaar.abuse.ch/

    [Online]. Available: https://bazaar.abuse.ch/

  24. [24]

    radare2

    “radare2.” [Online]. Available: https://radare.org/n/radare2.html