pith. sign in

arxiv: 2604.21310 · v1 · submitted 2026-04-23 · 💻 cs.CR · cs.AI

Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations

Pith reviewed 2026-05-09 21:56 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords adversarial malwaredrift detectionevasion attacksnon-stationary environmentssimilarity constraintsfeature space perturbationsmalware detection
0
0 comments X

The pith

Similarity constraints on adversarial malware perturbations can reduce drift signals while still evading classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether attackers can create adversarial malware samples that fool deep learning detectors yet stay hidden from systems that watch for distributional shifts in non-stationary environments. It generates perturbations directly in the classifier's standardized feature space and adds regularizers that force the perturbed samples to stay distributionally close to clean malware. Experiments compare several regularizers and show that the approach lowers measurable drift in classifier output probabilities, with the strength of the perturbation budget controlling how much evasion succeeds versus how much drift appears.

Core claim

By augmenting the adversarial optimization with similarity regularizers that enforce distributional closeness to clean malware in standardized feature space, the generated samples achieve targeted misclassification while producing lower drift signals across multiple metrics; ℓ₂ regularization yields the strongest reduction in these signals, and larger perturbation budgets improve attack success at the cost of increased drift indicators.

What carries the argument

similarity-constrained optimization objective that balances targeted misclassification against drift-signal minimization by penalizing deviations from clean-malware distributions in the standardized feature space

If this is right

  • Similarity constraints measurably lower output drift signals compared with unconstrained adversarial generation.
  • ℓ₂ regularization produces the largest reduction in drift among the tested regularizers.
  • Increasing the perturbation budget raises attack success rates but also increases the size of observable drift indicators.
  • The evasion-detectability trade-off is directly tunable through the choice and strength of the similarity regularizer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Monitoring systems that rely solely on output-probability drift may need additional checks for similarity-preserving changes inside the feature space.
  • Attackers could combine this method with other evasion techniques to create malware that persists longer before triggering retraining or alerts.
  • Defenders might counter by training detectors to recognize the specific distributional patterns introduced by these regularizers.

Load-bearing premise

Constraining perturbations to keep distributional similarity with clean malware in the standardized feature space will reliably minimize the drift signals that real-world monitoring mechanisms would detect.

What would settle it

Running the same similarity-constrained samples through an independent drift monitor that uses features or statistics outside the paper's controlled feature space and output-probability metrics, then checking whether drift signals remain low.

Figures

Figures reproduced from arXiv: 2604.21310 by Lan Zhang, Pawan Acharya.

Figure 1
Figure 1. Figure 1: Architecture overview of the methodology components. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FGSM ASR vs. budget δ [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 6
Figure 6. Figure 6: ASR vs. budget δ (λℓ2 = 1) [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes generating targeted adversarial perturbations to malware samples directly in a classifier's standardized feature space, augmented with similarity regularizers (particularly ℓ₂), to simultaneously achieve misclassification and minimize drift signals in classifier output probabilities. Experiments are said to show that these constraints reduce output drift compared to unconstrained attacks, with ℓ₂ regularization performing best, and that perturbation budget controls the evasion-detectability trade-off in non-stationary malware detection settings.

Significance. If the results hold, the work would provide evidence that similarity-constrained adversarial examples can evade both classification and drift monitoring in evolving malware environments, highlighting a potential gap in current non-stationary detection systems. This could inform the development of more robust monitoring mechanisms that account for feature-space similarity rather than raw distributional shifts.

major comments (2)
  1. Abstract: The central claim that 'similarity constraints can reduce output drift signals, with ℓ₂ regularization showing the most promising results' rests entirely on experiments, yet the manuscript supplies no datasets, feature representations, specific drift metrics on classifier probabilities, baselines, quantitative results, or validation details. This omission is load-bearing because the reported trade-off with perturbation budget cannot be assessed or reproduced from the given description.
  2. Optimization and evaluation sections: The approach balances misclassification against distributional similarity to clean malware in standardized feature space, but provides no evidence that the chosen drift metrics on output probabilities are representative of signals used by actual real-world non-stationary monitoring systems (which may operate on raw binaries, behavioral traces, or unstandardized statistics). Without this link, the claim that the method minimizes detectable drift signals does not follow.
minor comments (1)
  1. Abstract: The phrasing 'comprehensively comparing classifier output probabilities using multiple drift metrics' is vague; specify the exact metrics and comparison method to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our experimental claims and their connection to practical non-stationary detection. We respond to each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: Abstract: The central claim that 'similarity constraints can reduce output drift signals, with ℓ₂ regularization showing the most promising results' rests entirely on experiments, yet the manuscript supplies no datasets, feature representations, specific drift metrics on classifier probabilities, baselines, quantitative results, or validation details. This omission is load-bearing because the reported trade-off with perturbation budget cannot be assessed or reproduced from the given description.

    Authors: We agree that the abstract is high-level and does not enumerate the experimental components. The body of the manuscript contains dedicated sections on the datasets and feature representations employed, the specific drift metrics computed on classifier output probabilities, the baselines used for comparison, quantitative results with tables and figures, and analysis of the perturbation-budget trade-off. To address the referee's concern and improve self-containment, we have revised the abstract to briefly reference the dataset, the drift metrics, and the principal quantitative observations while preserving length constraints. revision: yes

  2. Referee: Optimization and evaluation sections: The approach balances misclassification against distributional similarity to clean malware in standardized feature space, but provides no evidence that the chosen drift metrics on output probabilities are representative of signals used by actual real-world non-stationary monitoring systems (which may operate on raw binaries, behavioral traces, or unstandardized statistics). Without this link, the claim that the method minimizes detectable drift signals does not follow.

    Authors: The drift metrics on output probabilities were selected because they align with common practice in the concept-drift literature for malware classifiers, where shifts in model confidence often serve as triggers for retraining or alerting. We have added a discussion paragraph in the revised evaluation section that cites representative prior work on probability-based drift detection in security settings and explains why these signals remain relevant even when some deployed systems incorporate raw-binary or behavioral features. We acknowledge that our evaluation does not directly benchmark against proprietary real-world pipelines, which would require access not available to us; the added discussion therefore frames the results as evidence for feature-space similarity constraints rather than a universal claim about all monitoring systems. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical trade-off is tested rather than defined by construction

full rationale

The paper defines an optimization objective that adds a similarity regularizer (e.g., ℓ₂) to a misclassification loss in standardized feature space, then separately measures the resulting change in output-probability drift metrics. Because the similarity term is not mathematically identical to any of the drift metrics, and because the reported reduction is an experimental observation rather than a definitional identity, the central result does not reduce to its inputs by construction. No load-bearing self-citations, uniqueness theorems, or fitted-parameter renamings appear in the provided description.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on an optimization objective that introduces tunable parameters for balancing evasion and similarity, plus the assumption that feature-space similarity controls observable drift.

free parameters (2)
  • perturbation budget
    Controls the maximum allowed change; directly affects the reported evasion-detectability trade-off.
  • regularization weight
    Scales the similarity penalty term in the objective; chosen to achieve the claimed drift reduction.
axioms (1)
  • domain assumption Standardized feature space permits meaningful distributional similarity measures between clean and perturbed malware.
    Invoked to justify the regularizer that keeps perturbations inconspicuous to drift monitors.

pith-pipeline@v0.9.0 · 5476 in / 1204 out tokens · 45487 ms · 2026-05-09T21:56:41.361247+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    J Artif Intell Mach Learn & Data Sci 2025 3(3), 2761–2768

    Akerele, S., Adebola, N., Fagbohun, O., et al.: Modern deep learning approaches for malware detection and classification. J Artif Intell Mach Learn & Data Sci 2025 3(3), 2761–2768

  2. [2]

    Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning

    Anderson,H.S.,Kharkar,A.,Filar,B.,Evans,D.,Roth,P.:Learningtoevadestatic pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917 (2018) 14 Acharya and Zhang

  3. [3]

    In: Fourth international work- shop on knowledge discovery from data streams

    Baena-Garcıa, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Fourth international work- shop on knowledge discovery from data streams. vol. 6, pp. 77–86 (2006)

  4. [4]

    In: Brazilian symposium on artificial intelligence

    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Brazilian symposium on artificial intelligence. pp. 286–295. Springer (2004)

  5. [5]

    Advances in neural information processing systems 19(2006)

    Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. Advances in neural information processing systems 19(2006)

  6. [6]

    Journal of Computer Virology and Hacking Techniques20(4), 901–918 (2024)

    Guerra-Manzanares, A., Bahsi, H.: Experts still needed: boosting long-term an- droid malware detection with active learning. Journal of Computer Virology and Hacking Techniques20(4), 901–918 (2024)

  7. [7]

    Expert Systems with Applications206, 117200 (2022)

    Guerra-Manzanares, A., Luckner, M., Bahsi, H.: Android malware concept drift using system calls: detection, characterization and challenges. Expert Systems with Applications206, 117200 (2022)

  8. [8]

    Hinder, V

    Hinder, F., Vaquet, V., Hammer, B.: Adversarial attacks for drift detection. arXiv preprint arXiv:2411.16591 (2024)

  9. [9]

    part a: detecting concept drift

    Hinder, F., Vaquet, V., Hammer, B.: One or two things we know about concept drift—a survey on monitoring in evolving environments. part a: detecting concept drift. Frontiers in Artificial Intelligence7, 1330257 (2024)

  10. [10]

    In: International Conference on Data Mining and Big Data

    Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on gan. In: International Conference on Data Mining and Big Data. pp. 409–423. Springer (2022)

  11. [11]

    arXiv preprint arXiv:2302.00775 (2023)

    Khademi, A., Hopka, M., Upadhyay, D.: Model monitoring and robustness of in- use machine learning models: quantifying data distribution shifts using population stability index. arXiv preprint arXiv:2302.00775 (2023)

  12. [12]

    In: 2018 26th European signal processing conference (EUSIPCO)

    Kolosnjaji,B.,Demontis,A.,Biggio,B.,Maiorca,D.,Giacinto,G.,Eckert,C.,Roli, F.: Adversarial malware binaries: Evading deep learning for malware detection in executables. In: 2018 26th European signal processing conference (EUSIPCO). pp. 533–537. IEEE (2018)

  13. [13]

    arXiv preprint arXiv:1802.04528 (2018)

    Kreuk, F., Barak, A., Aviv-Reuven, S., Baruch, M., Pinkas, B., Keshet, J.: Deceiv- ing end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv:1802.04528 (2018)

  14. [14]

    arXiv preprint arXiv:2407.13918 (2024)

    Li, A.S., Iyengar, A., Kundu, A., Bertino, E.: Revisiting concept drift in win- dows malware detection: Adaptation to real drifted malware with minimal samples. arXiv preprint arXiv:2407.13918 (2024)

  15. [15]

    IEEE Transactions on Information theory37(1), 145–151 (2002)

    Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information theory37(1), 145–151 (2002)

  16. [16]

    Journal of the American statistical Association46(253), 68–78 (1951)

    Massey Jr, F.J.: The kolmogorov-smirnov test for goodness of fit. Journal of the American statistical Association46(253), 68–78 (1951)

  17. [17]

    Villani, C., et al.: Optimal transport: old and new, vol. 338. Springer (2008)

  18. [18]

    In: 4th Deep Learning and Security Workshop (2021)

    Yang, L., Ciptadi, A., Laziuk, I., Ahmadzadeh, A., Wang, G.: Bodmas: An open dataset for learning based temporal analysis of pe malware. In: 4th Deep Learning and Security Workshop (2021)

  19. [19]

    In: Proceedings of the 2022 5th international conference on artificial intelligence and pattern recognition

    Yang, W., Su, R., Cheng, Y., Guo, J.: A concept drift detection approach based on jensen-shannon divergence for network traffic classification. In: Proceedings of the 2022 5th international conference on artificial intelligence and pattern recognition. pp. 982–987 (2022)

  20. [20]

    IEEE Transactions on Dependable and Secure Computing20(2), 1390–1402 (2022)

    Zhang, L., Liu, P., Choi, Y.H., Chen, P.: Semantics-preserving reinforcement learn- ing attack against graph neural networks for malware detection. IEEE Transactions on Dependable and Secure Computing20(2), 1390–1402 (2022)