arxiv: 2604.20495 · v1 · submitted 2026-04-22 · 💻 cs.CR · cs.LG

Recognition: unknown

Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks

Nandakrishna Giri , Asmitha K. A. , Serena Nicolazzo , Antonino Nocera , Vinod P

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:05 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords certified robustnessmalware detectionrandomized smoothingevasion attacksfeature ablationWilson score intervalmetamorphic malwareadversarial machine learning

0 comments

The pith

Randomized smoothing with majority voting and Wilson intervals provides certifiable robustness for malware detectors against feature perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework that adds mathematical robustness guarantees to machine learning-based static malware detectors. It creates multiple ablated versions of an executable by removing features and adding targeted noise, classifies each version, and decides the final label by majority vote. The distribution of votes for the top class is then analyzed with the Wilson score interval to produce a formal certificate that the label stays fixed under any changes to features inside a defined radius. The method requires no changes to the base classifier and is tested on clean executables plus variants produced by metamorphic mutation engines. A reader would care because current static detectors can be evaded by simple transformations, and this approach aims to close that gap with provable limits on attack size.

Core claim

By generating multiple ablated variants of an executable, classifying them with a smoothed classifier, and identifying the final label based on the majority vote, the system derives a formal certificate from the top-class voting distribution and the Wilson score interval. This certificate guarantees robustness within a specific radius against feature-space perturbations, providing provable guarantees against metamorphic evasion attacks without requiring modifications to the underlying machine learning architecture.

What carries the argument

Randomized smoothing via feature ablation and targeted noise injection, with majority voting over the resulting classifications and Wilson score interval analysis on the vote distribution to compute the robustness certificate.

If this is right

The smoothed classifier maintains detection performance on clean executables comparable to the base model.
Robustness guarantees apply to feature-space changes without any modification to the original machine learning architecture.
The certificate covers perturbations generated by metamorphic engines such as PyMetaEngine on executable files.
Certification works directly on the discrete feature representations typical of static malware analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ablation-and-vote approach could be tested on other security tasks that use discrete input features, such as network packet classification.
Varying the number of ablations or the noise level would likely change the certified radius, offering a tunable trade-off between certification strength and computational cost.
Attacks designed to exploit the specific ablation pattern rather than generic metamorphic mutations would serve as a direct test of whether the coverage assumption holds in practice.

Load-bearing premise

The chosen feature ablations and noise injection must sufficiently cover the distribution of real metamorphic evasion attacks, and the majority-vote plus Wilson interval construction must yield a valid robustness certificate for the discrete feature space of executables.

What would settle it

Discovery of a metamorphic variant that alters the malware label while remaining inside the certified perturbation radius would show the certificate does not hold.

Figures

Figures reproduced from arXiv: 2604.20495 by Antonino Nocera, Asmitha K. A., Nandakrishna Giri, Serena Nicolazzo, Vinod P.

**Figure 2.** Figure 2: The Feature Mutation Process, demonstrating group-wise feature ablation and targeted noise injection for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The Inference phase pipeline Since plower > 0.5, the prediction of the majority class is certifiably robust. The corresponding certified radius is then given by: R = σ · Φ −1 (plower) = 0.3 · Φ −1 (0.78) ≈ 0.23. (3) Formally, this implies that for any perturbed input x ′ such that ∥x ′ − x∥2 < R, the prediction of the smoothed classifier remains invariant. Hence, the classifier is provably robust within an… view at source ↗

**Figure 4.** Figure 4: Confusion matrices comparing Base Classifier (BC) and Smoothed Classifier (SC) under clean and synthetic [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Magnified t-SNE projection of 15 micro-mutated pairs, illustrating the exact perturbation trajectory between [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Demonstrating the certified radius, calculated by the EMBER-trained Smoothed Classifier, across various [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Confusion Matrices comparing the Base and Smoothed MalConv architectures on clean data and under the [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Intersection of misclassified samples across the baseline and smoothed MalConv architectures, highlighting [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Machine learning-based static malware detectors remain vulnerable to adversarial evasion techniques, such as metamorphic engine mutations. To address this vulnerability, we propose a certifiably robust malware detection framework based on randomized smoothing through feature ablation and targeted noise injection. During evaluation, our system analyzes an executable by generating multiple ablated variants, classifies them by using a smoothed classifier, and identifies the final label based on the majority vote. By analyzing the top-class voting distribution and the Wilson score interval, we derive a formal certificate that guarantees robustness within a specific radius against feature-space perturbations. We evaluate our approach by comparing the performance of the base classifier and the smoothed classifier on both clean executables and ablated variants generated using PyMetaEngine. Our results demonstrate that the proposed smoothed classifier successfully provides certifiable robustness against metamorphic evasion attacks without requiring modifications to the underlying machine learning architecture.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts randomized smoothing to static malware detection via feature ablation and Wilson-score intervals, but the claimed formal certificate uses an approximate bound that falls short of standard rigorous proofs.

read the letter

The core contribution is a smoothed classifier for PE executables that generates ablated variants, takes a majority vote, and then applies the Wilson score interval to the vote counts to produce a certified radius against feature perturbations. They test this on clean files and on variants produced by PyMetaEngine without retraining the underlying model. That is a reasonable domain-specific move and the empirical comparison between base and smoothed accuracy is straightforward to follow.

Referee Report

2 major / 2 minor

Summary. The paper claims to develop a certifiably robust malware detection framework for static ML-based detectors of PE executables. It applies randomized smoothing via feature ablation and targeted noise injection: multiple ablated variants are generated and classified by a base model, the final label is set by majority vote, and a formal robustness certificate is derived by feeding the empirical top-class vote fraction into the Wilson score interval to obtain a lower bound p_lower on the smoothed probability, with the certified radius defined as the largest r such that p_lower exceeds the decision threshold (e.g., 1/2). The approach is evaluated by comparing base and smoothed classifiers on clean samples and on metamorphic variants produced by PyMetaEngine.

Significance. If the certificate construction can be made rigorous, the work would offer a meaningful advance by adapting randomized-smoothing ideas to the discrete, high-dimensional feature space of malware binaries and by providing the first explicit robustness radius against metamorphic evasion. The evaluation on real metamorphic engines supplies a practical testbed that is absent from most theoretical smoothing papers. The absence of any architectural changes to the base detector is also a practical strength.

major comments (2)

[Abstract / certificate construction] Abstract and certificate derivation: the formal certificate is obtained by plugging the empirical top-class vote fraction into the Wilson score interval to produce p_lower and then taking the largest r with p_lower > 1/2. The Wilson interval is a normal approximation whose coverage probability can fall below the nominal 1-δ level for small ablation counts or extreme vote fractions; the manuscript supplies neither an explicit conversion to a conservative one-sided bound (e.g., Clopper-Pearson or Hoeffding) nor a proof that the resulting p_lower is a valid lower bound on the true smoothed probability with probability at least 1-δ.
[Abstract / method] Abstract and method: the binomial model underlying the Wilson interval requires that each ablation is an independent draw from the noise distribution. In the discrete feature space of PE executables, the concrete ablation procedure and the feature extractor may introduce statistical dependence among the ablated samples, violating the independence assumption required for the interval to be valid. No verification or correction for this dependence is reported.

minor comments (2)

[Abstract] The abstract does not state the number of ablations used to compute the vote fraction or the precise formula for the certified radius; both are needed to reproduce the claimed guarantees.
[Evaluation] The evaluation section should report the empirical coverage of the Wilson interval on held-out data (i.e., how often the certified label is correct when the certificate is issued) to allow readers to assess the practical tightness of the bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract / certificate construction] Abstract and certificate derivation: the formal certificate is obtained by plugging the empirical top-class vote fraction into the Wilson score interval to produce p_lower and then taking the largest r with p_lower > 1/2. The Wilson interval is a normal approximation whose coverage probability can fall below the nominal 1-δ level for small ablation counts or extreme vote fractions; the manuscript supplies neither an explicit conversion to a conservative one-sided bound (e.g., Clopper-Pearson or Hoeffding) nor a proof that the resulting p_lower is a valid lower bound on the true smoothed probability with probability at least 1-δ.

Authors: We agree that the Wilson score interval is an approximation whose finite-sample coverage can fall short of the nominal level. In the revised manuscript we will replace it with the one-sided Clopper-Pearson interval, which supplies an exact conservative lower bound for any sample size and any vote fraction. We will also add a short appendix proof establishing that the resulting p_lower is a valid (1-δ)-lower bound on the smoothed probability. These changes will be reflected in both the abstract and the certificate derivation section. revision: yes
Referee: [Abstract / method] Abstract and method: the binomial model underlying the Wilson interval requires that each ablation is an independent draw from the noise distribution. In the discrete feature space of PE executables, the concrete ablation procedure and the feature extractor may introduce statistical dependence among the ablated samples, violating the independence assumption required for the interval to be valid. No verification or correction for this dependence is reported.

Authors: Each ablated variant is produced by an independent random draw from the ablation distribution; the feature extractor and base classifier then act deterministically on that variant. Consequently the sequence of classification outcomes is i.i.d. conditional on the original input. We will add a clarifying paragraph in the method section that makes this independence explicit and briefly discusses why the deterministic nature of the extractor does not induce dependence across independent ablations. revision: partial

Circularity Check

0 steps flagged

No circularity: certificate applies standard Wilson interval to empirical votes without self-referential reduction

full rationale

The derivation computes an empirical top-class vote fraction from ablated samples, then applies the Wilson score interval to obtain a lower bound p_lower and selects the largest radius r where p_lower exceeds the decision threshold. This is a direct, one-way application of a pre-existing statistical procedure to observed counts; the interval formula and radius selection rule are independent of the paper's data and do not redefine any quantity in terms of itself. No self-citations, fitted parameters renamed as predictions, or ansatzes smuggled via prior work appear as load-bearing steps. The construction therefore remains self-contained against external statistical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that randomized smoothing and statistical intervals transfer from continuous to discrete executable feature spaces; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Majority vote over ablated and noised variants yields a valid robustness certificate via Wilson score interval
Invoked when deriving the formal certificate from the voting distribution.

pith-pipeline@v0.9.0 · 5454 in / 1173 out tokens · 38166 ms · 2026-05-10T00:05:42.559872+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 3 canonical work pages

[1]

Quantifying the impact of adversarial evasion attacks on machine learning based android malware classifiers

Zainab Abaid, Mohamed Ali Kaafar, and Sanjay Jha. Quantifying the impact of adversarial evasion attacks on machine learning based android malware classifiers. In2017 IEEE 16th international symposium on network computing and applications (NCA), pages 1–10. IEEE, 2017

2017
[2]

Anderson and Phil Roth

Hyrum S. Anderson and Phil Roth. Ember: An open dataset for training static pe malware machine learning models, 2018

2018
[3]

Deep learning vs

KA Asmitha, Vinod Puthuvath, KA Rafidha Rehiman, and SL Ananth. Deep learning vs. adversarial noise: a battle in malware image analysis.Cluster Computing, 27(7):9191–9220, 2024

2024
[4]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–1320. PMLR, 2019

2019
[5]

Certrob: Detecting pdf malware with certified adversarial robustness via randomization smoothing

Lijun Gao and Zheng Yan. Certrob: Detecting pdf malware with certified adversarial robustness via randomization smoothing. In2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 944–951. IEEE, 2024

2024
[6]

Certified robustness of static deep learning-based malware detectors against patch and append attacks

Daniel Gibert, Giulio Zizzo, and Quan Le. Certified robustness of static deep learning-based malware detectors against patch and append attacks. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 173–184, 2023

2023
[7]

Adversarial robustness of deep learning-based malware detectors via (de) randomized smoothing.IEEE Access, 12:61152–61162, 2024

Daniel Gibert, Giulio Zizzo, Quan Le, and Jordi Planes. Adversarial robustness of deep learning-based malware detectors via (de) randomized smoothing.IEEE Access, 12:61152–61162, 2024. 11 Towards Certified Malware Detection Figure 4: Confusion matrices comparing Base Classifier (BC) and Smoothed Classifier (SC) under clean and synthetic noise conditions

2024
[8]

Confidence-aware training of smoothed classifiers for certified robustness

Jongheon Jeong, Seojin Kim, and Jinwoo Shin. Confidence-aware training of smoothed classifiers for certified robustness. In Proceedings of the AAAI conference on artificial intelligence, volume 37(7), pages 8005–8013, 2023

2023
[9]

arXiv preprint arXiv:1802.03471 , year=

Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy.arXiv preprint arXiv:1802.03471, 2018

work page arXiv 2018
[10]

(de) randomized smoothing for certifiable defense against patch attacks.Advances in Neural Information Processing Systems, 33:6465–6475, 2020

Alexander Levine and Soheil Feizi. (de) randomized smoothing for certifiable defense against patch attacks.Advances in Neural Information Processing Systems, 33:6465–6475, 2020

2020
[11]

Certified adversarial robustness with additive noise.Advances in neural information processing systems, 32, 2019

Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certified adversarial robustness with additive noise.Advances in neural information processing systems, 32, 2019

2019
[12]

Malware Detection by Eating a Whole EXE,

Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles Nicholas. Malware detection by eating a whole exe.arXiv preprint arXiv:1710.09435, 2017

work page arXiv 2017
[13]

Drsm: de-randomized smoothing on malware classifier providing certified robustness.arXiv preprint arXiv:2303.13372, 2023

Shoumik Saha, Wenxiao Wang, Yigitcan Kaya, Soheil Feizi, and Tudor Dumitras. Drsm: de-randomized smoothing on malware classifier providing certified robustness.arXiv preprint arXiv:2303.13372, 2023. A Appendix This appendix reports additional experiment results. A.1 Baseline and Synthetic Noise Evaluation The confusion matrices show that both models achie...

work page arXiv 2023