DroidBreaker: Practical and Functional Problem-Space Attacks on Machine-Learning Android Malware Detectors

Angelo Sotgiu; Battista Biggio; Christian Scano; Davide Maiorca; Diego Soi; Fabio Roli; Giorgio Giacinto; Luca Demetrio

arxiv: 2606.26707 · v1 · pith:HWRMLZWAnew · submitted 2026-06-25 · 💻 cs.CR · cs.LG

DroidBreaker: Practical and Functional Problem-Space Attacks on Machine-Learning Android Malware Detectors

Christian Scano , Diego Soi , Angelo Sotgiu , Luca Demetrio , Davide Maiorca , Giorgio Giacinto , Fabio Roli , Battista Biggio This is my paper

Pith reviewed 2026-06-26 04:36 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords Android malware detectionadversarial examplesproblem-space attacksevasion attacksAPK manipulationsemantics preservationmachine learning security

0 comments

The pith

Targeted, build-safe changes to Android apps can evade machine-learning malware detectors while preserving original behavior through execution-trace matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing problem-space attacks on Android malware detectors often fail in practice: they inject large benign modules that create side effects, break builds, or alter app semantics, and they rely on weak validation that only checks installation. DroidBreaker instead selects the most influential APK components for manipulation, applies fine-grained operations on API calls, permissions, modules, and URLs, and enforces functionality by requiring identical execution logs and API traces before and after the change. When tested, this produces high evasion rates against both white-box and black-box models with few queries and also sharply lowers detection rates on commercial VirusTotal scanners.

Core claim

DroidBreaker demonstrates that a problem-space attack can remain both practical and functional by restricting edits to model-influential components, using only build-safe manipulations, and validating semantic equivalence via direct comparison of runtime logs and API traces, thereby achieving high evasion in white- and black-box settings while reducing detections by commercial scanners.

What carries the argument

The semantics-preserving functionality test that enforces runtime equivalence by comparing execution logs and API-level traces between original and modified APKs, paired with fine-grained build-safe manipulations of the most model-influential components.

If this is right

Attacks succeed with few model queries in both white-box and black-box access models.
Modified APKs produce minimal side-effect features and remain buildable.
Detection counts drop substantially when the same APKs are submitted to commercial scanners on VirusTotal.
The same manipulation set works across recent corpora of Android applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses that ignore runtime behavior may need to add behavioral equivalence checks or broader feature monitoring.
The same component-selection and trace-matching approach could be adapted to test robustness of detectors on other mobile platforms.
Automated pipelines that generate functional adversarial samples become feasible once the build and semantics constraints are solved.

Load-bearing premise

That matching execution logs and API traces between the original and modified APK is sufficient to guarantee the app still performs its intended behavior in practice.

What would settle it

A test set of modified APKs that pass the log-and-trace equivalence check yet fail to carry out the same user-facing or malicious actions when installed and run on real devices.

Figures

Figures reproduced from arXiv: 2606.26707 by Angelo Sotgiu, Battista Biggio, Christian Scano, Davide Maiorca, Diego Soi, Fabio Roli, Giorgio Giacinto, Luca Demetrio.

**Figure 2.** Figure 2: DROIDBREAKER workflow. (1) Attack Initialization selects the top-k transformations, discarding those that break repackaging or have negligible impact on the model. (2) Attack Optimization manipulates APK components via query-efficient white- and black-box attacks. (3) Functionality Testing uses dynamic analysis to ensure semantics preservation. This step does not require querying the target detector and gu… view at source ↗

**Figure 3.** Figure 3: DROIDBREAKER URL (string) injection (in blue). of the corresponding classes inside APKs’ DEX code. Hardware Feature and Permission Injection. We inject Hardware Features (C2) and Permissions (C3) by adding new entries in the manifest. These injections are inherently semantics-preserving because they do not introduce executable code or runtime behavior that could disrupt functionality. API Injection. We inj… view at source ↗

**Figure 5.** Figure 5: White-box ASR vs. number of modified features on [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Boxplots of the distribution of the number of mod [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Boxplots of the distribution of the number of modified components by D [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: ASR of DROIDBREAKER against APIGraph with ELSA dataset, using build-safe (green line) and build-safe + model-influential transformations (orange line). 4) Ablation Study on Model-influential Transformations: We conduct an ablation study to quantify the impact of modelinfluential transformations on the attack efficacy. Specifically, 12 [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 10.** Figure 10: App Module (C1) renaming in the manifest (line 6) without encoding the tokens that are also within the package name. to reject the app with a fatal error invalidating its functionality. C. DROIDBREAKER Manipulation Details We outline here the implementation details of the manipulations presented in Sect. III-A, explaining how they preserve functionality and differ from prior work. Component Obfuscation. … view at source ↗

**Figure 11.** Figure 11: Obfuscation through Call Indirection. (a) shows the [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: DROIDBREAKER API Reflection. (a) shows the smali code of the original function b with the call to the target API in blue (C5), while (b) shows the obfuscated counterpart with the corresponding side-effect in red. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: DROIDBREAKER manifest components Injection, i.e., Permissions (C3), Hardware Features (C2), App Modules (C1), and Intent Filers (C4). 1 .class public Lcom/apiinjectionmanager/ApiInjection; 2 .super Ljava/lang/Object; 3 4 .method public static inject()V 5 .registers 6 6 const/4 v0, 0x1 7 const/4 v1, 0x0 8 /* Logging */ 9 if-nez v0, :impossible 10 invoke-static {v2}, ,→ Landroid/os/Binder;->getCallingPid()I… view at source ↗

**Figure 14.** Figure 14: DROIDBREAKER API Injection. inject holds the injected API (C5). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

read the original abstract

Adversarial APKs are Android applications modified in the problem space to evade machine-learning malware detectors. In this work, we first show that, despite claims, existing problem-space attacks remain largely impractical. Most techniques leverage software transplantation to inject entire benign modules, introducing many side-effect features and often causing build-time failures. Fine-grained methods that inject only a narrow subset of components exhibit limited effectiveness, while those that also use obfuscation rely on brittle bytecode rewriting, producing APKs that are syntactically valid but semantically unusable. Prior work further overestimates attack success rates by running smoke tests that only validate installation and basic execution, without assessing whether the modified APK still preserves its intended behavior. To overcome these limitations, we present DROIDBREAKER, a practical (build-safe) and functional (semantics-preserving) problem-space attack framework that provides: (i) query-efficient white- and black-box attacks by manipulating only the APK components most influential to the target model; (ii) a set of fine-grained, build-safe manipulations (including injection and obfuscation of API calls, app modules, permissions, and URLs) with minimal side effects; and (iii) a semantics-preserving functionality test that enforces runtime equivalence by comparing execution logs and API-level traces between the initial and the modified APK. Evaluated on a recent corpus of Android applications, DROIDBREAKER achieves high evasion rates with few queries and minimal side effects in both white-box and black-box settings, and drastically reduces detections by commercial malware scanners hosted on VirusTotal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DroidBreaker flags real flaws in how prior problem-space attacks on Android detectors were tested and built, but the abstract supplies zero numbers or validation details for its own claims.

read the letter

The central takeaway is that many existing problem-space attacks on ML Android malware detectors fall short on practicality. Injecting whole benign modules creates too many side effects and build failures, narrow injections lack power, and obfuscation often breaks the app. Smoke tests that only check installation miss whether behavior is preserved.

The paper does a solid job laying out those specific shortcomings and proposing fixes: query-efficient attacks that target only the most influential APK components, fine-grained build-safe changes to APIs, modules, permissions, and URLs, plus a runtime test that compares execution logs and API traces for equivalence between original and modified apps.

The soft spot is exactly what the stress-test note flags. The abstract asserts high evasion rates with few queries, minimal side effects, and sharp drops in VirusTotal detections, yet it gives no pass rates for the functionality test, no trace similarity scores, no measured deltas in API calls or permissions, and no failure cases. Without those, the claim that the test ensures usable APKs rests on assertion rather than evidence.

This is for people working on mobile malware detection and adversarial robustness. A reader who needs to understand current attack limitations would get value from the problem breakdown and the proposed test structure. The work shows clear engagement with prior limitations rather than hand-waving.

It deserves peer review. The identified gaps are legitimate and the engineering approach is described clearly enough to be checked once the experiments are in front of referees.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces DroidBreaker, a problem-space attack framework targeting machine-learning Android malware detectors. It critiques prior work for relying on software transplantation (causing side effects and build failures), limited fine-grained methods, brittle obfuscation, and overestimation via smoke tests that ignore semantic preservation. DroidBreaker claims to overcome these via (i) query-efficient white- and black-box attacks manipulating only model-influential APK components, (ii) fine-grained build-safe manipulations (API calls, modules, permissions, URLs) with minimal side effects, and (iii) a semantics-preserving functionality test enforcing runtime equivalence via execution logs and API traces. On a recent Android corpus, it reports high evasion rates with few queries, minimal side effects, and drastic reductions in VirusTotal detections by commercial scanners.

Significance. If the empirical claims of high evasion under preserved functionality are quantitatively validated, the work would be significant for adversarial ML in security. It supplies a concrete engineering framework with explicit manipulation primitives and a testable semantics criterion, directly addressing practicality gaps that prior problem-space attacks have left open. This could inform both attack realism assessments and the design of detectors that account for build-safe, trace-preserving modifications.

major comments (1)

[Abstract] Abstract: the central claims that DROIDBREAKER 'achieves high evasion rates with few queries and minimal side effects' and 'drastically reduces detections by commercial malware scanners' are stated without any reported metrics (evasion percentages, query counts, side-effect deltas such as Δ API calls or permission changes, functionality-test pass rates, or trace-similarity scores). The semantics-preserving test is described but supplies no quantitative validation, leaving the load-bearing premise of usable, behavior-preserving APKs unverified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for quantitative support in the abstract. We address this point directly below and agree that revisions are warranted.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims that DROIDBREAKER 'achieves high evasion rates with few queries and minimal side effects' and 'drastically reduces detections by commercial malware scanners' are stated without any reported metrics (evasion percentages, query counts, side-effect deltas such as Δ API calls or permission changes, functionality-test pass rates, or trace-similarity scores). The semantics-preserving test is described but supplies no quantitative validation, leaving the load-bearing premise of usable, behavior-preserving APKs unverified.

Authors: We agree that the abstract would be strengthened by including specific metrics to substantiate the claims. The evaluation section of the manuscript reports these results (evasion rates, query efficiency, side-effect measurements, and functionality-test outcomes including pass rates and trace similarity), but the abstract itself does not. We will revise the abstract to incorporate representative quantitative findings from the experiments, such as evasion percentages, average query counts, deltas in manipulated features, and the pass rate of the runtime equivalence test. This change directly addresses the concern without requiring new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical engineering framework with no derivations or self-referential predictions

full rationale

The paper presents DROIDBREAKER as an empirical attack framework evaluated on Android APKs, with claims resting on experimental evasion rates and VirusTotal results rather than any mathematical derivation chain. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or described structure. The functionality test and manipulations are presented as engineering contributions validated by evaluation, not reduced to inputs by construction. This is a standard non-finding for an applied security paper whose central claims are falsifiable via external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivation is present; the paper is an empirical security engineering contribution. No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.1-grok · 5833 in / 1181 out tokens · 25748 ms · 2026-06-26T04:36:14.599618+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 2 canonical work pages

[1]

Unmasking the veiled: A comprehensive analysis of android evasive malware,

A. Ruggia, D. Nisi, S. Dambra, A. Merlo, D. Balzarotti, and S. Aonzo, “Unmasking the veiled: A comprehensive analysis of android evasive malware,” inProceedings of the 19th ACM Asia Conference on Com- puter and Communications Security, 2024, pp. 383–398

2024
[2]

Drebin: Effective and explainable detection of android malware in your pocket

D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket.” inNdss, vol. 14, 2014, pp. 23–26

2014
[3]

Androzoo: Collecting millions of android apps for the research community,

K. Allix, T. F. Bissyand ´e, J. Klein, and Y . Le Traon, “Androzoo: Collecting millions of android apps for the research community,” in Proceedings of the 13th international conference on mining software repositories, 2016, pp. 468–471

2016
[4]

Yes, machine learning can be more secure! a case study on android malware detection,

A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli, “Yes, machine learning can be more secure! a case study on android malware detection,”IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 4, pp. 711–724, 2017

2017
[5]

Mamadroid: Detecting android malware by building markov chains of behavioral models,

E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, and G. Stringhini, “Mamadroid: Detecting android malware by building markov chains of behavioral models,” 2017

2017
[6]

TESSERACT: Eliminating experimental bias in malware classification across space and time,

F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, “TESSERACT: Eliminating experimental bias in malware classification across space and time,” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 729–746. [Online]. Available: https://www.usenix.org/system/files/ sec19-pendlebury.pdf

2019
[7]

Adversarial examples for malware detection,

K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial examples for malware detection,” inComputer Security– ESORICS 2017: 22nd European Symposium on Research in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II

2017
[8]

Springer, 2017, pp. 62–79

2017
[9]

Intriguing properties of adversarial ml attacks in the problem space,

F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro, “Intriguing properties of adversarial ml attacks in the problem space,” in2020 IEEE symposium on security and privacy (SP). IEEE, 2020, pp. 1332–1349

2020
[10]

Adversarial deep ensemble: Evasion attacks and defenses for malware detection,

D. Li and Q. Li, “Adversarial deep ensemble: Evasion attacks and defenses for malware detection,”IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3886–3900, 2020

2020
[11]

Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps,

W. Yang, D. Kong, T. Xie, and C. A. Gunter, “Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps,” inProceedings of the 33rd Annual Computer Security Applications Conference, 2017, pp. 288–302

2017
[12]

Gendroid: A query-efficient black-box android adversarial attack framework,

G. Xu, H. Shao, J. Cui, H. Bai, J. Li, G. Bai, S. Liu, W. Meng, and X. Zheng, “Gendroid: A query-efficient black-box android adversarial attack framework,”Computers & Security, vol. 132, p. 103359, 2023

2023
[13]

Efficient query-based attack against ml-based android malware detection under zero knowledge setting,

P. He, Y . Xia, X. Zhang, and S. Ji, “Efficient query-based attack against ml-based android malware detection under zero knowledge setting,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 90–104

2023
[14]

Evadedroid: A practical evasion attack on machine learning for black-box android malware detection,

H. Bostani and V . Moonsamy, “Evadedroid: A practical evasion attack on machine learning for black-box android malware detection,”Computers & Security, vol. 139, p. 103676, 2024

2024
[15]

Automated software transplantation,

E. T. Barr, M. Harman, Y . Jia, A. Marginean, and J. Petke, “Automated software transplantation,” inProceedings of the 2015 International Symposium on Software Testing and Analysis, 2015, pp. 257–269

2015
[16]

Eagle: Evasion attacks guided by local explanations against android malware classification,

Z. Shu and G. Yan, “Eagle: Evasion attacks guided by local explanations against android malware classification,”IEEE Transactions on Depend- able and Secure Computing, vol. 21, no. 4, pp. 3165–3182, 2024

2024
[17]

Android HIV: A study of repackaging malware for evading machine-learning detection,

X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y . Xiang, and K. Ren, “Android HIV: A study of repackaging malware for evading machine-learning detection,”IEEE Transactions on Information Forensics and Security, vol. 15, pp. 987–1001, 2019

2019
[18]

Black-box adversarial example attack towards fcg based android malware detection under incomplete feature information,

H. Li, Z. Cheng, B. Wu, L. Yuan, C. Gao, W. Yuan, and X. Luo, “Black-box adversarial example attack towards fcg based android malware detection under incomplete feature information,” in Proceedings of the 32nd USENIX Conference on Security Symposium, ser. SEC ’23. USA: USENIX Association, 2023. [Online]. Available: https://www.usenix.org/system/files/sec2...

2023
[19]

Virustotal,

“Virustotal,” https://www.virustotal.com/, accessed on May 2025

2025
[20]

Structural attack against graph based android malware detection,

K. Zhao, H. Zhou, Y . Zhu, X. Zhan, K. Zhou, J. Li, L. Yu, W. Yuan, and X. Luo, “Structural attack against graph based android malware detection,” inProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 3218–3235

2021
[21]

On evaluating adversarial robustness,

N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin, “On evaluating adversarial robustness,”arXiv preprint arXiv:1902.06705, 2019

Pith/arXiv arXiv 1902
[22]

Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,

S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y . Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,”SIGPLAN Not., vol. 49, no. 6, p. 259–269, Jun. 2014. [Online]. Available: https://doi.org/10.1145/2666356.2594299

work page doi:10.1145/2666356.2594299 2014
[23]

Cuckoodroid,

“Cuckoodroid,” https://github.com/idanr1986/cuckoo-droid, accessed on August 2025

2025
[24]

Android runtime and dalvik,

Google, “Android runtime and dalvik,” https://source.android.com/docs/ core/runtime?hl=en, accessed on December 2025

2025
[25]

“Monkey,” https://developer.android.com/studio/test/other-testing-tools/ monkey, accessed on May 2025

2025
[26]

Get in Researchers; We’re Measuring Reproducibility

D. Olszewski, A. Lu, C. Stillman, K. Warren, C. Kitroser, A. Pascual, D. Ukirde, K. Butler, and P. Traynor, “”get in researchers; we’re measuring reproducibility”: A reproducibility study of machine learning papers in tier 1 security conferences,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. New Yo...

work page doi:10.1145/3576915.3623130 2023
[27]

Droidbot: a lightweight ui-guided test input generator for android,

Y . Li, Z. Yang, Y . Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 2017, pp. 23–26

2017
[28]

“Frida,” https://frida.re/, accessed on May 2025

2025
[29]

“Soot,” https://soot-oss.github.io/soot/, accessed on December 2025

2025
[30]

Evading android runtime analysis via sandbox detection,

T. Vidas and N. Christin, “Evading android runtime analysis via sandbox detection,” inProceedings of the 9th ACM Symposium on Informa- tion, Computer and Communications Security, ser. ASIA CCS ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 447–458

2014
[31]

Robust android malware detection competition,

“Robust android malware detection competition,” https: //ramd-competition.github.io/, accessed on May 2025

2025
[32]

Avclass,

“Avclass,” https://github.com/malicialab/avclass, accessed on May 2025

2025
[33]

Android malware detectors,

“Android malware detectors,” https://github.com/pralab/ android-detectors, accessed on May 2025

2025
[34]

Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware,

X. Zhang, Y . Zhang, M. Zhong, D. Ding, Y . Cao, Y . Zhang, M. Zhang, and M. Yang, “Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware,” inProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 757–770

2020
[35]

Androguard,

Androguard Project, “Androguard,” version 4.1.3. Accessed: May 2025. [Online]. Available: https://github.com/androguard/androguard

2025
[36]

On demystifying the android application framework: Re-visiting android permission specification analysis,

M. Backes, S. Bugiel, E. Derr, P. McDaniel, D. Octeau, and S. Weisger- ber, “On demystifying the android application framework: Re-visiting android permission specification analysis,” 2016, Conference paper, p. 1101 – 1116. [Online]. Available: https://www.usenix.org/system/files/ conference/usenixsecurity16/sec16 paper backes-android.pdf

2016
[37]

Obfuscapk: An open-source black-box obfuscation tool for android apps,

S. Aonzo, G. C. Georgiu, L. Verderame, and A. Merlo, “Obfuscapk: An open-source black-box obfuscation tool for android apps,”SoftwareX, vol. 11, p. 100403, 2020

2020
[38]

Dynamic security analysis on android: A systematic literature review,

T. Sutter, T. Kehrer, M. Rennhard, B. Tellenbach, and J. Klein, “Dynamic security analysis on android: A systematic literature review,”IEEE Access, vol. 12, pp. 57 261–57 287, 2024

2024
[39]

Stateful detection of black- box adversarial attacks,

S. Chen, N. Carlini, and D. Wagner, “Stateful detection of black- box adversarial attacks,” inProceedings of the 1st ACM Workshop on 14 Security and Privacy on Artificial Intelligence, ser. SPAI ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 30–39

2020
[40]

Evading Black-box Clas- sifiers Without Breaking Eggs ,

E. Debenedetti, N. Carlini, and F. Tramer, “ Evading Black-box Clas- sifiers Without Breaking Eggs ,” in2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2024, pp. 408–424

2024
[41]

Functionality-preserving black-box optimization of adversarial win- dows malware,

L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Functionality-preserving black-box optimization of adversarial win- dows malware,”IEEE Transactions on Information Forensics and Secu- rity, vol. 16, pp. 3469–3478, 2021

2021
[42]

Motivating the rules of the game for adversarial example research,

J. Gilmer, R. P. Adams, I. J. Goodfellow, D. Andersen, and G. E. Dahl, “Motivating the rules of the game for adversarial example research,” CoRR, vol. abs/1807.06732, 2018

Pith/arXiv arXiv 2018
[43]

The dark side of native code on android,

A. Ruggia, A. Possemato, S. Dambra, A. Merlo, S. Aonzo, and D. Balzarotti, “The dark side of native code on android,”ACM Trans. Priv. Secur., vol. 28, no. 2, Feb. 2025

2025
[44]

“Jadx,” https://github.com/skylot/jadx, accessed on May 2025

2025
[45]

Apktool,

“Apktool,” https://apktool.org/, accessed on May 2025

2025
[46]

Smali code,

“Smali code,” https://sallam.gitbook.io/sec-88/android-appsec/smali/ smali-cheat-sheet, accessed on July 2025

2025
[47]

Apksigner,

Android Developers, “Apksigner,” Android developer documentation, accessed: May 2025. [Online]. Available: https://developer.android.com/ tools/apksigner?hl=en APPENDIXA PROBLEM-SPACEATTACKSDETAILS A. Requirements for Problem-space Attacks Pierazzi et al. [8] formalized four key requirements for problem-space attacks: (i)practical manipulations, i.e., the...

2025

[1] [1]

Unmasking the veiled: A comprehensive analysis of android evasive malware,

A. Ruggia, D. Nisi, S. Dambra, A. Merlo, D. Balzarotti, and S. Aonzo, “Unmasking the veiled: A comprehensive analysis of android evasive malware,” inProceedings of the 19th ACM Asia Conference on Com- puter and Communications Security, 2024, pp. 383–398

2024

[2] [2]

Drebin: Effective and explainable detection of android malware in your pocket

D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket.” inNdss, vol. 14, 2014, pp. 23–26

2014

[3] [3]

Androzoo: Collecting millions of android apps for the research community,

K. Allix, T. F. Bissyand ´e, J. Klein, and Y . Le Traon, “Androzoo: Collecting millions of android apps for the research community,” in Proceedings of the 13th international conference on mining software repositories, 2016, pp. 468–471

2016

[4] [4]

Yes, machine learning can be more secure! a case study on android malware detection,

A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli, “Yes, machine learning can be more secure! a case study on android malware detection,”IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 4, pp. 711–724, 2017

2017

[5] [5]

Mamadroid: Detecting android malware by building markov chains of behavioral models,

E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, and G. Stringhini, “Mamadroid: Detecting android malware by building markov chains of behavioral models,” 2017

2017

[6] [6]

TESSERACT: Eliminating experimental bias in malware classification across space and time,

F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, “TESSERACT: Eliminating experimental bias in malware classification across space and time,” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 729–746. [Online]. Available: https://www.usenix.org/system/files/ sec19-pendlebury.pdf

2019

[7] [7]

Adversarial examples for malware detection,

K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial examples for malware detection,” inComputer Security– ESORICS 2017: 22nd European Symposium on Research in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II

2017

[8] [8]

Springer, 2017, pp. 62–79

2017

[9] [9]

Intriguing properties of adversarial ml attacks in the problem space,

F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro, “Intriguing properties of adversarial ml attacks in the problem space,” in2020 IEEE symposium on security and privacy (SP). IEEE, 2020, pp. 1332–1349

2020

[10] [10]

Adversarial deep ensemble: Evasion attacks and defenses for malware detection,

D. Li and Q. Li, “Adversarial deep ensemble: Evasion attacks and defenses for malware detection,”IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3886–3900, 2020

2020

[11] [11]

Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps,

W. Yang, D. Kong, T. Xie, and C. A. Gunter, “Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps,” inProceedings of the 33rd Annual Computer Security Applications Conference, 2017, pp. 288–302

2017

[12] [12]

Gendroid: A query-efficient black-box android adversarial attack framework,

G. Xu, H. Shao, J. Cui, H. Bai, J. Li, G. Bai, S. Liu, W. Meng, and X. Zheng, “Gendroid: A query-efficient black-box android adversarial attack framework,”Computers & Security, vol. 132, p. 103359, 2023

2023

[13] [13]

Efficient query-based attack against ml-based android malware detection under zero knowledge setting,

P. He, Y . Xia, X. Zhang, and S. Ji, “Efficient query-based attack against ml-based android malware detection under zero knowledge setting,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 90–104

2023

[14] [14]

Evadedroid: A practical evasion attack on machine learning for black-box android malware detection,

H. Bostani and V . Moonsamy, “Evadedroid: A practical evasion attack on machine learning for black-box android malware detection,”Computers & Security, vol. 139, p. 103676, 2024

2024

[15] [15]

Automated software transplantation,

E. T. Barr, M. Harman, Y . Jia, A. Marginean, and J. Petke, “Automated software transplantation,” inProceedings of the 2015 International Symposium on Software Testing and Analysis, 2015, pp. 257–269

2015

[16] [16]

Eagle: Evasion attacks guided by local explanations against android malware classification,

Z. Shu and G. Yan, “Eagle: Evasion attacks guided by local explanations against android malware classification,”IEEE Transactions on Depend- able and Secure Computing, vol. 21, no. 4, pp. 3165–3182, 2024

2024

[17] [17]

Android HIV: A study of repackaging malware for evading machine-learning detection,

X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y . Xiang, and K. Ren, “Android HIV: A study of repackaging malware for evading machine-learning detection,”IEEE Transactions on Information Forensics and Security, vol. 15, pp. 987–1001, 2019

2019

[18] [18]

Black-box adversarial example attack towards fcg based android malware detection under incomplete feature information,

H. Li, Z. Cheng, B. Wu, L. Yuan, C. Gao, W. Yuan, and X. Luo, “Black-box adversarial example attack towards fcg based android malware detection under incomplete feature information,” in Proceedings of the 32nd USENIX Conference on Security Symposium, ser. SEC ’23. USA: USENIX Association, 2023. [Online]. Available: https://www.usenix.org/system/files/sec2...

2023

[19] [19]

Virustotal,

“Virustotal,” https://www.virustotal.com/, accessed on May 2025

2025

[20] [20]

Structural attack against graph based android malware detection,

K. Zhao, H. Zhou, Y . Zhu, X. Zhan, K. Zhou, J. Li, L. Yu, W. Yuan, and X. Luo, “Structural attack against graph based android malware detection,” inProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 3218–3235

2021

[21] [21]

On evaluating adversarial robustness,

N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin, “On evaluating adversarial robustness,”arXiv preprint arXiv:1902.06705, 2019

Pith/arXiv arXiv 1902

[22] [22]

Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,

S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y . Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,”SIGPLAN Not., vol. 49, no. 6, p. 259–269, Jun. 2014. [Online]. Available: https://doi.org/10.1145/2666356.2594299

work page doi:10.1145/2666356.2594299 2014

[23] [23]

Cuckoodroid,

“Cuckoodroid,” https://github.com/idanr1986/cuckoo-droid, accessed on August 2025

2025

[24] [24]

Android runtime and dalvik,

Google, “Android runtime and dalvik,” https://source.android.com/docs/ core/runtime?hl=en, accessed on December 2025

2025

[25] [25]

“Monkey,” https://developer.android.com/studio/test/other-testing-tools/ monkey, accessed on May 2025

2025

[26] [26]

Get in Researchers; We’re Measuring Reproducibility

D. Olszewski, A. Lu, C. Stillman, K. Warren, C. Kitroser, A. Pascual, D. Ukirde, K. Butler, and P. Traynor, “”get in researchers; we’re measuring reproducibility”: A reproducibility study of machine learning papers in tier 1 security conferences,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. New Yo...

work page doi:10.1145/3576915.3623130 2023

[27] [27]

Droidbot: a lightweight ui-guided test input generator for android,

Y . Li, Z. Yang, Y . Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 2017, pp. 23–26

2017

[28] [28]

“Frida,” https://frida.re/, accessed on May 2025

2025

[29] [29]

“Soot,” https://soot-oss.github.io/soot/, accessed on December 2025

2025

[30] [30]

Evading android runtime analysis via sandbox detection,

T. Vidas and N. Christin, “Evading android runtime analysis via sandbox detection,” inProceedings of the 9th ACM Symposium on Informa- tion, Computer and Communications Security, ser. ASIA CCS ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 447–458

2014

[31] [31]

Robust android malware detection competition,

“Robust android malware detection competition,” https: //ramd-competition.github.io/, accessed on May 2025

2025

[32] [32]

Avclass,

“Avclass,” https://github.com/malicialab/avclass, accessed on May 2025

2025

[33] [33]

Android malware detectors,

“Android malware detectors,” https://github.com/pralab/ android-detectors, accessed on May 2025

2025

[34] [34]

Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware,

X. Zhang, Y . Zhang, M. Zhong, D. Ding, Y . Cao, Y . Zhang, M. Zhang, and M. Yang, “Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware,” inProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 757–770

2020

[35] [35]

Androguard,

Androguard Project, “Androguard,” version 4.1.3. Accessed: May 2025. [Online]. Available: https://github.com/androguard/androguard

2025

[36] [36]

On demystifying the android application framework: Re-visiting android permission specification analysis,

M. Backes, S. Bugiel, E. Derr, P. McDaniel, D. Octeau, and S. Weisger- ber, “On demystifying the android application framework: Re-visiting android permission specification analysis,” 2016, Conference paper, p. 1101 – 1116. [Online]. Available: https://www.usenix.org/system/files/ conference/usenixsecurity16/sec16 paper backes-android.pdf

2016

[37] [37]

Obfuscapk: An open-source black-box obfuscation tool for android apps,

S. Aonzo, G. C. Georgiu, L. Verderame, and A. Merlo, “Obfuscapk: An open-source black-box obfuscation tool for android apps,”SoftwareX, vol. 11, p. 100403, 2020

2020

[38] [38]

Dynamic security analysis on android: A systematic literature review,

T. Sutter, T. Kehrer, M. Rennhard, B. Tellenbach, and J. Klein, “Dynamic security analysis on android: A systematic literature review,”IEEE Access, vol. 12, pp. 57 261–57 287, 2024

2024

[39] [39]

Stateful detection of black- box adversarial attacks,

S. Chen, N. Carlini, and D. Wagner, “Stateful detection of black- box adversarial attacks,” inProceedings of the 1st ACM Workshop on 14 Security and Privacy on Artificial Intelligence, ser. SPAI ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 30–39

2020

[40] [40]

Evading Black-box Clas- sifiers Without Breaking Eggs ,

E. Debenedetti, N. Carlini, and F. Tramer, “ Evading Black-box Clas- sifiers Without Breaking Eggs ,” in2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2024, pp. 408–424

2024

[41] [41]

Functionality-preserving black-box optimization of adversarial win- dows malware,

L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Functionality-preserving black-box optimization of adversarial win- dows malware,”IEEE Transactions on Information Forensics and Secu- rity, vol. 16, pp. 3469–3478, 2021

2021

[42] [42]

Motivating the rules of the game for adversarial example research,

J. Gilmer, R. P. Adams, I. J. Goodfellow, D. Andersen, and G. E. Dahl, “Motivating the rules of the game for adversarial example research,” CoRR, vol. abs/1807.06732, 2018

Pith/arXiv arXiv 2018

[43] [43]

The dark side of native code on android,

A. Ruggia, A. Possemato, S. Dambra, A. Merlo, S. Aonzo, and D. Balzarotti, “The dark side of native code on android,”ACM Trans. Priv. Secur., vol. 28, no. 2, Feb. 2025

2025

[44] [44]

“Jadx,” https://github.com/skylot/jadx, accessed on May 2025

2025

[45] [45]

Apktool,

“Apktool,” https://apktool.org/, accessed on May 2025

2025

[46] [46]

Smali code,

“Smali code,” https://sallam.gitbook.io/sec-88/android-appsec/smali/ smali-cheat-sheet, accessed on July 2025

2025

[47] [47]

Apksigner,

Android Developers, “Apksigner,” Android developer documentation, accessed: May 2025. [Online]. Available: https://developer.android.com/ tools/apksigner?hl=en APPENDIXA PROBLEM-SPACEATTACKSDETAILS A. Requirements for Problem-space Attacks Pierazzi et al. [8] formalized four key requirements for problem-space attacks: (i)practical manipulations, i.e., the...

2025