DroidBreaker: Practical and Functional Problem-Space Attacks on Machine-Learning Android Malware Detectors
Pith reviewed 2026-06-26 04:36 UTC · model grok-4.3
The pith
Targeted, build-safe changes to Android apps can evade machine-learning malware detectors while preserving original behavior through execution-trace matching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DroidBreaker demonstrates that a problem-space attack can remain both practical and functional by restricting edits to model-influential components, using only build-safe manipulations, and validating semantic equivalence via direct comparison of runtime logs and API traces, thereby achieving high evasion in white- and black-box settings while reducing detections by commercial scanners.
What carries the argument
The semantics-preserving functionality test that enforces runtime equivalence by comparing execution logs and API-level traces between original and modified APKs, paired with fine-grained build-safe manipulations of the most model-influential components.
If this is right
- Attacks succeed with few model queries in both white-box and black-box access models.
- Modified APKs produce minimal side-effect features and remain buildable.
- Detection counts drop substantially when the same APKs are submitted to commercial scanners on VirusTotal.
- The same manipulation set works across recent corpora of Android applications.
Where Pith is reading between the lines
- Defenses that ignore runtime behavior may need to add behavioral equivalence checks or broader feature monitoring.
- The same component-selection and trace-matching approach could be adapted to test robustness of detectors on other mobile platforms.
- Automated pipelines that generate functional adversarial samples become feasible once the build and semantics constraints are solved.
Load-bearing premise
That matching execution logs and API traces between the original and modified APK is sufficient to guarantee the app still performs its intended behavior in practice.
What would settle it
A test set of modified APKs that pass the log-and-trace equivalence check yet fail to carry out the same user-facing or malicious actions when installed and run on real devices.
Figures
read the original abstract
Adversarial APKs are Android applications modified in the problem space to evade machine-learning malware detectors. In this work, we first show that, despite claims, existing problem-space attacks remain largely impractical. Most techniques leverage software transplantation to inject entire benign modules, introducing many side-effect features and often causing build-time failures. Fine-grained methods that inject only a narrow subset of components exhibit limited effectiveness, while those that also use obfuscation rely on brittle bytecode rewriting, producing APKs that are syntactically valid but semantically unusable. Prior work further overestimates attack success rates by running smoke tests that only validate installation and basic execution, without assessing whether the modified APK still preserves its intended behavior. To overcome these limitations, we present DROIDBREAKER, a practical (build-safe) and functional (semantics-preserving) problem-space attack framework that provides: (i) query-efficient white- and black-box attacks by manipulating only the APK components most influential to the target model; (ii) a set of fine-grained, build-safe manipulations (including injection and obfuscation of API calls, app modules, permissions, and URLs) with minimal side effects; and (iii) a semantics-preserving functionality test that enforces runtime equivalence by comparing execution logs and API-level traces between the initial and the modified APK. Evaluated on a recent corpus of Android applications, DROIDBREAKER achieves high evasion rates with few queries and minimal side effects in both white-box and black-box settings, and drastically reduces detections by commercial malware scanners hosted on VirusTotal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DroidBreaker, a problem-space attack framework targeting machine-learning Android malware detectors. It critiques prior work for relying on software transplantation (causing side effects and build failures), limited fine-grained methods, brittle obfuscation, and overestimation via smoke tests that ignore semantic preservation. DroidBreaker claims to overcome these via (i) query-efficient white- and black-box attacks manipulating only model-influential APK components, (ii) fine-grained build-safe manipulations (API calls, modules, permissions, URLs) with minimal side effects, and (iii) a semantics-preserving functionality test enforcing runtime equivalence via execution logs and API traces. On a recent Android corpus, it reports high evasion rates with few queries, minimal side effects, and drastic reductions in VirusTotal detections by commercial scanners.
Significance. If the empirical claims of high evasion under preserved functionality are quantitatively validated, the work would be significant for adversarial ML in security. It supplies a concrete engineering framework with explicit manipulation primitives and a testable semantics criterion, directly addressing practicality gaps that prior problem-space attacks have left open. This could inform both attack realism assessments and the design of detectors that account for build-safe, trace-preserving modifications.
major comments (1)
- [Abstract] Abstract: the central claims that DROIDBREAKER 'achieves high evasion rates with few queries and minimal side effects' and 'drastically reduces detections by commercial malware scanners' are stated without any reported metrics (evasion percentages, query counts, side-effect deltas such as Δ API calls or permission changes, functionality-test pass rates, or trace-similarity scores). The semantics-preserving test is described but supplies no quantitative validation, leaving the load-bearing premise of usable, behavior-preserving APKs unverified.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for quantitative support in the abstract. We address this point directly below and agree that revisions are warranted.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims that DROIDBREAKER 'achieves high evasion rates with few queries and minimal side effects' and 'drastically reduces detections by commercial malware scanners' are stated without any reported metrics (evasion percentages, query counts, side-effect deltas such as Δ API calls or permission changes, functionality-test pass rates, or trace-similarity scores). The semantics-preserving test is described but supplies no quantitative validation, leaving the load-bearing premise of usable, behavior-preserving APKs unverified.
Authors: We agree that the abstract would be strengthened by including specific metrics to substantiate the claims. The evaluation section of the manuscript reports these results (evasion rates, query efficiency, side-effect measurements, and functionality-test outcomes including pass rates and trace similarity), but the abstract itself does not. We will revise the abstract to incorporate representative quantitative findings from the experiments, such as evasion percentages, average query counts, deltas in manipulated features, and the pass rate of the runtime equivalence test. This change directly addresses the concern without requiring new experiments. revision: yes
Circularity Check
No circularity: empirical engineering framework with no derivations or self-referential predictions
full rationale
The paper presents DROIDBREAKER as an empirical attack framework evaluated on Android APKs, with claims resting on experimental evasion rates and VirusTotal results rather than any mathematical derivation chain. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or described structure. The functionality test and manipulations are presented as engineering contributions validated by evaluation, not reduced to inputs by construction. This is a standard non-finding for an applied security paper whose central claims are falsifiable via external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Unmasking the veiled: A comprehensive analysis of android evasive malware,
A. Ruggia, D. Nisi, S. Dambra, A. Merlo, D. Balzarotti, and S. Aonzo, “Unmasking the veiled: A comprehensive analysis of android evasive malware,” inProceedings of the 19th ACM Asia Conference on Com- puter and Communications Security, 2024, pp. 383–398
2024
-
[2]
Drebin: Effective and explainable detection of android malware in your pocket
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket.” inNdss, vol. 14, 2014, pp. 23–26
2014
-
[3]
Androzoo: Collecting millions of android apps for the research community,
K. Allix, T. F. Bissyand ´e, J. Klein, and Y . Le Traon, “Androzoo: Collecting millions of android apps for the research community,” in Proceedings of the 13th international conference on mining software repositories, 2016, pp. 468–471
2016
-
[4]
Yes, machine learning can be more secure! a case study on android malware detection,
A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli, “Yes, machine learning can be more secure! a case study on android malware detection,”IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 4, pp. 711–724, 2017
2017
-
[5]
Mamadroid: Detecting android malware by building markov chains of behavioral models,
E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, and G. Stringhini, “Mamadroid: Detecting android malware by building markov chains of behavioral models,” 2017
2017
-
[6]
TESSERACT: Eliminating experimental bias in malware classification across space and time,
F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro, “TESSERACT: Eliminating experimental bias in malware classification across space and time,” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 729–746. [Online]. Available: https://www.usenix.org/system/files/ sec19-pendlebury.pdf
2019
-
[7]
Adversarial examples for malware detection,
K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial examples for malware detection,” inComputer Security– ESORICS 2017: 22nd European Symposium on Research in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II
2017
-
[8]
Springer, 2017, pp. 62–79
2017
-
[9]
Intriguing properties of adversarial ml attacks in the problem space,
F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro, “Intriguing properties of adversarial ml attacks in the problem space,” in2020 IEEE symposium on security and privacy (SP). IEEE, 2020, pp. 1332–1349
2020
-
[10]
Adversarial deep ensemble: Evasion attacks and defenses for malware detection,
D. Li and Q. Li, “Adversarial deep ensemble: Evasion attacks and defenses for malware detection,”IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3886–3900, 2020
2020
-
[11]
Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps,
W. Yang, D. Kong, T. Xie, and C. A. Gunter, “Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps,” inProceedings of the 33rd Annual Computer Security Applications Conference, 2017, pp. 288–302
2017
-
[12]
Gendroid: A query-efficient black-box android adversarial attack framework,
G. Xu, H. Shao, J. Cui, H. Bai, J. Li, G. Bai, S. Liu, W. Meng, and X. Zheng, “Gendroid: A query-efficient black-box android adversarial attack framework,”Computers & Security, vol. 132, p. 103359, 2023
2023
-
[13]
Efficient query-based attack against ml-based android malware detection under zero knowledge setting,
P. He, Y . Xia, X. Zhang, and S. Ji, “Efficient query-based attack against ml-based android malware detection under zero knowledge setting,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 90–104
2023
-
[14]
Evadedroid: A practical evasion attack on machine learning for black-box android malware detection,
H. Bostani and V . Moonsamy, “Evadedroid: A practical evasion attack on machine learning for black-box android malware detection,”Computers & Security, vol. 139, p. 103676, 2024
2024
-
[15]
Automated software transplantation,
E. T. Barr, M. Harman, Y . Jia, A. Marginean, and J. Petke, “Automated software transplantation,” inProceedings of the 2015 International Symposium on Software Testing and Analysis, 2015, pp. 257–269
2015
-
[16]
Eagle: Evasion attacks guided by local explanations against android malware classification,
Z. Shu and G. Yan, “Eagle: Evasion attacks guided by local explanations against android malware classification,”IEEE Transactions on Depend- able and Secure Computing, vol. 21, no. 4, pp. 3165–3182, 2024
2024
-
[17]
Android HIV: A study of repackaging malware for evading machine-learning detection,
X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y . Xiang, and K. Ren, “Android HIV: A study of repackaging malware for evading machine-learning detection,”IEEE Transactions on Information Forensics and Security, vol. 15, pp. 987–1001, 2019
2019
-
[18]
Black-box adversarial example attack towards fcg based android malware detection under incomplete feature information,
H. Li, Z. Cheng, B. Wu, L. Yuan, C. Gao, W. Yuan, and X. Luo, “Black-box adversarial example attack towards fcg based android malware detection under incomplete feature information,” in Proceedings of the 32nd USENIX Conference on Security Symposium, ser. SEC ’23. USA: USENIX Association, 2023. [Online]. Available: https://www.usenix.org/system/files/sec2...
2023
-
[19]
Virustotal,
“Virustotal,” https://www.virustotal.com/, accessed on May 2025
2025
-
[20]
Structural attack against graph based android malware detection,
K. Zhao, H. Zhou, Y . Zhu, X. Zhan, K. Zhou, J. Li, L. Yu, W. Yuan, and X. Luo, “Structural attack against graph based android malware detection,” inProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 3218–3235
2021
-
[21]
On evaluating adversarial robustness,
N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin, “On evaluating adversarial robustness,”arXiv preprint arXiv:1902.06705, 2019
Pith/arXiv arXiv 1902
-
[22]
S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y . Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,”SIGPLAN Not., vol. 49, no. 6, p. 259–269, Jun. 2014. [Online]. Available: https://doi.org/10.1145/2666356.2594299
-
[23]
Cuckoodroid,
“Cuckoodroid,” https://github.com/idanr1986/cuckoo-droid, accessed on August 2025
2025
-
[24]
Android runtime and dalvik,
Google, “Android runtime and dalvik,” https://source.android.com/docs/ core/runtime?hl=en, accessed on December 2025
2025
-
[25]
“Monkey,” https://developer.android.com/studio/test/other-testing-tools/ monkey, accessed on May 2025
2025
-
[26]
Get in Researchers; We’re Measuring Reproducibility
D. Olszewski, A. Lu, C. Stillman, K. Warren, C. Kitroser, A. Pascual, D. Ukirde, K. Butler, and P. Traynor, “”get in researchers; we’re measuring reproducibility”: A reproducibility study of machine learning papers in tier 1 security conferences,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. New Yo...
-
[27]
Droidbot: a lightweight ui-guided test input generator for android,
Y . Li, Z. Yang, Y . Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 2017, pp. 23–26
2017
-
[28]
“Frida,” https://frida.re/, accessed on May 2025
2025
-
[29]
“Soot,” https://soot-oss.github.io/soot/, accessed on December 2025
2025
-
[30]
Evading android runtime analysis via sandbox detection,
T. Vidas and N. Christin, “Evading android runtime analysis via sandbox detection,” inProceedings of the 9th ACM Symposium on Informa- tion, Computer and Communications Security, ser. ASIA CCS ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 447–458
2014
-
[31]
Robust android malware detection competition,
“Robust android malware detection competition,” https: //ramd-competition.github.io/, accessed on May 2025
2025
-
[32]
Avclass,
“Avclass,” https://github.com/malicialab/avclass, accessed on May 2025
2025
-
[33]
Android malware detectors,
“Android malware detectors,” https://github.com/pralab/ android-detectors, accessed on May 2025
2025
-
[34]
Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware,
X. Zhang, Y . Zhang, M. Zhong, D. Ding, Y . Cao, Y . Zhang, M. Zhang, and M. Yang, “Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware,” inProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 757–770
2020
-
[35]
Androguard,
Androguard Project, “Androguard,” version 4.1.3. Accessed: May 2025. [Online]. Available: https://github.com/androguard/androguard
2025
-
[36]
On demystifying the android application framework: Re-visiting android permission specification analysis,
M. Backes, S. Bugiel, E. Derr, P. McDaniel, D. Octeau, and S. Weisger- ber, “On demystifying the android application framework: Re-visiting android permission specification analysis,” 2016, Conference paper, p. 1101 – 1116. [Online]. Available: https://www.usenix.org/system/files/ conference/usenixsecurity16/sec16 paper backes-android.pdf
2016
-
[37]
Obfuscapk: An open-source black-box obfuscation tool for android apps,
S. Aonzo, G. C. Georgiu, L. Verderame, and A. Merlo, “Obfuscapk: An open-source black-box obfuscation tool for android apps,”SoftwareX, vol. 11, p. 100403, 2020
2020
-
[38]
Dynamic security analysis on android: A systematic literature review,
T. Sutter, T. Kehrer, M. Rennhard, B. Tellenbach, and J. Klein, “Dynamic security analysis on android: A systematic literature review,”IEEE Access, vol. 12, pp. 57 261–57 287, 2024
2024
-
[39]
Stateful detection of black- box adversarial attacks,
S. Chen, N. Carlini, and D. Wagner, “Stateful detection of black- box adversarial attacks,” inProceedings of the 1st ACM Workshop on 14 Security and Privacy on Artificial Intelligence, ser. SPAI ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 30–39
2020
-
[40]
Evading Black-box Clas- sifiers Without Breaking Eggs ,
E. Debenedetti, N. Carlini, and F. Tramer, “ Evading Black-box Clas- sifiers Without Breaking Eggs ,” in2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2024, pp. 408–424
2024
-
[41]
Functionality-preserving black-box optimization of adversarial win- dows malware,
L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Functionality-preserving black-box optimization of adversarial win- dows malware,”IEEE Transactions on Information Forensics and Secu- rity, vol. 16, pp. 3469–3478, 2021
2021
-
[42]
Motivating the rules of the game for adversarial example research,
J. Gilmer, R. P. Adams, I. J. Goodfellow, D. Andersen, and G. E. Dahl, “Motivating the rules of the game for adversarial example research,” CoRR, vol. abs/1807.06732, 2018
Pith/arXiv arXiv 2018
-
[43]
The dark side of native code on android,
A. Ruggia, A. Possemato, S. Dambra, A. Merlo, S. Aonzo, and D. Balzarotti, “The dark side of native code on android,”ACM Trans. Priv. Secur., vol. 28, no. 2, Feb. 2025
2025
-
[44]
“Jadx,” https://github.com/skylot/jadx, accessed on May 2025
2025
-
[45]
Apktool,
“Apktool,” https://apktool.org/, accessed on May 2025
2025
-
[46]
Smali code,
“Smali code,” https://sallam.gitbook.io/sec-88/android-appsec/smali/ smali-cheat-sheet, accessed on July 2025
2025
-
[47]
Apksigner,
Android Developers, “Apksigner,” Android developer documentation, accessed: May 2025. [Online]. Available: https://developer.android.com/ tools/apksigner?hl=en APPENDIXA PROBLEM-SPACEATTACKSDETAILS A. Requirements for Problem-space Attacks Pierazzi et al. [8] formalized four key requirements for problem-space attacks: (i)practical manipulations, i.e., the...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.