Don't Trust Us: A privacy-by-design android malware detection pipeline

Diego Soi; Emmanuele Massidda; Giorgio Giacinto

arxiv: 2606.03714 · v1 · pith:DSK3MLNZnew · submitted 2026-06-02 · 💻 cs.CR

Don't Trust Us: A privacy-by-design android malware detection pipeline

Emmanuele Massidda , Diego Soi , Giorgio Giacinto This is my paper

Pith reviewed 2026-06-28 09:31 UTC · model grok-4.3

classification 💻 cs.CR

keywords android malwareprivacy by designstatic analysisdynamic analysisSVM classifiersandboxAPKmalware detection

0 comments

The pith

Android malware detection achieves strong performance without accessing any sensitive user data through a privacy-by-design pipeline of static analysis and conditional sandboxed dynamic checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that malware detection should avoid sensitive data entirely rather than manage it after collection. It implements this by extracting features from APKs via static analysis in the Drebin style, classifying them with an SVM using dual-reject thresholds, and sending only uncertain cases to a sandbox where dynamic analysis extracts no user information. On a temporally split 2024-2025 dataset the static stage alone reaches an F1 score of 0.87 while deferring just 6.7 percent of samples. The approach demonstrates that privacy can be ensured by never collecting the data instead of anonymizing or encrypting it later. This removes the need for users to trust the system with privileged access to their devices.

Core claim

The pipeline performs static analysis on each APK to extract features, vectorizes them, and applies an SVM equipped with dual-reject thresholds that either makes a confident classification or defers the sample to sandboxed dynamic analysis. The dynamic stage operates without extracting sensitive data or device identifiers. On the test set this yields an F1 score of 0.87 with only 6.7% of samples deferred, confirming that effective detection does not require access to user information.

What carries the argument

The dual-reject threshold rule applied to the SVM classifier after static feature extraction from APKs, which routes uncertain samples to a sandboxed dynamic analysis stage that collects no genuine user data.

If this is right

Over 93% of applications can be classified as malicious or benign using only static features extracted from the APK file itself.
Strong detection performance remains possible even when no device identifiers, network artifacts, or runtime traces are collected.
Sandboxed dynamic analysis can still provide high-confidence maliciousness recognition without compromising user privacy.
The requirement for user trust in data handling is eliminated because no sensitive data enters the pipeline at any stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar pipelines could be adapted for other mobile operating systems that support static APK-like analysis.
Improving the static classifier to reduce the deferral rate below 6.7% would further minimize any dynamic analysis overhead.
The temporal dataset split indicates the method may handle evolving malware patterns better than random splits.

Load-bearing premise

The sandboxed dynamic analysis can be performed without ever extracting or requiring genuine user information or device identifiers.

What would settle it

Observing that the dynamic sandbox stage requires access to device identifiers or user data to reach the reported high-confidence detection would show the privacy claim does not hold.

read the original abstract

Android malware detection increasingly relies on collecting and processing sensitive user data, including device identifiers, network artifacts, and runtime traces, while privacy is too often treated as a secondary concern. Existing privacy-aware approaches typically enforce privacy after data collection, for example, through anonymization, encryption, or federated learning, yet still require access to user information and therefore demand a high level of user trust in systems that already operate with privileged access to device activity. We argue that this requirement should be removed rather than managed. Android malware detection should be privacy-aware by design, so that effective analysis does not depend on sensitive data being accessed in the first place. To this end, we first formalize a set of design requirements for privacy-by-design detection and then implement each requirement in a comprehensive pipeline. First, static analysis is performed to extract relevant data from each APK, following the Drebin representation, which is then submitted to an SVM after vectorization. The model is equipped with a dual-reject threshold rule that either commits to a confident decision or defers uncertain samples to a dynamic analysis stage within a sandboxed environment, so that genuine user information never enters the analysis loop. Results confirm that, on a temporally split dataset spanning from 2024 to 2025, the pipeline achieves an F1 score of 0.87 with the first static analysis stage, deferring only 6.7% of test samples to secondary dynamic analysis. Additionally, dynamic sandboxing helps recognize applications' maliciousness with high confidence without extracting any sensitive data. These results demonstrate that strong detection performance is achievable without sacrificing user privacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable static-first pipeline with low deferral to sandboxed dynamic checks, but the no-sensitive-data claim for the dynamic stage is asserted without enough supporting detail.

read the letter

The core contribution is a pipeline that runs Drebin-style static features through an SVM with dual reject thresholds, then routes only the uncertain 6.7% of samples to a sandboxed dynamic stage that is required to extract nothing sensitive. On a 2024-2025 temporal split they report F1 0.87 at the static stage. That combination of numbers and the explicit privacy-by-design requirements is the part worth looking at.

The formalization of the design rules and the decision to keep genuine user data out of the loop for the bulk of samples is a clean way to frame the problem. The low deferral rate matters in practice because it limits how often the more expensive or privacy-risky step is invoked.

The main gap is that the abstract states the dynamic stage extracts no sensitive data or identifiers but gives no list of the runtime features collected, no description of the sandbox instrumentation, and no argument showing why those features cannot embed device IDs or user paths. If the full paper supplies that enumeration and a short check that the chosen signals stay safe, the privacy guarantee becomes verifiable; right now it is an assertion. Dataset size, baseline comparisons, and error bars are also missing from the summary, which makes it hard to judge how much the 0.87 F1 actually moves the needle.

The work is aimed at researchers who build Android detectors under strict data-minimization constraints. It is coherent on its own terms and engages the literature honestly, so it is worth sending to referees who can check the dynamic-feature details and the experimental setup. I would not cite it yet without seeing those sections filled in.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a privacy-by-design Android malware detection pipeline that performs static analysis on APKs using the Drebin feature representation, vectorizes the features, and classifies them with an SVM equipped with dual-reject thresholds. Samples falling between the thresholds (reported at 6.7% on the test set) are deferred to a sandboxed dynamic analysis stage that the authors claim extracts no sensitive user data or device identifiers. On a temporally split dataset spanning 2024–2025 the pipeline reports an F1 score of 0.87 while asserting that genuine user information never enters the analysis loop.

Significance. If the privacy guarantee for the dynamic stage can be substantiated, the work would be significant because it attempts to remove the need for user trust in data collection rather than managing privacy after collection. The use of a temporal split and the dual-reject mechanism are positive design choices that align with realistic deployment constraints.

major comments (1)

[Abstract] Abstract: the central claim that the sandboxed dynamic analysis stage extracts no sensitive data or device identifiers is asserted without any enumeration of the runtime features collected, description of sandbox instrumentation, or argument showing why the chosen features cannot embed user information (e.g., network flows, file paths, or process lists). This assertion is load-bearing for the privacy-by-design guarantee.

minor comments (2)

[Abstract] Abstract: the reported F1 score of 0.87 and 6.7% deferral rate are given without dataset size, baseline comparisons, error bars, or implementation details, making it impossible to assess whether the numbers support the performance claim.
The manuscript should provide the exact definition and selection procedure for the dual-reject thresholds, including any hyper-parameter values or cross-validation used.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comment on substantiating the privacy claims for the dynamic analysis stage is well-taken and directly addresses a load-bearing aspect of the privacy-by-design argument. We respond point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the sandboxed dynamic analysis stage extracts no sensitive data or device identifiers is asserted without any enumeration of the runtime features collected, description of sandbox instrumentation, or argument showing why the chosen features cannot embed user information (e.g., network flows, file paths, or process lists). This assertion is load-bearing for the privacy-by-design guarantee.

Authors: We agree that the abstract (and supporting sections) must provide explicit support for this claim rather than asserting it. In the revised manuscript we will: (1) enumerate the exact runtime features collected inside the sandbox (restricted to generic system-call sequences, memory-access patterns, and CPU usage signatures that contain no user identifiers, file paths tied to personal data, or network payloads); (2) describe the sandbox instrumentation (an isolated, non-rooted Android emulator with no access to device accounts, contacts, or external storage containing user files); and (3) add a short argument explaining why these features cannot embed user information (they are deliberately filtered at collection time to exclude any artifact that could be linked to a specific user or device). These additions will appear in both the abstract and the methods section describing the dynamic stage. We view this as a necessary clarification rather than a change in the underlying design. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline evaluated on held-out temporal split

full rationale

The manuscript presents a privacy-by-design pipeline consisting of static APK analysis (Drebin features), SVM classification with dual-reject thresholds, and optional sandboxed dynamic analysis. Reported performance (F1 0.87, 6.7% deferral) is measured directly on a temporally split 2024-2025 dataset rather than derived from any fitted parameter or self-referential equation. No equations, derivations, or load-bearing self-citations appear. The privacy assertion (sandbox extracts no sensitive data) is a design claim, not a mathematical reduction to inputs. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full methods, assumptions, and any fitted thresholds are not visible.

free parameters (1)

dual-reject thresholds
The two decision thresholds that determine when to commit versus defer are almost certainly chosen or tuned on data.

axioms (1)

domain assumption Drebin static features extracted from APKs are sufficient to produce high-confidence decisions for the majority of samples
The pipeline's first stage and low deferral rate rest on this representation working well without runtime or user data.

pith-pipeline@v0.9.1-grok · 5824 in / 1333 out tokens · 25717 ms · 2026-06-28T09:31:21.749368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 30 canonical work pages

[1]

Android developers documentation on accessibility ser- vices.https://developer.android.com/guide/ topics/ui/accessibility/service, accessed online on March 2026

2026
[2]

https://support.avast.com/en-us/article/ mobile-security-permissions/#mac, accessed online on March 2026

Android permissions required by avast mobile security. https://support.avast.com/en-us/article/ mobile-security-permissions/#mac, accessed online on March 2026

2026
[3]

ARES ’24

Aldini, A., Petrelli, T.: Image-based detection and clas- sification of android malware through cnn models. In: Proceedings of the 19th International Conference on Availability, Reliability and Security. ARES ’24, Asso- ciation for Computing Machinery, New York, NY , USA (2024). https://doi.org/10.1145/3664476.3670441

work page doi:10.1145/3664476.3670441 2024
[4]

IEEE Access12, 173168–173191 (2024)

Altaha, S.J., Aljughaiman, A., Gul, S.: A survey on android malware detection tech- niques using supervised machine learning. IEEE Access12, 173168–173191 (2024). https://doi.org/10.1109/ACCESS.2024.3485706

work page doi:10.1109/access.2024.3485706 2024
[5]

In: Proceedings of the 3rd ACM on International Workshop on Security And Privacy An- alytics

Alzaylaee, M.K., Yerima, S.Y ., Sezer, S.: Emulator vs real phone: Android malware detection using ma- chine learning. In: Proceedings of the 3rd ACM on International Workshop on Security And Privacy An- alytics. p. 65–72. IWSPA ’17, Association for Com- puting Machinery, New York, NY , USA (Mar 2017). https://doi.org/10.1145/3041008.3041010

work page doi:10.1145/3041008.3041010 2017
[6]

Computers & Security89, 101663 (Feb 2020)

Alzaylaee, M.K., Yerima, S.Y ., Sezer, S.: Dl-droid: Deep learning based android malware detection using real devices. Computers & Security89, 101663 (Feb 2020). https://doi.org/10.1016/j.cose.2019.101663

work page doi:10.1016/j.cose.2019.101663 2020
[7]

any.run: Any.run - interactive online malware sandbox, https://any.run/
[8]

In: Proceed- ings 2014 Network and Distributed System Security Symposium

Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K.: Drebin: Effective and explainable detec- tion of android malware in your pocket. In: Proceed- ings 2014 Network and Distributed System Security Symposium. Internet Society, San Diego, CA (2014). https://doi.org/10.14722/ndss.2014.23247

work page doi:10.14722/ndss.2014.23247 2014
[9]

Computers & Security130, 103277 (2023)

Bhat, P., Behal, S., Dutta, K.: A system call-based android malware detection approach with homoge- neous & heterogeneous ensemble machine learn- ing. Computers & Security130, 103277 (2023). https://doi.org/10.1016/j.cose.2023.103277

work page doi:10.1016/j.cose.2023.103277 2023
[10]

IEEE Transactions on Information Foren- sics and Security13(5), 1286–1300 (2018)

Chen, J., Wang, C., Zhao, Z., Chen, K., Du, R., Ahn, G.J.: Uncovering the face of android ran- somware: Characterization and real-time detec- tion. IEEE Transactions on Information Foren- sics and Security13(5), 1286–1300 (2018). https://doi.org/10.1109/TIFS.2017.2787905

work page doi:10.1109/tifs.2017.2787905 2018
[11]

In: Proceed- ings of the 4th IEEE Conference on Secure and Trustworthy Machine Learning

Chow, T., D’Onghia, M., Linhardt, L., Kan, Z., Arp, D., Cavallaro, L., Pierazzi, F.: Beyond the tesser- act: Trustworthy dataset curation for sound evalu- ations of android malware classifiers. In: Proceed- ings of the 4th IEEE Conference on Secure and Trustworthy Machine Learning. IEEE, Munich, Ger- many (2026),https://discovery.ucl.ac.uk/id/ eprint/1022...

arXiv 2026
[12]

Information and Software Technology189, 107892 (2026)

Ciaramella, G., Martinelli, F., Peluso, C., San- tone, A., Mercaldo, F.: A method for real-world privacy-preserving android malware detection through federated machine learning. Information and Software Technology189, 107892 (2026). https://doi.org/10.1016/j.infsof.2025.107892

work page doi:10.1016/j.infsof.2025.107892 2026
[13]

In: Proceed- ings of the 5th ACM Conference on Data and Applica- tion Security and Privacy

Conti, M., Mancini, L.V ., Spolaor, R., Verde, N.V .: Can’t you hear me knocking: Identification of user ac- tions on android apps via traffic analysis. In: Proceed- ings of the 5th ACM Conference on Data and Applica- tion Security and Privacy. p. 297–304. CODASPY ’15, Association for Computing Machinery, New York, NY , USA (2015). https://doi.org/10.1145...

work page doi:10.1145/2699026.2699119 2015
[14]

Automated Software Engineering30(2023)

Cui, Y ., Sun, Y ., Lin, Z.: Droidhook: a novel api- hook based android malware dynamic analysis sand- box. Automated Software Engineering30(2023). https://doi.org/10.1007/s10515-023-00378-w

work page doi:10.1007/s10515-023-00378-w 2023
[15]

Journal of Systems and Software183, 111092 (Jan 2022)

Da Costa, F.H., Medeiros, I., Menezes, T., Da Silva, J.V ., Da Silva, I.L., Bonifácio, R., Narasimhan, K., Ribeiro, M.: Exploring the use of static and dynamic analysis to improve the performance of the mining sandbox approach for android malware identification. Journal of Systems and Software183, 111092 (Jan 2022). https://doi.org/10.1016/j.jss.2021.111092

work page doi:10.1016/j.jss.2021.111092 2022
[16]

51(3) (May 2018)

Dijkhuizen, N.V ., Ham, J.V .D.: A survey of network traffic anonymisation techniques and implementations 12 Massidda et al. 51(3) (May 2018). https://doi.org/10.1145/3182660

work page doi:10.1145/3182660 2018
[17]

Journal of Systems Architecture125, 102452 (Apr 2022)

Faghihi, F., Zulkernine, M., Ding, S.: Camod- roid: An android application analysis environ- ment resilient against sandbox evasion. Journal of Systems Architecture125, 102452 (Apr 2022). https://doi.org/10.1016/j.sysarc.2022.102452

work page doi:10.1016/j.sysarc.2022.102452 2022
[18]

Pattern Recognition33(12), 2099–2101 (Dec 2000)

Fumera, G., Roli, F., Giacinto, G.: Reject option with multiple thresholds. Pattern Recognition33(12), 2099–2101 (Dec 2000). https://doi.org/10.1016/S0031- 3203(00)00059-5

work page doi:10.1016/s0031- 2099
[19]

In: Proceedings of the 31st In- ternational Conference on Neural Information Process- ing Systems

Geifman, Y ., El-Yaniv, R.: Selective classification for deep neural networks. In: Proceedings of the 31st In- ternational Conference on Neural Information Process- ing Systems. p. 4885–4894. NIPS’17, Curran Asso- ciates Inc., Red Hook, NY , USA (Dec 2017),10.5555/ 3295222.3295241

arXiv 2017
[20]

AI and Ethics2(3), 477–491 (Aug 2022)

Goldsteen, A., Ezov, G., Shmelkin, R., Moffie, M., Farkash, A.: Data minimization for gdpr compliance in machine learning models. AI and Ethics2(3), 477–491 (Aug 2022). https://doi.org/10.1007/s43681- 021-00095-8

work page doi:10.1007/s43681- 2022
[21]

https://blog.research.google/2017/04/federatedlearning- collaborative.html, accessed online on March 2026

Google: Federated learning: Collaborative ma- chine learning without centralized training data. https://blog.research.google/2017/04/federatedlearning- collaborative.html, accessed online on March 2026

2017
[22]

google.com/android/play-protect, accessed on- line on March 2026

Google: Google play protect.https://developers. google.com/android/play-protect, accessed on- line on March 2026

2026
[23]

In: 2020 15th Asia Joint Conference on Informa- tion Security (AsiaJCIS)

Hsu, R.H., Wang, Y .C., Fan, C.I., Sun, B., Ban, T., Takahashi, T., Wu, T.W., Kao, S.W.: A privacy- preserving federated learning system for android malware detection based on edge computing. In: 2020 15th Asia Joint Conference on Informa- tion Security (AsiaJCIS). p. 128–136 (Aug 2020). https://doi.org/10.1109/AsiaJCIS50894.2020.00031

work page doi:10.1109/asiajcis50894.2020.00031 2020
[24]

and Zhang, Xuyun , title =

Hu, H., Salcic, Z., Dobbie, G., Zhang, X.: Mem- bership inference attacks on machine learning: A survey. ACM Computing Surveys54(2021). https://doi.org/10.1145/3523273

work page doi:10.1145/3523273 2021
[25]

In: 2019 IEEE 26th Inter- national Conference on Software Analysis, Evolution and Reengineering (SANER)

Hu, Y ., Wang, H., Li, L., Guo, Y ., Xu, G., He, R.: Want to earn a few extra bucks? a first look at money-making apps. In: 2019 IEEE 26th Inter- national Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 332–343 (2019). https://doi.org/10.1109/SANER.2019.8668035

work page doi:10.1109/saner.2019.8668035 2019
[26]

Survey of intrusion detection systems: Techniques, datasets and challenges,

Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J.: Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity2(1), 20 (2019). https://doi.org/10.1186/s42400-019-0038-7

work page doi:10.1186/s42400-019-0038-7 2019
[27]

Sensors23(44), 2198 (Jan 2023)

Lee, S.: Distributed detection of malicious an- droid apps while preserving privacy using feder- ated learning. Sensors23(44), 2198 (Jan 2023). https://doi.org/10.3390/s23042198

work page doi:10.3390/s23042198 2023
[28]

In: 2014 9th IEEE Conference on Industrial Electronics and Applications

Li, J., Zhai, L., Zhang, X., Quan, D.: Research of an- droid malware detection based on network traffic mon- itoring. In: 2014 9th IEEE Conference on Industrial Electronics and Applications. pp. 1739–1744 (2014). https://doi.org/10.1109/ICIEA.2014.6931449

work page doi:10.1109/iciea.2014.6931449 2014
[29]

In: Proceedings 2017 Network and Distributed System Security Sympo- sium

Mariconti, E., Onwuzurike, L., Andriotis, P., De Cristo- faro, E., Ross, G., Stringhini, G.: Mamadroid: Detecting android malware by building markov chains of behavioral models. In: Proceedings 2017 Network and Distributed System Security Sympo- sium. Internet Society, San Diego, CA (2017). https://doi.org/10.14722/ndss.2017.23353

work page doi:10.14722/ndss.2017.23353 2017
[30]

In: In- formation Security: 24th International Conference, ISC 2021, Virtual Event, November 10–12, 2021, Proceed- ings

Norouzian, M.R., Xu, P., Eckert, C., Zarras, A.: Hy- broid: Toward android malware detection and catego- rization with program code and network traffic. In: In- formation Security: 24th International Conference, ISC 2021, Virtual Event, November 10–12, 2021, Proceed- ings. p. 259–278. Springer-Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978...

work page doi:10.1007/978-3-030-91356-4_14 2021
[31]

In: 28th USENIX Security Sym- posium (USENIX Security 19)

Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: TESSERACT: Eliminating ex- perimental bias in malware classification across space and time. In: 28th USENIX Security Sym- posium (USENIX Security 19). pp. 729–746. USENIX Association, Santa Clara, CA (Aug 2019),https://www.usenix.org/conference/ usenixsecurity19/presentation/pendlebury

2019
[32]

(2026),https://github

PRALab: End-to-end implementation of ml-based an- droid malware detectors. (2026),https://github. com/pralab/android-detectors

2026
[33]

Results in Engineering28, 107050 (2025)

Prasad, A., Chandra, S., Alenazy, W.M., Ali, G., Shah, S., ElAffendi, M.: Andromd: An android malware detection framework based on source code analysis and permission scan- ning. Results in Engineering28, 107050 (2025). https://doi.org/10.1016/j.rineng.2025.107050

work page doi:10.1016/j.rineng.2025.107050 2025
[34]

In: 28th USENIX Security Symposium (USENIX Security 19)

Reardon, J., Feal, Á., Wijesekera, P., On, A.E.B., Vallina-Rodriguez, N., Egelman, S.: 50 ways to leak your data: An exploration of apps’ circumvention of the android permissions system. In: 28th USENIX Security Symposium (USENIX Security 19). pp. 603–620. USENIX Association, Santa Clara, CA (Aug 2019),https://www.usenix.org/conference/ usenixsecurity19/p...

2019
[35]

In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Secu- rity

Shokri, R., Shmatikov, V .: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Secu- rity. p. 1310–1321. CCS ’15, Association for Com- puting Machinery, New York, NY , USA (2015). https://doi.org/10.1145/2810103.2813687

work page doi:10.1145/2810103.2813687 2015
[36]

Journal of Infor- Don’t Trust Us: A privacy-by-design android malware detection pipeline 13 mation Security and Applications80, 103691 (2024)

Soi, D., Sanna, A., Maiorca, D., Giacinto, G.: Enhancing android malware detection explainability through function call graph apis. Journal of Infor- Don’t Trust Us: A privacy-by-design android malware detection pipeline 13 mation Security and Applications80, 103691 (2024). https://doi.org/10.1016/j.jisa.2023.103691

work page doi:10.1016/j.jisa.2023.103691 2024
[37]

In- ternational Journal of Information Security14(2015)

Spreitzenbarth, M., Schreck, T., Echtler, F., Arp, D., Hoffmann, J.: Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. In- ternational Journal of Information Security14(2015). https://doi.org/10.1007/s10207-014-0250-0

work page doi:10.1007/s10207-014-0250-0 2015
[38]

IEEE Access12, 57261–57287 (2024)

Sutter, T., Kehrer, T., Rennhard, M., Tellen- bach, B., Klein, J.: Dynamic security anal- ysis on android: A systematic literature re- view. IEEE Access12, 57261–57287 (2024). https://doi.org/10.1109/ACCESS.2024.3390612

work page doi:10.1109/access.2024.3390612 2024
[39]

Wolford, B.: What is gdpr, the eu’s new data protection law? (Nov 2018),https://gdpr.eu/ what-is-gdpr/

2018
[40]

In: 2020 IEEE 45th Conference on Local Computer Networks (LCN)

Yao, W., Li, Y ., Lin, W., Hu, T., Chowdhury, I., Masood, R., Seneviratne, S.: Security apps under the looking glass: An empirical analysis of android security apps. In: 2020 IEEE 45th Conference on Local Computer Networks (LCN). p. 381–384 (Nov 2020). https://doi.org/10.1109/LCN48667.2020.9314784, iSSN: 0742-1303

work page doi:10.1109/lcn48667.2020.9314784 2020
[41]

In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communica- tions Security

Zhang, X., Zhang, Y ., Zhong, M., Ding, D., Cao, Y ., Zhang, Y ., Zhang, M., Yang, M.: Enhancing state-of- the-art classifiers with api semantics to detect evolved android malware. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communica- tions Security. p. 757–770. CCS ’20, Association for Computing Machinery, New York, NY , USA (Nov 2...

work page doi:10.1145/3372297.3417291 2020
[42]

Zhou, Z., Zhu, J., Yu, F., Li, X., Peng, X., Liu, T., Han, B.: Model inversion attacks: A survey of approaches and countermeasures (2025),https://arxiv.org/ abs/2411.10023

arXiv 2025

[1] [1]

Android developers documentation on accessibility ser- vices.https://developer.android.com/guide/ topics/ui/accessibility/service, accessed online on March 2026

2026

[2] [2]

https://support.avast.com/en-us/article/ mobile-security-permissions/#mac, accessed online on March 2026

Android permissions required by avast mobile security. https://support.avast.com/en-us/article/ mobile-security-permissions/#mac, accessed online on March 2026

2026

[3] [3]

ARES ’24

Aldini, A., Petrelli, T.: Image-based detection and clas- sification of android malware through cnn models. In: Proceedings of the 19th International Conference on Availability, Reliability and Security. ARES ’24, Asso- ciation for Computing Machinery, New York, NY , USA (2024). https://doi.org/10.1145/3664476.3670441

work page doi:10.1145/3664476.3670441 2024

[4] [4]

IEEE Access12, 173168–173191 (2024)

Altaha, S.J., Aljughaiman, A., Gul, S.: A survey on android malware detection tech- niques using supervised machine learning. IEEE Access12, 173168–173191 (2024). https://doi.org/10.1109/ACCESS.2024.3485706

work page doi:10.1109/access.2024.3485706 2024

[5] [5]

In: Proceedings of the 3rd ACM on International Workshop on Security And Privacy An- alytics

Alzaylaee, M.K., Yerima, S.Y ., Sezer, S.: Emulator vs real phone: Android malware detection using ma- chine learning. In: Proceedings of the 3rd ACM on International Workshop on Security And Privacy An- alytics. p. 65–72. IWSPA ’17, Association for Com- puting Machinery, New York, NY , USA (Mar 2017). https://doi.org/10.1145/3041008.3041010

work page doi:10.1145/3041008.3041010 2017

[6] [6]

Computers & Security89, 101663 (Feb 2020)

Alzaylaee, M.K., Yerima, S.Y ., Sezer, S.: Dl-droid: Deep learning based android malware detection using real devices. Computers & Security89, 101663 (Feb 2020). https://doi.org/10.1016/j.cose.2019.101663

work page doi:10.1016/j.cose.2019.101663 2020

[7] [7]

any.run: Any.run - interactive online malware sandbox, https://any.run/

[8] [8]

In: Proceed- ings 2014 Network and Distributed System Security Symposium

Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K.: Drebin: Effective and explainable detec- tion of android malware in your pocket. In: Proceed- ings 2014 Network and Distributed System Security Symposium. Internet Society, San Diego, CA (2014). https://doi.org/10.14722/ndss.2014.23247

work page doi:10.14722/ndss.2014.23247 2014

[9] [9]

Computers & Security130, 103277 (2023)

Bhat, P., Behal, S., Dutta, K.: A system call-based android malware detection approach with homoge- neous & heterogeneous ensemble machine learn- ing. Computers & Security130, 103277 (2023). https://doi.org/10.1016/j.cose.2023.103277

work page doi:10.1016/j.cose.2023.103277 2023

[10] [10]

IEEE Transactions on Information Foren- sics and Security13(5), 1286–1300 (2018)

Chen, J., Wang, C., Zhao, Z., Chen, K., Du, R., Ahn, G.J.: Uncovering the face of android ran- somware: Characterization and real-time detec- tion. IEEE Transactions on Information Foren- sics and Security13(5), 1286–1300 (2018). https://doi.org/10.1109/TIFS.2017.2787905

work page doi:10.1109/tifs.2017.2787905 2018

[11] [11]

In: Proceed- ings of the 4th IEEE Conference on Secure and Trustworthy Machine Learning

Chow, T., D’Onghia, M., Linhardt, L., Kan, Z., Arp, D., Cavallaro, L., Pierazzi, F.: Beyond the tesser- act: Trustworthy dataset curation for sound evalu- ations of android malware classifiers. In: Proceed- ings of the 4th IEEE Conference on Secure and Trustworthy Machine Learning. IEEE, Munich, Ger- many (2026),https://discovery.ucl.ac.uk/id/ eprint/1022...

arXiv 2026

[12] [12]

Information and Software Technology189, 107892 (2026)

Ciaramella, G., Martinelli, F., Peluso, C., San- tone, A., Mercaldo, F.: A method for real-world privacy-preserving android malware detection through federated machine learning. Information and Software Technology189, 107892 (2026). https://doi.org/10.1016/j.infsof.2025.107892

work page doi:10.1016/j.infsof.2025.107892 2026

[13] [13]

In: Proceed- ings of the 5th ACM Conference on Data and Applica- tion Security and Privacy

Conti, M., Mancini, L.V ., Spolaor, R., Verde, N.V .: Can’t you hear me knocking: Identification of user ac- tions on android apps via traffic analysis. In: Proceed- ings of the 5th ACM Conference on Data and Applica- tion Security and Privacy. p. 297–304. CODASPY ’15, Association for Computing Machinery, New York, NY , USA (2015). https://doi.org/10.1145...

work page doi:10.1145/2699026.2699119 2015

[14] [14]

Automated Software Engineering30(2023)

Cui, Y ., Sun, Y ., Lin, Z.: Droidhook: a novel api- hook based android malware dynamic analysis sand- box. Automated Software Engineering30(2023). https://doi.org/10.1007/s10515-023-00378-w

work page doi:10.1007/s10515-023-00378-w 2023

[15] [15]

Journal of Systems and Software183, 111092 (Jan 2022)

Da Costa, F.H., Medeiros, I., Menezes, T., Da Silva, J.V ., Da Silva, I.L., Bonifácio, R., Narasimhan, K., Ribeiro, M.: Exploring the use of static and dynamic analysis to improve the performance of the mining sandbox approach for android malware identification. Journal of Systems and Software183, 111092 (Jan 2022). https://doi.org/10.1016/j.jss.2021.111092

work page doi:10.1016/j.jss.2021.111092 2022

[16] [16]

51(3) (May 2018)

Dijkhuizen, N.V ., Ham, J.V .D.: A survey of network traffic anonymisation techniques and implementations 12 Massidda et al. 51(3) (May 2018). https://doi.org/10.1145/3182660

work page doi:10.1145/3182660 2018

[17] [17]

Journal of Systems Architecture125, 102452 (Apr 2022)

Faghihi, F., Zulkernine, M., Ding, S.: Camod- roid: An android application analysis environ- ment resilient against sandbox evasion. Journal of Systems Architecture125, 102452 (Apr 2022). https://doi.org/10.1016/j.sysarc.2022.102452

work page doi:10.1016/j.sysarc.2022.102452 2022

[18] [18]

Pattern Recognition33(12), 2099–2101 (Dec 2000)

Fumera, G., Roli, F., Giacinto, G.: Reject option with multiple thresholds. Pattern Recognition33(12), 2099–2101 (Dec 2000). https://doi.org/10.1016/S0031- 3203(00)00059-5

work page doi:10.1016/s0031- 2099

[19] [19]

In: Proceedings of the 31st In- ternational Conference on Neural Information Process- ing Systems

Geifman, Y ., El-Yaniv, R.: Selective classification for deep neural networks. In: Proceedings of the 31st In- ternational Conference on Neural Information Process- ing Systems. p. 4885–4894. NIPS’17, Curran Asso- ciates Inc., Red Hook, NY , USA (Dec 2017),10.5555/ 3295222.3295241

arXiv 2017

[20] [20]

AI and Ethics2(3), 477–491 (Aug 2022)

Goldsteen, A., Ezov, G., Shmelkin, R., Moffie, M., Farkash, A.: Data minimization for gdpr compliance in machine learning models. AI and Ethics2(3), 477–491 (Aug 2022). https://doi.org/10.1007/s43681- 021-00095-8

work page doi:10.1007/s43681- 2022

[21] [21]

https://blog.research.google/2017/04/federatedlearning- collaborative.html, accessed online on March 2026

Google: Federated learning: Collaborative ma- chine learning without centralized training data. https://blog.research.google/2017/04/federatedlearning- collaborative.html, accessed online on March 2026

2017

[22] [22]

google.com/android/play-protect, accessed on- line on March 2026

Google: Google play protect.https://developers. google.com/android/play-protect, accessed on- line on March 2026

2026

[23] [23]

In: 2020 15th Asia Joint Conference on Informa- tion Security (AsiaJCIS)

Hsu, R.H., Wang, Y .C., Fan, C.I., Sun, B., Ban, T., Takahashi, T., Wu, T.W., Kao, S.W.: A privacy- preserving federated learning system for android malware detection based on edge computing. In: 2020 15th Asia Joint Conference on Informa- tion Security (AsiaJCIS). p. 128–136 (Aug 2020). https://doi.org/10.1109/AsiaJCIS50894.2020.00031

work page doi:10.1109/asiajcis50894.2020.00031 2020

[24] [24]

and Zhang, Xuyun , title =

Hu, H., Salcic, Z., Dobbie, G., Zhang, X.: Mem- bership inference attacks on machine learning: A survey. ACM Computing Surveys54(2021). https://doi.org/10.1145/3523273

work page doi:10.1145/3523273 2021

[25] [25]

In: 2019 IEEE 26th Inter- national Conference on Software Analysis, Evolution and Reengineering (SANER)

Hu, Y ., Wang, H., Li, L., Guo, Y ., Xu, G., He, R.: Want to earn a few extra bucks? a first look at money-making apps. In: 2019 IEEE 26th Inter- national Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 332–343 (2019). https://doi.org/10.1109/SANER.2019.8668035

work page doi:10.1109/saner.2019.8668035 2019

[26] [26]

Survey of intrusion detection systems: Techniques, datasets and challenges,

Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J.: Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity2(1), 20 (2019). https://doi.org/10.1186/s42400-019-0038-7

work page doi:10.1186/s42400-019-0038-7 2019

[27] [27]

Sensors23(44), 2198 (Jan 2023)

Lee, S.: Distributed detection of malicious an- droid apps while preserving privacy using feder- ated learning. Sensors23(44), 2198 (Jan 2023). https://doi.org/10.3390/s23042198

work page doi:10.3390/s23042198 2023

[28] [28]

In: 2014 9th IEEE Conference on Industrial Electronics and Applications

Li, J., Zhai, L., Zhang, X., Quan, D.: Research of an- droid malware detection based on network traffic mon- itoring. In: 2014 9th IEEE Conference on Industrial Electronics and Applications. pp. 1739–1744 (2014). https://doi.org/10.1109/ICIEA.2014.6931449

work page doi:10.1109/iciea.2014.6931449 2014

[29] [29]

In: Proceedings 2017 Network and Distributed System Security Sympo- sium

Mariconti, E., Onwuzurike, L., Andriotis, P., De Cristo- faro, E., Ross, G., Stringhini, G.: Mamadroid: Detecting android malware by building markov chains of behavioral models. In: Proceedings 2017 Network and Distributed System Security Sympo- sium. Internet Society, San Diego, CA (2017). https://doi.org/10.14722/ndss.2017.23353

work page doi:10.14722/ndss.2017.23353 2017

[30] [30]

In: In- formation Security: 24th International Conference, ISC 2021, Virtual Event, November 10–12, 2021, Proceed- ings

Norouzian, M.R., Xu, P., Eckert, C., Zarras, A.: Hy- broid: Toward android malware detection and catego- rization with program code and network traffic. In: In- formation Security: 24th International Conference, ISC 2021, Virtual Event, November 10–12, 2021, Proceed- ings. p. 259–278. Springer-Verlag, Berlin, Heidelberg (2021). https://doi.org/10.1007/978...

work page doi:10.1007/978-3-030-91356-4_14 2021

[31] [31]

In: 28th USENIX Security Sym- posium (USENIX Security 19)

Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: TESSERACT: Eliminating ex- perimental bias in malware classification across space and time. In: 28th USENIX Security Sym- posium (USENIX Security 19). pp. 729–746. USENIX Association, Santa Clara, CA (Aug 2019),https://www.usenix.org/conference/ usenixsecurity19/presentation/pendlebury

2019

[32] [32]

(2026),https://github

PRALab: End-to-end implementation of ml-based an- droid malware detectors. (2026),https://github. com/pralab/android-detectors

2026

[33] [33]

Results in Engineering28, 107050 (2025)

Prasad, A., Chandra, S., Alenazy, W.M., Ali, G., Shah, S., ElAffendi, M.: Andromd: An android malware detection framework based on source code analysis and permission scan- ning. Results in Engineering28, 107050 (2025). https://doi.org/10.1016/j.rineng.2025.107050

work page doi:10.1016/j.rineng.2025.107050 2025

[34] [34]

In: 28th USENIX Security Symposium (USENIX Security 19)

Reardon, J., Feal, Á., Wijesekera, P., On, A.E.B., Vallina-Rodriguez, N., Egelman, S.: 50 ways to leak your data: An exploration of apps’ circumvention of the android permissions system. In: 28th USENIX Security Symposium (USENIX Security 19). pp. 603–620. USENIX Association, Santa Clara, CA (Aug 2019),https://www.usenix.org/conference/ usenixsecurity19/p...

2019

[35] [35]

In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Secu- rity

Shokri, R., Shmatikov, V .: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Secu- rity. p. 1310–1321. CCS ’15, Association for Com- puting Machinery, New York, NY , USA (2015). https://doi.org/10.1145/2810103.2813687

work page doi:10.1145/2810103.2813687 2015

[36] [36]

Journal of Infor- Don’t Trust Us: A privacy-by-design android malware detection pipeline 13 mation Security and Applications80, 103691 (2024)

Soi, D., Sanna, A., Maiorca, D., Giacinto, G.: Enhancing android malware detection explainability through function call graph apis. Journal of Infor- Don’t Trust Us: A privacy-by-design android malware detection pipeline 13 mation Security and Applications80, 103691 (2024). https://doi.org/10.1016/j.jisa.2023.103691

work page doi:10.1016/j.jisa.2023.103691 2024

[37] [37]

In- ternational Journal of Information Security14(2015)

Spreitzenbarth, M., Schreck, T., Echtler, F., Arp, D., Hoffmann, J.: Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. In- ternational Journal of Information Security14(2015). https://doi.org/10.1007/s10207-014-0250-0

work page doi:10.1007/s10207-014-0250-0 2015

[38] [38]

IEEE Access12, 57261–57287 (2024)

Sutter, T., Kehrer, T., Rennhard, M., Tellen- bach, B., Klein, J.: Dynamic security anal- ysis on android: A systematic literature re- view. IEEE Access12, 57261–57287 (2024). https://doi.org/10.1109/ACCESS.2024.3390612

work page doi:10.1109/access.2024.3390612 2024

[39] [39]

Wolford, B.: What is gdpr, the eu’s new data protection law? (Nov 2018),https://gdpr.eu/ what-is-gdpr/

2018

[40] [40]

In: 2020 IEEE 45th Conference on Local Computer Networks (LCN)

Yao, W., Li, Y ., Lin, W., Hu, T., Chowdhury, I., Masood, R., Seneviratne, S.: Security apps under the looking glass: An empirical analysis of android security apps. In: 2020 IEEE 45th Conference on Local Computer Networks (LCN). p. 381–384 (Nov 2020). https://doi.org/10.1109/LCN48667.2020.9314784, iSSN: 0742-1303

work page doi:10.1109/lcn48667.2020.9314784 2020

[41] [41]

In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communica- tions Security

Zhang, X., Zhang, Y ., Zhong, M., Ding, D., Cao, Y ., Zhang, Y ., Zhang, M., Yang, M.: Enhancing state-of- the-art classifiers with api semantics to detect evolved android malware. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communica- tions Security. p. 757–770. CCS ’20, Association for Computing Machinery, New York, NY , USA (Nov 2...

work page doi:10.1145/3372297.3417291 2020

[42] [42]

Zhou, Z., Zhu, J., Yu, F., Li, X., Peng, X., Liu, T., Han, B.: Model inversion attacks: A survey of approaches and countermeasures (2025),https://arxiv.org/ abs/2411.10023

arXiv 2025