Recognition: no theorem link
McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware
Pith reviewed 2026-05-11 01:05 UTC · model grok-4.3
The pith
McNdroid benchmark shows multimodal fusion resists Android malware detection degradation better than single modalities over long time gaps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
McNdroid supplies a longitudinal multimodal benchmark of Android malware samples from 2013 to 2025, each represented by static features, dynamic behavioral features, and graph-based features. Evaluation on temporally separated splits demonstrates clear performance degradation as train-test time gaps widen, while multimodal fusion maintains superior accuracy compared with the best single modality across the longest gaps; cross-modal agreement likewise declines, revealing that drift affects both individual feature spaces and their consistency.
What carries the argument
The McNdroid dataset with its three aligned modalities and temporally separated splits that enable controlled measurement of concept drift.
If this is right
- Detectors must incorporate drift-handling mechanisms to preserve accuracy over multi-year deployments.
- Multimodal fusion should be preferred when models are expected to encounter samples from distant future periods.
- Declining cross-modal agreement can serve as a practical signal for detecting the onset of drift.
- Modality-specific drift analysis can guide selective feature updating rather than full model retraining.
- Public release of the splits and code enables direct testing of adaptation techniques on the same data.
Where Pith is reading between the lines
- Security systems could adopt rolling data ingestion schedules modeled on the benchmark's yearly structure to limit degradation.
- The patterns of malware family evolution visible in the longitudinal data may support family-aware detection modules.
- Dynamic reweighting of modalities based on observed agreement could be tested as an extension of the fusion approach.
- Similar temporal splits could be constructed for other non-stationary domains such as network intrusion detection to compare drift behaviors.
Load-bearing premise
The temporally separated splits and three aligned modalities accurately capture real-world concept drift without major biases from data collection, labeling, or sandbox execution across the full period.
What would settle it
A single-modality detector that maintains accuracy equal to or higher than multimodal fusion on the longest temporal gaps in the released splits would falsify the claim of multimodal superiority under drift.
Figures
read the original abstract
Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challenging to curate and organize data to study these factors. We present McNdroid, to our knowledge the largest longitudinal multimodal Android malware benchmark for malware detection and drift analysis. McNdroid spans 2013--2025, excluding 2015, and represents each application with three aligned modalities--static features from manifests and smali code, dynamic behavioral features from sandbox execution, and graph-based features from function-call graphs. Using temporally separated splits, we evaluate standard ML and deep-learning detectors across increasing train--test time gaps. Results show clear temporal degradation, while multimodal fusion outperforms the best single modality across long-term temporal gaps. Cross-modal agreement also declines over time, suggesting that drift affects both individual feature spaces and the consistency among modalities. We further analyze modality-specific drift, malware-family evolution, and temporal changes in model explanations. We publicly release McNdroid, benchmark splits, and code to support reproducible research on temporal generalization and robust multimodal learning in security-critical, non-stationary settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces McNdroid, claimed to be the largest longitudinal multimodal Android malware benchmark spanning 2013-2025 (excluding 2015). Each sample is represented by three aligned modalities (static features from manifests and smali, dynamic behavioral features from sandbox execution, and graph-based features from function-call graphs). Using temporally separated train-test splits with increasing time gaps, the authors evaluate standard ML and deep-learning detectors, reporting clear temporal performance degradation, superior performance of multimodal fusion over the best single modality for long-term gaps, declining cross-modal agreement over time, and additional analyses of modality-specific drift, malware family evolution, and temporal changes in model explanations. The dataset, splits, and code are released publicly.
Significance. If the temporally separated splits isolate genuine concept drift without collection or labeling artifacts, this benchmark would be a valuable contribution to research on non-stationary malware detection and multimodal learning in security. The public release of data and code supports reproducibility, which strengthens its utility for the community studying temporal generalization.
major comments (3)
- [Dataset construction] Dataset construction section: The manuscript provides no details on label provenance, re-verification procedures, or controls for changes in AV engine behavior across 2013-2025. This is load-bearing because the central claim of temporal degradation (and declining cross-modal agreement) requires that observed drops reflect feature-space drift rather than time-varying label noise.
- [Experimental evaluation] Experimental evaluation section: No information is given on sandbox version pinning, Android OS normalization, or execution environment standardization over the collection period. Without this, performance degradation across temporal gaps could arise from evolving sandbox artifacts rather than malware distribution shifts, undermining the multimodal fusion and drift analysis results.
- [Results and analysis] Results and analysis section: The reported outperformance of multimodal fusion and cross-modal disagreement trends lack accompanying statistical tests, confidence intervals, or ablation on data quality controls, making it difficult to assess whether the improvements are robust or sensitive to the unaddressed temporal confounds.
minor comments (2)
- [Abstract] The reason for excluding 2015 from the 2013-2025 span is stated but not motivated; a brief justification would improve clarity.
- [Methodology] Notation for the three modalities is introduced without a summary table of feature dimensions or extraction costs, which would aid readers in interpreting the multimodal fusion experiments.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The concerns raised about dataset labeling, experimental controls, and statistical rigor are substantive and directly relevant to the validity of our temporal drift claims. We address each major comment below and have prepared revisions to the manuscript that incorporate the requested details and additional analyses.
read point-by-point responses
-
Referee: [Dataset construction] Dataset construction section: The manuscript provides no details on label provenance, re-verification procedures, or controls for changes in AV engine behavior across 2013-2025. This is load-bearing because the central claim of temporal degradation (and declining cross-modal agreement) requires that observed drops reflect feature-space drift rather than time-varying label noise.
Authors: We agree that the original manuscript omitted explicit details on label provenance and controls. In the revised version we will expand the Dataset Construction section with a new subsection that specifies: labels were obtained via VirusTotal using a fixed majority-vote threshold of at least five detections from a stable panel of AV engines; a 5% random subset was re-verified through manual static/dynamic analysis by two independent researchers with inter-rater agreement reported; and labeling scripts and raw VT metadata will be released with the dataset. These additions will demonstrate that the observed performance drops and cross-modal disagreement trends are driven by feature-space changes rather than time-varying label noise. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation section: No information is given on sandbox version pinning, Android OS normalization, or execution environment standardization over the collection period. Without this, performance degradation across temporal gaps could arise from evolving sandbox artifacts rather than malware distribution shifts, undermining the multimodal fusion and drift analysis results.
Authors: We concur that environment standardization is critical. The revised Experimental Evaluation section will document that all samples were executed under a pinned Cuckoo Sandbox configuration with Android emulator fixed at API level 28 (with documented backward-compatible emulators for pre-2018 samples), identical 300-second timeout, network simulation, and trigger set. We will also add an ablation that recomputes key metrics under these controlled conditions and shows that the temporal degradation and multimodal gains persist, thereby confirming that the results reflect malware distribution shifts. revision: yes
-
Referee: [Results and analysis] Results and analysis section: The reported outperformance of multimodal fusion and cross-modal disagreement trends lack accompanying statistical tests, confidence intervals, or ablation on data quality controls, making it difficult to assess whether the improvements are robust or sensitive to the unaddressed temporal confounds.
Authors: We accept that statistical support and quality ablations are necessary. The revised Results and Analysis section will include: (i) paired t-tests and McNemar’s tests with p-values for multimodal versus best-single-modality comparisons at each temporal gap; (ii) 95% bootstrap confidence intervals on all accuracy and F1 scores; and (iii) an ablation that removes samples with low-confidence labels and verifies that the multimodal advantage and cross-modal disagreement trends remain statistically significant. These additions will strengthen the robustness claims. revision: yes
Circularity Check
No significant circularity: empirical benchmark without derivations or self-referential predictions
full rationale
This paper introduces a new longitudinal multimodal Android malware dataset (McNdroid) spanning 2013-2025 and reports direct empirical evaluations of standard ML and deep-learning detectors on temporally separated train-test splits. No mathematical derivations, equations, fitted parameters, or predictions are claimed. The central results (temporal degradation, multimodal fusion outperforming single modalities, declining cross-modal agreement) are straightforward measurements on the released data and splits. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the findings; the work is self-contained and externally falsifiable via the public benchmark.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Temporally separated train-test splits with increasing gaps accurately reflect real-world concept drift in Android malware
Reference graph
Works this paper leans on
-
[1]
Bissyandé, Jacques Klein, and Yves Le Traon
Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. Androzoo: Collecting millions of android apps for the research community. InInternational Conference on Mining Software Repositories (MSR), 2016
2016
-
[2]
EMBER: an open dataset for training static pe malware machine learning models.arXiv preprint, 2018
Hyrum S Anderson and Phil Roth. EMBER: an open dataset for training static pe malware machine learning models.arXiv preprint, 2018
2018
-
[3]
Obfuscapk: An open-source black-box obfuscation tool for android apps.(SoftwareX), 2020
Simone Aonzo, Gabriel Claudiu Georgiu, Luca Verderame, and Alessio Merlo. Obfuscapk: An open-source black-box obfuscation tool for android apps.(SoftwareX), 2020
2020
-
[4]
Gated multimodal units for information fusion.arXiv preprint, 2017
John Arevalo, Thamar Solorio, Manuel Montes-y Gómez, and Fabio A González. Gated multimodal units for information fusion.arXiv preprint, 2017
2017
-
[5]
Drebin: Effective and explainable detection of android malware in your pocket
Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. Drebin: Effective and explainable detection of android malware in your pocket. In Network and Distributed System Security Symposium (NDSS), 2014
2014
-
[6]
Dos and don’ts of machine learning in computer security
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, and Konrad Rieck. Dos and don’ts of machine learning in computer security. InUSENIX Security Symposium, 2022
2022
-
[7]
Detecting concept drift with neural network model uncertainty.arXiv preprint arXiv:2107.01873, 2021
Lucas Baier, Tim Schlör, Jakob Schöffer, and Niklas Kühl. Detecting concept drift with neural network model uncertainty.arXiv preprint arXiv:2107.01873, 2021
-
[8]
The impact of api change-and fault-proneness on the user ratings of android apps.IEEE Transactions on Software Engineering (TSE), 2014
Gabriele Bavota, Mario Linares-Vasquez, Carlos Eduardo Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. The impact of api change-and fault-proneness on the user ratings of android apps.IEEE Transactions on Software Engineering (TSE), 2014
2014
-
[9]
Kolmogorov–smirnov test: Overview.Wiley statsref: Statistics reference online, 2014
Vance W Berger and YanYan Zhou. Kolmogorov–smirnov test: Overview.Wiley statsref: Statistics reference online, 2014
2014
-
[10]
Early, inter- mediate and late fusion strategies for robust deep learning-based multimodal action recognition
Said Yacine Boulahia, Abdenour Amamra, Mohamed Ridha Madi, and Said Daikh. Early, inter- mediate and late fusion strategies for robust deep learning-based multimodal action recognition. Machine Vision and Applications, 2021
2021
-
[11]
Assessing and improving malware detection sustainability through app evolution studies.ACM Transactions on Software Engineering and Methodology (TOSEM), 2020
Haipeng Cai. Assessing and improving malware detection sustainability through app evolution studies.ACM Transactions on Software Engineering and Methodology (TOSEM), 2020
2020
-
[12]
Ricardo J. G. B. Campello, Davoud Moulavi, and Jörg Sander. Density-based clustering based on hierarchical density estimates. InPacific-Asia Conference on Knowledge Discovery and Data Mining (PA-KDD), 2013
2013
-
[13]
Towards multimodal sarcasm detection (an _obviously_ perfect paper)
Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, and Soujanya Poria. Towards multimodal sarcasm detection (an _obviously_ perfect paper). InAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
2019
-
[14]
arXiv preprint arXiv:1811.03728 (2018)
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. Detecting backdoor attacks on deep neural networks by activation clustering.arXiv preprint arXiv:1811.03728, 2018. 10
-
[15]
Han Chen, Hanchen Wang, Hongmei Chen, Ying Zhang, Lu Qin, and Wenjie Zhang. Higraph: A large-scale hierarchical graph dataset for malware analysis.arXiv preprint arXiv:2509.02113, 2025
-
[16]
Xgboost: A scalable tree boosting system
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2016
2016
-
[17]
On training robust {PDF} malware classifiers
Yizheng Chen, Shiqi Wang, Dongdong She, and Suman Jana. On training robust {PDF} malware classifiers. InUSENIX Security Symposium, 2020
2020
-
[18]
Continuous learning for android malware detection
Yizheng Chen, Zhoujie Ding, and David Wagner. Continuous learning for android malware detection. InUSENIX Security Symposium, 2023
2023
-
[19]
Theo Chow, Mario D’Onghia, Lorenz Linhardt, Zeliang Kan, Daniel Arp, Lorenzo Cavallaro, and Fabio Pierazzi. Breaking out from the tesseract: Reassessing ml-based malware detection under spatio-temporal drift.arXiv preprint arXiv:2506.23814, 2025
-
[20]
Androguard: Reverse engineering, malware and goodware analysis of android applications.https://github.com/androguard/androguard, 2011
Anthony Desnos. Androguard: Reverse engineering, malware and goodware analysis of android applications.https://github.com/androguard/androguard, 2011
2011
-
[21]
Anoshift: A distribution shift benchmark for unsupervised anomaly detection.Advances in Neural Information Processing Systems (NeurIPS), 2022
Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, and Florin Brad. Anoshift: A distribution shift benchmark for unsupervised anomaly detection.Advances in Neural Information Processing Systems (NeurIPS), 2022
2022
-
[22]
Taintdroid: an information- flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 2014
William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information- flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 2014
2014
-
[23]
Threat landscape
European Union Agency for Cybersecurity (ENISA). Threat landscape. https://www. enisa.europa.eu/topics/cyber-threats/threat-landscape, 2025. URL https: //www.enisa.europa.eu/topics/cyber-threats/threat-landscape. Accessed 2025- 11-27
2025
-
[24]
Automated api-usage update for android apps
Mattia Fazzini, Qi Xin, and Alessandro Orso. Automated api-usage update for android apps. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis (SIGSOFT), 2019
2019
-
[25]
Fedmultimodal: A benchmark for multimodal federated learning
Tiantian Feng, Digbalay Bose, Tuo Zhang, Rajat Hebbar, Anil Ramakrishna, Rahul Gupta, Mi Zhang, Salman Avestimehr, and Shrikanth Narayanan. Fedmultimodal: A benchmark for multimodal federated learning. InACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2023
2023
-
[26]
Measuring nominal scale agreement among many raters.American Psycholog- ical Association Psychological bulletin, 1971
Joseph L Fleiss. Measuring nominal scale agreement among many raters.American Psycholog- ical Association Psychological bulletin, 1971
1971
-
[27]
A multimodal approach for human activity recognition based on skeleton and rgb data.Pattern Recognition Letters, 2020
Annalisa Franco, Antonio Magnani, and Dario Maio. A multimodal approach for human activity recognition based on skeleton and rgb data.Pattern Recognition Letters, 2020
2020
-
[28]
A large-scale database for graph representation learning.Advances in Neural Information Processing Systems (NeurIPS), 2021
Scott Freitas, Yuxiao Dong, Joshua Neil, and Duen Horng Chau. A large-scale database for graph representation learning.Advances in Neural Information Processing Systems (NeurIPS), 2021
2021
-
[29]
Malnet: A large-scale image database of malicious software
Scott Freitas, Rahul Duggal, and Duen Horng Chau. Malnet: A large-scale image database of malicious software. InACM International Conference on Information & Knowledge Manage- ment (CIKM), 2022
2022
-
[30]
A compre- hensive study of learning-based android malware detectors under challenging environments
Cuiying Gao, Gaozhun Huang, Heng Li, Bang Wu, Yueming Wu, and Wei Yuan. A compre- hensive study of learning-based android malware detectors under challenging environments. In IEEE/ACM International Conference on Software Engineering (ICSE), 2024
2024
-
[31]
Inductive representation learning on large graphs.Advances in Neural Information Processing Systems (NeurIPS), 2017
Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in Neural Information Processing Systems (NeurIPS), 2017. 11
2017
-
[32]
LAMDA: A longitudinal android malware benchmark for concept drift analysis
Md Ahsanul Haque, Ismail Hossain, Md Mahmuduzzaman Kamol, Md Jahangir Alam, Suresh Kumar Amalapuram, Sajedul Talukder, and Mohammad Saidur Rahman. LAMDA: A longitudinal android malware benchmark for concept drift analysis. InInternational Conference on Learning Representations (ICLR), 2026
2026
-
[33]
Ur-funny: A multimodal language dataset for understanding humor
Md Kamrul Hasan, Wasifur Rahman, AmirAli Bagher Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, Louis-Philippe Morency, and Mohammed Ehsan Hoque. Ur-funny: A multimodal language dataset for understanding humor. InConference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
2019
-
[34]
Efficient query-based attack against ml- based android malware detection under zero knowledge setting
Ping He, Yifan Xia, Xuhong Zhang, and Shouling Ji. Efficient query-based attack against ml- based android malware detection under zero knowledge setting. InACM SIGSAC Conference on Computer and Communications Security (CCS), 2023
2023
-
[35]
Msdroid: Identifying malicious snippets for android malware detection.IEEE Transactions on Dependable and Secure Computing (TDSC), 2023
Yiling He, Yiping Liu, Lei Wu, Ziqi Yang, Kui Ren, and Zhan Qin. Msdroid: Identifying malicious snippets for android malware detection.IEEE Transactions on Dependable and Secure Computing (TDSC), 2023
2023
-
[36]
Hindroid: An intelligent android malware detection system based on structured heterogeneous information network
Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017
2017
-
[37]
An invariant form for the prior probability in estimation problems.Proceedings of the Royal Society of London
Harold Jeffreys. An invariant form for the prior probability in estimation problems.Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 1946
1946
-
[38]
Mimic-iii, a freely accessible critical care database.Scientific data, 2016
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database.Scientific data, 2016
2016
-
[39]
Robert J Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, and James Holt. EMBER2024–A Benchmark Dataset for Holistic Evaluation of Malware Classifiers.arXiv preprint arXiv:2506.05074, 2025
-
[40]
The Kinetics Human Action Video Dataset
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijaya- narasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. The kinetics human action video dataset.arXiv preprint arXiv:1705.06950, 2017
work page internal anchor Pith review arXiv 2017
-
[41]
Lightgbm: A highly efficient gradient boosting decision tree.Advances in Neural Information Processing Systems (NeurIPS), 2017
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree.Advances in Neural Information Processing Systems (NeurIPS), 2017
2017
-
[42]
A multimodal deep learning method for android malware detection using various features.IEEE Transactions on Information Forensics and Security (TIFS), 2018
TaeGuen Kim, BooJoong Kang, Mina Rho, Sakir Sezer, and Eul Gyu Im. A multimodal deep learning method for android malware detection using various features.IEEE Transactions on Information Forensics and Security (TIFS), 2018
2018
-
[43]
The droid is in the details: Environment-aware evasion of android sandboxes
Brian Kondracki, Babak Amin Azad, Najmeh Miramirkhani, and Nick Nikiforakis. The droid is in the details: Environment-aware evasion of android sandboxes. InNetwork and Distributed System Security Symposium (NDSS), 2022
2022
-
[44]
Feature shift detection: Localizing which features have shifted via conditional distribution tests.Advances in Neural Information Processing Systems (NeurIPS), 2020
Sean Kulinski, Saurabh Bagchi, and David I Inouye. Feature shift detection: Localizing which features have shifted via conditional distribution tests.Advances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[45]
Darkgate malware exploits samba file shares in short-lived cam- paign.The Hacker News, 2024
Ravie Lakshmanan. Darkgate malware exploits samba file shares in short-lived cam- paign.The Hacker News, 2024. URL https://thehackernews.com/2024/07/ darkgate-malware-exploits-samba-file.html
2024
-
[46]
Toward devel- oping a systematic approach to generate benchmark android malware datasets and classification
Arash Habibi Lashkari, Andi Fitriah A Kadir, Laya Taheri, and Ali A Ghorbani. Toward devel- oping a systematic approach to generate benchmark android malware datasets and classification. In2018 International Carnahan conference on security technology (ICCST), 2018. 12
2018
-
[47]
Multi- modal sensor fusion with differentiable filters
Michelle A Lee, Brent Yi, Roberto Martín-Martín, Silvio Savarese, and Jeannette Bohg. Multi- modal sensor fusion with differentiable filters. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020
2020
-
[48]
Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics (T-RO), 2020
Michelle A Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, and Jeannette Bohg. Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics (T-RO), 2020
2020
-
[49]
Evedroid: Event-aware android malware detection against model degrading for iot devices.IEEE Internet of Things Journal (IoT-J), 2019
Tao Lei, Zhan Qin, Zhibo Wang, Qi Li, and Dengpan Ye. Evedroid: Event-aware android malware detection against model degrading for iot devices.IEEE Internet of Things Journal (IoT-J), 2019
2019
-
[50]
Enrico: A dataset for topic modeling of mobile ui designs
Luis A Leiva, Asutosh Hota, and Antti Oulasvirta. Enrico: A dataset for topic modeling of mobile ui designs. InInternational Conference on Human-Computer Interaction with Mobile Devices and Services, 2020
2020
-
[51]
Revisiting concept drift in windows malware detection: Adaptation to real drifted malware with minimal samples
Adrian Shuai Li, Arun Iyengar, Ashish Kundu, and Elisa Bertino. Revisiting concept drift in windows malware detection: Adaptation to real drifted malware with minimal samples. In Network and Distributed System Security Symposium (NDSS), 2025
2025
-
[52]
Robust android malware detection against adversarial example attacks
Heng Li, Shiyao Zhou, Wei Yuan, Xiapu Luo, Cuiying Gao, and Shuiyan Chen. Robust android malware detection against adversarial example attacks. InWeb Conference (WWW), 2021
2021
-
[53]
Multibench: Multiscale benchmarks for multimodal representation learning.Advances in Neural Information Processing Systems (NeurIPS), 2021
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A Lee, Yuke Zhu, et al. Multibench: Multiscale benchmarks for multimodal representation learning.Advances in Neural Information Processing Systems (NeurIPS), 2021
2021
-
[54]
Multizoo and multibench: A standardized toolkit for multimodal deep learning.Journal of Machine Learning Research, 2023
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, and Ruslan Salakhutdinov. Multizoo and multibench: A standardized toolkit for multimodal deep learning.Journal of Machine Learning Research, 2023
2023
-
[55]
Unraveling the key of machine learning-based android malware detection.ACM Transactions on Software Engineering and Methodology (TOSEM), 2026
Jiahao Liu, Jun Zeng, Fabio Pierazzi, Ziqi Yang, Lorenzo Cavallaro, and Zhenkai Liang. Unraveling the key of machine learning-based android malware detection.ACM Transactions on Software Engineering and Methodology (TOSEM), 2026
2026
-
[56]
A unified approach to interpreting model predictions
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (NeurIPS), 2017
2017
-
[57]
Dynamic android malware category classification using semi-supervised deep learning
Samaneh Mahdavifar, Andi Fitriah Abdul Kadir, Rasool Fatemi, Dima Alhadidi, and Ali A Ghorbani. Dynamic android malware category classification using semi-supervised deep learning. In2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on ...
2020
-
[58]
Mamadroid: Detecting android malware by building markov chains of behavioral models
Enrico Mariconti, Lucky Onwuzurike, Panagiotis Andriotis, Emiliano De Cristofaro, Gianluca Ross, and Gianluca Stringhini. Mamadroid: Detecting android malware by building markov chains of behavioral models. InNetwork and Distributed System Security Symposium (NDSS), 2017
2017
-
[59]
Dung Thuy Nguyen, Ngoc N Tran, Taylor T Johnson, and Kevin Leach. Pbp: Post-training backdoor purification for malware classifiers.arXiv preprint arXiv:2412.03441, 2024
-
[60]
Your mobile app, their playground: The dark side of virtualization, 2025
Fernando Ortega and Vishnu Pratapagiri. Your mobile app, their playground: The dark side of virtualization, 2025. URL https://zimperium.com/blog/ your-mobile-app-their-playground-the-dark-side-of-the-virtualization
2025
-
[61]
MalCL: Lever- aging gan-based generative replay to combat catastrophic forgetting in malware classification
Jimin Park, AHyun Ji, Minji Park, Mohammad Saidur Rahman, and Se Eun Oh. MalCL: Lever- aging gan-based generative replay to combat catastrophic forgetting in malware classification. InAAAI Conference on Artificial Intelligence, 2025
2025
-
[62]
TESSERACT: Eliminating experimental bias in malware classification across space and time
Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. TESSERACT: Eliminating experimental bias in malware classification across space and time. InUSENIX Security Symposium, 2019. 13
2019
-
[63]
Mfas: Multimodal fusion architecture search
Juan-Manuel Pérez-Rúa, Valentin Vielzeuf, Stéphane Pateux, Moez Baccouche, and Frédéric Jurie. Mfas: Multimodal fusion architecture search. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPRW), 2019
2019
-
[64]
Coull, and Matthew Wright
Mohammad Saidur Rahman, Scott E. Coull, and Matthew Wright. On the Limitations of Continual Learning for Malware Classification. InConference on Lifelong Learning Agents (CoLLAs), 2022
2022
-
[65]
MADAR: Efficient continual learning for malware analysis with distribution-aware replay
Mohammad Saidur Rahman, Scott Coull, Qi Yu, and Matthew Wright. MADAR: Efficient continual learning for malware analysis with distribution-aware replay. InConference on Applied Machine Learning in Information Security (CAMLIS), 2025
2025
-
[66]
Droidchameleon: Evaluating android anti- malware against transformation attacks
Vaibhav Rastogi, Yan Chen, and Xuxian Jiang. Droidchameleon: Evaluating android anti- malware against transformation attacks. InACM SIGSAC Conference on Computer and Com- munications Security (CCS), 2013
2013
-
[67]
Ex- perience replay for continual learning
David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Ex- perience replay for continual learning. InNeural Information Processing Systems (NeurIPS), 2019
2019
-
[68]
Avclass2: Massive malware tag extraction from av labels
Silvia Sebastián and Juan Caballero. Avclass2: Massive malware tag extraction from av labels. InAnnual Computer Security Applications Conference (ACSAC), 2020
2020
-
[69]
Integrated static and dynamic analysis for malware detection
PV Shijo and AJPCS Salim. Integrated static and dynamic analysis for malware detection. Procedia Computer Science, 2015
2015
-
[70]
Hossein Shokouhinejad, Roozbeh Razavi-Far, Hesamodin Mohammadian, Mahdi Rabbani, Samuel Ansong, Griffin Higgins, and Ali A Ghorbani. Recent advances in malware detection: Graph learning and explainability.arXiv preprint arXiv:2502.10556, 2025
-
[71]
Detection of malicious pdf files based on hierarchical document structure
Nedim Šrndic and Pavel Laskov. Detection of malicious pdf files based on hierarchical document structure. InNetwork and Distributed System Security Symposium (NDSS), 2013
2013
-
[72]
Copperdroid: automatic reconstruction of android malware behaviors
Kimberly Tam, Salahuddin J Khan, Aristide Fattori, and Lorenzo Cavallaro. Copperdroid: automatic reconstruction of android malware behaviors. InNetwork and Distributed System Security Symposium (NDSS), 2015
2015
-
[73]
Continual learning and catastrophic forgetting.arXiv preprint arXiv:2403.05175, 2024
Gido M van de Ven, Nicholas Soures, and Dhireesha Kudithipudi. Continual learning and catastrophic forgetting.arXiv preprint arXiv:2403.05175, 2024
-
[74]
Centralnet: a multi- layer approach for multimodal fusion
Valentin Vielzeuf, Alexis Lechervy, Stéphane Pateux, and Frédéric Jurie. Centralnet: a multi- layer approach for multimodal fusion. InEuropean Conference on Computer Vision (ECCV) workshops, 2018
2018
-
[75]
VirusTotal — virustotal.com
Virustotal. VirusTotal — virustotal.com. https://www.virustotal.com/gui/ intelligence-overview. [Accessed 21-10-2025]
2025
-
[76]
VirusTotal – Stats, 2025
VirusTotal. VirusTotal – Stats, 2025. https://www.virustotal.com/gui/stats
2025
-
[77]
Feature hashing for large scale multitask learning
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. InInternational Conference on Machine Learning (ICML), 2009
2009
-
[78]
Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, and Michael R. Lyu. Why an android app is classified as malware: Toward malware classification interpretation. ACM Transactions on Software Engineering and Methodology (TOSEM), 2021
2021
-
[79]
Malscan: Fast market-wide mobile malware scanning by social-network centrality analysis
Yueming Wu, Xiaodi Li, Deqing Zou, Wei Yang, Xin Zhang, and Hai Jin. Malscan: Fast market-wide mobile malware scanning by social-network centrality analysis. InIEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019
2019
-
[80]
Homdroid: detecting android covert malware by social-network homophily analysis
Yueming Wu, Deqing Zou, Wei Yang, Xiang Li, and Hai Jin. Homdroid: detecting android covert malware by social-network homophily analysis. InProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2021. 14
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.