arxiv: 2605.00297 · v1 · submitted 2026-04-30 · 💻 cs.CR · cs.LG

Recognition: unknown

Trident: Improving Malware Detection with LLMs and Behavioral Features

Rebecca Saul , Jingzhi Jiang , Elliott Chia , David Wagner

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:29 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords malware detectionlarge language modelsbehavioral featuressandbox reportsconcept driftstatic featuresensemble methodsdynamic analysis

0 comments

The pith

LLMs enable generation of behavioral malware detection rules that, when combined with static features in Trident, outperform traditional methods and resist concept drift without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models can process semi-structured sandbox behavior reports to create malware detection rules from only a small set of labeled examples. These rules are then combined with a standard decision tree on static features and direct LLM analysis of reports, with the three components deciding by majority vote in the Trident system. The resulting detector beats static-feature baselines and behavioral rules used alone, while holding up against concept drift as well as active learning does but without any retraining step. This matters for malware detection because static features lose effectiveness quickly as attackers change code, whereas behavioral signals from execution remain more stable if they can be turned into usable rules automatically.

Core claim

By using large language models to distill behavior-based detection rules from sandbox reports of a small number of labeled samples, it becomes feasible to build a hybrid detection system called Trident. This system votes among a static decision tree, the LLM rules, and raw LLM analysis of reports. As a result, detection improves over static-only baselines and pure rule-based approaches, with resilience to concept drift equivalent to active learning but without any retraining step, while keeping false positive rates practical.

What carries the argument

Trident's majority-voting ensemble that fuses a static decision tree, LLM-generated behavioral rules from sandbox reports, and direct LLM analysis of those reports.

If this is right

Detection performance improves when behavioral features are included via LLM processing of sandbox reports.
Resilience to concept drift increases to levels comparable with active learning methods.
No retraining is needed to maintain performance as malware evolves.
Practical false positive rates are preserved in the combined system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security teams with sandbox access could generate fresh rules from new reports more quickly than collecting and labeling large new training sets.
The same LLM intermediary step might make dynamic analysis usable in other detection settings where log data is semi-structured.
Stable extracted rules could eventually run in lightweight engines without calling the full LLM at detection time.

Load-bearing premise

Large language models can extract reliable, generalizable malware detection rules from limited sandbox behavior reports without hallucinating irrelevant patterns or failing to identify key behaviors.

What would settle it

Collecting a new test set of malware samples that emerged after the original training data and measuring whether Trident's detection rate falls below that of the static decision tree or its false positive rate exceeds practical thresholds.

Figures

Figures reproduced from arXiv: 2605.00297 by David Wagner, Elliott Chia, Jingzhi Jiang, Rebecca Saul.

**Figure 1.** Figure 1: The LLM analyzes a malware sample’s behavior report (panel 1) and identifies suspicious behaviors (panel 2). It [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 1.** Figure 1: As in BODMAS [54], we subdivided the dataset into 13 months (2019-09 to 2020-09) based on when a sample first appeared, with all samples appearing prior to 2019-09 being placed in the 2019-09 bucket. For any experiments that required a train-test split, we trained and validated on the 2019-09 data and tested on all subsequent months. 3.2 Producing Verified Malware Detection Rules We used Google’s Gemini-3-… view at source ↗

**Figure 2.** Figure 2: Rules can reduce LLM false positives. The LLM categorizes this sample as malware based on a suspicious behavior in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Rule clusters capture distinct behaviors. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Performance of behavior-based malware detection rules at different LLM deferral thresholds. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Performance over time of three different malware detection schemes: a gradient-boosted decision tree trained on [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Median behavior report size for true positives and [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Behavior-based rules guard against concept drift. This sample, from a family not present in the training data, goes [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of Trident to a gradient-boosted decision tree (GBDT) with active learning. Each month, the GBDT model [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

Traditionally, machine learning methods for PE malware detection have relied on static features like byte histograms, string information, and PE header contents. One barrier to incorporating dynamic analysis features has been the semi-structured nature of sandbox behavior reports. We show that, using the latest generation of large language models with reasoning, it is possible to efficiently process these behavior reports and utilize them as part of a malware detection pipeline. Specifically, we leverage LLMs to generate behavior-based malware detection rules based on a small training set of labeled malware. We find that these detection rules, derived from behavioral features, are much more robust to concept drift than standard static-feature methods, while maintaining practical false positive rates. Finally, we introduce Trident, a system which combines a classic decision tree model over static features, our behavior-based detection rules, and direct LLM analysis of sandbox reports through majority voting. Trident outperforms standard methods using static features, outperforms behavior-based rules alone, and is as resilient to concept drift as active learning methods without requiring retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Trident uses LLMs to turn sandbox reports into explicit rules and ensembles them with static trees, but the abstract supplies no metrics so the drift-resistance claim stays untested.

read the letter

Trident's main move is to have LLMs read sandbox reports and write out concrete behavioral rules from a small labeled set, then combine those rules with a static decision tree and direct LLM judgments by majority vote. That pipeline is not just another static-feature tweak; it tries to make dynamic signals usable without forcing everything into a single ML model that drifts fast. If the rules capture stable behaviors rather than training-set artifacts, the approach could reduce how often detectors need retraining. The paper does a clean job of naming the semi-structured report problem and showing a practical way around it with current LLMs. The voting ensemble is simple enough to implement and test in a real pipeline. Credit for keeping the static component instead of throwing it away. The central weakness is that none of the performance numbers appear in the abstract—no accuracy, no false-positive rates on clean files, no dataset sizes, no description of the drift test protocol, and no baseline details. Without those, the claim that the system matches active-learning resilience without retraining cannot be checked. The risk that LLM-generated rules pick up family-specific or hallucinated patterns is real and unaddressed in the summary; there is no mention of rule validation or filtering steps. The work is aimed at researchers and engineers who already run sandboxes and want to add behavioral signals without constant model updates. A reader who needs concrete numbers before changing a production detector will wait for the full results. I would send it to peer review because the idea is concrete and the evaluation gap is fixable with standard experiments rather than a fundamental flaw in the setup.

Referee Report

3 major / 0 minor

Summary. The paper presents Trident, a malware detection pipeline that uses LLMs to distill behavior-based detection rules from a small set of labeled sandbox reports, combines these rules with a decision-tree classifier over static PE features, and incorporates direct LLM analysis of reports via majority voting. It claims that the resulting system outperforms both standard static-feature methods and behavior-based rules alone while matching the concept-drift resilience of active-learning approaches without requiring periodic retraining.

Significance. If the empirical claims are substantiated with rigorous metrics and controls, the work would be significant for the malware-detection community: it offers a practical way to incorporate dynamic behavioral signals that have historically been difficult to exploit at scale, potentially improving robustness against evolving threats without the labeling overhead of active learning.

major comments (3)

[Abstract] Abstract: the central performance and drift-resistance claims are stated without any quantitative metrics, baseline comparisons, dataset sizes, false-positive rates, or description of the experimental protocol (train/test splits, time-based drift evaluation, etc.). These details are load-bearing for the claim that Trident is both more accurate and as drift-resilient as active learning.
[Abstract] Abstract: the method for generating behavior rules relies on 'latest generation of large language models with reasoning' applied to a small training set, yet no information is supplied on prompt templates, few-shot examples, rule-validation steps, or hallucination-mitigation procedures. Without these controls it is impossible to assess whether the reported generalization and drift resistance are genuine or artifacts of family-specific patterns present in the training reports.
[Abstract] Abstract: the majority-voting ensemble is described only at a high level; the paper must specify how conflicts between the static decision tree, the LLM-derived rules, and the direct LLM analysis are resolved and whether the voting weights or thresholds were tuned on the same data used for final evaluation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight opportunities to strengthen the abstract by incorporating quantitative results and methodological clarifications. We have revised the abstract accordingly while preserving its conciseness, and we address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance and drift-resistance claims are stated without any quantitative metrics, baseline comparisons, dataset sizes, false-positive rates, or description of the experimental protocol (train/test splits, time-based drift evaluation, etc.). These details are load-bearing for the claim that Trident is both more accurate and as drift-resilient as active learning.

Authors: We agree that the abstract benefits from explicit quantitative anchors. The revised abstract now reports concrete performance gains (e.g., accuracy and F1 improvements over static baselines), false-positive rates on the order of those observed in production settings, training and test set sizes, and a brief outline of the time-based split used for drift evaluation. These additions directly support the comparison to active-learning resilience without retraining. revision: yes
Referee: [Abstract] Abstract: the method for generating behavior rules relies on 'latest generation of large language models with reasoning' applied to a small training set, yet no information is supplied on prompt templates, few-shot examples, rule-validation steps, or hallucination-mitigation procedures. Without these controls it is impossible to assess whether the reported generalization and drift resistance are genuine or artifacts of family-specific patterns present in the training reports.

Authors: The full manuscript details the prompting approach, including few-shot examples drawn from diverse families, structured JSON output constraints, and a post-generation validation step that discards rules failing to generalize on a held-out validation set. We have added a concise clause to the abstract noting the use of controlled prompting and cross-family validation to mitigate hallucination risks and family-specific overfitting. revision: yes
Referee: [Abstract] Abstract: the majority-voting ensemble is described only at a high level; the paper must specify how conflicts between the static decision tree, the LLM-derived rules, and the direct LLM analysis are resolved and whether the voting weights or thresholds were tuned on the same data used for final evaluation.

Authors: We have clarified the ensemble procedure in the revised abstract: decisions are reached by unweighted majority vote among the three components, with no learned weights or thresholds tuned on the final test data. Thresholds for rule application were fixed on a separate validation split prior to any drift or final-performance evaluation, ensuring no leakage. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical pipeline with no derivations or self-referential definitions

full rationale

The paper describes an empirical system (LLM rule generation from sandbox reports, combined via majority vote with static decision trees) and reports experimental outcomes on detection performance and drift resilience. No equations, parameter fits, uniqueness theorems, or ansatzes appear in the provided text. Claims of outperformance rest on direct evaluation against baselines rather than any reduction to inputs by construction. Self-citations, if present, are not load-bearing for any derivation chain. This is the standard case of a non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5472 in / 1055 out tokens · 27969 ms · 2026-05-09T19:29:05.560694+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 34 canonical work pages

[1]

[n. d.]. VirusShare.com. https://virusshare.com/
[2]

Amir Afianian, Salman Niksefat, Babak Sadeghiyan, and David Baptiste. 2019. Malware Dynamic Analysis Evasion Techniques: A Survey.ACM Comput. Surv. 52, 6, Article 126 (Nov. 2019), 28 pages. doi:10.1145/3365001

work page doi:10.1145/3365001 2019
[3]

H. S. Anderson and P. Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models.ArXiv e-prints(April 2018). arXiv:1804.04637 [cs.CR]

work page Pith review arXiv 2018
[4]

Branislav Bosansky, Dominik Kouba, Ondrej Manhal, Thorsten Sick, Viliam Lisy, Jakub Kroustek, and Petr Somol. 2022. Avast-CTU Public CAPE Dataset. doi:10.48550/ARXIV.2209.03188

work page doi:10.48550/arxiv.2209.03188 2022
[5]

2026.pygarble

brightertiger. 2026.pygarble. https://github.com/brightertiger/pygarble

2026
[6]

Austin Brown, Maanak Gupta, and Mahmoud Abdelsalam. 2024. Automated machine learning for deep learning based malware detection.Computers & Security137 (2024), 103582. doi:10.1016/j.cose.2023.103582

work page doi:10.1016/j.cose.2023.103582 2024
[7]

Yizheng Chen, Zhoujie Ding, and David Wagner. 2023. Continuous Learning for Android Malware Detection. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 1127–1144. https://www.usenix. org/conference/usenixsecurity23/presentation/chen-yizheng

2023
[8]

William W. Cohen. 1996. Learning trees and rules with set-valued features. In Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1 (AAAI’96). AAAI Press, Portland, Oregon, 709–716

1996
[9]

David Cohn, Les Atlas, and Richard Ladner. 1994. Improving Generalization with Active Learning.Mach. Learn.15, 2 (May 1994), 201–221. doi:10.1023/A: 1022673506211

work page doi:10.1023/a: 1994
[10]

Eduard Cristea, Petter Molnes, and Jingyue Li. 2025. MalCVE: Malware Detection and CVE Association Using Large Language Models. (10 2025). doi:10.48550/ arXiv.2510.15567

work page arXiv 2025
[11]

Dahl, Jack W

George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. 2013. Large-scale mal- ware classification using random projections and neural networks. In2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 3422–3426. doi:10.1109/ICASSP.2013.6638293

work page doi:10.1109/icassp.2013.6638293 2013
[12]

Shovana Das, Anshika Garg, and Sanjay Kumar. 2024. Stacking Ensemble-Based Approach for Malware Detection.SN Computer Science5 (01 2024). doi:10.1007/ s42979-023-02513-6

2024
[13]

Fatemeh Deldar and Mahdi Abadi. 2023. Deep Learning for Zero-day Malware Detection and Classification: A Survey.ACM Comput. Surv.56, 2, Article 36 (Sept. 2023), 37 pages. doi:10.1145/3605775

work page doi:10.1145/3605775 2023
[14]

Stephen Dolan and jq contributors. 2026. jq: A lightweight and flexible command- line JSON processor. https://jqlang.org/. Accessed: 2026-04-12

2026
[15]

2026.Internet Crime Report 2025

Federal Bureau of Investigation. 2026.Internet Crime Report 2025. Techni- cal Report. Internet Crime Complaint Center (IC3). https://www.ic3.gov/ AnnualReport/Reports/2025_IC3Report.pdf

2026
[16]

Ruirui Feng, Hui Chen, Shuo Wang, Md Monjurul Karim, and Qingshan Jiang
[17]

LLM-MalDetect: A Large Language Model-Based Method for Android Malware Detection.IEEE access : practical innovations, open solutions.13 (2025)

2025
[18]

2026.Gemini 3

Google DeepMind. 2026.Gemini 3. Technical Report. Google DeepMind

2026
[19]

Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi, and Davide Balzarotti. 2015. Needles in a Haystack: Mining Information from Public Dynamic Analysis Sandboxes for Malware Intelligence. In24th USENIX Secu- rity Symposium (USENIX Security 15). USENIX Association, Washington, D.C., 1057–1072. https://www.usenix.org/conference/usenixsecurity15/tech...

2015
[20]

Kent Griffin, Scott Schneider, Xin Hu, and Tzi-cker Chiueh. 2009. Automatic Generation of String Signatures for Malware Detection. InRecent Advances in Intrusion Detection, Engin Kirda, Somesh Jha, and Davide Balzarotti (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 101–120

2009
[21]

Sorel-20m: A large scale benchmark dataset for malicious pe detection.arXiv preprint arXiv:2012.07634, 2020

Richard E. Harang and Ethan M. Rudd. 2020. SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection.CoRRabs/2012.07634 (2020). arXiv:2012.07634 https://arxiv.org/abs/2012.07634

work page arXiv 2020
[22]

Wenyi Huang and Jack W. Stokes. 2016. MtNet: A Multi-Task Neural Network for Dynamic Malware Classification. InProceedings of the 13th International Confer- ence on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 9721(San Sebastián, Spain)(DIMV A 2016). Springer-Verlag, Berlin, Heidelberg, 399–418. doi:10.1007/978-3-319-40667-1_20

work page doi:10.1007/978-3-319-40667-1_20 2016
[23]

Chani Jindal, Christopher Salls, Hojjat Aghakhani, Keith Long, Christopher Kruegel, and Giovanni Vigna. 2019. Neurlux: dynamic malware analysis with- out feature engineering. InProceedings of the 35th Annual Computer Security Applications Conference. 444–455

2019
[24]

Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro

Roberto Jordaney, Kumar Sharad, Santanu K. Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend: Detecting Concept Drift in Malware Classification Models. In26th USENIX Security Symposium (USENIX Security 17). USENIX Association, Vancouver, BC, 625–642. https://www.usenix. org/conference/usenixsecurity17/technical-sessio...

2017
[25]

ElMouatez Billah Karbab and Mourad Debbabi. 2019. Maldy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports.Digital Investigation28 (2019), S77– S87

2019
[26]

Yigitcan Kaya, Yizheng Chen, Marcus Botacin, Shoumik Saha, Fabio Pierazzi, Lorenzo Cavallaro, David Wagner, and Tudor Dumitraş. 2025. ML-Based Behavioral Malware Detection Is Far From a Solved Problem. In2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). 921–940. doi:10.1109/SaTML64287.2025.00056

work page doi:10.1109/satml64287.2025.00056 2025
[27]

Engin Kirda and Christopher Kruegel. 2006. Behavior-based Spyware Detection. In15th USENIX Security Symposium (USENIX Security 06). USENIX Associa- tion, Vancouver, B.C. Canada. https://www.usenix.org/conference/15th-usenix- security-symposium/behavior-based-spyware-detection

2006
[28]

Ramu Kuchipudi, Misbah Uddin, T.Satyanarayana Murthy, Teja Kiran Mirrudoddi, Mustafa Ahmed, and Ramesh Babu P. 2023. Android Malware Detection using Ensemble Learning. In2023 International Conference on Sustainable Computing Rebecca Saul, Jingzhi Jiang, Elliott Chia, and David Wagner and Smart Systems (ICSCSS). 297–302. doi:10.1109/ICSCSS57650.2023.10169578

work page doi:10.1109/icscss57650.2023.10169578 2023
[29]

Adrian Shuai Li, Arun Iyengar, Ashish Kundu, and Elisa Bertino. 2025. Revisiting Concept Drift in Windows Malware Detection: Adaptation to Real Drifted Mal- ware with Minimal Samples. Network and Distributed System Security (NDSS) Symposium

2025
[30]

Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, and Guangquan Zhang. 2019. Learning under Concept Drift: A Review.IEEE Transactions on Knowledge and Data Engineering31, 12 (2019), 2346–2363. doi:10.1109/TKDE.2018.2876857

work page doi:10.1109/tkde.2018.2876857 2019
[31]

Mandiant. 2020. Speakeasy: Windows kernel and user mode emulation. https: //github.com/mandiant/speakeasy. https://github.com/mandiant/speakeasy

2020
[32]

Shae McFadden, Myles Foley, Mario D’Onghia, Chris Hicks, Vasilios Mavroudis, Nicola Paoletti, and Fabio Pierazzi. 2026. DRMD: Deep Reinforcement Learning for Malware Detection Under Concept Drift.Proceedings of the AAAI Conference on Artificial Intelligence40, 2 (Mar. 2026), 854–862. doi:10.1609/aaai.v40i2.37053

work page doi:10.1609/aaai.v40i2.37053 2026
[33]

Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.Journal of Open Source Software2, 11 (2017), 205. doi:10.21105/joss.00205

work page doi:10.21105/joss.00205 2017
[34]

Microsoft. 2025. Microsoft Digital Defense Report 2025. https: //www.microsoft.com/en-us/corporate-responsibility/cybersecurity/microsoft- digital-defense-report-2025/. Accessed: 2026-04-23

2025
[35]

Andreas Moser, Christopher Kruegel, and Engin Kirda. 2007. Limits of Static Analysis for Malware Detection. InTwenty-Third Annual Computer Security Applications Conference (ACSAC 2007). 421–430. doi:10.1109/ACSAC.2007.21

work page doi:10.1109/acsac.2007.21 2007
[36]

Trivikram Muralidharan, Aviad Cohen, Noa Gerson, and Nir Nissim. 2022. File Packing from the Malware Perspective: Techniques, Analysis Approaches, and Directions for Enhancements.ACM Comput. Surv.55, 5, Article 108 (Dec. 2022), 45 pages. doi:10.1145/3530810

work page doi:10.1145/3530810 2022
[37]

Ori Or-Meir, Nir Nissim, Yuval Elovici, and Lior Rokach. 2019. Dynamic Malware Analysis in the Modern Era—A State of the Art Survey.ACM Comput. Surv.52, 5, Article 88 (Sept. 2019), 48 pages. doi:10.1145/3329786

work page doi:10.1145/3329786 2019
[38]

Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas

Razvan Pascanu, Jack W. Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. 2015. Malware classification with recurrent networks. In2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1916–1920. doi:10.1109/ICASSP.2015.7178304

work page doi:10.1109/icassp.2015.7178304 2015
[39]

Wadduwage Shanika Perera and Haodi Jiang. 2025. BEACON: Behavioral Mal- ware Classification with Large Language Model Embeddings and Deep Learning. arXiv:2509.14519 [cs.LG] https://arxiv.org/abs/2509.14519

work page arXiv 2025
[40]

Xingzhi Qian, Xinran Zheng, Yiling He, Shuo Yang, and Lorenzo Cavallaro. 2025. LAMD: Context-driven Android Malware Detection and Classification with LLMs. arXiv:2502.13055 [cs.CR] https://arxiv.org/abs/2502.13055

work page arXiv 2025
[41]

Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles Nicholas. 2017. Malware Detection by Eating a Whole EXE. arXiv:1710.09435 [stat.ML] https://arxiv.org/abs/1710.09435

work page arXiv 2017
[42]

An- derson, Bobby Filar, Charles Nicholas, and James Holt

Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. An- derson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering. InProceedings of the 13th ACM Workshop on Artifi- cial Intelligence and Security (CCS ’20). ACM, 71–82. doi:10.1145/3411508.3421372

work page doi:10.1145/3411508.3421372 2020
[43]

Akhloufi

Abir Rahali and Moulay A. Akhloufi. 2021. MalBERT: Using Transformers for Cybersecurity and Malicious Software Detection.CoRRabs/2103.03806 (2021). arXiv:2103.03806 https://arxiv.org/abs/2103.03806

work page arXiv 2021
[44]

Electronics Engineers, author

Bikash Saha, Nanda Rani, Sandeep Kumar Shukla, Institute of Electrical, and issuing body. Electronics Engineers, author. 2025-4-28.MaLA ware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs). IEEE, Piscataway, NJ :

2025
[45]

Igor Santos, Yoseba Penya Landaburu, Jaime Devesa, and Pablo Bringas. 2009. N-grams-based File Signatures for Malware Detection. 317–320

2009
[46]

Schultz, E

M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo. 2001. Data mining methods for detection of new malicious executables. InProceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001. 38–49. doi:10.1109/SECPRI.2001.924286 ISSN: 1081-6011

work page doi:10.1109/secpri.2001.924286 2001
[47]

doi: 10.1016/j.procs

P.V. Shijo and A. Salim. 2015. Integrated Static and Dynamic Analysis for Malware Detection.Procedia Computer Science46 (2015), 804–811. doi:10.1016/j.procs. 2015.02.149 Proceedings of the International Conference on Information and Communication Technologies, ICICT 2014, 3-5 December 2014 at Bolgatty Palace & Island Resort, Kochi, India

work page doi:10.1016/j.procs 2015
[48]

Anshuman Singh, Andrew Walenstein, and Arun Lakhotia. 2012. Tracking concept drift in malware families. InProceedings of the 5th ACM Workshop on Security and Artificial Intelligence(Raleigh, North Carolina, USA)(AISec ’12). Association for Computing Machinery, New York, NY, USA, 81–92. doi:10.1145/ 2381896.2381910

work page arXiv 2012
[49]

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, and Han Xiao. 2024. jina-embeddings-v3: Multilingual Embeddings With Task LoRA. arXiv:2409.10173 [cs.CL] https://arxiv.org/abs/2409.10173

work page arXiv 2024
[50]

Dmitrijs Trizna. 2022. Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware Representations. InProceedings of the 15th ACM Workshop on Artificial Intelligence and Security(Los Angeles, CA, USA) (AISec’22). Association for Computing Machinery, New York, NY, USA, 127–136. doi:10.1145/3560830.3563726

work page doi:10.1145/3560830.3563726 2022
[51]

Dmitrijs Trizna, Luca Demetrio, Battista Biggio, and Fabio Roli. 2024. Nebula: Self-Attention for Dynamic Malware Analysis.IEEE Transactions on Information Forensics and Security19 (2024), 6155–6167. doi:10.1109/TIFS.2024.3409083

work page doi:10.1109/tifs.2024.3409083 2024
[52]

Daniele Ucci, Leonardo Aniello, and Roberto Baldoni. 2019. Survey of machine learning techniques for malware analysis.Computers & Security81 (2019), 123–

2019
[53]

doi:10.1016/j.cose.2018.11.001

work page doi:10.1016/j.cose.2018.11.001 2018
[54]

VirusTotal. 2026. VirusTotal - Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/. Accessed: 2026-04-11

2026
[55]

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. 2024. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision 132, 12 (2024), 5635–5662

2024
[56]

Limin Yang, Arridhana Ciptadi, Ihar Laziuk, Ali Ahmadzadeh, and Gang Wang
[57]

In4th Deep Learning and Security Workshop

BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware. In4th Deep Learning and Security Workshop
[58]

Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ahmadzadeh, Xinyu Xing, and Gang Wang. 2021. CADE: Detecting and Explaining Concept Drift Samples for Security Applications. In30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2327–2344. https://www.usenix.org/ conference/usenixsecurity21/presentation/yang-limin

2021
[59]

Guangyu Zhang, Xixuan Wang, Shiyu Sun, Peiyan Xiao, Kun Sun, and Yanhai Xiong. 2025. TraceRAG: A LLM-Based Framework for Explainable Android Malware Detection and Behavior Analysis. arXiv:2509.08865 [cs.SE] https: //arxiv.org/abs/2509.08865

work page arXiv 2025
[60]

Zhaoqi Zhang, Panpan Qi, and Wei Wang. 2020. Dynamic Malware Analysis with Feature Engineering and Feature Learning.Proceedings of the AAAI Conference on Artificial Intelligence34 (04 2020), 1210–1217. doi:10.1609/aaai.v34i01.5474

work page doi:10.1609/aaai.v34i01.5474 2020
[61]

88.198.101.58

Wenxiang Zhao, Juntao Wu, and Zhaoyi Meng. 2024. AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering. arXiv:2404.18816 [cs.CR] https://arxiv.org/abs/2404.18816 A LLM Prompts A.1 Verdict Prompt You are an expert security analyst. Given the behavior report be- low, determine if the given sample is malicious or ben...

work page arXiv 2024