Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns

Constantinos Patsakis; Fran Casino; Nikolaos Lykousas

arxiv: 2404.19715 · v1 · pith:6VYALQRFnew · submitted 2024-04-30 · 💻 cs.CR

Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns

Constantinos Patsakis , Fran Casino , Nikolaos Lykousas This is my paper

classification 💻 cs.CR

keywords llmsmalwarecapabilitiescodedeobfuscationpotentialmaliciouspipelines

0 comments

read the original abstract

The integration of large language models (LLMs) into various pipelines is increasingly widespread, effectively automating many manual tasks and often surpassing human capabilities. Cybersecurity researchers and practitioners have recognised this potential. Thus, they are actively exploring its applications, given the vast volume of heterogeneous data that requires processing to identify anomalies, potential bypasses, attacks, and fraudulent incidents. On top of this, LLMs' advanced capabilities in generating functional code, comprehending code context, and summarising its operations can also be leveraged for reverse engineering and malware deobfuscation. To this end, we delve into the deobfuscation capabilities of state-of-the-art LLMs. Beyond merely discussing a hypothetical scenario, we evaluate four LLMs with real-world malicious scripts used in the notorious Emotet malware campaign. Our results indicate that while not absolutely accurate yet, some LLMs can efficiently deobfuscate such payloads. Thus, fine-tuning LLMs for this task can be a viable potential for future AI-powered threat intelligence pipelines in the fight against obfuscated malware.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption
cs.CR 2026-05 unverdicted novelty 6.0

LLMs recover IoCs from lightweight-obfuscated JavaScript but performance collapses under encryption in a new benchmark of 336 programs across 12 concealment levels.