MalGEN: A Testbed for Modeling and Evaluating Malware Behaviors

Bikash Saha , Sandeep Kumar Shukla

Authors on Pith no claims yet

classification 💻 cs.CR

keywords attackbehaviorsdetectionmalgenartifactsdefensesevaluateexecutable

read the original abstract

Modern cybersecurity requires systematic ways to evaluate how detection systems respond to evolving and previously unseen attack behaviors. Existing malware repositories largely capture known patterns and provide limited support for stress-testing defenses against novel threats. To address this, we present MalGEN, a modular testbed that models adversarial workflows and generates executable artifacts in a controlled environment. The framework decomposes high-level attack objectives into structured stages, enabling the synthesis of diverse and multi-stage behaviors. We evaluate MalGEN across 1,920 benchmark settings covering multiple platforms and behavioral objectives, resulting in 977 executable samples. Analysis shows that the generated artifacts exhibit a wide range of malicious techniques and multi-stage attack patterns. However, 45.71% of these samples remain undetected by existing detection engines, which reveals notable gaps in current defenses. These findings provide practical insights into the limitations of widely used detection approaches and support the development of more robust security evaluation and testing practices.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code
cs.CR 2026-05 unverdicted novelty 6.0

A commercial LLM can cheaply produce large numbers of structurally diverse yet behaviorally equivalent malware payloads using functional prompts or history-augmented prompts.
The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code
cs.CR 2026-05 unverdicted novelty 6.0

A single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse malware payloads.