Malware Detection by Eating a Whole EXE

Edward Raff , Jon Barker , Jared Sylvester , Robert Brandon , Bryan Catanzaro , Charles Nicholas

Authors on Pith no claims yet

classification 📊 stat.ML cs.CRcs.LG

keywords problembuildingdetectionworkchallengeslearningmalwarenetwork

read the original abstract

In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community. Building a neural network for such a problem presents a number of interesting challenges that have not occurred in tasks such as image processing or NLP. In particular, we note that detection from raw bytes presents a sequence problem with over two million time steps and a problem where batch normalization appear to hinder the learning process. We present our initial work in building a solution to tackle this problem, which has linear complexity dependence on the sequence length, and allows for interpretable sub-regions of the binary to be identified. In doing so we will discuss the many challenges in building a neural network to process data at this scale, and the methods we used to work around them.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Quantifiable Uncertainty: A Stochastic Consensus Multi-Agent RAG Framework for Robust Malware Detection
cs.CR 2026-05 unverdicted novelty 7.0

MAGMA combines RAG with a stochastic consistency ensemble over dual code embeddings to derive Function Evidence Strength and Evidence Conflict Score metrics, enabling reject-option decisions and achieving 98.4% malwar...
Trident: Improving Malware Detection with LLMs and Behavioral Features
cs.CR 2026-04 unverdicted novelty 6.0

Trident combines static decision trees, LLM-generated behavioral rules from sandbox reports, and direct LLM analysis via majority voting to outperform static methods while resisting concept drift without retraining.
Adversarial Malware Generation in Linux ELF Binaries via Semantic-Preserving Transformations
cs.CR 2026-04 unverdicted novelty 6.0

A new adversarial generator for Linux ELF malware achieves 67.74% evasion against MalConv by inserting benign-like strings, with the detector showing mean confidence drop of 0.50.
Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks
cs.CR 2026-04 unverdicted novelty 5.0

A randomized smoothing framework with feature ablation and Wilson score intervals provides formal certificates guaranteeing malware classifier robustness within a perturbation radius.
Explainability-Guided Adversarial Attacks on Transformer-Based Malware Detectors Using Control Flow Graphs
cs.CR 2026-04 unverdicted novelty 5.0

An explainability-guided attack perturbs key function calls in linearized CFGs to evade transformer malware detectors on Windows PE datasets.