Cybersecurity is the True Frontier for Generative AI Success or Failure

David J. Elkind; Edward Raff; Maor Ashkenazi; Sagar Samtani; Sven Krasser

arxiv: 2606.28929 · v1 · pith:4OBOREZWnew · submitted 2026-06-27 · 💻 cs.CR · cs.LG

Cybersecurity is the True Frontier for Generative AI Success or Failure

Edward Raff , Maor Ashkenazi , Sagar Samtani , David J. Elkind , Sven Krasser This is my paper

Pith reviewed 2026-06-30 09:43 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords cybersecuritygenerative AImachine learningtest casesadversarial settingsAI benchmarksLLM agents

0 comments

The pith

Cybersecurity is a better test-case for general AI progress than natural language or computer vision due to its greater complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that cybersecurity workflows involve orchestrating hundreds of tools, processing data at enormous scales such as billions of tokens per sample, managing high labeling costs from adversaries, dealing with expert disagreements on ground truth, and requiring fast low-latency decisions with explanations in a changing environment. These elements together create more complexity than appears in natural language processing or computer vision. A sympathetic reader would care because this would position cybersecurity as a stronger indicator of movement toward general AI capabilities.

Core claim

Cybersecurity is a real-life test-bed for many machine learning problems at once, especially when considering modern strides in using Large Language Models to automate processes as agents. Workflows require orchestrating hundreds of standard and bespoke tools through various formats, with a single malware sample viewed as a sequence of billions of tokens. The cost of labeling is enormous due to adversaries attempting to subvert methods, experts may disagree on labels, models must run quickly on billions of items a day with low latency critical, and explainability is required for analysts facing false positives in a continuously changing environment. The amount of complexity in cybersecurity

What carries the argument

The direct comparison showing cybersecurity's combined requirements for tool orchestration, data scale, adversarial labeling, latency, explainability, and environmental change exceed those of natural language and computer vision.

If this is right

AI systems succeeding in cybersecurity would need to handle simultaneous demands of scale, adversaries, and explainability.
Measuring progress toward general AI through cybersecurity tasks would capture handling of ambiguous labels and dynamic conditions at once.
Deployed models would require low-latency operation on billions of items daily while providing reasoning for decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the claim holds, current AI advances in language or vision may translate less reliably to high-stakes adversarial settings.
Generative AI agents would face particular tests in orchestrating multiple tools under real constraints not present in simpler tasks.
One could develop cross-domain metrics to check whether performance in cybersecurity predicts broader capabilities.

Load-bearing premise

That having more overlapping difficult requirements automatically makes cybersecurity a superior measure of general AI ability.

What would settle it

Demonstration that AI systems achieve strong results on cybersecurity tasks without corresponding gains in generality on standard benchmarks from other domains, or fail in cybersecurity despite broad success elsewhere.

read the original abstract

Cybersecurity is a real-life test-bed for many machine learning problems at once, especially when considering modern strides in using Large Language Models (LLMs) to automate processes as ``agents.'' Cybersecurity workflows require orchestrating hundreds of standard and bespoke tools through various formats. The scale of cybersecurity data is enormous; for example, a single malware sample can be viewed as a sequence of billions of tokens. The cost of labeling any file by experts is enormous and labor-intensive, in part because an adversary (possibly a well-funded nation state actor) is attempting to subvert your detection methods. Even skilled experts may disagree on the correct label, creating ambiguity in what constitutes ground truth. When deployed, models must run quickly on billions of items a day, where low-latency is critical for operational success, in a continuously changing environment. In addition, explainability is not optional: analysts demand clear reasoning for model decisions to cope with the large number of false-positive alerts they face daily, and to quickly develop remediation and understand how something went wrong. In short, the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress than other, well-studied fields.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This position paper lists real cybersecurity challenges for LLM agents but asserts without evidence that they make the domain the best test for general AI progress.

read the letter

The main takeaway is that cybersecurity's complexities are presented as making it the superior test-bed for general AI progress over NLP or CV, but the paper supplies no metrics, definitions, or comparisons to support that leap.

It does a solid job cataloging practical difficulties: orchestrating many tools in varied formats, handling billion-token sequences like malware, costly and ambiguous expert labeling amid adversaries, strict low-latency needs on massive daily volumes, and required explainability for analysts. These descriptions match known domain realities.

Nothing here counts as new. Claims that complex real-world settings reveal AI capabilities better than controlled benchmarks have circulated in prior robustness and deployment literature; this is a focused restatement for generative agents.

The soft spot is the central assumption. The text lists complexities but never defines general AI progress, measures complexity across fields, or shows why these traits would better indicate or advance generality. The inference is stated rather than derived or evidenced.

This is for readers already working on AI in security who want a concise reminder of deployment hurdles. It might spark informal discussion but offers little for anyone seeking new results or rigorous evaluation.

I would not send it for peer review. The argument stays too thin on substance for a position paper to justify referee effort.

Referee Report

1 major / 1 minor

Summary. The manuscript claims that cybersecurity workflows involve greater complexity than NLP or CV—including orchestrating hundreds of tools, billion-token sequences, adversarial labeling with expert disagreement, low-latency requirements on billions of items daily, and mandatory explainability—and therefore posits that cybersecurity is the superior test-case for general AI progress.

Significance. If the unargued premise that higher listed complexity necessarily indicates a better test-bed for generality were established with metrics or comparative analysis, the claim could redirect AI research priorities toward cybersecurity as a forcing function for more capable systems. As written, the manuscript supplies only a qualitative list without definitions, data, or derivations.

major comments (1)

[Abstract] Abstract (final sentence): the inference that 'the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress' rests on the axiom that greater domain complexity correlates with superior suitability as a test for generality; no definition of 'general AI progress,' no quantitative complexity measures across domains, and no argument linking the listed properties to generality are provided.

minor comments (1)

[Abstract] Abstract: the phrasing 'the amount of complexity cybersecurity is greater' contains a grammatical omission that reduces readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the review. We address the major comment on the abstract below, noting that the manuscript is a position paper.

read point-by-point responses

Referee: [Abstract] Abstract (final sentence): the inference that 'the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress' rests on the axiom that greater domain complexity correlates with superior suitability as a test for generality; no definition of 'general AI progress,' no quantitative complexity measures across domains, and no argument linking the listed properties to generality are provided.

Authors: The manuscript is a position paper that qualitatively contrasts operational demands rather than claiming empirical proof. The listed properties are linked to generality because they simultaneously require multi-tool orchestration, long-context reasoning over billion-token inputs, robustness to adversarial label subversion and expert disagreement, sub-second inference at massive scale, and mandatory human-interpretable outputs—capabilities that current NLP and CV benchmarks test in isolation rather than in combination under live adversarial conditions. 'General AI progress' is used to mean advancement toward systems that integrate these capabilities without domain-specific retraining. No quantitative cross-domain metrics are supplied because the paper's purpose is to highlight these distinctions to redirect attention, not to derive a formal complexity ranking. We do not intend revisions, as adding such analysis would change the paper's scope. revision: no

Circularity Check

0 steps flagged

No circularity: central claim is direct assertion without equations, fits, or self-citation reductions

full rationale

The paper asserts that cybersecurity's enumerated properties (tool orchestration, billion-token sequences, adversarial labeling, low latency, explainability) exceed those of NLP/CV and therefore make it the superior test-bed for general AI progress. This inference is presented as a posit without any derivation chain, equations, parameter fitting, or citations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear. The text supplies no quantitative complexity metric or formal definition of 'general AI progress,' but the absence of any reductive step means the claim does not collapse to its inputs by construction. This is a standard non-circular position paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on two unproven domain assumptions linking complexity to test suitability for general AI; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Cybersecurity involves greater complexity than natural language processing and computer vision
This premise is stated directly in the abstract as the basis for the main claim.
ad hoc to paper A domain with higher complexity is a better test-case for general AI progress
This links the listed challenges to the conclusion about AI testing without further justification.

pith-pipeline@v0.9.1-grok · 5763 in / 1168 out tokens · 41200 ms · 2026-06-30T09:43:41.323397+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

108 extracted references · 47 canonical work pages · 4 internal anchors

[1]

(2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen

International Information System Security Certification Consortium. (2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen. [Online]. Available: https://www.isc2.org/Insights/2024/09/Employers- Must-Act-Cybersecurity-Workforce-Growth- St...

2024
[2]

Sok: The impact of unlabelled data in cyberthreat detec- tion,

G. Apruzzese, P. Laskov, and A. Tastemirova, “Sok: The impact of unlabelled data in cyberthreat detec- tion,” in2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 2022, pp. 20–42

2022
[3]

Ma- chine learning (in) security: A stream of problems,

F. Ceschin, M. Botacin, A. Bifet, B. Pfahringer, L. S. Oliveira, H. M. Gomes, and A. Gr ´egio, “Ma- chine learning (in) security: A stream of problems,” Digital Threats: Research and Practice, vol. 5, no. 1, pp. 1–32, 2024

2024
[4]

Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,

N. Kshetri, “Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,”Telecom- munications Policy, p. 102976, 2025

2025
[5]

Frontier ai’s impact on the cybersecurity landscape,

W. Guo, Y . Potter, T. Shi, Z. Wang, A. Zhang, and D. Song, “Frontier ai’s impact on the cybersecurity landscape,” 2025. [Online]. Available: https://arxiv. org/abs/2504.05408

work page arXiv 2025
[6]

Ai and cybersecurity: a risk society perspective,

S.-N. Vulpe, R. Rughinis , , D. T , urcanu, and D. Rosner, “Ai and cybersecurity: a risk society perspective,”Frontiers in Computer Science, vol. 6, Oct. 2024. [Online]. Available: http://dx.doi.org/ 10.3389/fcomp.2024.1462250

work page doi:10.3389/fcomp.2024.1462250 2024
[7]

2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,

CrowdStrike, “2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,” CrowdStrike, Tech. Rep., 02 2026. [Online]. Available: https://www.crowdstrike.com/ en-us/global-threat-report/

2026
[8]

A scalable implementation of malware detection based on network connection behaviors,

L. Shi, J. Que, Z. Zhong, B. Meyer, P. Crenshaw, and Y . He, “A scalable implementation of malware detection based on network connection behaviors,” inProceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowl- edge Discovery (CyberC), Oct 2013, pp. 59–66

2013
[9]

Learning and classification of malware behavior,

K. Rieck, T. Holz, C. Willems, P. D ¨ussel, and P. Laskov, “Learning and classification of malware behavior,” inDetection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2008, pp. 108–125

2008
[10]

Large-scale malware classification using random projections and neural networks,

G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” inAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426

2013
[11]

Panorama: capturing system-wide infor- mation flow for malware detection and analysis,

H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda, “Panorama: capturing system-wide infor- mation flow for malware detection and analysis,” inProceedings of the 14th ACM conference on Computer and communications security. ACM, 2007, pp. 116–127

2007
[12]

Effective and efficient malware detection at the end host

C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X.-y. Zhou, and X. Wang, “Effective and efficient malware detection at the end host.” in USENIX security symposium, 2009, pp. 351–366

2009
[13]

Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,

R. Perdisci, A. Lanzi, and W. Lee, “Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,” inComputer Security Applications Confer- ence, 2008. ACSAC 2008. Annual. IEEE, 2008, pp. 301–310

2008
[14]

Learning to detect and classify malicious executables in the wild,

J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,”The Journal of Machine Learning Research, vol. 7, pp. 2721–2744, 2006

2006
[15]

MetaAware: Identi- fying metamorphic malware,

Q. Zhang and D. S. Reeves, “MetaAware: Identi- fying metamorphic malware,” inComputer Secu- rity Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual. IEEE, 2007, pp. 411–420

2007
[16]

Deep neural network based malware detection using two dimensional bi- nary program features,

J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional bi- nary program features,” in2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 2015, pp. 11–20

2015
[17]

Unveiling Zeus: Automated Classification of Malware Samples,

A. Mohaisen and O. Alrawi, “Unveiling Zeus: Automated Classification of Malware Samples,” in Proceedings of the 22Nd International Conference on World Wide Web. New York, NY , USA: ACM, 2013, pp. 829–832, series Title: WWW ’13 Companion. [Online]. Available: http://doi. acm.org/10.1145/2487788.2488056

work page doi:10.1145/2487788.2488056 2013
[18]

An Observational Investiga- tion of Reverse Engineers’ Processes,

D. V otipka, S. M. Rabin, K. Micinski, J. S. Foster, and M. M. Mazurek, “An Observational Investiga- tion of Reverse Engineers’ Processes,” inUSENIX Security Symposium, 2019

2019
[19]

Time is money: Considerations for measuring the radiological reading time,

R. Sexauer and C. Bestler, “Time is money: Considerations for measuring the radiological reading time,”Journal of Imaging, vol. 8, no. 8, p. 208, Jul. 2022. [Online]. Available: http://dx.doi.org/10.3390/jimaging8080208

work page doi:10.3390/jimaging8080208 2022
[20]

Ra- diologists’ variation of time to read across different procedure types,

D. Forsberg, B. Rosipko, and J. L. Sunshine, “Ra- diologists’ variation of time to read across different procedure types,”J. Digit. Imaging, vol. 30, no. 1, pp. 86–94, Feb. 2017

2017
[21]

{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,

A. Mantovani, S. Aonzo, Y . Fratantonio, and D. Balzarotti, “{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,” 2022, pp. 2727–2745. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity22/presentation/mantovani

2022
[22]

Superset Decompilation,

C. Liu, Y . Sun, T. Gilray, and K. Micin- ski, “Superset Decompilation,” Mar. 2026, arXiv:2603.28002 [cs]. [Online]. Available: http://arxiv.org/abs/2603.28002

work page arXiv 2026
[23]

Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,

S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, pp. 24 893–24 901, Apr. 2025, number: 23. [Online]. Available: https://ojs.a...

2025
[24]

Code Obfuscation against Symbolic Execution Attacks,

S. Banescu, C. Collberg, V . Ganesh, Z. Newsham, and A. Pretschner, “Code Obfuscation against Symbolic Execution Attacks,” inProceedings of the 32nd Annual Conference on Computer Security Applications. New York, NY , USA: Association for Computing Machinery, 2016, pp. 189–200, series Title: ACSAC ’16. [Online]. Available: https://doi.org/10.1145/2991079.2991114

work page doi:10.1145/2991079.2991114 2016
[25]

Obfuscation of executable code to improve resistance to static disassembly,

C. Linn and S. Debray, “Obfuscation of executable code to improve resistance to static disassembly,” inProceedings of the 10th ACM conference on Computer and communication security - CCS ’03. New York, New York, USA: ACM Press, 2003, p. 290. [Online]. Available: http: //portal.acm.org/citation.cfm?doid=948109.948149

work page arXiv 2003
[26]

Assemblage: Automatic Binary Dataset Construction for Machine Learning,

C. Liu, R. Saul, Y . Sun, E. Raff, M. Fuchs, T. Southard Pantano, J. Holt, and K. Micinski, “Assemblage: Automatic Binary Dataset Construction for Machine Learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 58 698–58 715, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/6bbefc73a187dd42e0dc0...

2024
[27]

Is Function Similarity Over-Engineered? Building a Benchmark,

R. Saul, C. Liu, N. Fleischmann, R. Zak, K. Micinski, E. Raff, and J. Holt, “Is Function Similarity Over-Engineered? Building a Benchmark,”Advances in Neural Information Processing Systems, vol. 37, pp. 21 636– 21 655, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/2663c994c84a79b338bca613fe1ae223- Abstract-Dat...

2024
[28]

SynCode: LLM Generation with Grammar Augmentation,

S. Ugare, T. Suresh, H. Kang, S. Misailovic, and G. Singh, “SynCode: LLM Generation with Grammar Augmentation,”Transactions on Machine Learning Research, Nov. 2024. [Online]. Available: https://openreview.net/forum?id=HiUZtgAPoH

2024
[29]

UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,

P. Firestone, S. Ugare, G. Singh, and S. Mis- ailovic, “UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,” Aug. 2025. [Online]. Available: https:// openreview.net/forum?id=8ExXncFpf6#discussion

2025
[30]

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,

S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Available...

2023
[31]

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,

T. Scholak, N. Schucher, and D. Bah- danau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computationa...

2021
[32]

Learning the PE Header, Malware Detection with Minimal Domain Knowledge,

E. Raff, J. Sylvester, and C. Nicholas, “Learning the PE Header, Malware Detection with Minimal Domain Knowledge,” inProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. New York, NY , USA: ACM, 2017, pp. 121–132, series Title: AISec ’17. [Online]. Avail- able: http://doi.acm.org/10.1145/3128572.3140442

work page doi:10.1145/3128572.3140442 2017
[33]

A Qualitative Evaluation of Reverse Engineering Tool Usability,

J. Mattei, M. McLaughlin, S. Katcher, and D. V otipka, “A Qualitative Evaluation of Reverse Engineering Tool Usability,” inProceedings of the 38th Annual Computer Security Applications Conference, ser. ACSAC ’22. New York, NY , USA: Association for Computing Machinery, Dec. 2022, pp. 619–631. [Online]. Available: https://dl.acm.org/doi/10.1145/3564625.3567993

work page doi:10.1145/3564625.3567993 2022
[34]

Decomperson: How Humans Decompile and What We Can Learn From It,

K. Burk, F. Pagani, C. Kruegel, and G. Vigna, “Decomperson: How Humans Decompile and What We Can Learn From It,” 2022, pp. 2765–

2022
[35]

Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk

[Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk
[36]

”I’m trying to learn. . . and I’m shooting myself in the foot

J. Mattei, C. Pellegrini, M. Soto, M. S. Bohuk, and D. V otipka, “”I’m trying to learn. . . and I’m shooting myself in the foot”: Beginners’ Struggles When Solving Binary Exploitation Exercises,” 2025, pp. 2867–

2025
[37]

Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei

[Online]. Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei
[38]

E. Raff, D. Farris, and S. Biderman,How Large Language Models Work. Shelter Island: Manning, 2025

2025
[39]

Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,

H. Suresh, N. Lao, and I. Liccardi, “Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,” in Proceedings of the 12th ACM Conference on Web Science, ser. WebSci ’20. New York, NY , USA: Association for Computing Machinery, Jul. 2020, pp. 315–324. [Online]. Available: https://dl.acm.org/doi/10.1145/3394231.3397922

work page doi:10.1145/3394231.3397922 2020
[40]

Using LLMs as a reverse engineering sidekick,

G. Venere, “Using LLMs as a reverse engineering sidekick,” Jul. 2025. [Online]. Available: https://blog.talosintelligence.com/using- llm-as-a-reverse-engineering-sidekick/

2025
[41]

Quintero

B. Quintero. (2024, 04) From assistant to analyst: The power of gemini 1.5 pro for malware analysis. [Online]. Avail- able: https://cloud.google.com/blog/topics/threat- intelligence/gemini-for-malware-analysis

2024
[42]

Automatic Y ARA Rule Generation Using Biclustering,

E. Raff, R. Zak, G. L. Munoz, W. Fleming, H. S. Anderson, B. Filar, C. Nicholas, and J. Holt, “Automatic Y ARA Rule Generation Using Biclustering,” in13th ACM Workshop on Artificial Intelligence and Security (AISec’20), 2020, arXiv: 2009.03779. [Online]. Available: http://arxiv.org/abs/2009.03779

work page arXiv 2020
[43]

Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,

A. Nguyen-Tuong, D. Melski, J. W. Davidson, M. Co, W. Hawkins, J. D. Hiser, D. Morris, D. Nguyen, and E. Rizzi, “Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,”IEEE Security & Privacy, vol. 16, no. 2, pp. 42–51, Mar. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8328984

work page arXiv 2018
[44]

Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,

Y . Shoshitaishvili, M. Weissbacher, L. Dresel, C. Salls, R. Wang, C. Kruegel, and G. Vigna, “Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’17. New York, NY , USA: Association for Computing Machinery, Oct. 2017, pp...

work page arXiv 2017
[45]

Mechanical Phish: Resilient Autonomous Hacking,

Y . Shoshitaishvili, A. Bianchi, K. Borgolte, A. Cama, J. Corbetta, F. Disperati, A. Dutcher, J. Grosen, P. Grosen, A. Machiry, C. Salls, N. Stephens, R. Wang, and G. Vigna, “Mechanical Phish: Resilient Autonomous Hacking,”IEEE Security & Privacy, vol. 16, no. 2, pp. 12–22, Mar. 2018. [Online]. Available: https://ieeexplore. ieee.org/abstract/document/8328966

work page arXiv 2018
[46]

N. Waisman. (2025, 06) The road to top 1: How xbow did it. [Online]. Available: https: //xbow.com/blog/top-1-how-xbow-did-it

2025
[47]

Angora: Efficient Fuzzing by Principled Search

P. Chen and H. Chen, “2018 ieee symposium on security and privacy (SP),” pp. 711–725, May 2018, arXiv: 1803.01307

work page internal anchor Pith review Pith/arXiv arXiv 2018
[48]

american fuzzy lop,

M. Zalewski, “american fuzzy lop,” Nov. 2013. [Online]. Available: https://lcamtuf.coredump.cx/ afl/

2013
[49]

Magma: A Ground-Truth Fuzzing Benchmark,

A. Hazimeh, A. Herrera, and M. Payer, “Magma: A Ground-Truth Fuzzing Benchmark,”Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 3, pp. 49:1–49:29, Nov. 2020. [Online]. Available: https://doi.org/10.1145/3428334

work page doi:10.1145/3428334 2020
[50]

AFL++ : Combining Incremental Steps of Fuzzing Research,

A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++ : Combining Incremental Steps of Fuzzing Research,” 2020. [Online]. Available: https://www. usenix.org/conference/woot20/presentation/fioraldi

2020
[51]

ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space

C. Chen, B. Dolan-Gavitt, and Z. Lin, “ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space.” in34th USENIX Security Symposium (USENIX Security 25), Jul. 2025, pp. 6279–6298. [Online]. Available: http://arxiv.org/ abs/2506.10323

work page arXiv 2025
[52]

L. Wired. (2025, 06) ghidramcp. [Online]. Avail- able: https://github.com/LaurieWired/GhidraMCP

2025
[53]

H. C. Yuceel. (April 09, 2024) The MITRE ATT&CK T1027 obfuscated files or information technique. [Online]. Available: https: //www.picussecurity.com/resource/the-mitre-attck- t1027-obfuscated-files-or-information-technique

2024
[54]

Transformers for End-to-End InfoSec Tasks: A Feasibility Study,

E. M. Rudd, M. S. Rahman, and P. Tully, “Transformers for End-to-End InfoSec Tasks: A Feasibility Study,” inProceedings of the 1st Workshop on Robust Malware Analysis. New York, NY , USA: Association for Computing Machinery, 2022, pp. 21–31, series Title: WoRMA ’22. [Online]. Available: https://doi.org/10.1145/ 3494110.3528242

work page arXiv 2022
[55]

Malware Detection by Eating a Whole EXE

E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware Detection by Eating a Whole EXE,” inAAAI Workshop on Artificial Intelligence for Cyber Security, Oct. 2018, arXiv: 1710.09435. [Online]. Available: http://arxiv.org/abs/1710.09435

work page internal anchor Pith review Pith/arXiv arXiv 2018
[56]

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,

E. Raff, W. Fleshman, R. Zak, H. S. Anderson, B. Filar, and M. McLean, “Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,” inThe Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021, arXiv: 2012.09390. [Online]. Available: http://arxiv.org/ abs/2012.09390

work page arXiv 2021
[57]

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,

M. M. Alam, E. Raff, S. R. Biderman, T. Oates, and J. Holt, “Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,” inProceedings of The 27th International Conference on Artificial Intelligence and Statistics. PMLR, Apr. 2024, pp. 4042–4050. [Online]. Available: https://proceedings.mlr.press/ v238/mahmudul-alam24a.html

2024
[58]

Recasting Self-Attention with Holographic Reduced Representations,

M. M. Alam, E. Raff, S. Biderman, T. Oates, and J. Holt, “Recasting Self-Attention with Holographic Reduced Representations,” inProceedings of the 40th International Conference on Machine Learning. PMLR, Jul. 2023, pp. 490–507. [Online]. Available: https://proceedings.mlr.press/ v202/alam23a.html

2023
[59]

Linformer: Self-Attention with Linear Complexity

S. Wang, B. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-Attention with Linear Complexity,” vol. 2048, no. 2019, 2020, arXiv: 2006.04768. [Online]. Available: http://arxiv.org/ abs/2006.04768

work page internal anchor Pith review Pith/arXiv arXiv 2048
[60]

Rethinking Attention with Performers

K. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, “Rethinking Attention with Performers,” pp. 1–38, 2020, arXiv: 2009.14794. [Online]. Available: http://arxiv.org/abs/2009.14794

work page internal anchor Pith review Pith/arXiv arXiv 2020
[61]

Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,

J. Hurwitz, C. Nicholas, and E. Raff, “Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,” inECAI 2025. IOS Press, 2025, pp. 4273–4280. [Online]. Available: https://ebooks.iospress.nl/doi/10.3233/FAIA251322

work page doi:10.3233/faia251322 2025
[62]

Sok: Leveraging transformers for malware analysis,

P. Kunwar, K. Aryal, M. Gupta, M. Abdelsalam, and E. Bertino, “Sok: Leveraging transformers for malware analysis,”IEEE Transactions on Depend- able and Secure Computing, 2025

2025
[63]

Iot malware threat hunting method based on improved transformer,

Y . Li and Y . Li, “Iot malware threat hunting method based on improved transformer,”Interna- tional Journal of Network Security, vol. 25, no. 2, pp. 267–276, 2023

2023
[64]

Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,

F. Pi, S. Tian, X. Pei, P. Chen, X. Wang, and X. Wang, “Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,”Journal of Intelligent & Fuzzy Systems, vol. 45, no. 6, pp. 11 439–11 452, 2023

2023
[65]

St ¨ortz

F. St ¨ortz. (2025, 03) Byte back: Next-generation malware classification us- ing binary transformers. [Online]. Avail- able: https://www.crowdstrike.com/en-us/blog/ byte-back-next-gen-malware-classification/

2025
[66]

Bytes are all you need: Transformers operating directly on file bytes,

M. Horton, S. Mehta, A. Farhadi, and M. Rastegari, “Bytes are all you need: Transformers operating directly on file bytes,”ArXiv, vol. abs/2306.00238, 2023

work page arXiv 2023
[67]

arXiv preprint arXiv:2412.09871 , year=

A. Pagnoni, R. Pasunuru, P. Rodriguez, J. Nguyen, B. Muller, M. Li, C. Zhou, L. Yu, J. Weston, L. Zettlemoyeret al., “Byte latent transformer: Patches scale better than tokens,”arXiv preprint arXiv:2412.09871, 2024

work page arXiv 2024
[68]

Tady: A Neural Disassembler without Structural Constraint Violations,

S. Qin, F. Yang, H. Wang, B. Zhang, Z. Gao, C. Zhang, and K. Chen, “Tady: A Neural Disassembler without Structural Constraint Violations,” in34th USENIX Security Symposium (USENIX Security 25). arXiv, Jun. 2025, pp. 451– 468, arXiv:2506.13323 [cs]. [Online]. Available: http://arxiv.org/abs/2506.13323

work page arXiv 2025
[69]

XDA: Accurate, Robust Disassembly with Transfer Learning,

K. Pei, J. Guan, D. Williams-King, J. Yang, and S. Jana, “XDA: Accurate, Robust Disassembly with Transfer Learning,” inProceedings 2021 Network and Distributed System Security Sympo- sium. Virtual: Internet Society, 2021. [Online]. Available: https://www.ndss-symposium.org/wp- content/uploads/ndss2021 1B-3 23112 paper.pdf

2021
[70]

DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,

S. Yu, Y . Qu, X. Hu, and H. Yin, “DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,” 2022, pp. 2709–

2022
[71]

Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng

[Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng
[72]

Disassembly as Weighted Interval Scheduling with Learned Weights,

A. Flores-Montoya, J. Lim, A. Seitz, A. Sood, E. Raff, and J. Holt, “Disassembly as Weighted Interval Scheduling with Learned Weights,” in 2025 IEEE Symposium on Security and Privacy (SP), May 2025, pp. 3033–3050, iSSN: 2375-

2025
[73]

Available: https://ieeexplore.ieee

[Online]. Available: https://ieeexplore.ieee. org/document/11023516

work page arXiv
[74]

A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,

R. J. Joyce, E. Raff, and C. Nicholas, “A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,” inProceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21). Association for Com- puting Machinery, 2021, arXiv: 2109.11126v1

work page arXiv 2021
[75]

ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,

R. J. Joyce, D. Everett, M. Fuchs, E. Raff, and J. Holt, “ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,” in Companion Proceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, May 2025, pp. 277–286. [Online]. Available: https://dl.acm.org/doi/10.1145/3701716.3715212

work page doi:10.1145/3701716.3715212 2025
[76]

TLSH – A Locality Sensitive Hash,

J. Oliver, C. Cheng, and Y . Chen, “TLSH – A Locality Sensitive Hash,” in2013 Fourth Cybercrime and Trustworthy Computing Workshop. IEEE, Nov. 2013, pp. 7–13. [Online]. Available: http://ieeexplore.ieee.org/document/6754635/

work page arXiv 2013
[77]

If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,

J. Gonzalez, “If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,”Forensic Science International: Digital Investigation, vol. 53, p. 301922, Jul. 2025. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S2666281725000617

2025
[78]

SMT solvers for software security,

J. Vanegue, S. Heelan, and R. Rolles, “SMT solvers for software security,” in Proceedings of the 6th USENIX conference on Offensive Technologies, ser. WOOT’12. USA: USENIX Association, Aug. 2012, p. 9. [On- line]. Available: https://www.usenix.org/system/ files/conference/woot12/woot12-final26.pdf

2012
[79]

Symbolic optimization with SMT solvers,

Y . Li, A. Albarghouthi, Z. Kincaid, A. Gurfinkel, and M. Chechik, “Symbolic optimization with SMT solvers,”SIGPLAN Not., vol. 49, no. 1, pp. 607–618, Jan. 2014. [Online]. Available: https://dl.acm.org/doi/10.1145/2578855.2535857

work page doi:10.1145/2578855.2535857 2014
[80]

Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,

S. Gupta, A. Saxena, A. Mahajan, and S. Bansal, “Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,” inTheory and Applications of Satisfiability Testing – SAT 2018, O. Beyersdorff and C. M. Wintersteiger, Eds. Cham: Springer International Publishing, 2018, pp. 365–382

2018

Showing first 80 references.

[1] [1]

(2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen

International Information System Security Certification Consortium. (2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen. [Online]. Available: https://www.isc2.org/Insights/2024/09/Employers- Must-Act-Cybersecurity-Workforce-Growth- St...

2024

[2] [2]

Sok: The impact of unlabelled data in cyberthreat detec- tion,

G. Apruzzese, P. Laskov, and A. Tastemirova, “Sok: The impact of unlabelled data in cyberthreat detec- tion,” in2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 2022, pp. 20–42

2022

[3] [3]

Ma- chine learning (in) security: A stream of problems,

F. Ceschin, M. Botacin, A. Bifet, B. Pfahringer, L. S. Oliveira, H. M. Gomes, and A. Gr ´egio, “Ma- chine learning (in) security: A stream of problems,” Digital Threats: Research and Practice, vol. 5, no. 1, pp. 1–32, 2024

2024

[4] [4]

Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,

N. Kshetri, “Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,”Telecom- munications Policy, p. 102976, 2025

2025

[5] [5]

Frontier ai’s impact on the cybersecurity landscape,

W. Guo, Y . Potter, T. Shi, Z. Wang, A. Zhang, and D. Song, “Frontier ai’s impact on the cybersecurity landscape,” 2025. [Online]. Available: https://arxiv. org/abs/2504.05408

work page arXiv 2025

[6] [6]

Ai and cybersecurity: a risk society perspective,

S.-N. Vulpe, R. Rughinis , , D. T , urcanu, and D. Rosner, “Ai and cybersecurity: a risk society perspective,”Frontiers in Computer Science, vol. 6, Oct. 2024. [Online]. Available: http://dx.doi.org/ 10.3389/fcomp.2024.1462250

work page doi:10.3389/fcomp.2024.1462250 2024

[7] [7]

2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,

CrowdStrike, “2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,” CrowdStrike, Tech. Rep., 02 2026. [Online]. Available: https://www.crowdstrike.com/ en-us/global-threat-report/

2026

[8] [8]

A scalable implementation of malware detection based on network connection behaviors,

L. Shi, J. Que, Z. Zhong, B. Meyer, P. Crenshaw, and Y . He, “A scalable implementation of malware detection based on network connection behaviors,” inProceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowl- edge Discovery (CyberC), Oct 2013, pp. 59–66

2013

[9] [9]

Learning and classification of malware behavior,

K. Rieck, T. Holz, C. Willems, P. D ¨ussel, and P. Laskov, “Learning and classification of malware behavior,” inDetection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2008, pp. 108–125

2008

[10] [10]

Large-scale malware classification using random projections and neural networks,

G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” inAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426

2013

[11] [11]

Panorama: capturing system-wide infor- mation flow for malware detection and analysis,

H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda, “Panorama: capturing system-wide infor- mation flow for malware detection and analysis,” inProceedings of the 14th ACM conference on Computer and communications security. ACM, 2007, pp. 116–127

2007

[12] [12]

Effective and efficient malware detection at the end host

C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X.-y. Zhou, and X. Wang, “Effective and efficient malware detection at the end host.” in USENIX security symposium, 2009, pp. 351–366

2009

[13] [13]

Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,

R. Perdisci, A. Lanzi, and W. Lee, “Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,” inComputer Security Applications Confer- ence, 2008. ACSAC 2008. Annual. IEEE, 2008, pp. 301–310

2008

[14] [14]

Learning to detect and classify malicious executables in the wild,

J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,”The Journal of Machine Learning Research, vol. 7, pp. 2721–2744, 2006

2006

[15] [15]

MetaAware: Identi- fying metamorphic malware,

Q. Zhang and D. S. Reeves, “MetaAware: Identi- fying metamorphic malware,” inComputer Secu- rity Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual. IEEE, 2007, pp. 411–420

2007

[16] [16]

Deep neural network based malware detection using two dimensional bi- nary program features,

J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional bi- nary program features,” in2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 2015, pp. 11–20

2015

[17] [17]

Unveiling Zeus: Automated Classification of Malware Samples,

A. Mohaisen and O. Alrawi, “Unveiling Zeus: Automated Classification of Malware Samples,” in Proceedings of the 22Nd International Conference on World Wide Web. New York, NY , USA: ACM, 2013, pp. 829–832, series Title: WWW ’13 Companion. [Online]. Available: http://doi. acm.org/10.1145/2487788.2488056

work page doi:10.1145/2487788.2488056 2013

[18] [18]

An Observational Investiga- tion of Reverse Engineers’ Processes,

D. V otipka, S. M. Rabin, K. Micinski, J. S. Foster, and M. M. Mazurek, “An Observational Investiga- tion of Reverse Engineers’ Processes,” inUSENIX Security Symposium, 2019

2019

[19] [19]

Time is money: Considerations for measuring the radiological reading time,

R. Sexauer and C. Bestler, “Time is money: Considerations for measuring the radiological reading time,”Journal of Imaging, vol. 8, no. 8, p. 208, Jul. 2022. [Online]. Available: http://dx.doi.org/10.3390/jimaging8080208

work page doi:10.3390/jimaging8080208 2022

[20] [20]

Ra- diologists’ variation of time to read across different procedure types,

D. Forsberg, B. Rosipko, and J. L. Sunshine, “Ra- diologists’ variation of time to read across different procedure types,”J. Digit. Imaging, vol. 30, no. 1, pp. 86–94, Feb. 2017

2017

[21] [21]

{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,

A. Mantovani, S. Aonzo, Y . Fratantonio, and D. Balzarotti, “{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,” 2022, pp. 2727–2745. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity22/presentation/mantovani

2022

[22] [22]

Superset Decompilation,

C. Liu, Y . Sun, T. Gilray, and K. Micin- ski, “Superset Decompilation,” Mar. 2026, arXiv:2603.28002 [cs]. [Online]. Available: http://arxiv.org/abs/2603.28002

work page arXiv 2026

[23] [23]

Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,

S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, pp. 24 893–24 901, Apr. 2025, number: 23. [Online]. Available: https://ojs.a...

2025

[24] [24]

Code Obfuscation against Symbolic Execution Attacks,

S. Banescu, C. Collberg, V . Ganesh, Z. Newsham, and A. Pretschner, “Code Obfuscation against Symbolic Execution Attacks,” inProceedings of the 32nd Annual Conference on Computer Security Applications. New York, NY , USA: Association for Computing Machinery, 2016, pp. 189–200, series Title: ACSAC ’16. [Online]. Available: https://doi.org/10.1145/2991079.2991114

work page doi:10.1145/2991079.2991114 2016

[25] [25]

Obfuscation of executable code to improve resistance to static disassembly,

C. Linn and S. Debray, “Obfuscation of executable code to improve resistance to static disassembly,” inProceedings of the 10th ACM conference on Computer and communication security - CCS ’03. New York, New York, USA: ACM Press, 2003, p. 290. [Online]. Available: http: //portal.acm.org/citation.cfm?doid=948109.948149

work page arXiv 2003

[26] [26]

Assemblage: Automatic Binary Dataset Construction for Machine Learning,

C. Liu, R. Saul, Y . Sun, E. Raff, M. Fuchs, T. Southard Pantano, J. Holt, and K. Micinski, “Assemblage: Automatic Binary Dataset Construction for Machine Learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 58 698–58 715, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/6bbefc73a187dd42e0dc0...

2024

[27] [27]

Is Function Similarity Over-Engineered? Building a Benchmark,

R. Saul, C. Liu, N. Fleischmann, R. Zak, K. Micinski, E. Raff, and J. Holt, “Is Function Similarity Over-Engineered? Building a Benchmark,”Advances in Neural Information Processing Systems, vol. 37, pp. 21 636– 21 655, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/2663c994c84a79b338bca613fe1ae223- Abstract-Dat...

2024

[28] [28]

SynCode: LLM Generation with Grammar Augmentation,

S. Ugare, T. Suresh, H. Kang, S. Misailovic, and G. Singh, “SynCode: LLM Generation with Grammar Augmentation,”Transactions on Machine Learning Research, Nov. 2024. [Online]. Available: https://openreview.net/forum?id=HiUZtgAPoH

2024

[29] [29]

UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,

P. Firestone, S. Ugare, G. Singh, and S. Mis- ailovic, “UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,” Aug. 2025. [Online]. Available: https:// openreview.net/forum?id=8ExXncFpf6#discussion

2025

[30] [30]

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,

S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Available...

2023

[31] [31]

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,

T. Scholak, N. Schucher, and D. Bah- danau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computationa...

2021

[32] [32]

Learning the PE Header, Malware Detection with Minimal Domain Knowledge,

E. Raff, J. Sylvester, and C. Nicholas, “Learning the PE Header, Malware Detection with Minimal Domain Knowledge,” inProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. New York, NY , USA: ACM, 2017, pp. 121–132, series Title: AISec ’17. [Online]. Avail- able: http://doi.acm.org/10.1145/3128572.3140442

work page doi:10.1145/3128572.3140442 2017

[33] [33]

A Qualitative Evaluation of Reverse Engineering Tool Usability,

J. Mattei, M. McLaughlin, S. Katcher, and D. V otipka, “A Qualitative Evaluation of Reverse Engineering Tool Usability,” inProceedings of the 38th Annual Computer Security Applications Conference, ser. ACSAC ’22. New York, NY , USA: Association for Computing Machinery, Dec. 2022, pp. 619–631. [Online]. Available: https://dl.acm.org/doi/10.1145/3564625.3567993

work page doi:10.1145/3564625.3567993 2022

[34] [34]

Decomperson: How Humans Decompile and What We Can Learn From It,

K. Burk, F. Pagani, C. Kruegel, and G. Vigna, “Decomperson: How Humans Decompile and What We Can Learn From It,” 2022, pp. 2765–

2022

[35] [35]

Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk

[Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk

[36] [36]

”I’m trying to learn. . . and I’m shooting myself in the foot

J. Mattei, C. Pellegrini, M. Soto, M. S. Bohuk, and D. V otipka, “”I’m trying to learn. . . and I’m shooting myself in the foot”: Beginners’ Struggles When Solving Binary Exploitation Exercises,” 2025, pp. 2867–

2025

[37] [37]

Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei

[Online]. Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei

[38] [38]

E. Raff, D. Farris, and S. Biderman,How Large Language Models Work. Shelter Island: Manning, 2025

2025

[39] [39]

Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,

H. Suresh, N. Lao, and I. Liccardi, “Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,” in Proceedings of the 12th ACM Conference on Web Science, ser. WebSci ’20. New York, NY , USA: Association for Computing Machinery, Jul. 2020, pp. 315–324. [Online]. Available: https://dl.acm.org/doi/10.1145/3394231.3397922

work page doi:10.1145/3394231.3397922 2020

[40] [40]

Using LLMs as a reverse engineering sidekick,

G. Venere, “Using LLMs as a reverse engineering sidekick,” Jul. 2025. [Online]. Available: https://blog.talosintelligence.com/using- llm-as-a-reverse-engineering-sidekick/

2025

[41] [41]

Quintero

B. Quintero. (2024, 04) From assistant to analyst: The power of gemini 1.5 pro for malware analysis. [Online]. Avail- able: https://cloud.google.com/blog/topics/threat- intelligence/gemini-for-malware-analysis

2024

[42] [42]

Automatic Y ARA Rule Generation Using Biclustering,

E. Raff, R. Zak, G. L. Munoz, W. Fleming, H. S. Anderson, B. Filar, C. Nicholas, and J. Holt, “Automatic Y ARA Rule Generation Using Biclustering,” in13th ACM Workshop on Artificial Intelligence and Security (AISec’20), 2020, arXiv: 2009.03779. [Online]. Available: http://arxiv.org/abs/2009.03779

work page arXiv 2020

[43] [43]

Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,

A. Nguyen-Tuong, D. Melski, J. W. Davidson, M. Co, W. Hawkins, J. D. Hiser, D. Morris, D. Nguyen, and E. Rizzi, “Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,”IEEE Security & Privacy, vol. 16, no. 2, pp. 42–51, Mar. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8328984

work page arXiv 2018

[44] [44]

Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,

Y . Shoshitaishvili, M. Weissbacher, L. Dresel, C. Salls, R. Wang, C. Kruegel, and G. Vigna, “Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’17. New York, NY , USA: Association for Computing Machinery, Oct. 2017, pp...

work page arXiv 2017

[45] [45]

Mechanical Phish: Resilient Autonomous Hacking,

Y . Shoshitaishvili, A. Bianchi, K. Borgolte, A. Cama, J. Corbetta, F. Disperati, A. Dutcher, J. Grosen, P. Grosen, A. Machiry, C. Salls, N. Stephens, R. Wang, and G. Vigna, “Mechanical Phish: Resilient Autonomous Hacking,”IEEE Security & Privacy, vol. 16, no. 2, pp. 12–22, Mar. 2018. [Online]. Available: https://ieeexplore. ieee.org/abstract/document/8328966

work page arXiv 2018

[46] [46]

N. Waisman. (2025, 06) The road to top 1: How xbow did it. [Online]. Available: https: //xbow.com/blog/top-1-how-xbow-did-it

2025

[47] [47]

Angora: Efficient Fuzzing by Principled Search

P. Chen and H. Chen, “2018 ieee symposium on security and privacy (SP),” pp. 711–725, May 2018, arXiv: 1803.01307

work page internal anchor Pith review Pith/arXiv arXiv 2018

[48] [48]

american fuzzy lop,

M. Zalewski, “american fuzzy lop,” Nov. 2013. [Online]. Available: https://lcamtuf.coredump.cx/ afl/

2013

[49] [49]

Magma: A Ground-Truth Fuzzing Benchmark,

A. Hazimeh, A. Herrera, and M. Payer, “Magma: A Ground-Truth Fuzzing Benchmark,”Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 3, pp. 49:1–49:29, Nov. 2020. [Online]. Available: https://doi.org/10.1145/3428334

work page doi:10.1145/3428334 2020

[50] [50]

AFL++ : Combining Incremental Steps of Fuzzing Research,

A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++ : Combining Incremental Steps of Fuzzing Research,” 2020. [Online]. Available: https://www. usenix.org/conference/woot20/presentation/fioraldi

2020

[51] [51]

ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space

C. Chen, B. Dolan-Gavitt, and Z. Lin, “ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space.” in34th USENIX Security Symposium (USENIX Security 25), Jul. 2025, pp. 6279–6298. [Online]. Available: http://arxiv.org/ abs/2506.10323

work page arXiv 2025

[52] [52]

L. Wired. (2025, 06) ghidramcp. [Online]. Avail- able: https://github.com/LaurieWired/GhidraMCP

2025

[53] [53]

H. C. Yuceel. (April 09, 2024) The MITRE ATT&CK T1027 obfuscated files or information technique. [Online]. Available: https: //www.picussecurity.com/resource/the-mitre-attck- t1027-obfuscated-files-or-information-technique

2024

[54] [54]

Transformers for End-to-End InfoSec Tasks: A Feasibility Study,

E. M. Rudd, M. S. Rahman, and P. Tully, “Transformers for End-to-End InfoSec Tasks: A Feasibility Study,” inProceedings of the 1st Workshop on Robust Malware Analysis. New York, NY , USA: Association for Computing Machinery, 2022, pp. 21–31, series Title: WoRMA ’22. [Online]. Available: https://doi.org/10.1145/ 3494110.3528242

work page arXiv 2022

[55] [55]

Malware Detection by Eating a Whole EXE

E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware Detection by Eating a Whole EXE,” inAAAI Workshop on Artificial Intelligence for Cyber Security, Oct. 2018, arXiv: 1710.09435. [Online]. Available: http://arxiv.org/abs/1710.09435

work page internal anchor Pith review Pith/arXiv arXiv 2018

[56] [56]

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,

E. Raff, W. Fleshman, R. Zak, H. S. Anderson, B. Filar, and M. McLean, “Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,” inThe Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021, arXiv: 2012.09390. [Online]. Available: http://arxiv.org/ abs/2012.09390

work page arXiv 2021

[57] [57]

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,

M. M. Alam, E. Raff, S. R. Biderman, T. Oates, and J. Holt, “Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,” inProceedings of The 27th International Conference on Artificial Intelligence and Statistics. PMLR, Apr. 2024, pp. 4042–4050. [Online]. Available: https://proceedings.mlr.press/ v238/mahmudul-alam24a.html

2024

[58] [58]

Recasting Self-Attention with Holographic Reduced Representations,

M. M. Alam, E. Raff, S. Biderman, T. Oates, and J. Holt, “Recasting Self-Attention with Holographic Reduced Representations,” inProceedings of the 40th International Conference on Machine Learning. PMLR, Jul. 2023, pp. 490–507. [Online]. Available: https://proceedings.mlr.press/ v202/alam23a.html

2023

[59] [59]

Linformer: Self-Attention with Linear Complexity

S. Wang, B. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-Attention with Linear Complexity,” vol. 2048, no. 2019, 2020, arXiv: 2006.04768. [Online]. Available: http://arxiv.org/ abs/2006.04768

work page internal anchor Pith review Pith/arXiv arXiv 2048

[60] [60]

Rethinking Attention with Performers

K. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, “Rethinking Attention with Performers,” pp. 1–38, 2020, arXiv: 2009.14794. [Online]. Available: http://arxiv.org/abs/2009.14794

work page internal anchor Pith review Pith/arXiv arXiv 2020

[61] [61]

Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,

J. Hurwitz, C. Nicholas, and E. Raff, “Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,” inECAI 2025. IOS Press, 2025, pp. 4273–4280. [Online]. Available: https://ebooks.iospress.nl/doi/10.3233/FAIA251322

work page doi:10.3233/faia251322 2025

[62] [62]

Sok: Leveraging transformers for malware analysis,

P. Kunwar, K. Aryal, M. Gupta, M. Abdelsalam, and E. Bertino, “Sok: Leveraging transformers for malware analysis,”IEEE Transactions on Depend- able and Secure Computing, 2025

2025

[63] [63]

Iot malware threat hunting method based on improved transformer,

Y . Li and Y . Li, “Iot malware threat hunting method based on improved transformer,”Interna- tional Journal of Network Security, vol. 25, no. 2, pp. 267–276, 2023

2023

[64] [64]

Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,

F. Pi, S. Tian, X. Pei, P. Chen, X. Wang, and X. Wang, “Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,”Journal of Intelligent & Fuzzy Systems, vol. 45, no. 6, pp. 11 439–11 452, 2023

2023

[65] [65]

St ¨ortz

F. St ¨ortz. (2025, 03) Byte back: Next-generation malware classification us- ing binary transformers. [Online]. Avail- able: https://www.crowdstrike.com/en-us/blog/ byte-back-next-gen-malware-classification/

2025

[66] [66]

Bytes are all you need: Transformers operating directly on file bytes,

M. Horton, S. Mehta, A. Farhadi, and M. Rastegari, “Bytes are all you need: Transformers operating directly on file bytes,”ArXiv, vol. abs/2306.00238, 2023

work page arXiv 2023

[67] [67]

arXiv preprint arXiv:2412.09871 , year=

A. Pagnoni, R. Pasunuru, P. Rodriguez, J. Nguyen, B. Muller, M. Li, C. Zhou, L. Yu, J. Weston, L. Zettlemoyeret al., “Byte latent transformer: Patches scale better than tokens,”arXiv preprint arXiv:2412.09871, 2024

work page arXiv 2024

[68] [68]

Tady: A Neural Disassembler without Structural Constraint Violations,

S. Qin, F. Yang, H. Wang, B. Zhang, Z. Gao, C. Zhang, and K. Chen, “Tady: A Neural Disassembler without Structural Constraint Violations,” in34th USENIX Security Symposium (USENIX Security 25). arXiv, Jun. 2025, pp. 451– 468, arXiv:2506.13323 [cs]. [Online]. Available: http://arxiv.org/abs/2506.13323

work page arXiv 2025

[69] [69]

XDA: Accurate, Robust Disassembly with Transfer Learning,

K. Pei, J. Guan, D. Williams-King, J. Yang, and S. Jana, “XDA: Accurate, Robust Disassembly with Transfer Learning,” inProceedings 2021 Network and Distributed System Security Sympo- sium. Virtual: Internet Society, 2021. [Online]. Available: https://www.ndss-symposium.org/wp- content/uploads/ndss2021 1B-3 23112 paper.pdf

2021

[70] [70]

DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,

S. Yu, Y . Qu, X. Hu, and H. Yin, “DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,” 2022, pp. 2709–

2022

[71] [71]

Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng

[Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng

[72] [72]

Disassembly as Weighted Interval Scheduling with Learned Weights,

A. Flores-Montoya, J. Lim, A. Seitz, A. Sood, E. Raff, and J. Holt, “Disassembly as Weighted Interval Scheduling with Learned Weights,” in 2025 IEEE Symposium on Security and Privacy (SP), May 2025, pp. 3033–3050, iSSN: 2375-

2025

[73] [73]

Available: https://ieeexplore.ieee

[Online]. Available: https://ieeexplore.ieee. org/document/11023516

work page arXiv

[74] [74]

A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,

R. J. Joyce, E. Raff, and C. Nicholas, “A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,” inProceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21). Association for Com- puting Machinery, 2021, arXiv: 2109.11126v1

work page arXiv 2021

[75] [75]

ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,

R. J. Joyce, D. Everett, M. Fuchs, E. Raff, and J. Holt, “ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,” in Companion Proceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, May 2025, pp. 277–286. [Online]. Available: https://dl.acm.org/doi/10.1145/3701716.3715212

work page doi:10.1145/3701716.3715212 2025

[76] [76]

TLSH – A Locality Sensitive Hash,

J. Oliver, C. Cheng, and Y . Chen, “TLSH – A Locality Sensitive Hash,” in2013 Fourth Cybercrime and Trustworthy Computing Workshop. IEEE, Nov. 2013, pp. 7–13. [Online]. Available: http://ieeexplore.ieee.org/document/6754635/

work page arXiv 2013

[77] [77]

If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,

J. Gonzalez, “If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,”Forensic Science International: Digital Investigation, vol. 53, p. 301922, Jul. 2025. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S2666281725000617

2025

[78] [78]

SMT solvers for software security,

J. Vanegue, S. Heelan, and R. Rolles, “SMT solvers for software security,” in Proceedings of the 6th USENIX conference on Offensive Technologies, ser. WOOT’12. USA: USENIX Association, Aug. 2012, p. 9. [On- line]. Available: https://www.usenix.org/system/ files/conference/woot12/woot12-final26.pdf

2012

[79] [79]

Symbolic optimization with SMT solvers,

Y . Li, A. Albarghouthi, Z. Kincaid, A. Gurfinkel, and M. Chechik, “Symbolic optimization with SMT solvers,”SIGPLAN Not., vol. 49, no. 1, pp. 607–618, Jan. 2014. [Online]. Available: https://dl.acm.org/doi/10.1145/2578855.2535857

work page doi:10.1145/2578855.2535857 2014

[80] [80]

Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,

S. Gupta, A. Saxena, A. Mahajan, and S. Bansal, “Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,” inTheory and Applications of Satisfiability Testing – SAT 2018, O. Beyersdorff and C. M. Wintersteiger, Eds. Cham: Springer International Publishing, 2018, pp. 365–382

2018