pith. sign in

arxiv: 2606.28929 · v1 · pith:4OBOREZWnew · submitted 2026-06-27 · 💻 cs.CR · cs.LG

Cybersecurity is the True Frontier for Generative AI Success or Failure

Pith reviewed 2026-06-30 09:43 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords cybersecuritygenerative AImachine learningtest casesadversarial settingsAI benchmarksLLM agents
0
0 comments X

The pith

Cybersecurity is a better test-case for general AI progress than natural language or computer vision due to its greater complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that cybersecurity workflows involve orchestrating hundreds of tools, processing data at enormous scales such as billions of tokens per sample, managing high labeling costs from adversaries, dealing with expert disagreements on ground truth, and requiring fast low-latency decisions with explanations in a changing environment. These elements together create more complexity than appears in natural language processing or computer vision. A sympathetic reader would care because this would position cybersecurity as a stronger indicator of movement toward general AI capabilities.

Core claim

Cybersecurity is a real-life test-bed for many machine learning problems at once, especially when considering modern strides in using Large Language Models to automate processes as agents. Workflows require orchestrating hundreds of standard and bespoke tools through various formats, with a single malware sample viewed as a sequence of billions of tokens. The cost of labeling is enormous due to adversaries attempting to subvert methods, experts may disagree on labels, models must run quickly on billions of items a day with low latency critical, and explainability is required for analysts facing false positives in a continuously changing environment. The amount of complexity in cybersecurity

What carries the argument

The direct comparison showing cybersecurity's combined requirements for tool orchestration, data scale, adversarial labeling, latency, explainability, and environmental change exceed those of natural language and computer vision.

If this is right

  • AI systems succeeding in cybersecurity would need to handle simultaneous demands of scale, adversaries, and explainability.
  • Measuring progress toward general AI through cybersecurity tasks would capture handling of ambiguous labels and dynamic conditions at once.
  • Deployed models would require low-latency operation on billions of items daily while providing reasoning for decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the claim holds, current AI advances in language or vision may translate less reliably to high-stakes adversarial settings.
  • Generative AI agents would face particular tests in orchestrating multiple tools under real constraints not present in simpler tasks.
  • One could develop cross-domain metrics to check whether performance in cybersecurity predicts broader capabilities.

Load-bearing premise

That having more overlapping difficult requirements automatically makes cybersecurity a superior measure of general AI ability.

What would settle it

Demonstration that AI systems achieve strong results on cybersecurity tasks without corresponding gains in generality on standard benchmarks from other domains, or fail in cybersecurity despite broad success elsewhere.

read the original abstract

Cybersecurity is a real-life test-bed for many machine learning problems at once, especially when considering modern strides in using Large Language Models (LLMs) to automate processes as ``agents.'' Cybersecurity workflows require orchestrating hundreds of standard and bespoke tools through various formats. The scale of cybersecurity data is enormous; for example, a single malware sample can be viewed as a sequence of billions of tokens. The cost of labeling any file by experts is enormous and labor-intensive, in part because an adversary (possibly a well-funded nation state actor) is attempting to subvert your detection methods. Even skilled experts may disagree on the correct label, creating ambiguity in what constitutes ground truth. When deployed, models must run quickly on billions of items a day, where low-latency is critical for operational success, in a continuously changing environment. In addition, explainability is not optional: analysts demand clear reasoning for model decisions to cope with the large number of false-positive alerts they face daily, and to quickly develop remediation and understand how something went wrong. In short, the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress than other, well-studied fields.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript claims that cybersecurity workflows involve greater complexity than NLP or CV—including orchestrating hundreds of tools, billion-token sequences, adversarial labeling with expert disagreement, low-latency requirements on billions of items daily, and mandatory explainability—and therefore posits that cybersecurity is the superior test-case for general AI progress.

Significance. If the unargued premise that higher listed complexity necessarily indicates a better test-bed for generality were established with metrics or comparative analysis, the claim could redirect AI research priorities toward cybersecurity as a forcing function for more capable systems. As written, the manuscript supplies only a qualitative list without definitions, data, or derivations.

major comments (1)
  1. [Abstract] Abstract (final sentence): the inference that 'the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress' rests on the axiom that greater domain complexity correlates with superior suitability as a test for generality; no definition of 'general AI progress,' no quantitative complexity measures across domains, and no argument linking the listed properties to generality are provided.
minor comments (1)
  1. [Abstract] Abstract: the phrasing 'the amount of complexity cybersecurity is greater' contains a grammatical omission that reduces readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the review. We address the major comment on the abstract below, noting that the manuscript is a position paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final sentence): the inference that 'the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress' rests on the axiom that greater domain complexity correlates with superior suitability as a test for generality; no definition of 'general AI progress,' no quantitative complexity measures across domains, and no argument linking the listed properties to generality are provided.

    Authors: The manuscript is a position paper that qualitatively contrasts operational demands rather than claiming empirical proof. The listed properties are linked to generality because they simultaneously require multi-tool orchestration, long-context reasoning over billion-token inputs, robustness to adversarial label subversion and expert disagreement, sub-second inference at massive scale, and mandatory human-interpretable outputs—capabilities that current NLP and CV benchmarks test in isolation rather than in combination under live adversarial conditions. 'General AI progress' is used to mean advancement toward systems that integrate these capabilities without domain-specific retraining. No quantitative cross-domain metrics are supplied because the paper's purpose is to highlight these distinctions to redirect attention, not to derive a formal complexity ranking. We do not intend revisions, as adding such analysis would change the paper's scope. revision: no

Circularity Check

0 steps flagged

No circularity: central claim is direct assertion without equations, fits, or self-citation reductions

full rationale

The paper asserts that cybersecurity's enumerated properties (tool orchestration, billion-token sequences, adversarial labeling, low latency, explainability) exceed those of NLP/CV and therefore make it the superior test-bed for general AI progress. This inference is presented as a posit without any derivation chain, equations, parameter fitting, or citations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear. The text supplies no quantitative complexity metric or formal definition of 'general AI progress,' but the absence of any reductive step means the claim does not collapse to its inputs by construction. This is a standard non-circular position paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on two unproven domain assumptions linking complexity to test suitability for general AI; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Cybersecurity involves greater complexity than natural language processing and computer vision
    This premise is stated directly in the abstract as the basis for the main claim.
  • ad hoc to paper A domain with higher complexity is a better test-case for general AI progress
    This links the listed challenges to the conclusion about AI testing without further justification.

pith-pipeline@v0.9.1-grok · 5763 in / 1168 out tokens · 41200 ms · 2026-06-30T09:43:41.323397+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

108 extracted references · 47 canonical work pages · 4 internal anchors

  1. [1]

    (2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen

    International Information System Security Certification Consortium. (2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen. [Online]. Available: https://www.isc2.org/Insights/2024/09/Employers- Must-Act-Cybersecurity-Workforce-Growth- St...

  2. [2]

    Sok: The impact of unlabelled data in cyberthreat detec- tion,

    G. Apruzzese, P. Laskov, and A. Tastemirova, “Sok: The impact of unlabelled data in cyberthreat detec- tion,” in2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 2022, pp. 20–42

  3. [3]

    Ma- chine learning (in) security: A stream of problems,

    F. Ceschin, M. Botacin, A. Bifet, B. Pfahringer, L. S. Oliveira, H. M. Gomes, and A. Gr ´egio, “Ma- chine learning (in) security: A stream of problems,” Digital Threats: Research and Practice, vol. 5, no. 1, pp. 1–32, 2024

  4. [4]

    Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,

    N. Kshetri, “Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,”Telecom- munications Policy, p. 102976, 2025

  5. [5]

    Frontier ai’s impact on the cybersecurity landscape,

    W. Guo, Y . Potter, T. Shi, Z. Wang, A. Zhang, and D. Song, “Frontier ai’s impact on the cybersecurity landscape,” 2025. [Online]. Available: https://arxiv. org/abs/2504.05408

  6. [6]

    Ai and cybersecurity: a risk society perspective,

    S.-N. Vulpe, R. Rughinis , , D. T , urcanu, and D. Rosner, “Ai and cybersecurity: a risk society perspective,”Frontiers in Computer Science, vol. 6, Oct. 2024. [Online]. Available: http://dx.doi.org/ 10.3389/fcomp.2024.1462250

  7. [7]

    2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,

    CrowdStrike, “2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,” CrowdStrike, Tech. Rep., 02 2026. [Online]. Available: https://www.crowdstrike.com/ en-us/global-threat-report/

  8. [8]

    A scalable implementation of malware detection based on network connection behaviors,

    L. Shi, J. Que, Z. Zhong, B. Meyer, P. Crenshaw, and Y . He, “A scalable implementation of malware detection based on network connection behaviors,” inProceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowl- edge Discovery (CyberC), Oct 2013, pp. 59–66

  9. [9]

    Learning and classification of malware behavior,

    K. Rieck, T. Holz, C. Willems, P. D ¨ussel, and P. Laskov, “Learning and classification of malware behavior,” inDetection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2008, pp. 108–125

  10. [10]

    Large-scale malware classification using random projections and neural networks,

    G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” inAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426

  11. [11]

    Panorama: capturing system-wide infor- mation flow for malware detection and analysis,

    H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda, “Panorama: capturing system-wide infor- mation flow for malware detection and analysis,” inProceedings of the 14th ACM conference on Computer and communications security. ACM, 2007, pp. 116–127

  12. [12]

    Effective and efficient malware detection at the end host

    C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X.-y. Zhou, and X. Wang, “Effective and efficient malware detection at the end host.” in USENIX security symposium, 2009, pp. 351–366

  13. [13]

    Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,

    R. Perdisci, A. Lanzi, and W. Lee, “Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,” inComputer Security Applications Confer- ence, 2008. ACSAC 2008. Annual. IEEE, 2008, pp. 301–310

  14. [14]

    Learning to detect and classify malicious executables in the wild,

    J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,”The Journal of Machine Learning Research, vol. 7, pp. 2721–2744, 2006

  15. [15]

    MetaAware: Identi- fying metamorphic malware,

    Q. Zhang and D. S. Reeves, “MetaAware: Identi- fying metamorphic malware,” inComputer Secu- rity Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual. IEEE, 2007, pp. 411–420

  16. [16]

    Deep neural network based malware detection using two dimensional bi- nary program features,

    J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional bi- nary program features,” in2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 2015, pp. 11–20

  17. [17]

    Unveiling Zeus: Automated Classification of Malware Samples,

    A. Mohaisen and O. Alrawi, “Unveiling Zeus: Automated Classification of Malware Samples,” in Proceedings of the 22Nd International Conference on World Wide Web. New York, NY , USA: ACM, 2013, pp. 829–832, series Title: WWW ’13 Companion. [Online]. Available: http://doi. acm.org/10.1145/2487788.2488056

  18. [18]

    An Observational Investiga- tion of Reverse Engineers’ Processes,

    D. V otipka, S. M. Rabin, K. Micinski, J. S. Foster, and M. M. Mazurek, “An Observational Investiga- tion of Reverse Engineers’ Processes,” inUSENIX Security Symposium, 2019

  19. [19]

    Time is money: Considerations for measuring the radiological reading time,

    R. Sexauer and C. Bestler, “Time is money: Considerations for measuring the radiological reading time,”Journal of Imaging, vol. 8, no. 8, p. 208, Jul. 2022. [Online]. Available: http://dx.doi.org/10.3390/jimaging8080208

  20. [20]

    Ra- diologists’ variation of time to read across different procedure types,

    D. Forsberg, B. Rosipko, and J. L. Sunshine, “Ra- diologists’ variation of time to read across different procedure types,”J. Digit. Imaging, vol. 30, no. 1, pp. 86–94, Feb. 2017

  21. [21]

    {RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,

    A. Mantovani, S. Aonzo, Y . Fratantonio, and D. Balzarotti, “{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,” 2022, pp. 2727–2745. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity22/presentation/mantovani

  22. [22]

    Superset Decompilation,

    C. Liu, Y . Sun, T. Gilray, and K. Micin- ski, “Superset Decompilation,” Mar. 2026, arXiv:2603.28002 [cs]. [Online]. Available: http://arxiv.org/abs/2603.28002

  23. [23]

    Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,

    S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, pp. 24 893–24 901, Apr. 2025, number: 23. [Online]. Available: https://ojs.a...

  24. [24]

    Code Obfuscation against Symbolic Execution Attacks,

    S. Banescu, C. Collberg, V . Ganesh, Z. Newsham, and A. Pretschner, “Code Obfuscation against Symbolic Execution Attacks,” inProceedings of the 32nd Annual Conference on Computer Security Applications. New York, NY , USA: Association for Computing Machinery, 2016, pp. 189–200, series Title: ACSAC ’16. [Online]. Available: https://doi.org/10.1145/2991079.2991114

  25. [25]

    Obfuscation of executable code to improve resistance to static disassembly,

    C. Linn and S. Debray, “Obfuscation of executable code to improve resistance to static disassembly,” inProceedings of the 10th ACM conference on Computer and communication security - CCS ’03. New York, New York, USA: ACM Press, 2003, p. 290. [Online]. Available: http: //portal.acm.org/citation.cfm?doid=948109.948149

  26. [26]

    Assemblage: Automatic Binary Dataset Construction for Machine Learning,

    C. Liu, R. Saul, Y . Sun, E. Raff, M. Fuchs, T. Southard Pantano, J. Holt, and K. Micinski, “Assemblage: Automatic Binary Dataset Construction for Machine Learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 58 698–58 715, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/6bbefc73a187dd42e0dc0...

  27. [27]

    Is Function Similarity Over-Engineered? Building a Benchmark,

    R. Saul, C. Liu, N. Fleischmann, R. Zak, K. Micinski, E. Raff, and J. Holt, “Is Function Similarity Over-Engineered? Building a Benchmark,”Advances in Neural Information Processing Systems, vol. 37, pp. 21 636– 21 655, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/2663c994c84a79b338bca613fe1ae223- Abstract-Dat...

  28. [28]

    SynCode: LLM Generation with Grammar Augmentation,

    S. Ugare, T. Suresh, H. Kang, S. Misailovic, and G. Singh, “SynCode: LLM Generation with Grammar Augmentation,”Transactions on Machine Learning Research, Nov. 2024. [Online]. Available: https://openreview.net/forum?id=HiUZtgAPoH

  29. [29]

    UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,

    P. Firestone, S. Ugare, G. Singh, and S. Mis- ailovic, “UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,” Aug. 2025. [Online]. Available: https:// openreview.net/forum?id=8ExXncFpf6#discussion

  30. [30]

    Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,

    S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Available...

  31. [31]

    PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,

    T. Scholak, N. Schucher, and D. Bah- danau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computationa...

  32. [32]

    Learning the PE Header, Malware Detection with Minimal Domain Knowledge,

    E. Raff, J. Sylvester, and C. Nicholas, “Learning the PE Header, Malware Detection with Minimal Domain Knowledge,” inProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. New York, NY , USA: ACM, 2017, pp. 121–132, series Title: AISec ’17. [Online]. Avail- able: http://doi.acm.org/10.1145/3128572.3140442

  33. [33]

    A Qualitative Evaluation of Reverse Engineering Tool Usability,

    J. Mattei, M. McLaughlin, S. Katcher, and D. V otipka, “A Qualitative Evaluation of Reverse Engineering Tool Usability,” inProceedings of the 38th Annual Computer Security Applications Conference, ser. ACSAC ’22. New York, NY , USA: Association for Computing Machinery, Dec. 2022, pp. 619–631. [Online]. Available: https://dl.acm.org/doi/10.1145/3564625.3567993

  34. [34]

    Decomperson: How Humans Decompile and What We Can Learn From It,

    K. Burk, F. Pagani, C. Kruegel, and G. Vigna, “Decomperson: How Humans Decompile and What We Can Learn From It,” 2022, pp. 2765–

  35. [35]

    Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk

    [Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk

  36. [36]

    ”I’m trying to learn. . . and I’m shooting myself in the foot

    J. Mattei, C. Pellegrini, M. Soto, M. S. Bohuk, and D. V otipka, “”I’m trying to learn. . . and I’m shooting myself in the foot”: Beginners’ Struggles When Solving Binary Exploitation Exercises,” 2025, pp. 2867–

  37. [37]

    Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei

    [Online]. Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei

  38. [38]

    E. Raff, D. Farris, and S. Biderman,How Large Language Models Work. Shelter Island: Manning, 2025

  39. [39]

    Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,

    H. Suresh, N. Lao, and I. Liccardi, “Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,” in Proceedings of the 12th ACM Conference on Web Science, ser. WebSci ’20. New York, NY , USA: Association for Computing Machinery, Jul. 2020, pp. 315–324. [Online]. Available: https://dl.acm.org/doi/10.1145/3394231.3397922

  40. [40]

    Using LLMs as a reverse engineering sidekick,

    G. Venere, “Using LLMs as a reverse engineering sidekick,” Jul. 2025. [Online]. Available: https://blog.talosintelligence.com/using- llm-as-a-reverse-engineering-sidekick/

  41. [41]

    Quintero

    B. Quintero. (2024, 04) From assistant to analyst: The power of gemini 1.5 pro for malware analysis. [Online]. Avail- able: https://cloud.google.com/blog/topics/threat- intelligence/gemini-for-malware-analysis

  42. [42]

    Automatic Y ARA Rule Generation Using Biclustering,

    E. Raff, R. Zak, G. L. Munoz, W. Fleming, H. S. Anderson, B. Filar, C. Nicholas, and J. Holt, “Automatic Y ARA Rule Generation Using Biclustering,” in13th ACM Workshop on Artificial Intelligence and Security (AISec’20), 2020, arXiv: 2009.03779. [Online]. Available: http://arxiv.org/abs/2009.03779

  43. [43]

    Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,

    A. Nguyen-Tuong, D. Melski, J. W. Davidson, M. Co, W. Hawkins, J. D. Hiser, D. Morris, D. Nguyen, and E. Rizzi, “Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,”IEEE Security & Privacy, vol. 16, no. 2, pp. 42–51, Mar. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8328984

  44. [44]

    Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,

    Y . Shoshitaishvili, M. Weissbacher, L. Dresel, C. Salls, R. Wang, C. Kruegel, and G. Vigna, “Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’17. New York, NY , USA: Association for Computing Machinery, Oct. 2017, pp...

  45. [45]

    Mechanical Phish: Resilient Autonomous Hacking,

    Y . Shoshitaishvili, A. Bianchi, K. Borgolte, A. Cama, J. Corbetta, F. Disperati, A. Dutcher, J. Grosen, P. Grosen, A. Machiry, C. Salls, N. Stephens, R. Wang, and G. Vigna, “Mechanical Phish: Resilient Autonomous Hacking,”IEEE Security & Privacy, vol. 16, no. 2, pp. 12–22, Mar. 2018. [Online]. Available: https://ieeexplore. ieee.org/abstract/document/8328966

  46. [46]

    N. Waisman. (2025, 06) The road to top 1: How xbow did it. [Online]. Available: https: //xbow.com/blog/top-1-how-xbow-did-it

  47. [47]

    Angora: Efficient Fuzzing by Principled Search

    P. Chen and H. Chen, “2018 ieee symposium on security and privacy (SP),” pp. 711–725, May 2018, arXiv: 1803.01307

  48. [48]

    american fuzzy lop,

    M. Zalewski, “american fuzzy lop,” Nov. 2013. [Online]. Available: https://lcamtuf.coredump.cx/ afl/

  49. [49]

    Magma: A Ground-Truth Fuzzing Benchmark,

    A. Hazimeh, A. Herrera, and M. Payer, “Magma: A Ground-Truth Fuzzing Benchmark,”Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 3, pp. 49:1–49:29, Nov. 2020. [Online]. Available: https://doi.org/10.1145/3428334

  50. [50]

    AFL++ : Combining Incremental Steps of Fuzzing Research,

    A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++ : Combining Incremental Steps of Fuzzing Research,” 2020. [Online]. Available: https://www. usenix.org/conference/woot20/presentation/fioraldi

  51. [51]

    ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space

    C. Chen, B. Dolan-Gavitt, and Z. Lin, “ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space.” in34th USENIX Security Symposium (USENIX Security 25), Jul. 2025, pp. 6279–6298. [Online]. Available: http://arxiv.org/ abs/2506.10323

  52. [52]

    L. Wired. (2025, 06) ghidramcp. [Online]. Avail- able: https://github.com/LaurieWired/GhidraMCP

  53. [53]

    H. C. Yuceel. (April 09, 2024) The MITRE ATT&CK T1027 obfuscated files or information technique. [Online]. Available: https: //www.picussecurity.com/resource/the-mitre-attck- t1027-obfuscated-files-or-information-technique

  54. [54]

    Transformers for End-to-End InfoSec Tasks: A Feasibility Study,

    E. M. Rudd, M. S. Rahman, and P. Tully, “Transformers for End-to-End InfoSec Tasks: A Feasibility Study,” inProceedings of the 1st Workshop on Robust Malware Analysis. New York, NY , USA: Association for Computing Machinery, 2022, pp. 21–31, series Title: WoRMA ’22. [Online]. Available: https://doi.org/10.1145/ 3494110.3528242

  55. [55]

    Malware Detection by Eating a Whole EXE

    E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware Detection by Eating a Whole EXE,” inAAAI Workshop on Artificial Intelligence for Cyber Security, Oct. 2018, arXiv: 1710.09435. [Online]. Available: http://arxiv.org/abs/1710.09435

  56. [56]

    Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,

    E. Raff, W. Fleshman, R. Zak, H. S. Anderson, B. Filar, and M. McLean, “Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,” inThe Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021, arXiv: 2012.09390. [Online]. Available: http://arxiv.org/ abs/2012.09390

  57. [57]

    Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,

    M. M. Alam, E. Raff, S. R. Biderman, T. Oates, and J. Holt, “Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,” inProceedings of The 27th International Conference on Artificial Intelligence and Statistics. PMLR, Apr. 2024, pp. 4042–4050. [Online]. Available: https://proceedings.mlr.press/ v238/mahmudul-alam24a.html

  58. [58]

    Recasting Self-Attention with Holographic Reduced Representations,

    M. M. Alam, E. Raff, S. Biderman, T. Oates, and J. Holt, “Recasting Self-Attention with Holographic Reduced Representations,” inProceedings of the 40th International Conference on Machine Learning. PMLR, Jul. 2023, pp. 490–507. [Online]. Available: https://proceedings.mlr.press/ v202/alam23a.html

  59. [59]

    Linformer: Self-Attention with Linear Complexity

    S. Wang, B. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-Attention with Linear Complexity,” vol. 2048, no. 2019, 2020, arXiv: 2006.04768. [Online]. Available: http://arxiv.org/ abs/2006.04768

  60. [60]

    Rethinking Attention with Performers

    K. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, “Rethinking Attention with Performers,” pp. 1–38, 2020, arXiv: 2009.14794. [Online]. Available: http://arxiv.org/abs/2009.14794

  61. [61]

    Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,

    J. Hurwitz, C. Nicholas, and E. Raff, “Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,” inECAI 2025. IOS Press, 2025, pp. 4273–4280. [Online]. Available: https://ebooks.iospress.nl/doi/10.3233/FAIA251322

  62. [62]

    Sok: Leveraging transformers for malware analysis,

    P. Kunwar, K. Aryal, M. Gupta, M. Abdelsalam, and E. Bertino, “Sok: Leveraging transformers for malware analysis,”IEEE Transactions on Depend- able and Secure Computing, 2025

  63. [63]

    Iot malware threat hunting method based on improved transformer,

    Y . Li and Y . Li, “Iot malware threat hunting method based on improved transformer,”Interna- tional Journal of Network Security, vol. 25, no. 2, pp. 267–276, 2023

  64. [64]

    Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,

    F. Pi, S. Tian, X. Pei, P. Chen, X. Wang, and X. Wang, “Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,”Journal of Intelligent & Fuzzy Systems, vol. 45, no. 6, pp. 11 439–11 452, 2023

  65. [65]

    St ¨ortz

    F. St ¨ortz. (2025, 03) Byte back: Next-generation malware classification us- ing binary transformers. [Online]. Avail- able: https://www.crowdstrike.com/en-us/blog/ byte-back-next-gen-malware-classification/

  66. [66]

    Bytes are all you need: Transformers operating directly on file bytes,

    M. Horton, S. Mehta, A. Farhadi, and M. Rastegari, “Bytes are all you need: Transformers operating directly on file bytes,”ArXiv, vol. abs/2306.00238, 2023

  67. [67]

    arXiv preprint arXiv:2412.09871 , year=

    A. Pagnoni, R. Pasunuru, P. Rodriguez, J. Nguyen, B. Muller, M. Li, C. Zhou, L. Yu, J. Weston, L. Zettlemoyeret al., “Byte latent transformer: Patches scale better than tokens,”arXiv preprint arXiv:2412.09871, 2024

  68. [68]

    Tady: A Neural Disassembler without Structural Constraint Violations,

    S. Qin, F. Yang, H. Wang, B. Zhang, Z. Gao, C. Zhang, and K. Chen, “Tady: A Neural Disassembler without Structural Constraint Violations,” in34th USENIX Security Symposium (USENIX Security 25). arXiv, Jun. 2025, pp. 451– 468, arXiv:2506.13323 [cs]. [Online]. Available: http://arxiv.org/abs/2506.13323

  69. [69]

    XDA: Accurate, Robust Disassembly with Transfer Learning,

    K. Pei, J. Guan, D. Williams-King, J. Yang, and S. Jana, “XDA: Accurate, Robust Disassembly with Transfer Learning,” inProceedings 2021 Network and Distributed System Security Sympo- sium. Virtual: Internet Society, 2021. [Online]. Available: https://www.ndss-symposium.org/wp- content/uploads/ndss2021 1B-3 23112 paper.pdf

  70. [70]

    DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,

    S. Yu, Y . Qu, X. Hu, and H. Yin, “DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,” 2022, pp. 2709–

  71. [71]

    Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng

    [Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng

  72. [72]

    Disassembly as Weighted Interval Scheduling with Learned Weights,

    A. Flores-Montoya, J. Lim, A. Seitz, A. Sood, E. Raff, and J. Holt, “Disassembly as Weighted Interval Scheduling with Learned Weights,” in 2025 IEEE Symposium on Security and Privacy (SP), May 2025, pp. 3033–3050, iSSN: 2375-

  73. [73]

    Available: https://ieeexplore.ieee

    [Online]. Available: https://ieeexplore.ieee. org/document/11023516

  74. [74]

    A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,

    R. J. Joyce, E. Raff, and C. Nicholas, “A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,” inProceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21). Association for Com- puting Machinery, 2021, arXiv: 2109.11126v1

  75. [75]

    ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,

    R. J. Joyce, D. Everett, M. Fuchs, E. Raff, and J. Holt, “ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,” in Companion Proceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, May 2025, pp. 277–286. [Online]. Available: https://dl.acm.org/doi/10.1145/3701716.3715212

  76. [76]

    TLSH – A Locality Sensitive Hash,

    J. Oliver, C. Cheng, and Y . Chen, “TLSH – A Locality Sensitive Hash,” in2013 Fourth Cybercrime and Trustworthy Computing Workshop. IEEE, Nov. 2013, pp. 7–13. [Online]. Available: http://ieeexplore.ieee.org/document/6754635/

  77. [77]

    If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,

    J. Gonzalez, “If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,”Forensic Science International: Digital Investigation, vol. 53, p. 301922, Jul. 2025. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S2666281725000617

  78. [78]

    SMT solvers for software security,

    J. Vanegue, S. Heelan, and R. Rolles, “SMT solvers for software security,” in Proceedings of the 6th USENIX conference on Offensive Technologies, ser. WOOT’12. USA: USENIX Association, Aug. 2012, p. 9. [On- line]. Available: https://www.usenix.org/system/ files/conference/woot12/woot12-final26.pdf

  79. [79]

    Symbolic optimization with SMT solvers,

    Y . Li, A. Albarghouthi, Z. Kincaid, A. Gurfinkel, and M. Chechik, “Symbolic optimization with SMT solvers,”SIGPLAN Not., vol. 49, no. 1, pp. 607–618, Jan. 2014. [Online]. Available: https://dl.acm.org/doi/10.1145/2578855.2535857

  80. [80]

    Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,

    S. Gupta, A. Saxena, A. Mahajan, and S. Bansal, “Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,” inTheory and Applications of Satisfiability Testing – SAT 2018, O. Beyersdorff and C. M. Wintersteiger, Eds. Cham: Springer International Publishing, 2018, pp. 365–382

Showing first 80 references.