Cybersecurity is the True Frontier for Generative AI Success or Failure
Pith reviewed 2026-06-30 09:43 UTC · model grok-4.3
The pith
Cybersecurity is a better test-case for general AI progress than natural language or computer vision due to its greater complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Cybersecurity is a real-life test-bed for many machine learning problems at once, especially when considering modern strides in using Large Language Models to automate processes as agents. Workflows require orchestrating hundreds of standard and bespoke tools through various formats, with a single malware sample viewed as a sequence of billions of tokens. The cost of labeling is enormous due to adversaries attempting to subvert methods, experts may disagree on labels, models must run quickly on billions of items a day with low latency critical, and explainability is required for analysts facing false positives in a continuously changing environment. The amount of complexity in cybersecurity
What carries the argument
The direct comparison showing cybersecurity's combined requirements for tool orchestration, data scale, adversarial labeling, latency, explainability, and environmental change exceed those of natural language and computer vision.
If this is right
- AI systems succeeding in cybersecurity would need to handle simultaneous demands of scale, adversaries, and explainability.
- Measuring progress toward general AI through cybersecurity tasks would capture handling of ambiguous labels and dynamic conditions at once.
- Deployed models would require low-latency operation on billions of items daily while providing reasoning for decisions.
Where Pith is reading between the lines
- If the claim holds, current AI advances in language or vision may translate less reliably to high-stakes adversarial settings.
- Generative AI agents would face particular tests in orchestrating multiple tools under real constraints not present in simpler tasks.
- One could develop cross-domain metrics to check whether performance in cybersecurity predicts broader capabilities.
Load-bearing premise
That having more overlapping difficult requirements automatically makes cybersecurity a superior measure of general AI ability.
What would settle it
Demonstration that AI systems achieve strong results on cybersecurity tasks without corresponding gains in generality on standard benchmarks from other domains, or fail in cybersecurity despite broad success elsewhere.
read the original abstract
Cybersecurity is a real-life test-bed for many machine learning problems at once, especially when considering modern strides in using Large Language Models (LLMs) to automate processes as ``agents.'' Cybersecurity workflows require orchestrating hundreds of standard and bespoke tools through various formats. The scale of cybersecurity data is enormous; for example, a single malware sample can be viewed as a sequence of billions of tokens. The cost of labeling any file by experts is enormous and labor-intensive, in part because an adversary (possibly a well-funded nation state actor) is attempting to subvert your detection methods. Even skilled experts may disagree on the correct label, creating ambiguity in what constitutes ground truth. When deployed, models must run quickly on billions of items a day, where low-latency is critical for operational success, in a continuously changing environment. In addition, explainability is not optional: analysts demand clear reasoning for model decisions to cope with the large number of false-positive alerts they face daily, and to quickly develop remediation and understand how something went wrong. In short, the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress than other, well-studied fields.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that cybersecurity workflows involve greater complexity than NLP or CV—including orchestrating hundreds of tools, billion-token sequences, adversarial labeling with expert disagreement, low-latency requirements on billions of items daily, and mandatory explainability—and therefore posits that cybersecurity is the superior test-case for general AI progress.
Significance. If the unargued premise that higher listed complexity necessarily indicates a better test-bed for generality were established with metrics or comparative analysis, the claim could redirect AI research priorities toward cybersecurity as a forcing function for more capable systems. As written, the manuscript supplies only a qualitative list without definitions, data, or derivations.
major comments (1)
- [Abstract] Abstract (final sentence): the inference that 'the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress' rests on the axiom that greater domain complexity correlates with superior suitability as a test for generality; no definition of 'general AI progress,' no quantitative complexity measures across domains, and no argument linking the listed properties to generality are provided.
minor comments (1)
- [Abstract] Abstract: the phrasing 'the amount of complexity cybersecurity is greater' contains a grammatical omission that reduces readability.
Simulated Author's Rebuttal
We thank the referee for the review. We address the major comment on the abstract below, noting that the manuscript is a position paper.
read point-by-point responses
-
Referee: [Abstract] Abstract (final sentence): the inference that 'the amount of complexity cybersecurity is greater than that of natural language and computer vision, and thus we posit that cybersecurity is the better test-case for general AI progress' rests on the axiom that greater domain complexity correlates with superior suitability as a test for generality; no definition of 'general AI progress,' no quantitative complexity measures across domains, and no argument linking the listed properties to generality are provided.
Authors: The manuscript is a position paper that qualitatively contrasts operational demands rather than claiming empirical proof. The listed properties are linked to generality because they simultaneously require multi-tool orchestration, long-context reasoning over billion-token inputs, robustness to adversarial label subversion and expert disagreement, sub-second inference at massive scale, and mandatory human-interpretable outputs—capabilities that current NLP and CV benchmarks test in isolation rather than in combination under live adversarial conditions. 'General AI progress' is used to mean advancement toward systems that integrate these capabilities without domain-specific retraining. No quantitative cross-domain metrics are supplied because the paper's purpose is to highlight these distinctions to redirect attention, not to derive a formal complexity ranking. We do not intend revisions, as adding such analysis would change the paper's scope. revision: no
Circularity Check
No circularity: central claim is direct assertion without equations, fits, or self-citation reductions
full rationale
The paper asserts that cybersecurity's enumerated properties (tool orchestration, billion-token sequences, adversarial labeling, low latency, explainability) exceed those of NLP/CV and therefore make it the superior test-bed for general AI progress. This inference is presented as a posit without any derivation chain, equations, parameter fitting, or citations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear. The text supplies no quantitative complexity metric or formal definition of 'general AI progress,' but the absence of any reductive step means the claim does not collapse to its inputs by construction. This is a standard non-circular position paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Cybersecurity involves greater complexity than natural language processing and computer vision
- ad hoc to paper A domain with higher complexity is a better test-case for general AI progress
Reference graph
Works this paper leans on
-
[1]
(2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen
International Information System Security Certification Consortium. (2024, 09) Employers must act as cybersecurity workforce growth stalls and skills gaps widen employers must act as cybersecurity workforce growth stalls and skills gaps widen. [Online]. Available: https://www.isc2.org/Insights/2024/09/Employers- Must-Act-Cybersecurity-Workforce-Growth- St...
2024
-
[2]
Sok: The impact of unlabelled data in cyberthreat detec- tion,
G. Apruzzese, P. Laskov, and A. Tastemirova, “Sok: The impact of unlabelled data in cyberthreat detec- tion,” in2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 2022, pp. 20–42
2022
-
[3]
Ma- chine learning (in) security: A stream of problems,
F. Ceschin, M. Botacin, A. Bifet, B. Pfahringer, L. S. Oliveira, H. M. Gomes, and A. Gr ´egio, “Ma- chine learning (in) security: A stream of problems,” Digital Threats: Research and Practice, vol. 5, no. 1, pp. 1–32, 2024
2024
-
[4]
Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,
N. Kshetri, “Transforming cybersecurity with agen- tic ai to combat emerging cyber threats,”Telecom- munications Policy, p. 102976, 2025
2025
-
[5]
Frontier ai’s impact on the cybersecurity landscape,
W. Guo, Y . Potter, T. Shi, Z. Wang, A. Zhang, and D. Song, “Frontier ai’s impact on the cybersecurity landscape,” 2025. [Online]. Available: https://arxiv. org/abs/2504.05408
-
[6]
Ai and cybersecurity: a risk society perspective,
S.-N. Vulpe, R. Rughinis , , D. T , urcanu, and D. Rosner, “Ai and cybersecurity: a risk society perspective,”Frontiers in Computer Science, vol. 6, Oct. 2024. [Online]. Available: http://dx.doi.org/ 10.3389/fcomp.2024.1462250
-
[7]
2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,
CrowdStrike, “2026 crowdstrike global threat report: Ai accelerates adversaries and reshapes the attack surface,” CrowdStrike, Tech. Rep., 02 2026. [Online]. Available: https://www.crowdstrike.com/ en-us/global-threat-report/
2026
-
[8]
A scalable implementation of malware detection based on network connection behaviors,
L. Shi, J. Que, Z. Zhong, B. Meyer, P. Crenshaw, and Y . He, “A scalable implementation of malware detection based on network connection behaviors,” inProceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowl- edge Discovery (CyberC), Oct 2013, pp. 59–66
2013
-
[9]
Learning and classification of malware behavior,
K. Rieck, T. Holz, C. Willems, P. D ¨ussel, and P. Laskov, “Learning and classification of malware behavior,” inDetection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2008, pp. 108–125
2008
-
[10]
Large-scale malware classification using random projections and neural networks,
G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” inAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426
2013
-
[11]
Panorama: capturing system-wide infor- mation flow for malware detection and analysis,
H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda, “Panorama: capturing system-wide infor- mation flow for malware detection and analysis,” inProceedings of the 14th ACM conference on Computer and communications security. ACM, 2007, pp. 116–127
2007
-
[12]
Effective and efficient malware detection at the end host
C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X.-y. Zhou, and X. Wang, “Effective and efficient malware detection at the end host.” in USENIX security symposium, 2009, pp. 351–366
2009
-
[13]
Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,
R. Perdisci, A. Lanzi, and W. Lee, “Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executa- bles,” inComputer Security Applications Confer- ence, 2008. ACSAC 2008. Annual. IEEE, 2008, pp. 301–310
2008
-
[14]
Learning to detect and classify malicious executables in the wild,
J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,”The Journal of Machine Learning Research, vol. 7, pp. 2721–2744, 2006
2006
-
[15]
MetaAware: Identi- fying metamorphic malware,
Q. Zhang and D. S. Reeves, “MetaAware: Identi- fying metamorphic malware,” inComputer Secu- rity Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual. IEEE, 2007, pp. 411–420
2007
-
[16]
Deep neural network based malware detection using two dimensional bi- nary program features,
J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional bi- nary program features,” in2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 2015, pp. 11–20
2015
-
[17]
Unveiling Zeus: Automated Classification of Malware Samples,
A. Mohaisen and O. Alrawi, “Unveiling Zeus: Automated Classification of Malware Samples,” in Proceedings of the 22Nd International Conference on World Wide Web. New York, NY , USA: ACM, 2013, pp. 829–832, series Title: WWW ’13 Companion. [Online]. Available: http://doi. acm.org/10.1145/2487788.2488056
-
[18]
An Observational Investiga- tion of Reverse Engineers’ Processes,
D. V otipka, S. M. Rabin, K. Micinski, J. S. Foster, and M. M. Mazurek, “An Observational Investiga- tion of Reverse Engineers’ Processes,” inUSENIX Security Symposium, 2019
2019
-
[19]
Time is money: Considerations for measuring the radiological reading time,
R. Sexauer and C. Bestler, “Time is money: Considerations for measuring the radiological reading time,”Journal of Imaging, vol. 8, no. 8, p. 208, Jul. 2022. [Online]. Available: http://dx.doi.org/10.3390/jimaging8080208
-
[20]
Ra- diologists’ variation of time to read across different procedure types,
D. Forsberg, B. Rosipko, and J. L. Sunshine, “Ra- diologists’ variation of time to read across different procedure types,”J. Digit. Imaging, vol. 30, no. 1, pp. 86–94, Feb. 2017
2017
-
[21]
{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,
A. Mantovani, S. Aonzo, Y . Fratantonio, and D. Balzarotti, “{RE-Mind}: a First Look Inside the Mind of a Reverse Engineer,” 2022, pp. 2727–2745. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity22/presentation/mantovani
2022
-
[22]
C. Liu, Y . Sun, T. Gilray, and K. Micin- ski, “Superset Decompilation,” Mar. 2026, arXiv:2603.28002 [cs]. [Online]. Available: http://arxiv.org/abs/2603.28002
-
[23]
Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,
S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 23, pp. 24 893–24 901, Apr. 2025, number: 23. [Online]. Available: https://ojs.a...
2025
-
[24]
Code Obfuscation against Symbolic Execution Attacks,
S. Banescu, C. Collberg, V . Ganesh, Z. Newsham, and A. Pretschner, “Code Obfuscation against Symbolic Execution Attacks,” inProceedings of the 32nd Annual Conference on Computer Security Applications. New York, NY , USA: Association for Computing Machinery, 2016, pp. 189–200, series Title: ACSAC ’16. [Online]. Available: https://doi.org/10.1145/2991079.2991114
-
[25]
Obfuscation of executable code to improve resistance to static disassembly,
C. Linn and S. Debray, “Obfuscation of executable code to improve resistance to static disassembly,” inProceedings of the 10th ACM conference on Computer and communication security - CCS ’03. New York, New York, USA: ACM Press, 2003, p. 290. [Online]. Available: http: //portal.acm.org/citation.cfm?doid=948109.948149
-
[26]
Assemblage: Automatic Binary Dataset Construction for Machine Learning,
C. Liu, R. Saul, Y . Sun, E. Raff, M. Fuchs, T. Southard Pantano, J. Holt, and K. Micinski, “Assemblage: Automatic Binary Dataset Construction for Machine Learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 58 698–58 715, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/6bbefc73a187dd42e0dc0...
2024
-
[27]
Is Function Similarity Over-Engineered? Building a Benchmark,
R. Saul, C. Liu, N. Fleischmann, R. Zak, K. Micinski, E. Raff, and J. Holt, “Is Function Similarity Over-Engineered? Building a Benchmark,”Advances in Neural Information Processing Systems, vol. 37, pp. 21 636– 21 655, Dec. 2024. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 2024/hash/2663c994c84a79b338bca613fe1ae223- Abstract-Dat...
2024
-
[28]
SynCode: LLM Generation with Grammar Augmentation,
S. Ugare, T. Suresh, H. Kang, S. Misailovic, and G. Singh, “SynCode: LLM Generation with Grammar Augmentation,”Transactions on Machine Learning Research, Nov. 2024. [Online]. Available: https://openreview.net/forum?id=HiUZtgAPoH
2024
-
[29]
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,
P. Firestone, S. Ugare, G. Singh, and S. Mis- ailovic, “UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8,” Aug. 2025. [Online]. Available: https:// openreview.net/forum?id=8ExXncFpf6#discussion
2025
-
[30]
Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,
S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 10 932– 10 952. [Online]. Available...
2023
-
[31]
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,
T. Scholak, N. Schucher, and D. Bah- danau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computationa...
2021
-
[32]
Learning the PE Header, Malware Detection with Minimal Domain Knowledge,
E. Raff, J. Sylvester, and C. Nicholas, “Learning the PE Header, Malware Detection with Minimal Domain Knowledge,” inProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. New York, NY , USA: ACM, 2017, pp. 121–132, series Title: AISec ’17. [Online]. Avail- able: http://doi.acm.org/10.1145/3128572.3140442
-
[33]
A Qualitative Evaluation of Reverse Engineering Tool Usability,
J. Mattei, M. McLaughlin, S. Katcher, and D. V otipka, “A Qualitative Evaluation of Reverse Engineering Tool Usability,” inProceedings of the 38th Annual Computer Security Applications Conference, ser. ACSAC ’22. New York, NY , USA: Association for Computing Machinery, Dec. 2022, pp. 619–631. [Online]. Available: https://dl.acm.org/doi/10.1145/3564625.3567993
-
[34]
Decomperson: How Humans Decompile and What We Can Learn From It,
K. Burk, F. Pagani, C. Kruegel, and G. Vigna, “Decomperson: How Humans Decompile and What We Can Learn From It,” 2022, pp. 2765–
2022
-
[35]
Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk
[Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/burk
-
[36]
”I’m trying to learn. . . and I’m shooting myself in the foot
J. Mattei, C. Pellegrini, M. Soto, M. S. Bohuk, and D. V otipka, “”I’m trying to learn. . . and I’m shooting myself in the foot”: Beginners’ Struggles When Solving Binary Exploitation Exercises,” 2025, pp. 2867–
2025
-
[37]
Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei
[Online]. Available: https://www.usenix.org/ conference/usenixsecurity25/presentation/mattei
-
[38]
E. Raff, D. Farris, and S. Biderman,How Large Language Models Work. Shelter Island: Manning, 2025
2025
-
[39]
Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,
H. Suresh, N. Lao, and I. Liccardi, “Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making,” in Proceedings of the 12th ACM Conference on Web Science, ser. WebSci ’20. New York, NY , USA: Association for Computing Machinery, Jul. 2020, pp. 315–324. [Online]. Available: https://dl.acm.org/doi/10.1145/3394231.3397922
-
[40]
Using LLMs as a reverse engineering sidekick,
G. Venere, “Using LLMs as a reverse engineering sidekick,” Jul. 2025. [Online]. Available: https://blog.talosintelligence.com/using- llm-as-a-reverse-engineering-sidekick/
2025
-
[41]
Quintero
B. Quintero. (2024, 04) From assistant to analyst: The power of gemini 1.5 pro for malware analysis. [Online]. Avail- able: https://cloud.google.com/blog/topics/threat- intelligence/gemini-for-malware-analysis
2024
-
[42]
Automatic Y ARA Rule Generation Using Biclustering,
E. Raff, R. Zak, G. L. Munoz, W. Fleming, H. S. Anderson, B. Filar, C. Nicholas, and J. Holt, “Automatic Y ARA Rule Generation Using Biclustering,” in13th ACM Workshop on Artificial Intelligence and Security (AISec’20), 2020, arXiv: 2009.03779. [Online]. Available: http://arxiv.org/abs/2009.03779
-
[43]
Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,
A. Nguyen-Tuong, D. Melski, J. W. Davidson, M. Co, W. Hawkins, J. D. Hiser, D. Morris, D. Nguyen, and E. Rizzi, “Xandra: An Autonomous Cyber Battle System for the Cyber Grand Challenge,”IEEE Security & Privacy, vol. 16, no. 2, pp. 42–51, Mar. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8328984
-
[44]
Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,
Y . Shoshitaishvili, M. Weissbacher, L. Dresel, C. Salls, R. Wang, C. Kruegel, and G. Vigna, “Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’17. New York, NY , USA: Association for Computing Machinery, Oct. 2017, pp...
-
[45]
Mechanical Phish: Resilient Autonomous Hacking,
Y . Shoshitaishvili, A. Bianchi, K. Borgolte, A. Cama, J. Corbetta, F. Disperati, A. Dutcher, J. Grosen, P. Grosen, A. Machiry, C. Salls, N. Stephens, R. Wang, and G. Vigna, “Mechanical Phish: Resilient Autonomous Hacking,”IEEE Security & Privacy, vol. 16, no. 2, pp. 12–22, Mar. 2018. [Online]. Available: https://ieeexplore. ieee.org/abstract/document/8328966
-
[46]
N. Waisman. (2025, 06) The road to top 1: How xbow did it. [Online]. Available: https: //xbow.com/blog/top-1-how-xbow-did-it
2025
-
[47]
Angora: Efficient Fuzzing by Principled Search
P. Chen and H. Chen, “2018 ieee symposium on security and privacy (SP),” pp. 711–725, May 2018, arXiv: 1803.01307
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[48]
american fuzzy lop,
M. Zalewski, “american fuzzy lop,” Nov. 2013. [Online]. Available: https://lcamtuf.coredump.cx/ afl/
2013
-
[49]
Magma: A Ground-Truth Fuzzing Benchmark,
A. Hazimeh, A. Herrera, and M. Payer, “Magma: A Ground-Truth Fuzzing Benchmark,”Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 3, pp. 49:1–49:29, Nov. 2020. [Online]. Available: https://doi.org/10.1145/3428334
-
[50]
AFL++ : Combining Incremental Steps of Fuzzing Research,
A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++ : Combining Incremental Steps of Fuzzing Research,” 2020. [Online]. Available: https://www. usenix.org/conference/woot20/presentation/fioraldi
2020
-
[51]
ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space
C. Chen, B. Dolan-Gavitt, and Z. Lin, “ELFuzz: Efficient input generation via LLM-driven synthesis over fuzzer space.” in34th USENIX Security Symposium (USENIX Security 25), Jul. 2025, pp. 6279–6298. [Online]. Available: http://arxiv.org/ abs/2506.10323
-
[52]
L. Wired. (2025, 06) ghidramcp. [Online]. Avail- able: https://github.com/LaurieWired/GhidraMCP
2025
-
[53]
H. C. Yuceel. (April 09, 2024) The MITRE ATT&CK T1027 obfuscated files or information technique. [Online]. Available: https: //www.picussecurity.com/resource/the-mitre-attck- t1027-obfuscated-files-or-information-technique
2024
-
[54]
Transformers for End-to-End InfoSec Tasks: A Feasibility Study,
E. M. Rudd, M. S. Rahman, and P. Tully, “Transformers for End-to-End InfoSec Tasks: A Feasibility Study,” inProceedings of the 1st Workshop on Robust Malware Analysis. New York, NY , USA: Association for Computing Machinery, 2022, pp. 21–31, series Title: WoRMA ’22. [Online]. Available: https://doi.org/10.1145/ 3494110.3528242
-
[55]
Malware Detection by Eating a Whole EXE
E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas, “Malware Detection by Eating a Whole EXE,” inAAAI Workshop on Artificial Intelligence for Cyber Security, Oct. 2018, arXiv: 1710.09435. [Online]. Available: http://arxiv.org/abs/1710.09435
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[56]
Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,
E. Raff, W. Fleshman, R. Zak, H. S. Anderson, B. Filar, and M. McLean, “Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection,” inThe Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021, arXiv: 2012.09390. [Online]. Available: http://arxiv.org/ abs/2012.09390
-
[57]
Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,
M. M. Alam, E. Raff, S. R. Biderman, T. Oates, and J. Holt, “Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection,” inProceedings of The 27th International Conference on Artificial Intelligence and Statistics. PMLR, Apr. 2024, pp. 4042–4050. [Online]. Available: https://proceedings.mlr.press/ v238/mahmudul-alam24a.html
2024
-
[58]
Recasting Self-Attention with Holographic Reduced Representations,
M. M. Alam, E. Raff, S. Biderman, T. Oates, and J. Holt, “Recasting Self-Attention with Holographic Reduced Representations,” inProceedings of the 40th International Conference on Machine Learning. PMLR, Jul. 2023, pp. 490–507. [Online]. Available: https://proceedings.mlr.press/ v202/alam23a.html
2023
-
[59]
Linformer: Self-Attention with Linear Complexity
S. Wang, B. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-Attention with Linear Complexity,” vol. 2048, no. 2019, 2020, arXiv: 2006.04768. [Online]. Available: http://arxiv.org/ abs/2006.04768
work page internal anchor Pith review Pith/arXiv arXiv 2048
-
[60]
Rethinking Attention with Performers
K. Choromanski, V . Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, “Rethinking Attention with Performers,” pp. 1–38, 2020, arXiv: 2009.14794. [Online]. Available: http://arxiv.org/abs/2009.14794
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[61]
Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,
J. Hurwitz, C. Nicholas, and E. Raff, “Large Language Models and Normalized Compression Distance: Better Compression Yet Worse Accuracy,” inECAI 2025. IOS Press, 2025, pp. 4273–4280. [Online]. Available: https://ebooks.iospress.nl/doi/10.3233/FAIA251322
-
[62]
Sok: Leveraging transformers for malware analysis,
P. Kunwar, K. Aryal, M. Gupta, M. Abdelsalam, and E. Bertino, “Sok: Leveraging transformers for malware analysis,”IEEE Transactions on Depend- able and Secure Computing, 2025
2025
-
[63]
Iot malware threat hunting method based on improved transformer,
Y . Li and Y . Li, “Iot malware threat hunting method based on improved transformer,”Interna- tional Journal of Network Security, vol. 25, no. 2, pp. 267–276, 2023
2023
-
[64]
Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,
F. Pi, S. Tian, X. Pei, P. Chen, X. Wang, and X. Wang, “Adatrans: An adaptive transformer for iot malware detection based on sensitive api call graph and inter-component communication analy- sis,”Journal of Intelligent & Fuzzy Systems, vol. 45, no. 6, pp. 11 439–11 452, 2023
2023
-
[65]
St ¨ortz
F. St ¨ortz. (2025, 03) Byte back: Next-generation malware classification us- ing binary transformers. [Online]. Avail- able: https://www.crowdstrike.com/en-us/blog/ byte-back-next-gen-malware-classification/
2025
-
[66]
Bytes are all you need: Transformers operating directly on file bytes,
M. Horton, S. Mehta, A. Farhadi, and M. Rastegari, “Bytes are all you need: Transformers operating directly on file bytes,”ArXiv, vol. abs/2306.00238, 2023
-
[67]
arXiv preprint arXiv:2412.09871 , year=
A. Pagnoni, R. Pasunuru, P. Rodriguez, J. Nguyen, B. Muller, M. Li, C. Zhou, L. Yu, J. Weston, L. Zettlemoyeret al., “Byte latent transformer: Patches scale better than tokens,”arXiv preprint arXiv:2412.09871, 2024
-
[68]
Tady: A Neural Disassembler without Structural Constraint Violations,
S. Qin, F. Yang, H. Wang, B. Zhang, Z. Gao, C. Zhang, and K. Chen, “Tady: A Neural Disassembler without Structural Constraint Violations,” in34th USENIX Security Symposium (USENIX Security 25). arXiv, Jun. 2025, pp. 451– 468, arXiv:2506.13323 [cs]. [Online]. Available: http://arxiv.org/abs/2506.13323
-
[69]
XDA: Accurate, Robust Disassembly with Transfer Learning,
K. Pei, J. Guan, D. Williams-King, J. Yang, and S. Jana, “XDA: Accurate, Robust Disassembly with Transfer Learning,” inProceedings 2021 Network and Distributed System Security Sympo- sium. Virtual: Internet Society, 2021. [Online]. Available: https://www.ndss-symposium.org/wp- content/uploads/ndss2021 1B-3 23112 paper.pdf
2021
-
[70]
DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,
S. Yu, Y . Qu, X. Hu, and H. Yin, “DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly,” 2022, pp. 2709–
2022
-
[71]
Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng
[Online]. Available: https://www.usenix.org/ conference/usenixsecurity22/presentation/yu-sheng
-
[72]
Disassembly as Weighted Interval Scheduling with Learned Weights,
A. Flores-Montoya, J. Lim, A. Seitz, A. Sood, E. Raff, and J. Holt, “Disassembly as Weighted Interval Scheduling with Learned Weights,” in 2025 IEEE Symposium on Security and Privacy (SP), May 2025, pp. 3033–3050, iSSN: 2375-
2025
-
[73]
Available: https://ieeexplore.ieee
[Online]. Available: https://ieeexplore.ieee. org/document/11023516
-
[74]
A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,
R. J. Joyce, E. Raff, and C. Nicholas, “A Frame- work for Cluster and Classifier Evaluation in the Absence of Reference Labels,” inProceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21). Association for Com- puting Machinery, 2021, arXiv: 2109.11126v1
-
[75]
ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,
R. J. Joyce, D. Everett, M. Fuchs, E. Raff, and J. Holt, “ClarA Vy: A Tool for Scalable and Accurate Malware Family Labeling,” in Companion Proceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, May 2025, pp. 277–286. [Online]. Available: https://dl.acm.org/doi/10.1145/3701716.3715212
-
[76]
TLSH – A Locality Sensitive Hash,
J. Oliver, C. Cheng, and Y . Chen, “TLSH – A Locality Sensitive Hash,” in2013 Fourth Cybercrime and Trustworthy Computing Workshop. IEEE, Nov. 2013, pp. 7–13. [Online]. Available: http://ieeexplore.ieee.org/document/6754635/
-
[77]
If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,
J. Gonzalez, “If at first you don’t succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics,”Forensic Science International: Digital Investigation, vol. 53, p. 301922, Jul. 2025. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S2666281725000617
2025
-
[78]
SMT solvers for software security,
J. Vanegue, S. Heelan, and R. Rolles, “SMT solvers for software security,” in Proceedings of the 6th USENIX conference on Offensive Technologies, ser. WOOT’12. USA: USENIX Association, Aug. 2012, p. 9. [On- line]. Available: https://www.usenix.org/system/ files/conference/woot12/woot12-final26.pdf
2012
-
[79]
Symbolic optimization with SMT solvers,
Y . Li, A. Albarghouthi, Z. Kincaid, A. Gurfinkel, and M. Chechik, “Symbolic optimization with SMT solvers,”SIGPLAN Not., vol. 49, no. 1, pp. 607–618, Jan. 2014. [Online]. Available: https://dl.acm.org/doi/10.1145/2578855.2535857
-
[80]
Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,
S. Gupta, A. Saxena, A. Mahajan, and S. Bansal, “Effective Use of SMT Solvers for Program Equiv- alence Checking Through Invariant-Sketching and Query-Decomposition,” inTheory and Applications of Satisfiability Testing – SAT 2018, O. Beyersdorff and C. M. Wintersteiger, Eds. Cham: Springer International Publishing, 2018, pp. 365–382
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.