pith. machine review for the scientific record. sign in

arxiv: 2604.09998 · v1 · submitted 2026-04-11 · 💻 cs.CR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

Aditi Ganapathi, Chih-Yi Huang, Gail-Joon Ahn, Jaron Mink, Kashyap Thimmaraju, Souradip Nath

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:40 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords large language modelscybersecurity operationsReddit analysisLLM adoptionsecurity practitionersSOC workflowsperceptions of AI tools
0
0 comments X

The pith

Security practitioners use LLMs for low-risk productivity tasks in cybersecurity but grant them limited autonomy due to reliability and security concerns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how cybersecurity professionals discuss and adopt large language models based on posts in Reddit forums. It identifies patterns where LLMs are used independently for routine, low-risk activities that boost productivity, while there is interest in more secure enterprise platforms for sensitive work. Practitioners note real improvements in efficiency and effectiveness from these tools. However, doubts about reliability, the need for verification, and potential security risks prevent wider use with greater independence. Understanding these views can help developers create tools that better fit operational needs and reduce risks in security environments.

Core claim

Analysis of 892 posts from cybersecurity Reddit forums between December 2022 and September 2025 shows that security practitioners use LLMs mainly on their own for low-risk, productivity-focused tasks and express interest in enterprise-grade, security-oriented LLM platforms. They describe meaningful gains in workflow efficiency and effectiveness but highlight ongoing problems with reliability, the extra work of verifying outputs, and security risks, which together restrict how much freedom they allow the tools. The study also offers recommendations for creating and implementing these tools in ways that protect organizations and analysts.

What carries the argument

Mixed-methods study combining qualitative coding and statistical analysis of 892 Reddit posts to map stated LLM tools, use cases, perceived pros and cons, and adoption decisions.

If this is right

  • LLMs are adopted independently for low-risk tasks to improve productivity.
  • Interest exists in specialized enterprise LLM platforms focused on security.
  • Reported gains in efficiency and effectiveness from LLM-assisted work.
  • Concerns over reliability, verification needs, and security risks limit tool autonomy.
  • Recommendations exist for safer development and adoption of LLM tools in security contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers could prioritize building better verification features to increase trust in LLM outputs for security work.
  • Similar adoption patterns might emerge in other high-stakes technical domains beyond cybersecurity.
  • Actual SOC teams could be surveyed directly to validate whether Reddit discussions match in-person practices.
  • Over time, addressing the identified risks might allow greater integration of LLMs into core security operations.

Load-bearing premise

That the posts collected from three public Reddit forums between December 2022 and September 2025 reflect the full range of security practitioner behaviors and views without major self-selection or platform bias.

What would settle it

Finding through interviews or surveys with a broad sample of working SOC analysts that they grant LLMs more autonomy or use them differently than described in the Reddit data.

Figures

Figures reproduced from arXiv: 2604.09998 by Aditi Ganapathi, Chih-Yi Huang, Gail-Joon Ahn, Jaron Mink, Kashyap Thimmaraju, Souradip Nath.

Figure 1
Figure 1. Figure 1: Opinions of LLM Tools by Factor – All factors exhibit statistically significant differences in sentiment (p < .001); LLM Capabilities and Efficiency are discussed more positively, while other factors are discussed more negatively. Improved Interpretability of Signals. In addition to better contextualization of signals, practitioners (n=10) also noted that LLMs made previously hard-to-understand signals rea… view at source ↗
Figure 2
Figure 2. Figure 2: Number of Threads in Our Dataset Over Time. A Data Collection and Curation The following is the list of keywords used to find potential relevant threads (§ 3.1): ‘SOC AI’, ‘SOC AI Agents’, ‘SOC LLM’, ‘AI-powered SOC’, ‘Agentic AI SOC’, ‘autonomous SOC’, ‘AI SOC Analyst’, ‘AI Security Operations’, ‘AI cybersecurity’, ‘LLM in cybersecurity’, ‘AI augmenting cybersecurity’, ‘AI agents cybersecurity’, ‘LLM for … view at source ↗
read the original abstract

Large language models (LLMs) have recently emerged as promising tools for augmenting Security Operations Center (SOC) workflows, with vendors increasingly marketing autonomous AI solutions for SOCs. However, there remains a limited empirical understanding of how such tools are used, perceived, and adopted by real-world security practitioners. To address this gap, we conduct a mixed-methods analysis of discussions in cybersecurity-focused forums to learn how a diverse group of practitioners use and perceive modern LLM tools for security operations. More specifically, we analyzed 892 posts between December 2022 and September 2025 from three cybersecurity-focused forums on Reddit, and, using a combination of qualitative coding and statistical analysis, examined how security practitioners discuss LLM tools across three dimensions: (1) their stated tools and use cases, (2) the perceived pros and cons of each tool across a set of critical factors, and (3) their adoption of such tools and the expected impacts on the cybersecurity industry and individual analysts. Overall, our findings reveal nuanced patterns in LLM tools adoption, highlighting independent use of LLMs for low-risk, productivity-oriented tasks, alongside active interest around enterprise-grade, security-focused LLM platforms. Although practitioners report meaningful gains in efficiency and effectiveness in LLM-assisted workflows, persistent issues with reliability, verification overheads, and security risks sharply constrain the autonomy granted to LLM tools. Based on these results, we also provide recommendations for developing and adopting LLM tools to ensure the security of organizations and the safety of cybersecurity practitioners.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a mixed-methods analysis of 892 Reddit posts from three cybersecurity forums (December 2022–September 2025) to investigate the uses, perceptions, and adoption of large language models (LLMs) by security practitioners. Through qualitative coding and statistical analysis, the authors identify patterns of independent LLM use for low-risk, productivity-oriented tasks, interest in enterprise-grade security-focused platforms, reported gains in efficiency and effectiveness, and constraints on autonomy due to issues with reliability, verification overheads, and security risks. The paper concludes with recommendations for LLM tool development and adoption in cybersecurity.

Significance. If the observed patterns are representative, this work provides important empirical grounding for understanding LLM integration in security operations centers, moving beyond anecdotal or vendor-driven narratives. It highlights practical barriers to full autonomy and suggests pathways for safer adoption, which could influence both research and industry practices in cybersecurity. The use of public forum data offers a scalable method for tracking emerging technology perceptions in the field.

major comments (3)
  1. [Methods] The qualitative coding process lacks details on the coding scheme (e.g., codebook or categories for use cases, pros/cons, and adoption), inter-rater reliability metrics, number of coders, and disagreement resolution. This is load-bearing for the central claims about nuanced patterns, as all findings derive from these codes.
  2. [Data Collection] No information is given on sampling and filtering of the 892 posts, such as search terms, inclusion/exclusion criteria, or per-forum distribution. The manuscript also does not quantify or correct for Reddit self-selection bias, which directly undermines generalizability to 'real-world security practitioners' as stated in the abstract and findings.
  3. [Results] The statistical analysis is referenced but without naming the tests used, reporting p-values, effect sizes, or how they support the 'nuanced patterns' in adoption and perceptions. This weakens evaluation of the strength of the reported efficiency gains and constraints.
minor comments (2)
  1. [Abstract] The date range in the abstract extends to September 2025; clarify whether data collection is retrospective or if this is a typo.
  2. [Results] Ensure all figures or tables summarizing coded categories include sample sizes per category and confidence intervals where appropriate for transparency.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important areas for improving the transparency and rigor of our mixed-methods analysis. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods] The qualitative coding process lacks details on the coding scheme (e.g., codebook or categories for use cases, pros/cons, and adoption), inter-rater reliability metrics, number of coders, and disagreement resolution. This is load-bearing for the central claims about nuanced patterns, as all findings derive from these codes.

    Authors: We agree that the Methods section requires greater transparency on the qualitative coding process to substantiate the central claims. In the revised manuscript, we will add a dedicated subsection detailing the codebook, including the hierarchical categories developed for use cases (e.g., code generation, threat analysis, documentation), pros/cons (e.g., efficiency, reliability, security risks), and adoption factors. We will report that two authors independently coded an initial 20% sample of posts, achieving a Cohen's kappa of 0.82 before full coding, with all disagreements resolved through iterative discussion and consensus. This information will directly support the nuanced patterns reported in the findings. revision: yes

  2. Referee: [Data Collection] No information is given on sampling and filtering of the 892 posts, such as search terms, inclusion/exclusion criteria, or per-forum distribution. The manuscript also does not quantify or correct for Reddit self-selection bias, which directly undermines generalizability to 'real-world security practitioners' as stated in the abstract and findings.

    Authors: We will revise the Data Collection section to explicitly list the search terms (e.g., combinations of 'LLM', 'ChatGPT', 'large language model' with 'SOC', 'cybersecurity', 'security operations'), inclusion criteria (posts discussing LLM use in security contexts from Dec 2022–Sep 2025), exclusion criteria (off-topic, spam, or non-practitioner perspectives), and the per-forum distribution (e.g., 412 from r/netsec, 305 from r/cybersecurity, 175 from r/InfoSec). For self-selection bias, we will add a dedicated Limitations paragraph acknowledging that Reddit users may skew toward certain demographics and cannot be fully quantified or corrected without supplementary survey data; however, we maintain that the observed patterns in public discussions still offer valuable empirical grounding for practitioner perceptions, as stated in the abstract. revision: partial

  3. Referee: [Results] The statistical analysis is referenced but without naming the tests used, reporting p-values, effect sizes, or how they support the 'nuanced patterns' in adoption and perceptions. This weakens evaluation of the strength of the reported efficiency gains and constraints.

    Authors: We will expand the Results section to specify the statistical methods employed, including chi-square tests for comparing categorical adoption patterns across use cases and perceived pros/cons, with all p-values (e.g., p < 0.01 for efficiency gains in low-risk tasks) and effect sizes (Cramer's V ranging 0.25–0.45) reported. We will explicitly link these results to the nuanced patterns, such as stronger efficiency gains in productivity tasks versus constraints from reliability issues, thereby providing clearer quantitative support for the qualitative findings. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational analysis of forum posts

full rationale

The paper performs a mixed-methods study of 892 Reddit posts via qualitative coding and statistical counts to report patterns in LLM use, perceptions, and adoption. No equations, derivations, fitted parameters, or predictions are present. The central claims derive directly from the coded data without reduction to self-citations, ansatzes, or input renaming. The analysis is self-contained against external benchmarks as an empirical report of observed discourse.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on two untested assumptions: that Reddit forum posts serve as a valid proxy for diverse practitioner behavior and that qualitative coding faithfully captures stated perceptions without researcher bias. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Reddit posts from three cybersecurity forums between December 2022 and September 2025 reflect the views and practices of a diverse group of security practitioners.
    The entire analysis treats forum text as representative data; this is invoked in the abstract when describing the sample and generalizing findings.

pith-pipeline@v0.9.0 · 5601 in / 1494 out tokens · 39996 ms · 2026-05-10T16:40:07.708918+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

129 extracted references · 20 canonical work pages · 2 internal anchors

  1. [1]

    Matched and Mismatched SOCs: A Quali- tative Study on Security Operations Center Issues

    Faris Bugra Kokulu, Ananta Soneji, Tiffany Bao, Yan Shoshitaishvili, Ziming Zhao, Adam Doupé, and Gail- Joon Ahn. Matched and Mismatched SOCs: A Quali- tative Study on Security Operations Center Issues. In Proc. of the 2019 ACM SIGSAC Conf. on Computer and Communications Security, pages 1955–1970, Lon- don United Kingdom, November 2019. ACM

  2. [2]

    Security Operations Center: A Sys- tematic Study and Open Challenges.IEEE Access, 8:227756–227779, 2020

    Manfred Vielberth, Fabian Böhm, Ines Fichtinger, and Günther Pernul. Security Operations Center: A Sys- tematic Study and Open Challenges.IEEE Access, 8:227756–227779, 2020

  3. [3]

    Blue Team Fun- damentals: Roles and Tools in a Security Operations Center

    Jenny Hofbauer and Kevin Mayer. Blue Team Fun- damentals: Roles and Tools in a Security Operations Center. InThe 18th International Conf. on Emerging Security Information, Systems and Technologies, pages 176–184. IARIA, 2024

  4. [4]

    Enhancing intrusion detection systems with ma- chine learning

    S Sreelakshmi, A Aalan Babu, C Lakshmipriya, LA Anto Gracious, M Nalini, and R Siva Subrama- nian. Enhancing intrusion detection systems with ma- chine learning. In2024 2nd International Conf. on Self Sustainable Artificial Intelligence Systems (ICSSAS), pages 557–564. IEEE, 2024

  5. [5]

    Insomnia: Towards concept-drift robustness in network intrusion detection

    Giuseppina Andresini, Feargus Pendlebury, Fabio Pier- azzi, Corrado Loglisci, Annalisa Appice, and Lorenzo Cavallaro. Insomnia: Towards concept-drift robustness in network intrusion detection. InProc. of AISec’21, pages 111–122, 2021

  6. [6]

    Explainable artificial intelligence in cybersecurity: A survey.Ieee Access, 10:93575– 93600, 2022

    Nicola Capuano, Giuseppe Fenza, Vincenzo Loia, and Claudio Stanzione. Explainable artificial intelligence in cybersecurity: A survey.Ieee Access, 10:93575– 93600, 2022

  7. [7]

    The Rise of Cognitive SOCs: A Systematic Literature Review on AI Approaches.IEEE Open Journal of the Computer Society, 2025

    Farid Binbeshr, Muhammad Imam, Mustafa Ghaleb, Mosab Hamdan, Mussadiq Abdul Rahim, and Moham- mad Hammoudeh. The Rise of Cognitive SOCs: A Systematic Literature Review on AI Approaches.IEEE Open Journal of the Computer Society, 2025

  8. [8]

    A machine learning and optimization framework for effi- cient alert management in a cybersecurity operations center.Digital Threats: Research and Practice, 5(2):1– 23, 2024

    Jalal Ghadermazi, Ankit Shah, and Sushil Jajodia. A machine learning and optimization framework for effi- cient alert management in a cybersecurity operations center.Digital Threats: Research and Practice, 5(2):1– 23, 2024

  9. [9]

    https://www.tines.com/ reports/voice-of-the-soc-2023/

    2023 V oice of the SOC. https://www.tines.com/ reports/voice-of-the-soc-2023/

  10. [10]

    https: //www.vectra.ai/resources/2023-state- of-threat-detection

    2023 State of Threat Detection. https: //www.vectra.ai/resources/2023-state- of-threat-detection

  11. [11]

    99% false positives: A qualitative study of {SOC} analysts’ perspectives on security alarms

    Bushra A Alahmadi, Louise Axon, and Ivan Marti- novic. 99% false positives: A qualitative study of {SOC} analysts’ perspectives on security alarms. In 31st USENIX Security 22, pages 2783–2800, 2022

  12. [12]

    Nodoze: Combatting threat alert fatigue with auto- mated provenance triage

    Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, Kangkook Jee, Zhichun Li, and Adam Bates. Nodoze: Combatting threat alert fatigue with auto- mated provenance triage. Innetwork and distributed systems security symposium, 2019

  13. [13]

    Integrating large language models into security incident response

    Diana Kramer, Lambert Rosique, Ajay Narotam, Elie Bursztein, Patrick Gage Kelley, Kurt Thomas, and Al- lison Woodruff. Integrating large language models into security incident response. InSOUPS 2025, pages 133–148, 2025

  14. [14]

    arXiv:2508.18947 (2025)

    Ronal Singh, Shahroz Tariq, Fatemeh Jalalvand, Mo- han Baruwal Chhetri, Surya Nepal, Cecile Paris, and Martin Lochner. LLMs in the SOC: An empirical study of human-ai collaboration in security operations centres.arXiv:2508.18947, 2025

  15. [15]

    From promise to peril: Rethink- ing cybersecurity red and blue teaming in the age of LLMs.arXiv preprint arXiv:2506.13434, 2025

    Alsharif Abuadbba, Chris Hicks, Kristen Moore, Vasil- ios Mavroudis, Burak Hasircioglu, Diksha Goel, and Piers Jennings. From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs.arXiv:2506.13434, 2025

  16. [16]

    IRCopilot: Automated Incident Response with Large Language Models

    Xihuan Lin, Jie Zhang, Gelei Deng, Tianzhe Liu, Xi- aolong Liu, Changcai Yang, Tianwei Zhang, Qing Guo, and Riqing Chen. IRCopilot: Automated Incident Response with Large Language Models. arXiv:2505.20945, 2025

  17. [17]

    Nl2kql: From natural language to kusto query

    Xinye Tang, Amir H Abdi, Jeremias Eichelbaum, Ma- han Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, et al. Nl2kql: From natural language to kusto query. arXiv:2404.02933, 2024

  18. [18]

    Towards small language models for security query generation in SOC workflows.arXiv preprint arXiv:2512.06660, 2025

    Saleha Muzammil, Rahul Reddy, Vishal Kamalakrish- nan, Hadi Ahmadi, and Wajih Ul Hassan. Towards Small Language Models for Security Query Genera- tion in SOC Workflows.arXiv:2512.06660, 2025

  19. [19]

    Improving Cyberse- curity Decision-Making Through Text Summarization Addressing Key Applications and Overcoming Chal- lenges

    Charupriya Bisht and Anurag Jain. Improving Cyberse- curity Decision-Making Through Text Summarization Addressing Key Applications and Overcoming Chal- lenges. In2025 International Conf. on Networks and Cryptology, pages 1544–1550. IEEE, 2025

  20. [20]

    Threat detection and response using AI and NLP in cybersecurity.J

    Walaa Saber Ismail. Threat detection and response using AI and NLP in cybersecurity.J. Internet Serv. Inf. Secur, 14(1):195–205, 2024. 13

  21. [21]

    Microsoft Copilot for Security is generally available on April 1, 2024, with new capabilities

    Microsoft Security. Microsoft Copilot for Security is generally available on April 1, 2024, with new capabilities. https://www.microsoft.com/en- us/security/blog/2024/03/13/microsoft- copilot-for-security-is-generally- available-on-april-1-2024-with-new- capabilities/, 2024

  22. [22]

    https://www

    Crowdstrike charlotte ai. https://www. crowdstrike.com/en-us/platform/charlotte- ai/

  23. [23]

    Transforming cybersecurity with agentic ai to combat emerging cyber threats.Telecommunica- tions Policy, page 102976, 2025

    Nir Kshetri. Transforming cybersecurity with agentic ai to combat emerging cyber threats.Telecommunica- tions Policy, page 102976, 2025

  24. [24]

    Everybody’s got ML, tell me what else you have: Practitioners’ perception of ML-based secu- rity tools and explanations

    Jaron Mink, Hadjer Benkraouda, Limin Yang, Arrid- hana Ciptadi, Ali Ahmadzadeh, Daniel V otipka, and Gang Wang. Everybody’s got ML, tell me what else you have: Practitioners’ perception of ML-based secu- rity tools and explanations. InIEEE Symp. on Security and Privacy (SP), pages 2068–2085. IEEE, 2023

  25. [25]

    An assessment of the usability of ma- chine learning based tools for the security operations center

    Sean Oesch, Robert Bridges, Jared Smith, Justin Beaver, John Goodall, Kelly Huffer, Craig Miles, and Dan Scofield. An assessment of the usability of ma- chine learning based tools for the security operations center. In2020 International Conf.s on Internet of Things (iThings), pages 634–641. IEEE, 2020

  26. [26]

    Navigating autonomy: unveiling secu- rity experts’ perspectives on augmented intelligence in cybersecurity

    Neele Roch, Hannah Sievers, Lorin Schöni, and Verena Zimmermann. Navigating autonomy: unveiling secu- rity experts’ perspectives on augmented intelligence in cybersecurity. InSOUPS 2024, pages 41–60, 2024

  27. [27]

    Rastogi, D

    Nidhi Rastogi, Devang Dhanuka, Amulya Saxena, Pranjal Mairal, and Le Nguyen. Measuring the Security and Cognitive Impacts of Explainability in AI-Driven SOCs.arXiv:2503.02065, 2025

  28. [28]

    It was honestly just gambling

    Elijah Bouma-Sims, Hiba Hassan, Alexandra Nisenoff, Lorrie Faith Cranor, and Nicolas Christin. "It was honestly just gambling": Investigating the Experiences of Teenage Cryptocurrency Users on Reddit. InSymp. on Usable Privacy and Security, pages 333–352, 2024

  29. [29]

    Victims, vigilantes, and advice givers: An analysis of {Scam-Related} dis- course on reddit

    Rajvardhan Oak and Zubair Shafiq. Victims, vigilantes, and advice givers: An analysis of {Scam-Related} dis- course on reddit. InSOUPS 2025, pages 57–71, 2025

  30. [30]

    don’t you dare go hollow

    Jaakko Väkevä, Perttu Hämäläinen, and Janne Lindqvist. " don’t you dare go hollow": How dark souls helps players cope with depression, a thematic analysis of reddit discussions. InProc. of the 2025 CHI Conf. on Human Factors in Computing Systems, pages 1–20, 2025

  31. [31]

    Is this a scam?

    Elijah Bouma-Sims, Mandy Lanyon, and Lorrie Faith Cranor. "Is this a scam?": The Nature and Quality of Reddit Discussion about Scams. InProc. of the 2025 ACM SIGSAC Conf. on Computer and Communica- tions Security, pages 2444–2458, 2025

  32. [32]

    Applications of llms for generating cyber security exercise scenarios.IEEE Access, 2024

    Muhammad Mudassar Yamin, Ehtesham Hashmi, Mo- hib Ullah, and Basel Katt. Applications of llms for generating cyber security exercise scenarios.IEEE Access, 2024

  33. [33]

    Generative ai and large language models for cyber security: All insights you need.Available at SSRN 4853709, 2024

    Mohamed Amine Ferrag, Fatima Alwahedi, Ammar Battah, Bilel Cherif, Abdechakour Mechri, and Norbert Tihanyi. Generative ai and large language models for cyber security: All insights you need.Available at SSRN 4853709, 2024

  34. [34]

    Towards ai-driven human-machine co-teaming for adaptive and agile cy- ber security operation centers.arXiv:2505.06394, 2025

    Massimiliano Albanese, Xinming Ou, Kevin Lybarger, Daniel Lende, and Dmitry Goldgof. Towards ai-driven human-machine co-teaming for adaptive and agile cy- ber security operation centers.arXiv:2505.06394, 2025

  35. [35]

    Cortex: Collaborative llm agents for high-stakes alert triage.CoRR, abs/2510.00311, 2025

    Bowen Wei, Yuan Shen Tay, Howard Liu, Jinhao Pan, Kun Luo, Ziwei Zhu, and Chris Jordan. Cortex: Collaborative llm agents for high-stakes alert triage. arXiv:2510.00311, 2025

  36. [36]

    CyberSOCEval: Benchmarking LLMs capa- bilities for malware analysis and threat intelligence reasoning.arXiv preprint arXiv:2509.20166, 2025

    Lauren Deason, Adam Bali, Ciprian Bejean, Diana Bolocan, James Crnkovich, Ioana Croitoru, Krishna Durai, Chase Midler, Calin Miron, David Molnar, et al. Cybersoceval: Benchmarking llms capabilities for malware analysis and threat intelligence reasoning. arXiv:2509.20166, 2025

  37. [37]

    Locus: Agentic predicate synthe- sis for directed fuzzing.arXiv:2508.21302, 2025

    Jie Zhu, Chihao Shen, Ziyang Li, Jiahao Yu, Yizheng Chen, and Kexin Pei. Locus: Agentic predicate synthe- sis for directed fuzzing.arXiv:2508.21302, 2025

  38. [38]

    Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt

    Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv:2304.02014, 2023

  39. [39]

    True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center

    Limin Yang, Zhi Chen, Chenkai Wang, Zhenning Zhang, Sushruth Booma, Phuong Cao, Constantin Adam, Alexander Withers, Zbigniew Kalbarczyk, Rav- ishankar K Iyer, et al. True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center. In33rd USENIX Security, pages 1525–1542, 2024

  40. [40]

    A human capi- tal model for mitigating security analyst burnout

    Sathya Chandran Sundaramurthy, Alexandru G Bar- das, Jacob Case, Xinming Ou, Michael Wesch, John McHugh, and S Raj Rajagopalan. A human capi- tal model for mitigating security analyst burnout. In SOUPS 2015, pages 347–359, 2015. 14

  41. [41]

    Human performance in security opera- tions: a survey on burnout, well-being and flow state among practitioners

    Kashyap Thimmaraju, Sybe Izaak Rispens, and Gail- Joon Ahn. Human performance in security opera- tions: a survey on burnout, well-being and flow state among practitioners. InProc. 2025 Workshop on Secu- rity Operations Center Operations and Construction (WOSOC 2025), pages 2–4, 2025

  42. [42]

    Burnout in cybersecurity incident responders: exploring the factors that light the fire.Proc

    Subigya Nepal, Javier Hernandez, Robert Lewis, Ahad Chaudhry, Brian Houck, Eric Knudsen, Raul Rojas, Ben Tankus, Hemma Prafullchandra, and Mary Czer- winski. Burnout in cybersecurity incident responders: exploring the factors that light the fire.Proc. of ACM Human-Computer Interaction, pages 1–35, 2024

  43. [43]

    To- wards human-ai teaming to mitigate alert fatigue in security operations centres.ACM Transactions on In- ternet Technology, 24(3):1–22, 2024

    Mohan Baruwal Chhetri, Shahroz Tariq, Ronal Singh, Fatemeh Jalalvand, Cecile Paris, and Surya Nepal. To- wards human-ai teaming to mitigate alert fatigue in security operations centres.ACM Transactions on In- ternet Technology, 24(3):1–22, 2024

  44. [44]

    Sarker, and Seyit Camtepe

    Ahmad Mohsin, Helge Janicke, Ahmed Ibrahim, Iqbal H Sarker, and Seyit Camtepe. A unified frame- work for human ai collaboration in security operations centers with trusted autonomy.arXiv:2505.23397, 2025

  45. [45]

    Augmented intelligence framework for human–artificial intelligence teaming in cybersecu- rity.Human-Centric Intelligent Systems, 2025

    Masike Malatji. Augmented intelligence framework for human–artificial intelligence teaming in cybersecu- rity.Human-Centric Intelligent Systems, 2025

  46. [46]

    Building trust bridges: The dynamics of autonomy and transparency in expert-ai collaboration for cybersecurity

    Neele Roch. Building trust bridges: The dynamics of autonomy and transparency in expert-ai collaboration for cybersecurity. InCompanion Proc. of the 30th In- ternational Conf. on Intelligent User Interfaces, pages 202–204, 2025

  47. [47]

    https://platform.openai.com/ docs/models/gpt-4.1-mini

    GPT-4.1 mini. https://platform.openai.com/ docs/models/gpt-4.1-mini

  48. [48]

    research involving human subjects

    Ivor A Pritchard. Searching for "research involving human subjects": What is examined? what is exempt? what is exasperating?IRB: Ethics & Human Research, 23(3):5–13, 2001

  49. [49]

    Privacy, surveillance, and power in the gig economy

    Shruti Sannon, Billie Sun, and Dan Cosley. Privacy, surveillance, and power in the gig economy. InProc. of the 2022 CHI Conf. on human factors in computing systems, pages 1–15, 2022

  50. [50]

    https://www.reddit.com/r/ Drugs/

    r/Drugs Subreddit. https://www.reddit.com/r/ Drugs/. Accessed: 01-13-2026

  51. [51]

    Remember the hu- man: A systematic review of ethical considerations in reddit research.Proc

    Casey Fiesler, Michael Zimmer, Nicholas Proferes, Sarah Gilbert, and Naiyan Jones. Remember the hu- man: A systematic review of ethical considerations in reddit research.Proc. of the ACM on Human-Computer Interaction, 8(GROUP):1–33, 2024

  52. [52]

    Disguising reddit sources and the effi- cacy of ethical research.Ethics and Information Tech- nology, 24(3):41, 2022

    Joseph Reagle. Disguising reddit sources and the effi- cacy of ethical research.Ethics and Information Tech- nology, 24(3):41, 2022

  53. [53]

    I’m trying to learn... and I’m shooting myself in the foot

    James Mattei, Christopher Pellegrini, Matthew Soto, Marina Sanusi Bohuk, and Daniel V otipka. "I’m trying to learn... and I’m shooting myself in the foot": Be- ginners’ Struggles When Solving Binary Exploitation Exercises. InUSENIX, 2025

  54. [54]

    Qualitative Research: Deductive and Inductive Approaches to Data Analysis.Qualita- tive Research Journal, 18(4):383–400, 2018

    Theophilus Azungah. Qualitative Research: Deductive and Inductive Approaches to Data Analysis.Qualita- tive Research Journal, 18(4):383–400, 2018

  55. [55]

    Answering the call for a standard reliability measure for coding data.Communication methods and measures, 1(1):77– 89, 2007

    Andrew F Hayes and Klaus Krippendorff. Answering the call for a standard reliability measure for coding data.Communication methods and measures, 1(1):77– 89, 2007

  56. [56]

    Reliability and inter-rater reliability in quali- tative research: Norms and guidelines for cscw and hci practice.Proc

    Nora McDonald, Sarita Schoenebeck, and Andrea Forte. Reliability and inter-rater reliability in quali- tative research: Norms and guidelines for cscw and hci practice.Proc. of the ACM on human-computer interaction, 3(CSCW):1–23, 2019

  57. [57]

    Thematic analysis: A practical guide

    Virginia Braun and Victoria Clarke. Thematic analysis: A practical guide. 2021

  58. [58]

    Comparing two proportions

    PennState Eberly College of Science. Comparing two proportions. https://online.stat.psu.edu/ stat415/lesson/9/9.4

  59. [59]

    Using adjusted standardized residuals for interpreting contingency ta- bles, 2020

    Cornell Statistical Consulting Unit. Using adjusted standardized residuals for interpreting contingency ta- bles, 2020

  60. [60]

    Bonferroni correction

    Eric W Weisstein. Bonferroni correction. https: //mathworld.wolfram.com/, 2004

  61. [61]

    Using ai assistants in software development: A qualitative study on security practices and concerns

    Jan H Klemmer, Stefan Albert Horstmann, Nikhil Pat- naik, Cordelia Ludden, Cordell Burton Jr, Carson Pow- ers, Fabio Massacci, Akond Rahman, Daniel V otipka, Heather Richter Lipford, et al. Using ai assistants in software development: A qualitative study on security practices and concerns. InProceedings of the 2024 on ACM SIGSAC Conf. on Computer and Comm...

  62. [62]

    https://www.prophetsecurity.ai/blog/ announcing-prophet-security, 2024

    Prophet Security launches with an Agentic AI SOC An- alyst. https://www.prophetsecurity.ai/blog/ announcing-prophet-security, 2024

  63. [63]

    https://docs.cloud

    Gemini in Google SecOps. https://docs.cloud. google.com/chronicle/docs/secops/release- notes#March_26_2024, 2024

  64. [64]

    A Survey on Large Language Models for Code Generation

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.arXiv:2406.00515, 2024. 15

  65. [65]

    A review on code generation with llms: Application and evaluation

    Jianxun Wang and Yixiang Chen. A review on code generation with llms: Application and evaluation. In 2023 IEEE International Conf. on Medical Artificial Intelligence (MedAI), pages 284–289. IEEE, 2023

  66. [66]

    Evaluating llms for code generation in hri: A comparative study of chatgpt, gemini, and claude.Applied Artificial Intelligence, 39(1):2439610, 2025

    Andrei Sobo, Awes Mubarak, Almas Baimagambe- tov, and Nikolaos Polatidis. Evaluating llms for code generation in hri: A comparative study of chatgpt, gemini, and claude.Applied Artificial Intelligence, 39(1):2439610, 2025

  67. [67]

    doi:10.48550/arXiv.2310.01469 , abstract =

    Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Yu-Yang Liu, and Li Yuan. Llm lies: Hallucina- tions are not bugs, but features as adversarial examples. arXiv:2310.01469, 2023

  68. [68]

    Matthew Renze

    Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, et al. Non-determinism of deterministic llm settings. arXiv:2408.04667, 2024

  69. [69]

    The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism

    Yifan Song, Guoyin Wang, Sujian Li, and Bill Yuchen Lin. The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism. InProc. of the 2025 Conf. of the Nations of the Americas Chapter of the Association for Computational Linguistics: Hu- man Language Technologies (Volume 1: Long Papers), pages 4195–4206, 2025

  70. [70]

    Jailbreakradar: Comprehensive assessment of jailbreak attacks against llms

    Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, and Yang Zhang. Jailbreakradar: Comprehensive assessment of jailbreak attacks against llms. InProc. of the 63rd Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 1: Long Papers), pages 21538–21566, 2025

  71. [71]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection attack against llm-integrated applications.arXiv:2306.05499, 2023

  72. [72]

    https://www.gartner

    Gartner Magic Quadrant. https://www.gartner. com/en/research/methodologies/magic- quadrants-research

  73. [73]

    com/blog/llm-inference-benchmarking-how- much-does-your-llm-inference-cost/, 2025

    LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? https://developer.nvidia. com/blog/llm-inference-benchmarking-how- much-does-your-llm-inference-cost/, 2025

  74. [74]

    Where Are We On Cyber?

    Jens Opdenbusch, Jonas Hielscher, and M Angela Sasse. "Where Are We On Cyber?" A Qualitative Study On Boards’ Cybersecurity Risk Decision Mak- ing. InNDSS, 2025

  75. [75]

    from the base- ment to the boardroom

    Stef Schinagl and Abbas Shahim. What do we know about information security governance? “from the base- ment to the boardroom”: towards digital security gov- ernance.Information & Computer Security, 28(2):261– 292, 2020

  76. [76]

    SAGA: Governing AI agent security,

    Georgios Syros, Anshuman Suri, Jacob Ginesin, Cristina Nita-Rotaru, and Alina Oprea. Saga: A se- curity architecture for governing ai agentic systems. arXiv:2504.21034, 2025

  77. [77]

    Trust in automation: Designing for appropriate reliance.Human factors, 46(1):50–80, 2004

    John D Lee and Katrina A See. Trust in automation: Designing for appropriate reliance.Human factors, 46(1):50–80, 2004

  78. [78]

    ‘chatgpt can make mistakes’ warnings fail: A randomized controlled trial.Medical Education, 2025

    Yavuz Selim Kıyak, Özlem Co¸ skun, and I¸ sıl˙Irem Bu- dako˘glu. ‘chatgpt can make mistakes’ warnings fail: A randomized controlled trial.Medical Education, 2025

  79. [79]

    The future cy- bersecurity workforce: Going beyond technical skills for successful cyber performance.Frontiers in psychol- ogy, 9:744, 2018

    Jessica Dawson and Robert Thomson. The future cy- bersecurity workforce: Going beyond technical skills for successful cyber performance.Frontiers in psychol- ogy, 9:744, 2018

  80. [80]

    Developing expertise for network intrusion detec- tion.Information Technology & People, 22(2):92–108, 2009

    John R Goodall, Wayne G Lutters, and Anita Kom- lodi. Developing expertise for network intrusion detec- tion.Information Technology & People, 22(2):92–108, 2009

Showing first 80 references.