pith. sign in

arxiv: 2509.13021 · v2 · submitted 2025-09-16 · 💻 cs.CR · cs.AI

xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models

Pith reviewed 2026-05-18 16:33 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords penetration testingmulti-agent systemslarge language modelsautonomous securityfine-tuned modelscybersecurity automationvulnerability exploitation
0
0 comments X

The pith

A fine-tuned mid-scale LLM in a multi-agent setup automates penetration testing and reaches 79 percent sub-task success on benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents xOffense as a complete shift from expert-driven manual penetration testing to an automated system where specialized agents handle reconnaissance, scanning, and exploitation. It relies on a fine-tuned open-source model to generate commands and sustain reasoning chains across multiple steps. A reader would care because this promises security testing that runs without constant human oversight and scales simply by adding compute. The work shows the system beats prior tools on two established benchmarks by completing nearly four-fifths of sub-tasks.

Core claim

By fine-tuning Qwen3-32B on Chain-of-Thought penetration testing data and placing it inside an orchestration layer that coordinates dedicated agents for reconnaissance, vulnerability scanning, and exploitation, xOffense produces autonomous workflows that reach 79.17 percent sub-task completion on AutoPenBench and AI-Pentest-Benchmark, exceeding VulnBot and PentestGPT.

What carries the argument

The orchestration layer that assigns and coordinates specialized agents powered by the fine-tuned LLM to generate precise tool commands and maintain consistent multi-step reasoning.

If this is right

  • Penetration testing becomes executable as a fully machine-driven process that scales with available compute rather than expert hours.
  • Results gain reproducibility because the same model and orchestration produce consistent command sequences.
  • Security assessments can shift from occasional manual reviews to routine, on-demand automated runs.
  • Domain-adapted mid-scale models prove capable of handling the full chain from reconnaissance to exploitation when structured with agent roles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent orchestration pattern could transfer to related security tasks such as continuous monitoring or post-breach analysis.
  • Real-world deployment would need checks for cases where network defenses or ethical limits require human judgment to avoid unintended actions.
  • Hybrid human-AI loops might emerge as a practical next step to handle the minority of sub-tasks the model does not complete.

Load-bearing premise

The fine-tuned LLM will generate precise tool commands and sustain consistent multi-step reasoning across varied penetration testing scenarios without requiring human correction or intervention.

What would settle it

A new benchmark or live target set where the framework completes under 60 percent of sub-tasks or requires repeated human intervention to continue would show the autonomy claim does not hold.

Figures

Figures reproduced from arXiv: 2509.13021 by Dong Huu Nguyen Khoa, Le Tran Gia Bao, Nguyen Huu Quyen, Nguyen Vu Khai Tam, Phan The Duy, Phung Duc Luong, Van-Hau Pham.

Figure 1
Figure 1. Figure 1: The Overall Architecture of the xOffense Framework. tasks, such as performing an exhaustive enumeration of writable directories for privilege escalation through misconfigured per￾missions or publicly writable paths — (find / -writable 2>/dev/null) or listing running processes (ps aux), are con￾tingent on this authentication [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Task Coordination Graph (TCG) illustrating task dependencies and execution status. Completed tasks are shown in dark, the current task in orange, and pending tasks in light blue. Algorithm 1 Check and Reflection Procedure Require: TCG (Task Coordination Graph), Knowledge Repos￾itory KR 1: while not all tasks completed do 2: t ← NextTask(TCG) 3: r ← Execute(t) 4: if CheckSuccess(r) then 5: MarkCompleted(t) … view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of subtask completion rates across six real-world vulnerable machines in a No-RAG setting. [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of subtask completion rates across six real-world vulnerable machines with RAG setting. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
read the original abstract

This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces xOffense, an AI-driven multi-agent framework for penetration testing that employs a fine-tuned Qwen3-32B LLM to enable fully automated workflows. Specialized agents handle reconnaissance, vulnerability scanning, and exploitation, coordinated by an orchestration layer. The authors report that xOffense achieves a 79.17% sub-task completion rate on AutoPenBench and AI-Pentest-Benchmark, outperforming systems like VulnBot and PentestGPT.

Significance. If validated, the results would indicate that domain-adapted mid-scale LLMs combined with multi-agent orchestration can provide superior, cost-efficient solutions for autonomous penetration testing. This could have implications for scaling security testing without heavy reliance on human experts.

major comments (2)
  1. [Abstract and Results] The abstract and results section state a 79.17% sub-task completion rate and benchmark superiority but supply no details on experimental controls, statistical significance, exact task definitions, or potential data leakage between fine-tuning and evaluation sets. This information is required to verify the central performance claim.
  2. [Experimental Evaluation] The claim of fully automated workflows and consistent multi-step reasoning is load-bearing for the autonomy and cost-efficiency advantages, yet the experimental evaluation provides no breakdown of command failure rates, retry counts, fraction of trajectories completed with zero human input, or total commands issued.
minor comments (2)
  1. [Related Work] Clarify the precise differences between the proposed orchestration layer and prior multi-agent pentesting systems in the related work section.
  2. [Methodology] Expand the description of the Chain-of-Thought fine-tuning dataset, including its size, source, and construction process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and insightful comments on our manuscript. We value the opportunity to clarify and strengthen our presentation of the experimental results and evaluation methodology. Below, we address each major comment point by point.

read point-by-point responses
  1. Referee: [Abstract and Results] The abstract and results section state a 79.17% sub-task completion rate and benchmark superiority but supply no details on experimental controls, statistical significance, exact task definitions, or potential data leakage between fine-tuning and evaluation sets. This information is required to verify the central performance claim.

    Authors: We fully agree that these details are crucial for the credibility of our central claims. The current manuscript provides high-level results but lacks the requested granularity. In the revised manuscript, we will add comprehensive information in the Experimental Setup and Results sections. Specifically, we will describe the experimental controls (e.g., fixed environment setups and multiple runs), report statistical significance using paired t-tests or similar with p-values, provide exact definitions of sub-tasks drawn from the benchmark papers, and explicitly address data leakage by detailing how the fine-tuning dataset was curated separately from the evaluation benchmarks with no overlap. We will also include error bars or confidence intervals around the 79.17% figure to better contextualize the results. revision: yes

  2. Referee: [Experimental Evaluation] The claim of fully automated workflows and consistent multi-step reasoning is load-bearing for the autonomy and cost-efficiency advantages, yet the experimental evaluation provides no breakdown of command failure rates, retry counts, fraction of trajectories completed with zero human input, or total commands issued.

    Authors: This is a valid observation, as our evaluation focused on overall success rates rather than these granular automation metrics. To address this, we will revise the Experimental Evaluation section to include a detailed breakdown. This will encompass: observed command failure rates across all agents, average retry counts for failed commands, confirmation that 100% of trajectories were completed with zero human input as the framework operates autonomously, and the total number of commands issued during the benchmark evaluations. These additions will directly support our claims regarding fully automated workflows and cost-efficiency. revision: yes

Circularity Check

0 steps flagged

No circularity detected in empirical framework and benchmark evaluation

full rationale

The paper presents an empirical system description and benchmark results rather than a mathematical derivation chain. It introduces a multi-agent framework, describes fine-tuning an LLM on external Chain-of-Thought data, and reports measured sub-task completion rates on independent benchmarks (AutoPenBench, AI-Pentest-Benchmark). No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations that reduce the central performance claim to its own inputs appear in the provided text. The evaluation relies on external test sets and comparisons to prior systems, making the reported 79.17% rate an independent measurement rather than a constructed tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on the practical assumption that a fine-tuned mid-scale LLM can reliably drive autonomous tool use and reasoning in security workflows; no new mathematical axioms or invented physical entities are introduced.

axioms (1)
  • domain assumption A fine-tuned mid-scale LLM can generate precise tool commands and maintain consistent multi-step reasoning for penetration testing tasks.
    Invoked to justify autonomous operation of the specialized agents without human oversight.

pith-pipeline@v0.9.0 · 5760 in / 1188 out tokens · 43344 ms · 2026-05-18T16:33:26.772594+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

    cs.CR 2026-04 unverdicted novelty 8.0

    The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

  2. PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

    cs.CR 2026-04 unverdicted novelty 6.0

    PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 2 Pith papers · 3 internal anchors

  1. [1]

    Nvd revamps operations amid cve surge.https://www.infosecurity-magazine.com/news/ nvd-revamps-operations-cve-surge/, 2024

    Infosecurity Magazine. Nvd revamps operations amid cve surge.https://www.infosecurity-magazine.com/news/ nvd-revamps-operations-cve-surge/, 2024. Accessed: 2025-07- 30

  2. [2]

    Nist facing challenges in manag- ing cve backlog.https://gbhackers.com/ nist-facing-challenges-in-managing-cve-backlog/, 2024

    GBHackers. Nist facing challenges in manag- ing cve backlog.https://gbhackers.com/ nist-facing-challenges-in-managing-cve-backlog/, 2024. Accessed: 2025-07-30

  3. [3]

    Deep exploit.https: //www.blackhat.com/us-18/arsenal/schedule/index.html# deep-exploit-11908, 2018

    Isao Takaesu and Daisuke Chikamori. Deep exploit.https: //www.blackhat.com/us-18/arsenal/schedule/index.html# deep-exploit-11908, 2018. Presented at Black Hat USA 2018 Arsenal, Las Vegas. Accessed: 2025-07-30

  4. [4]

    Metasploit — penetration testing software, pen testing security

    Rapid7. Metasploit — penetration testing software, pen testing security. https://www.metasploit.com/, 2024. Accessed: July 27, 2024

  5. [5]

    Advancements in au- tomated penetration testing for iot security by leveraging reinforcement learning.evaluation, 8:9, 2024

    Abdul Samad, Saad Altaf, and M Junaid Arshad. Advancements in au- tomated penetration testing for iot security by leveraging reinforcement learning.evaluation, 8:9, 2024

  6. [6]

    Deep hierarchical rein- forcement agents for automated penetration testing.arXiv preprint arXiv:2109.06449, 2021

    Khuong Tran, Ashlesha Akella, Maxwell Standen, Junae Kim, David Bowman, Toby Richer, and Chin-Teng Lin. Deep hierarchical rein- forcement agents for automated penetration testing.arXiv preprint arXiv:2109.06449, 2021

  7. [7]

    Pen- testGPT: Evaluating and harnessing large language models for automated penetration testing

    Gelei Deng, Yi Liu, V ´ıctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. Pen- testGPT: Evaluating and harnessing large language models for automated penetration testing. In33rd USENIX Security Symposium (USENIX Secu- rity 24), pages 847–864, Philadelphia, PA, 2024. USENIX Association

  8. [8]

    Pentestagent: Incorpo- rating llm agents to automated penetration testing

    Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, and Wei Ruan. Pentestagent: Incorpo- rating llm agents to automated penetration testing. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security, pages 375–391, 2025

  9. [9]

    Vulnbot: Autonomous penetration testing for a multi-agent collaborative framework,

    He Kong, Die Hu, Jingguo Ge, Liangxiong Li, Tong Li, and Bingzhen Wu. VulnBot: Autonomous penetration testing for a multi-agent collabo- rative framework.arXiv preprint arXiv:2501.13411, Jan 2025

  10. [10]

    Autopenbench: Benchmark- ing generative agents for penetration testing, 2024

    Luca Gioacchini, Marco Mellia, Idilio Drago, Alexander Delsanto, Giuseppe Siracusano, and Roberto Bifulco. Autopenbench: Benchmark- ing generative agents for penetration testing, 2024

  11. [11]

    Ai-pentest-benchmark: A benchmark for auto- mated penetration testing.https://github.com/isamu-isozaki/ AI-Pentest-Benchmark, 2024

    Isamu Isozaki. Ai-pentest-benchmark: A benchmark for auto- mated penetration testing.https://github.com/isamu-isozaki/ AI-Pentest-Benchmark, 2024. GitHub repository. Accessed: 2025- 07-30

  12. [12]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  13. [13]

    Nmap: The network mapper - free security scanner

    Gordon Lyon. Nmap: The network mapper - free security scanner. https://nmap.org/, 2024. Accessed: July 27, 2024

  14. [14]

    Nikto web server scanner.https://github.com/sullo/ nikto, 2024

    Chris Sullo. Nikto web server scanner.https://github.com/sullo/ nikto, 2024. Accessed: July 27, 2024

  15. [15]

    Wpscan wordpress security scanner.https://github

    WPScan Team. Wpscan wordpress security scanner.https://github. com/wpscanteam/wpscan, 2024. Accessed: July 27, 2024

  16. [16]

    Automating post-exploitation with deep reinforcement learning.Computers&Security, 100:102108, 2021

    Ryusei Maeda and Mamoru Mimura. Automating post-exploitation with deep reinforcement learning.Computers&Security, 100:102108, 2021

  17. [17]

    Raiju: Reinforcement learning- guided post-exploitation for automating security assessment of network systems.Computer Networks, 253:110706, 2024

    Van-Hau Pham, Hien Do Hoang, Phan Thanh Trung, Van Dinh Quoc, Trong-Nghia To, and Phan The Duy. Raiju: Reinforcement learning- guided post-exploitation for automating security assessment of network systems.Computer Networks, 253:110706, 2024

  18. [18]

    AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,

    Jiacen Xu, Jack W Stokes, GeoffMcDonald, Xuesong Bai, David Mar- shall, Siyue Wang, Adith Swaminathan, and Zhou Li. AutoAttacker: A large language model guided system to implement automatic cyber- attacks.arXiv preprint arXiv:2403.01038, 2024

  19. [19]

    Refpentester: A knowledge-informed self-reflective penetration testing framework based on large language models.arXiv preprint arXiv:2505.07089, 2025

    Hanzheng Dai, Yuanliang Li, Zhibo Zhang, and Jun Yan. Refpentester: A knowledge-informed self-reflective penetration testing framework based on large language models.arXiv preprint arXiv:2505.07089, 2025

  20. [20]

    Rapidpen: Fully automated ip-to-shell penetration testing with llm-based agents.arXiv preprint arXiv:2502.16730, 2025

    Sho Nakatani. Rapidpen: Fully automated ip-to-shell penetration testing with llm-based agents.arXiv preprint arXiv:2502.16730, 2025

  21. [21]

    Weber, Ioannis Tzachristas, and Aifen Sui

    Dominik M. Weber, Ioannis Tzachristas, and Aifen Sui. Perses: Unlock- ing privilege escalation for small llms via extensible heterogeneity. In Proceedings of the 20th ACM Asia Conference on Computer and Com- munications Security (ASIA CCS ’25). ACM, 2025

  22. [22]

    LLM Agents can Autonomously Exploit One-day Vulnerabilities

    Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. Llm agents can autonomously exploit one-day vulnerabilities.arXiv preprint arXiv:2404.08144, 2024

  23. [23]

    the winning worker cost

    Yuxuan Zhu, Antony Kellermann, Akul Gupta, Philip Li, Richard Fang, Rohan Bindu, and Daniel Kang. Teams of llm agents can exploit zero-day vulnerabilities.arXiv preprint arXiv:2406.01637, Mar 2025. 16

  24. [24]

    Muzsai, D

    Lajos Muzsai, David Imolai, and Andr ´as Luk´acs. Hacksynth: Llm agent and evaluation framework for autonomous penetration testing.arXiv preprint arXiv:2412.01778, 2024

  25. [25]

    Nyu ctf bench: A scalable open-source bench- mark dataset for evaluating llms in offensive security

    Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Ramesh Karri, Prashanth Krishnamurthy, Farshad Khorrami, and Muhammad Shafique. Nyu ctf bench: A scalable open-source bench- mark dataset for evaluating llms in offensive security. InNeurIPS 2024 Datasets and Benchmarks Track, 2024

  26. [26]

    Cve-bench: A benchmark for ai agents’ ability to exploit real- world web application vulnerabilities.arXiv preprint arXiv:2503.17332, Mar 2025

    Yuxuan Zhu, Antony Kellermann, Dylan Bowman, Philip Li, Akul Gupta, Adarsh Danda, Richard Fang, Conner Jensen, Eric Ihli, Jason Benn, Jet Geronimo, Avi Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, and Daniel Kang. Cve-bench: A benchmark for ai agents’ ability to exploit real- world web application vulnerabilities.arXiv preprint arXiv:2503.17332, Mar 2025

  27. [27]

    To- wards automated penetration testing: Introducing LLM benchmark, anal- ysis, and improvements

    Isamu Isozaki, Manil Shrestha, Rick Console, and Edward Kim. To- wards automated penetration testing: Introducing LLM benchmark, anal- ysis, and improvements. InProceedings of the 2025 ACM Conference (companion/adjunct) on Computer and Communications Security, 2025. Accessed: 2025-08-06

  28. [28]

    Autopentest: Enhancing vulnerability management with autonomous llm agents.arXiv preprint arXiv:2505.10321, 2025

    Julius Henke. Autopentest: Enhancing vulnerability management with autonomous llm agents.arXiv preprint arXiv:2505.10321, 2025

  29. [29]

    Camel: Communicative agents for ”mind” exploration of large language model society

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for ”mind” exploration of large language model society. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Pro- cessing Systems, volume 36, pages 51991–52008. Curran Associates, Inc., 2023

  30. [30]

    langchain-chatchat.GitHub repository, 2024

    Liu Qian, Song Jinke, Huang Zhiguo, Zhang Yuxuan, glide the, and li- unux4odoo. langchain-chatchat.GitHub repository, 2024

  31. [31]

    Dirb web content scanner.https://gitlab.com/ kalilinux/packages/dirb, 2025

    DirB Project. Dirb web content scanner.https://gitlab.com/ kalilinux/packages/dirb, 2025

  32. [32]

    Gobuster - directory/file, dns and vhost busting tool written in go.https://github.com/OJ/gobuster, 2025

    Gobuster Project. Gobuster - directory/file, dns and vhost busting tool written in go.https://github.com/OJ/gobuster, 2025

  33. [33]

    Owasp amass - in-depth attack surface mapping and asset discovery.https://github.com/owasp-amass/amass, 2025

    OW ASP Amass Project. Owasp amass - in-depth attack surface mapping and asset discovery.https://github.com/owasp-amass/amass, 2025

  34. [34]

    sqlmap - automatic sql injection and database takeover tool.https://github.com/sqlmapproject/sqlmap, 2025

    sqlmap Developers. sqlmap - automatic sql injection and database takeover tool.https://github.com/sqlmapproject/sqlmap, 2025

  35. [35]

    Thc-hydra - network logon cracker.https:// github.com/vanhauser-thc/thc-hydra, 2025

    THC Hydra Team. Thc-hydra - network logon cracker.https:// github.com/vanhauser-thc/thc-hydra, 2025

  36. [36]

    John the ripper - password cracker.https:// github.com/openwall/john, 2025

    Openwall Project. John the ripper - password cracker.https:// github.com/openwall/john, 2025

  37. [37]

    Exploit database (exploit-db).https://www

    Offensive Security. Exploit database (exploit-db).https://www. exploit-db.com/, 2025

  38. [38]

    Hacktricks: Hacking techniques & privilege escalation encyclopedia.https://book.hacktricks.xyz/, 2025

    Carlos Polop. Hacktricks: Hacking techniques & privilege escalation encyclopedia.https://book.hacktricks.xyz/, 2025

  39. [39]

    Hacking articles: A cyber secu- rity community blog.https://www.hackingarticles.in/, 2025

    Raj Chandel and Hacking Articles Team. Hacking articles: A cyber secu- rity community blog.https://www.hackingarticles.in/, 2025

  40. [40]

    MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

    Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, et al. Memagent: Reshaping long-context llm with multi-conv rl-based memory agent.arXiv preprint arXiv:2507.02259, 2025

  41. [41]

    Kali linux: Penetration testing and ethical hacking linux distribution.https://www.kali.org/, 2025

    Offensive Security. Kali linux: Penetration testing and ethical hacking linux distribution.https://www.kali.org/, 2025

  42. [42]

    Zero: Memory optimizations toward training trillion parameter models

    Samyam Rajbhandari, JeffRasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimizations toward training trillion parameter models. InSC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE, 2020

  43. [43]

    Flashattention: Fast and memory-efficient exact attention with io-awareness.Advances in neural information processing systems, 35:16344–16359, 2022

    Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R´e. Flashattention: Fast and memory-efficient exact attention with io-awareness.Advances in neural information processing systems, 35:16344–16359, 2022

  44. [44]

    Tryhackme: Hands-on cybersecurity training plat- form.https://tryhackme.com, 2024

    TryHackMe Team. Tryhackme: Hands-on cybersecurity training plat- form.https://tryhackme.com, 2024

  45. [45]

    Hack the box: Cybersecurity labs and challenges

    HackTheBox Team. Hack the box: Cybersecurity labs and challenges. https://www.hackthebox.com, 2024

  46. [46]

    Vulnhub: Vulnerable machines for penetration testing practice.https://www.vulnhub.com, 2024

    VulnHub Community. Vulnhub: Vulnerable machines for penetration testing practice.https://www.vulnhub.com, 2024

  47. [47]

    Huggingface datasets hub: Open-source datasets for machine learning.https://huggingface.co/datasets, 2024

    HuggingFace Team. Huggingface datasets hub: Open-source datasets for machine learning.https://huggingface.co/datasets, 2024

  48. [48]

    Whiterabbitneo cybersecu- rity dataset (wrn-chapter-1, wrn-chapter-2).https://huggingface

    Migel Tissera and WhiteRabbitNeo Team. Whiterabbitneo cybersecu- rity dataset (wrn-chapter-1, wrn-chapter-2).https://huggingface. co/datasets/WhiteRabbitNeo/WRN-Chapter-1, 2024. 17