hub

AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks

· 2024 · arXiv 2403.01038

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

cs.CR · 2026-05-04 · unverdicted · novelty 8.0

APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

cs.CR · 2026-04-07 · unverdicted · novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

HIDBench: Benchmarking Large Language Models for Host-Based Intrusion Detection

cs.CR · 2026-05-20 · unverdicted · novelty 7.0

HIDBench unifies DARPA-E3, DARPA-E5, and NodLink datasets with a data pipeline to benchmark LLMs for host-based intrusion detection, showing high precision on simple logs but sharp drops in MCC and rises in false positives on complex noisy data.

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.

Frontier Models are Capable of In-context Scheming

cs.AI · 2024-12-06 · conditional · novelty 7.0

Frontier models demonstrate in-context scheming by strategically deceiving in multiple agentic evaluations to achieve given goals.

PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents

cs.CR · 2026-05-20 · unverdicted · novelty 6.0

PocketAgents introduces a manifest-driven library for LLM-based autonomous defense agents, evaluated in 18 closed-loop trials against a DarkSide-inspired attack where 13 trials produced validated blocking actions.

Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

cs.CR · 2026-04-07 · unverdicted · novelty 6.0

CritBench evaluates five LLMs on 81 tasks in IEC 61850 environments, showing reliable performance on static analysis and single-tool reconnaissance but degradation on dynamic live-system tasks that require sequential reasoning, with domain-specific tools improving results.

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

cs.SE · 2025-12-28 · unverdicted · novelty 6.0

RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.

Autonomous Adversary: Red-Teaming in the age of LLM

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

Expert-defined action plans for LLM agents achieve higher task completion in lateral-movement scenarios than fully autonomous or self-scaffolded modes, but failures remain common due to brittle commands and state handling.

Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis

cs.CR · 2026-05-06 · unverdicted · novelty 5.0

Pen-Strategist fine-tunes Qwen-3-14B with RL on a pentesting reasoning dataset and pairs it with a CNN step classifier, reporting 87% better strategy derivation, 47.5% more subtask completions than baselines, and gains on CTFKnow and user studies.

Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents

cs.CR · 2026-04-29 · unverdicted · novelty 5.0

Targeted prompting and system interventions enable local LLMs such as Llama 3.1 70B to exploit 83% of tested Linux privilege escalation vulnerabilities.

LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations

cs.CR · 2026-04-07 · unverdicted · novelty 5.0

LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.

xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models

cs.CR · 2025-09-16 · unverdicted · novelty 5.0

xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.

citing papers explorer

Showing 14 of 14 citing papers.

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks cs.CR · 2026-05-04 · unverdicted · none · ref 33
APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing cs.CR · 2026-04-07 · unverdicted · none · ref 121
The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.
HIDBench: Benchmarking Large Language Models for Host-Based Intrusion Detection cs.CR · 2026-05-20 · unverdicted · none · ref 7
HIDBench unifies DARPA-E3, DARPA-E5, and NodLink datasets with a data pipeline to benchmark LLMs for host-based intrusion detection, showing high precision on simple logs but sharp drops in MCC and rises in false positives on complex noisy data.
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios cs.CR · 2026-05-08 · unverdicted · none · ref 29
LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.
Frontier Models are Capable of In-context Scheming cs.AI · 2024-12-06 · conditional · none · ref 36
Frontier models demonstrate in-context scheming by strategically deceiving in multiple agentic evaluations to achieve given goals.
PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents cs.CR · 2026-05-20 · unverdicted · none · ref 17
PocketAgents introduces a manifest-driven library for LLM-based autonomous defense agents, evaluated in 18 closed-loop trials against a DarkSide-inspired attack where 13 trials produced validated blocking actions.
Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems cs.CR · 2026-05-01 · unverdicted · none · ref 37
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.
CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments cs.CR · 2026-04-07 · unverdicted · none · ref 8
CritBench evaluates five LLMs on 81 tasks in IEC 61850 environments, showing reliable performance on static analysis and single-tool reconnaissance but degradation on dynamic live-system tasks that require sequential reasoning, with domain-specific tools improving results.
From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software cs.SE · 2025-12-28 · unverdicted · none · ref 34
RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.
Autonomous Adversary: Red-Teaming in the age of LLM cs.CR · 2026-05-07 · unverdicted · none · ref 6
Expert-defined action plans for LLM agents achieve higher task completion in lateral-movement scenarios than fully autonomous or self-scaffolded modes, but failures remain common due to brittle commands and state handling.
Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis cs.CR · 2026-05-06 · unverdicted · none · ref 39
Pen-Strategist fine-tunes Qwen-3-14B with RL on a pentesting reasoning dataset and pairs it with a CNN step classifier, reporting 87% better strategy derivation, 47.5% more subtask completions than baselines, and gains on CTFKnow and user studies.
Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents cs.CR · 2026-04-29 · unverdicted · none · ref 33
Targeted prompting and system interventions enable local LLMs such as Llama 3.1 70B to exploit 83% of tested Linux privilege escalation vulnerabilities.
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations cs.CR · 2026-04-07 · unverdicted · none · ref 32
LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.
xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models cs.CR · 2025-09-16 · unverdicted · none · ref 18
xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.

AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer