hub Mixed citations

LLM Agents can Autonomously Exploit One-day Vulnerabilities

Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang · 2024 · cs.CR · arXiv 2404.08144

Mixed citation behavior. Most common role is background (60%).

27 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 27 citing papers arXiv PDF

abstract

LLMs have becoming increasingly powerful, both in their benign and malicious uses. With the increase in capabilities, researchers have been increasingly interested in their ability to exploit cybersecurity vulnerabilities. In particular, recent work has conducted preliminary studies on the ability of LLM agents to autonomously hack websites. However, these studies are limited to simple vulnerabilities. In this work, we show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description. When given the CVE description, GPT-4 is capable of exploiting 87% of these vulnerabilities compared to 0% for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit). Fortunately, our GPT-4 agent requires the CVE description for high performance: without the description, GPT-4 can exploit only 7% of the vulnerabilities. Our findings raise questions around the widespread deployment of highly capable LLM agents.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 1 other 1

citation-polarity summary

background 6 support 2 unclear 1 use method 1

representative citing papers

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

cs.CR · 2026-05-04 · unverdicted · novelty 8.0

APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.

Agentic Vulnerability Reasoning on Windows COM Binaries

cs.CR · 2026-05-06 · accept · novelty 7.0

SLYP agentic pipeline discovers race condition vulnerabilities in Windows COM binaries and generates debugger-verified PoCs, scoring 0.973 F1 on a 40-case benchmark and finding 28 new confirmed vulnerabilities in production services.

PHANTOM: Polymorphic Honeytoken Adaptation with Narrative-Tailored Organisational Mimicry

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

PHANTOM raises honeytoken believability from 0.576 to 0.778 by adding organization-specific mimicry, lifting human acceptance to 100% and detection resistance to 0.870.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).

SoK: Honeypots & LLMs, More Than the Sum of Their Parts?

cs.CR · 2025-10-29 · unverdicted · novelty 7.0

A systematization of knowledge paper that taxonomizes honeypot detection vectors, synthesizes LLM-honeypot literature into canonical architecture and evaluation methods, and proposes a roadmap for autonomous deception systems.

APT-Agent: Automated Penetration Testing using Large Language Models

cs.CR · 2026-05-24 · unverdicted · novelty 6.0

APT-Agent automates penetration testing with LLMs using rectification and memory modules, achieving 84.29% end-to-end success on Metasploitable 2 versus lower rates for baselines.

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

An agentic pipeline localizes the security-relevant function in 10 of 20 Ubuntu binary security updates and produces an accepted root-cause classification in 11 of 20, limited mainly by binary differencing coverage.

Towards Optimal Agentic Architectures for Offensive Security Tasks

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Empirical comparison of agentic topologies for offensive security shows MAS-Indep reaching 64.2% validated detection while simpler baselines remain competitive on efficiency, with whitebox and web targets outperforming blackbox and binary ones.

An Independent Safety Evaluation of Kimi K2.5

cs.CR · 2026-04-03 · conditional · novelty 6.0

Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

cs.SE · 2025-12-28 · unverdicted · novelty 6.0

RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.

Direct Causation in International Humanitarian Law and the Challenge of AI-Mediated Civilian Cyber Operations

cs.AI · 2026-06-28 · unverdicted · novelty 5.0

Autonomous AI cyber systems deployed by civilians fail the one-causal-step and integral-part requirements of the IHL direct participation test because harm arises from post-disengagement system decisions.

Demand-Driven Vulnerability Detection for Cloud Security Posture Management: Removing Human Rule Authoring from the Disclosure-to-Protection Critical Path

cs.CR · 2026-06-06 · unverdicted · novelty 5.0

Proposes demand-driven, tenant-local derivation of CSPM rules from catalogue-asset intersections to eliminate vendor rule authoring and release cadence delays.

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

cs.CR · 2026-05-31 · accept · novelty 5.0

Analysis of 67,453 OpenClaw skills shows three scanners overlap on at most 10.4% of combined positives, with 81.9% flagged by only one scanner and distinct profiles for malicious versus suspicious skills.

A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection

cs.SE · 2026-04-06 · unverdicted · novelty 5.0

Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.

xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models

cs.CR · 2025-09-16 · unverdicted · novelty 5.0

xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.

Hephaestus: Toward a Cybersecurity AI Scientist

cs.CR · 2026-06-29 · unverdicted · novelty 4.0

The paper proposes the Cybersecurity AI Scientist as a modular multi-agent architecture for automating cybersecurity research, distinguished by its focus on non-stationary threats and anchored in a four-zeros risk-trust-incident-energy frame.

Needles at Scale: LLM-Assisted Target Selection for Windows Vulnerability Research

cs.CR · 2026-05-31 · unverdicted · novelty 4.0

Symbolicate-Enrich-Sample recovers symbols and call graphs from Windows binaries, enriches functions with LLM labels on reachability and risk, and produces a prioritized ~22K-function shortlist from 7.2M total via importance sampling.

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

cs.CR · 2026-05-28 · unverdicted · novelty 4.0

Proposes a typed Security Context enforced across LLM agent components, Runtime Core, Tool Adapter Layer, and HITL gates for auditable, scoped cybersecurity workflows.

Token Economics for LLM Agents: A Dual-View Study from Computing and Economics

cs.AI · 2026-05-09 · unverdicted · novelty 4.0

The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.

Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand

cs.CR · 2026-05-06 · unverdicted · novelty 4.0

Agentic AI lowers the cost and speed of cyber attacks, requiring immediate improvements in identity management, phishing-resistant authentication, patching, and agent governance for large enterprises and the Mittelstand.

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

cs.AI · 2026-05-03 · unverdicted · novelty 4.0

CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.

citing papers explorer

Showing 27 of 27 citing papers.

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks cs.CR · 2026-05-04 · unverdicted · none · ref 12 · internal anchor
APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios cs.CR · 2026-05-08 · unverdicted · none · ref 5 · internal anchor
LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.
Agentic Vulnerability Reasoning on Windows COM Binaries cs.CR · 2026-05-06 · accept · none · ref 13 · internal anchor
SLYP agentic pipeline discovers race condition vulnerabilities in Windows COM binaries and generates debugger-verified PoCs, scoring 0.973 F1 on a 40-case benchmark and finding 28 new confirmed vulnerabilities in production services.
PHANTOM: Polymorphic Honeytoken Adaptation with Narrative-Tailored Organisational Mimicry cs.CR · 2026-05-04 · unverdicted · none · ref 7 · internal anchor
PHANTOM raises honeytoken believability from 0.576 to 0.778 by adding organization-specific mimicry, lifting human acceptance to 100% and detection resistance to 0.870.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 98 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning cs.CR · 2026-04-22 · unverdicted · none · ref 21 · internal anchor
LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
SoK: Honeypots & LLMs, More Than the Sum of Their Parts? cs.CR · 2025-10-29 · unverdicted · none · ref 104 · internal anchor
A systematization of knowledge paper that taxonomizes honeypot detection vectors, synthesizes LLM-honeypot literature into canonical architecture and evaluation methods, and proposes a roadmap for autonomous deception systems.
APT-Agent: Automated Penetration Testing using Large Language Models cs.CR · 2026-05-24 · unverdicted · none · ref 14 · internal anchor
APT-Agent automates penetration testing with LLMs using rectification and memory modules, achieving 84.29% end-to-end success on Metasploitable 2 versus lower rates for baselines.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs cs.CR · 2026-05-15 · unverdicted · none · ref 9 · internal anchor
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches cs.CR · 2026-05-07 · unverdicted · none · ref 18 · internal anchor
An agentic pipeline localizes the security-relevant function in 10 of 20 Ubuntu binary security updates and produces an accepted root-cause classification in 11 of 20, limited mainly by binary differencing coverage.
Towards Optimal Agentic Architectures for Offensive Security Tasks cs.CR · 2026-04-20 · unverdicted · none · ref 5 · internal anchor
Empirical comparison of agentic topologies for offensive security shows MAS-Indep reaching 64.2% validated detection while simpler baselines remain competitive on efficiency, with whitebox and web targets outperforming blackbox and binary ones.
An Independent Safety Evaluation of Kimi K2.5 cs.CR · 2026-04-03 · conditional · none · ref 78 · internal anchor
Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software cs.SE · 2025-12-28 · unverdicted · none · ref 7 · internal anchor
RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.
Direct Causation in International Humanitarian Law and the Challenge of AI-Mediated Civilian Cyber Operations cs.AI · 2026-06-28 · unverdicted · none · ref 2 · internal anchor
Autonomous AI cyber systems deployed by civilians fail the one-causal-step and integral-part requirements of the IHL direct participation test because harm arises from post-disengagement system decisions.
Demand-Driven Vulnerability Detection for Cloud Security Posture Management: Removing Human Rule Authoring from the Disclosure-to-Protection Critical Path cs.CR · 2026-06-06 · unverdicted · none · ref 46 · internal anchor
Proposes demand-driven, tenant-local derivation of CSPM rules from catalogue-asset intersections to eliminate vendor rule authoring and release cadence delays.
ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree cs.CR · 2026-05-31 · accept · none · ref 10 · internal anchor
Analysis of 67,453 OpenClaw skills shows three scanners overlap on at most 10.4% of combined positives, with 81.9% flagged by only one scanner and distinct profiles for malicious versus suspicious skills.
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection cs.SE · 2026-04-06 · unverdicted · none · ref 15 · internal anchor
Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.
xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models cs.CR · 2025-09-16 · unverdicted · none · ref 22 · internal anchor
xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.
Hephaestus: Toward a Cybersecurity AI Scientist cs.CR · 2026-06-29 · unverdicted · none · ref 14 · internal anchor
The paper proposes the Cybersecurity AI Scientist as a modular multi-agent architecture for automating cybersecurity research, distinguished by its focus on non-stationary threats and anchored in a four-zeros risk-trust-incident-energy frame.
Needles at Scale: LLM-Assisted Target Selection for Windows Vulnerability Research cs.CR · 2026-05-31 · unverdicted · none · ref 5 · internal anchor
Symbolicate-Enrich-Sample recovers symbols and call graphs from Windows binaries, enriches functions with LLM labels on reachability and risk, and produces a prioritized ~22K-function shortlist from 7.2M total via importance sampling.
An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations cs.CR · 2026-05-28 · unverdicted · none · ref 15 · internal anchor
Proposes a typed Security Context enforced across LLM agent components, Runtime Core, Tool Adapter Layer, and HITL gates for auditable, scoped cybersecurity workflows.
Token Economics for LLM Agents: A Dual-View Study from Computing and Economics cs.AI · 2026-05-09 · unverdicted · none · ref 157 · internal anchor
The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.
Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand cs.CR · 2026-05-06 · unverdicted · none · ref 13 · internal anchor
Agentic AI lowers the cost and speed of cyber attacks, requiring immediate improvements in identity management, phishing-resistant authentication, patching, and agent governance for large enterprises and the Mittelstand.
CyberAId: AI-Driven Cybersecurity for Financial Service Providers cs.AI · 2026-05-03 · unverdicted · none · ref 4 · internal anchor
CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 101 · internal anchor
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
Large Language Model-Based Agents for Software Engineering: A Survey cs.SE · 2024-09-04 · unverdicted · none · ref 179 · internal anchor
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 40 · internal anchor
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

LLM Agents can Autonomously Exploit One-day Vulnerabilities

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer