pith. sign in

hub Mixed citations

LLM Agents can Autonomously Exploit One-day Vulnerabilities

Mixed citation behavior. Most common role is background (60%).

27 Pith papers citing it
Background 60% of classified citations
abstract

LLMs have becoming increasingly powerful, both in their benign and malicious uses. With the increase in capabilities, researchers have been increasingly interested in their ability to exploit cybersecurity vulnerabilities. In particular, recent work has conducted preliminary studies on the ability of LLM agents to autonomously hack websites. However, these studies are limited to simple vulnerabilities. In this work, we show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description. When given the CVE description, GPT-4 is capable of exploiting 87% of these vulnerabilities compared to 0% for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit). Fortunately, our GPT-4 agent requires the CVE description for high performance: without the description, GPT-4 can exploit only 7% of the vulnerabilities. Our findings raise questions around the widespread deployment of highly capable LLM agents.

hub tools

citation-role summary

background 8 method 1 other 1

citation-polarity summary

representative citing papers

Agentic Vulnerability Reasoning on Windows COM Binaries

cs.CR · 2026-05-06 · accept · novelty 7.0

SLYP agentic pipeline discovers race condition vulnerabilities in Windows COM binaries and generates debugger-verified PoCs, scoring 0.973 F1 on a 40-case benchmark and finding 28 new confirmed vulnerabilities in production services.

SoK: Honeypots & LLMs, More Than the Sum of Their Parts?

cs.CR · 2025-10-29 · unverdicted · novelty 7.0

A systematization of knowledge paper that taxonomizes honeypot detection vectors, synthesizes LLM-honeypot literature into canonical architecture and evaluation methods, and proposes a roadmap for autonomous deception systems.

Towards Optimal Agentic Architectures for Offensive Security Tasks

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Empirical comparison of agentic topologies for offensive security shows MAS-Indep reaching 64.2% validated detection while simpler baselines remain competitive on efficiency, with whitebox and web targets outperforming blackbox and binary ones.

An Independent Safety Evaluation of Kimi K2.5

cs.CR · 2026-04-03 · conditional · novelty 6.0

Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.

Hephaestus: Toward a Cybersecurity AI Scientist

cs.CR · 2026-06-29 · unverdicted · novelty 4.0

The paper proposes the Cybersecurity AI Scientist as a modular multi-agent architecture for automating cybersecurity research, distinguished by its focus on non-stationary threats and anchored in a four-zeros risk-trust-incident-energy frame.

citing papers explorer

Showing 27 of 27 citing papers.