PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages

Aryaz Eghbali; Deniz Simsek; Michael Pradel

arxiv: 2506.04962 · v4 · pith:WKWCYUFSnew · submitted 2025-06-05 · 💻 cs.CR · cs.SE

PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages

Deniz Simsek , Aryaz Eghbali , Michael Pradel This is my paper

classification 💻 cs.CR cs.SE

keywords exploitsvulnerabilitiesvulnerabilitypocgenreportspackagesanalysisapproach

0 comments

read the original abstract

Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are informal and often incomplete, and because it requires a detailed understanding of how inputs passed to potentially vulnerable APIs may reach security-relevant sinks. In this paper, we present PoCGen, a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages. The approach is the first to address this task by combining the complementary strengths of large language models (LLMs), e.g., to understand informal vulnerability reports, with static analysis, e.g., to identify taint paths, and dynamic analysis, e.g., to validate generated exploits. PoCGen successfully generates exploits for 77% of the vulnerabilities in the SecBench$.$js dataset. This success rate significantly outperforms a recent baseline (by 45 absolute percentage points), while imposing an average cost of only $0.02 per generated exploit. Moreover, PoCGen generates six successful exploits for recent real-world vulnerabilities, five of which are now included in their respective vulnerability reports.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?
cs.CR 2026-05 unverdicted novelty 7.0

SEC-bench Pro benchmark with 183 real vulnerabilities shows frontier LLM coding agents achieve at most 38.8% success on SpiderMonkey and 32% on V8.
Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning
cs.CR 2026-04 unverdicted novelty 7.0

LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse
cs.SE 2026-05 unverdicted novelty 6.0

ContraFix couples differential runtime evidence from execution variants with reusable repair skills to achieve 84.0% resolution on SEC-Bench and 73.8% on PatchEval using GPT-5-mini, outperforming baselines at lower cost.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
cs.CR 2026-05 unverdicted novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection
cs.SE 2026-04 conditional novelty 6.0

AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.
Program Analysis Guided LLM Agent for Proof-of-Concept Generation
cs.SE 2026-04 unverdicted novelty 6.0

PAGENT integrates static and dynamic program analysis guidance with an LLM agent to improve automated proof-of-concept generation success by 132% over prior agentic methods.
PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy
cs.CR 2026-04 unverdicted novelty 6.0

PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.
Triggering and Detecting Exploitable Library Vulnerability from the Client by Directed Greybox Fuzzing
cs.CR 2026-04 conditional novelty 6.0

LiveFuzz extends directed greybox fuzzing with abstract path mapping and risk-based mutation to expose library vulnerabilities from client programs on a 61-case dataset, reaching more target paths and triggering three...
V2E: Validating Smart Contract Vulnerabilities through Profit-driven Exploit Generation and Execution
cs.SE 2026-04 unverdicted novelty 5.0

V2E automates PoC generation, triggerability and profitability validation, and iterative refinement using LLMs to confirm exploitable smart contract vulnerabilities, outperforming baselines on 264 labeled contracts.
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
cs.SE 2026-04 unverdicted novelty 5.0

Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.