A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
SAILOR combines static analysis and LLM-orchestrated synthesis to automatically generate symbolic execution harnesses, discovering 379 previously unknown memory-safety vulnerabilities across 10 large open-source C/C++ projects where the strongest baseline found only 12.
ENCRUST decouples C-to-Rust translation via ABI wrappers and agentic refinement to reduce unsafe constructs across 15 real programs while preserving full test correctness.
FunFuzz uses parallel LLM islands with candidate migration and adaptive prompting to achieve higher compiler coverage and more unique internal failures than prior LLM fuzzers on GCC and Clang over 24-hour runs.
citing papers explorer
-
Evaluating LLM Agents on Automated Software Analysis Tasks
A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
-
Guiding Symbolic Execution with Static Analysis and LLMs for Vulnerability Discovery
SAILOR combines static analysis and LLM-orchestrated synthesis to automatically generate symbolic execution harnesses, discovering 379 previously unknown memory-safety vulnerabilities across 10 large open-source C/C++ projects where the strongest baseline found only 12.
-
ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation
ENCRUST decouples C-to-Rust translation via ABI wrappers and agentic refinement to reduce unsafe constructs across 15 real programs while preserving full test correctness.
-
FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework
FunFuzz uses parallel LLM islands with candidate migration and adaptive prompting to achieve higher compiler coverage and more unique internal failures than prior LLM fuzzers on GCC and Clang over 24-hour runs.