Recognition: unknown
BugScope: Learn to Find Bugs Like Human
read the original abstract
Software auditing is an increasingly critical task in the era of rapid code generation. While LLM-based auditors have demonstrated strong potential, their effectiveness remains limited by misalignment with the highly complex, domain-specific nature of bug detection. In this work, we introduce BugScope, a framework that mirrors how human auditors learn specific bug patterns from representative examples and apply this knowledge during code auditing. BugScope structures auditing into three steps: seed identification, context retrieval, and bug detection, and aligns LLMs to each step by analyzing real bug reports and mutated examples, and distilling concise, reusable guidelines. On a curated dataset of 33 real-world bugs from 21 widely used open-source projects, BugScope achieves 86.05\% precision and 87.88\% recall, corresponding to an F1 score of 0.87. By comparison, leading industrial tools such as Claude Code (with Claude Opus 4.6) and Cursor BugBot achieve F1 scores of only 0.51 and 0.43, respectively. Beyond benchmarks, large-scale evaluation on real-world projects such as the Linux kernel uncovered 184 previously unknown bugs, of which 78 have already been fixed and 7 explicitly confirmed by developers. Our code is available at https://github.com/jinyaoguo/BugScope
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries
Veritas detects memory corruption vulnerabilities in stripped binaries by combining static value-flow slicing, dual-view LLM reasoning, and multi-agent runtime validation, reporting 90% recall, zero false positives on...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.