ADI equips AI debugging agents with function-level interaction via a new execution trace structure, raising SWE-bench Verified resolution to 63.8% at $1.28 per task and delivering 6-18% gains when added to existing agents.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 5representative citing papers
False-positive bug reports in the Linux kernel consume effort comparable to real bugs and can be filtered by LLMs using retrieval-augmented generation at 88% F1.
TestGeneralizer generalizes an initial test into a set of executable tests covering more diverse scenarios, delivering +31.66% mutation-based and +23.08% LLM-assessed scenario coverage gains over ChatTester on 12 open-source Java projects.
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
IntentionTest retrieves a reusable test from the project and edits it with an LLM to match a supplied validation intention, yielding tests that kill 28.1-37.6% more mutants, share 16.9-23.9% more coverage, and produce 23.7-49.0% more passing tests than four baselines on 3,680 cases.
citing papers explorer
-
Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis
ADI equips AI debugging agents with function-level interaction via a new execution trace structure, raising SWE-bench Verified resolution to 63.8% at $1.28 per task and delivering 6-18% gains when added to existing agents.
-
Characterizing and Mitigating False-Positive Bug Reports in the Linux Kernel
False-positive bug reports in the Linux kernel consume effort comparable to real bugs and can be filtered by LLMs using retrieval-augmented generation at 88% F1.
-
Generalizing Test Cases for Comprehensive Test Scenario Coverage
TestGeneralizer generalizes an initial test into a set of executable tests covering more diverse scenarios, delivering +31.66% mutation-based and +23.08% LLM-assessed scenario coverage gains over ChatTester on 12 open-source Java projects.
-
Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
-
Generating Project-Specific Test Cases with Requirement Validation Intention
IntentionTest retrieves a reusable test from the project and edits it with an LLM to match a supplied validation intention, yielding tests that kill 28.1-37.6% more mutants, share 16.9-23.9% more coverage, and produce 23.7-49.0% more passing tests than four baselines on 3,680 cases.