AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.
Eshkevari, Davood Mazinanian, and Danny Dig
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 4years
2026 4roles
method 1polarities
use method 1representative citing papers
AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.
A two-stage LLM pipeline for taxonomy-based labeling of code changes in patches achieves up to 84% recall and 81% precision on a manually curated benchmark of natural and synthetic patches.
ARGUS extracts fragmented code change rationales from multiple documents using LLMs and generates summaries that developers rate as useful for review and maintenance.
citing papers explorer
-
AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits
AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.
-
AgenticSZZ: Temporal Knowledge Graph-Guided Agentic Bug-Inducing Commit Identification
AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.
-
Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models
A two-stage LLM pipeline for taxonomy-based labeling of code changes in patches achieves up to 84% recall and 81% precision on a manually curated benchmark of natural and synthetic patches.
-
Fine-grained Multi-Document Extraction and Generation of Code Change Rationale
ARGUS extracts fragmented code change rationales from multiple documents using LLMs and generates summaries that developers rate as useful for review and maintenance.