LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
1999.Mathematical methods of statistics
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.SE 2years
2026 2verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.
citing papers explorer
-
Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
-
Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.