The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 7years
2026 7representative citing papers
ReproBreak provides 449 verified locator breaks from real web test commits along with scripts to reproduce them automatically.
A systematic review of 50 studies identifies 69 LLM-assisted tasks in empirical software engineering, concentrated in data processing and analysis with gaps in human-centered integration and reproducibility reporting.
Creativity in human-LLM collaborative software design emerges primarily from human traits and interactions, with LLMs providing supplementary novel ideas but occasionally hindering progress.
Documentation on testing in 160 OSS repositories shows a weak positive correlation (ρ=0.36) with higher test engagement ratios in pull requests, strengthening to moderate in high-activity repos.
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.
citing papers explorer
-
Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks
The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.
-
ReproBreak: A Dataset of Reproducible Web Locator Breaks
ReproBreak provides 449 verified locator breaks from real web test commits along with scripts to reproduce them automatically.
-
LLM-Assisted Empirical Software Engineering: Systematic Literature Review and Research Agenda
A systematic review of 50 studies identifies 69 LLM-assisted tasks in empirical software engineering, concentrated in data processing and analysis with gaps in human-centered integration and reproducibility reporting.
-
Exploring Creativity in Human-Human-LLM Collaborative Software Design
Creativity in human-LLM collaborative software design emerges primarily from human traits and interactions, with LLMs providing supplementary novel ideas but occasionally hindering progress.
-
The Impact of Documentation on Test Engagement in Pull Requests in OSS
Documentation on testing in 160 OSS repositories shows a weak positive correlation (ρ=0.36) with higher test engagement ratios in pull requests, strengthening to moderate in high-activity repos.
-
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
-
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning
MNAL reduces human effort in bug report labeling by up to 95.8% for readability and 196% for identifiability while improving identification performance and working with various neural models.