Using a corpus of 5542 fault-injected traces from 38 DL programs, the study finds a 0.19 balanced accuracy gap in fault diagnosis between within-program and cross-program evaluation caused by program-specific feature structures.
An empirical comparison of model validation techniques for defect prediction models,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2representative citing papers
Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.
citing papers explorer
-
Evaluation-Strategy Gap in Fault Diagnosis of Deep Learning Programs
Using a corpus of 5542 fault-injected traces from 38 DL programs, the study finds a 0.19 balanced accuracy gap in fault diagnosis between within-program and cross-program evaluation caused by program-specific feature structures.
-
PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes
Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.