Layer-isolated evaluation decomposes LLM agents into per-layer deterministic no-LLM test slices whose locked baselines localize regressions that aggregate pass rates mask.
Zhang, Mark Harman, Lei Ma, and Yang Liu
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ML-specific code smells occur 41-94 times less often than general Python smells in 279 projects, with associations to commit frequency and domain but none for general smells or most other project characteristics.
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
citing papers explorer
-
Layer-Isolated Evaluation: Gating the Deterministic Scaffold of a Production LLM Agent with a No-LLM, Regression-Locked Test Harness
Layer-isolated evaluation decomposes LLM agents into per-layer deterministic no-LLM test slices whose locked baselines localize regressions that aggregate pass rates mask.
-
Comparing ML-Specific and General Python Code Smells Across Project Characteristics
ML-specific code smells occur 41-94 times less often than general Python smells in 279 projects, with associations to commit frequency and domain but none for general smells or most other project characteristics.
-
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.