A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 2roles
background 1polarities
background 1representative citing papers
CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.
citing papers explorer
-
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
-
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation
CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.