Causal mediation analysis shows harmful LLM outputs arise in late layers from MLP failures and gating neurons, with early layers handling harm context detection and signal propagation.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Fine-tuning and prompting reduce some CWEs in AI-generated code but frequently introduce new weaknesses, with no strategy working reliably across models or languages.
Empirical evaluation shows that code generated by all seven tested LLMs contains vulnerabilities, the majority of critical or high severity.
Empirical analysis of AI refactoring PRs shows quality attribute improvements in 22.5% of cases with new Pylint issues in 24.17% and Bandit findings in 4.7%, yet 73.5% developer acceptance.
citing papers explorer
-
Why Do Large Language Models Generate Harmful Content?
Causal mediation analysis shows harmful LLM outputs arise in late layers from MLP failures and gating neurons, with early layers handling harm context detection and signal propagation.
-
On Fixing Insecure AI-Generated Code through Model Fine-Tuning and Prompting Strategies
Fine-tuning and prompting reduce some CWEs in AI-generated code but frequently introduce new weaknesses, with no strategy working reliably across models or languages.
-
Security of LLM-generated Code: A Comparative Analysis
Empirical evaluation shows that code generated by all seven tested LLMs contains vulnerabilities, the majority of critical or high severity.
-
Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
Empirical analysis of AI refactoring PRs shows quality attribute improvements in 22.5% of cases with new Pylint issues in 24.17% and Bandit findings in 4.7%, yet 73.5% developer acceptance.