LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
In: Proceedings of the 38th International Conference on Software Engineering (ICSE 2016)
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 5verdicts
UNVERDICTED 5representative citing papers
ContractSkill converts draft web agent skills into explicit executable contracts that enable deterministic verification, fault localization, and minimal local repair, improving stability on benchmarks like VisualWebArena.
PrevaRank ranks plausible patches from APR tools using similarity to historic fix features, improving correct fix placement in top ranks on Defects4J bugs.
CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.
API misuses in data-centric libraries share key characteristics with deep learning misuses and occur regardless of whether documentation directives are present.
citing papers explorer
-
Social Bias in LLM-Generated Code: Benchmark and Mitigation
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
-
ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents
ContractSkill converts draft web agent skills into explicit executable contracts that enable deterministic verification, fault localization, and minimal local repair, improving stability on benchmarks like VisualWebArena.
-
Ranking Plausible Patches by Historic Feature Frequencies
PrevaRank ranks plausible patches from APR tools using similarity to historic fix features, improving correct fix placement in top ranks on Defects4J bugs.
-
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.
-
An Empirical Study of API Misuses of Data-Centric Libraries
API misuses in data-centric libraries share key characteristics with deep learning misuses and occur regardless of whether documentation directives are present.