LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
//arxiv.org/abs/2304.07590, 2304.07590
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
Iterative self-repair improves LLM code pass rates by 4.9-17.1 pp on HumanEval and 16-30 pp on MBPP across seven models, with gains concentrated early and syntax errors easier to fix than logical ones.
citing papers explorer
-
Social Bias in LLM-Generated Code: Benchmark and Mitigation
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
-
How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks
Iterative self-repair improves LLM code pass rates by 4.9-17.1 pp on HumanEval and 16-30 pp on MBPP across seven models, with gains concentrated early and syntax errors easier to fix than logical ones.
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation