Recursive generative retraining with heterogeneous rewards converges to a stable distribution satisfying a weighted Nash bargaining solution, preserving diversity under stated conditions.
arXiv preprint arXiv:2405.16455 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.
PAFO applies Pareto fairness optimization and group-specialized distillation to produce a single personalized reward model that improves accuracy for both majority and minority preference groups without requiring group labels at inference.
citing papers explorer
No citing papers match the current filters.