MOA applies multi-objective RL with fine-grained rubrics and thought-augmented rollouts to role-playing agents, enabling an 8B model to match closed-source performance on PersonaGym and RoleMRC benchmarks.
Given a group of G rollouts and D dimensions, the rollouts will be optimized for D iterations
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MOA: Multi-Objective Alignment for Role-Playing Agents
MOA applies multi-objective RL with fine-grained rubrics and thought-augmented rollouts to role-playing agents, enabling an 8B model to match closed-source performance on PersonaGym and RoleMRC benchmarks.