Augmenting LLM search judges with historical QRI cards improves Spearman correlation with user preferences by ~5% overall (91% relative on disagreements) and 15% in multilingual settings, with better alignment to live A/B test outcomes.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Open-ended preference data reveals substantial plurality in what people want from AI and divergent interpretations of shared values such as truthfulness.
CRPO modifies GRPO with three mechanisms—decoupling task and style rewards, adapting constraints to character complexity, and using generic responses as negative baselines—to improve character fidelity in role-playing agents.
citing papers explorer
-
What Do People Actually Want From AI? Mapping Preference Plurality
Open-ended preference data reveals substantial plurality in what people want from AI and divergent interpretations of shared values such as truthfulness.
-
CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents
CRPO modifies GRPO with three mechanisms—decoupling task and style rewards, adapting constraints to character complexity, and using generic responses as negative baselines—to improve character fidelity in role-playing agents.