Reinforcement learning with a dual recall-precision reward trains models to enumerate valid interpretations and answers for ambiguous inputs using only multiple-answer supervision.
Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, and Ming-Wei Chang
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.
citing papers explorer
-
Reasoning about Intent for Ambiguous Requests
Reinforcement learning with a dual recall-precision reward trains models to enumerate valid interpretations and answers for ambiguous inputs using only multiple-answer supervision.
-
Discriminatory Compliance: How LLMs Answer Queries from Protected Groups
State-of-the-art LLMs respond inconsistently to queries from protected-group personas, with some responses omitting key information that should be provided.