A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Unifying Lens on Reward Uncertainty in RLHF
A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.