Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
arXiv preprint arXiv:2203.09168 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A probabilistic denoising model recovers spectral features from Poisson-noisy 3D ARPES data at 0.02 electrons per voxel and propagates uncertainties into superconducting gap fits for cuprate superconductors.
A relaxed Picard iteration plus heteroscedastic boundary denoising lets Monte Carlo PDE solvers solve heat equations with nonlinear radiation boundary conditions more accurately than linearization.
citing papers explorer
-
Variance-aware Reward Modeling with Anchor Guidance
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
-
Probabilistic denoising for reliable signal extraction in spectroscopy
A probabilistic denoising model recovers spectral features from Poisson-noisy 3D ARPES data at 0.02 electrons per voxel and propagates uncertainties into superconducting gap fits for cuprate superconductors.
-
Monte Carlo PDE Solvers for Nonlinear Radiative Boundary Conditions
A relaxed Picard iteration plus heteroscedastic boundary denoising lets Monte Carlo PDE solvers solve heat equations with nonlinear radiation boundary conditions more accurately than linearization.