Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.
citing papers explorer
-
Variance-aware Reward Modeling with Anchor Guidance
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
-
Semi-supervised Method for Risk Prediction with Doubly Censored EHR Data
Proposes a novel semi-supervised estimator for risk prediction under double censoring that combines limited gold-standard labels with large-scale surrogates, proves theoretical validity, and shows efficiency gains over supervised methods in simulations and a T2D EHR application.