Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
The annals of statistics , pages=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
A model-agnostic two-stage estimator links high-fidelity quantiles to low-fidelity ones via a covariate-dependent level function for faster convergence and better accuracy with limited high-fidelity data.
Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.
Empirical Bernstein confidence intervals for kernel smoothers attain nominal coverage up to a remainder of order n to the minus 2S over 2S+1 while achieving minimax optimal widths under S-th order local smoothness.
citing papers explorer
-
Variance-aware Reward Modeling with Anchor Guidance
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
-
Multi-Fidelity Quantile Regression
A model-agnostic two-stage estimator links high-fidelity quantiles to low-fidelity ones via a covariate-dependent level function for faster convergence and better accuracy with limited high-fidelity data.
-
Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation
Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.
-
Empirical Bernstein Confidence Intervals for Kernel Smoothers: A Safe and Sharp Way to Exhaust Assumed Smoothness
Empirical Bernstein confidence intervals for kernel smoothers attain nominal coverage up to a remainder of order n to the minus 2S over 2S+1 while achieving minimax optimal widths under S-th order local smoothness.