SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.
Advances in Neural Information Processing Systems , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
RepFlow combines representation learning and conditional flow matching to estimate both point and distributional causal effects while mitigating selection bias via entropically regularized Wasserstein distance on normalized latent representations.
citing papers explorer
-
Optimal Transport for LLM Reward Modeling from Noisy Preference
SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.
-
RepFlow: Representation Enhanced Flow Matching for Causal Effect Estimation
RepFlow combines representation learning and conditional flow matching to estimate both point and distributional causal effects while mitigating selection bias via entropically regularized Wasserstein distance on normalized latent representations.