Reward-free alignment for conflicting objectives

Peter L Chen, Xiaopeng Li, Xi Chen, Tianyi Lin · 2026 · arXiv 2602.02495

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Update direction selection for PINN training is cast as a Chebyshev-center problem in the dual cone, yielding an efficient dual formulation with nonconvex convergence guarantees and automatic recovery of scale robustness and simultaneous descent.

RVPO: Risk-Sensitive Alignment via Variance Regularization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

RVPO penalizes variance across multiple reward signals during RLHF advantage aggregation, using a LogSumExp operator as a smooth variance penalty to reduce constraint neglect in LLM alignment.

citing papers explorer

Showing 2 of 2 citing papers.

Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs cs.LG · 2026-05-11 · unverdicted · none · ref 27
Update direction selection for PINN training is cast as a Chebyshev-center problem in the dual cone, yielding an efficient dual formulation with nonconvex convergence guarantees and automatic recovery of scale robustness and simultaneous descent.
RVPO: Risk-Sensitive Alignment via Variance Regularization cs.LG · 2026-05-07 · unverdicted · none · ref 24
RVPO penalizes variance across multiple reward signals during RLHF advantage aggregation, using a LogSumExp operator as a smooth variance penalty to reduce constraint neglect in LLM alignment.

Reward-free alignment for conflicting objectives

fields

years

verdicts

representative citing papers

citing papers explorer