Safety alignment reduces to density ratio matching, enabling Bregman Safety Optimization losses that provably recover the optimal safe policy as special cases of prior methods.
Creative Problem-Solving
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
BSO: Safety Alignment Is Density Ratio Matching
Safety alignment reduces to density ratio matching, enabling Bregman Safety Optimization losses that provably recover the optimal safe policy as special cases of prior methods.