GW-DPO with bilateral weighting improves macro pairwise priority adherence on Llama-3.1-8B-Instruct over standard DPO while halving over-refusal rates.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization
GW-DPO with bilateral weighting improves macro pairwise priority adherence on Llama-3.1-8B-Instruct over standard DPO while halving over-refusal rates.