GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
Preference ranking optimization for human alignment
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
EAPO adapts wildfire models to new environments via k-nearest neighbor data retrieval and hybrid fine-tuning that emphasizes rare extreme events, achieving ROC-AUC 0.7310 on real data.
citing papers explorer
-
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
-
Environment-Adaptive Preference Optimization for Wildfire Prediction
EAPO adapts wildfire models to new environments via k-nearest neighbor data retrieval and hybrid fine-tuning that emphasizes rare extreme events, achieving ROC-AUC 0.7310 on real data.