GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
arXiv preprint arXiv:2402.10958 (2024)
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
method 2representative citing papers
RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.
SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.
Empathic similarity feedback in prompts generates more acceptable compromises than chain-of-thought, and margin-based training on the resulting data lets smaller models produce them without ongoing empathy estimation.
Empirical study on five LLMs finds pretrained-to-aligned paths yield bigger gains over baseline than finetuned-to-aligned paths, though absolute accuracy remains lower for pretrained starts.
A holistic survey of affective computing for intelligent agents covering emotion understanding via multimodal data, affective cognition, emotional expression synthesis, key challenges, and future directions emphasizing generative technologies.
citing papers explorer
-
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
GraphDPO generalizes pairwise DPO to a graph-structured Plackett-Luce objective over DAGs induced by rollout rankings, enforcing transitivity with linear complexity and recovering DPO as a special case.
-
RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection
RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.
-
Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment
SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.
-
Generating Place-Based Compromises Between Two Points of View
Empathic similarity feedback in prompts generates more acceptable compromises than chain-of-thought, and margin-based training on the resulting data lets smaller models produce them without ongoing empathy estimation.
-
Reward-Free Code Alignment from Pretrained or Fine-Tuned LLM: Unpacking the Trade-offs for Code Generation
Empirical study on five LLMs finds pretrained-to-aligned paths yield bigger gains over baseline than finetuned-to-aligned paths, though absolute accuracy remains lower for pretrained starts.
-
Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects
A holistic survey of affective computing for intelligent agents covering emotion understanding via multimodal data, affective cognition, emotional expression synthesis, key challenges, and future directions emphasizing generative technologies.