Empirical study shows reward model ensembles mitigate biases like brightness and composition in preference data for image inpainting, yielding better performance than prior methods without architecture changes.
Preference alignment on diffusion model: A comprehensive survey for image generation and editing
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
citing papers explorer
-
Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting
Empirical study shows reward model ensembles mitigate biases like brightness and composition in preference data for image inpainting, yielding better performance than prior methods without architecture changes.
-
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.