Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
Rethinking cross-modal interaction in multimodal diffusion transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
citing papers explorer
-
Premier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
- Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers