Introduces OmniRef-Bench benchmark and DyRef two-stage framework using Difficulty-aware Advantage Reweighting and Discriminative Reward Scaling to improve open-source models on complex multi-reference image generation.
Uso: Unified style and subject-driven generation via disentangled and reward learning.arXivpreprint
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6roles
background 1polarities
background 1representative citing papers
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
A unified visual conditioning approach fuses semantic and appearance features before VLM processing, with two-stage training and slot-wise regularization, to improve consistency in multi-reference image generation.
Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
FreeStyle proposes community LoRA mining plus attention and frequency disentanglement to enable scalable style-content dual-reference generation with reduced leakage.
citing papers explorer
-
Scaling Multi-Reference Image Generation with Dynamic Reward Optimization
Introduces OmniRef-Bench benchmark and DyRef two-stage framework using Difficulty-aware Advantage Reweighting and Discriminative Reward Scaling to improve open-source models on complex multi-reference image generation.
-
Lance: Unified Multimodal Modeling by Multi-Task Synergy
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
-
UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation
A unified visual conditioning approach fuses semantic and appearance features before VLM processing, with two-stage training and slot-wise regularization, to improve consistency in multi-reference image generation.
-
Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
-
FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
FreeStyle proposes community LoRA mining plus attention and frequency disentanglement to enable scalable style-content dual-reference generation with reduced leakage.