UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
Evalmuse-40k: A reliable and fine-grained benchmark with comprehensive human annotations for text-to-image generation model evaluation
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4verdicts
UNVERDICTED 4roles
dataset 2polarities
use dataset 2representative citing papers
Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.
BEiTScore is a new efficient cross-encoder metric for reference-free image captioning evaluation that achieves state-of-the-art results on detailed caption benchmarks through VQA initialization and adversarial LLM augmentations.
Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.
citing papers explorer
-
Unified Reward Model for Multimodal Understanding and Generation
UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
-
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.
-
BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model
BEiTScore is a new efficient cross-encoder metric for reference-free image captioning evaluation that achieves state-of-the-art results on detailed caption benchmarks through VQA initialization and adversarial LLM augmentations.
-
Seedream 3.0 Technical Report
Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.