UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
arXiv preprint arXiv:2412.18150 (2024)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4verdicts
UNVERDICTED 4roles
dataset 2polarities
use dataset 2representative citing papers
Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.
BEiTScore is a new efficient cross-encoder metric for reference-free image captioning evaluation that achieves state-of-the-art results on detailed caption benchmarks through VQA initialization and adversarial LLM augmentations.
Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.
citing papers explorer
-
BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model
BEiTScore is a new efficient cross-encoder metric for reference-free image captioning evaluation that achieves state-of-the-art results on detailed caption benchmarks through VQA initialization and adversarial LLM augmentations.