pith. sign in

Evalmuse-40k: A reliable and fine-grained benchmark with comprehensive human annotations for text-to-image generation model evaluation

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

dataset 2

citation-polarity summary

fields

cs.CV 4

years

2026 1 2025 3

verdicts

UNVERDICTED 4

roles

dataset 2

polarities

use dataset 2

representative citing papers

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

cs.CV · 2025-03-10 · unverdicted · novelty 6.0

Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.

Seedream 3.0 Technical Report

cs.CV · 2025-04-15 · unverdicted · novelty 4.0

Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.

citing papers explorer

Showing 4 of 4 citing papers.

  • Unified Reward Model for Multimodal Understanding and Generation cs.CV · 2025-03-07 · unverdicted · none · ref 32

    UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.

  • Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model cs.CV · 2025-03-10 · unverdicted · none · ref 8

    Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.

  • BEiTScore: Reference-free Image Captioning Evaluation with an Efficient Cross-Encoder Model cs.CV · 2026-05-20 · unverdicted · none · ref 8

    BEiTScore is a new efficient cross-encoder metric for reference-free image captioning evaluation that achieves state-of-the-art results on detailed caption benchmarks through VQA initialization and adversarial LLM augmentations.

  • Seedream 3.0 Technical Report cs.CV · 2025-04-15 · unverdicted · none · ref 7

    Seedream 3.0 improves bilingual image generation through doubled defect-aware data, mixed-resolution training, cross-modality RoPE, representation alignment, aesthetic SFT, VLM reward modeling, and importance-aware timestep sampling for 4-8x faster inference at up to 2K resolution.