Exploring the deep fusion of large language models and diffusion transformers for text-to-image synthesis

Bingda Tang, Boyang Zheng, Sayak Paul, Saining Xie · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.

Showing 1 of 1 citing paper.

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision cs.CV · 2026-05-07 · unverdicted · none · ref 45
Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.