Illume+: Illuminating unified mllm with dual visual tokenization and diffusion refinement

Runhui Huang, Chunwei Wang, Junwei Yang, Guansong Lu, Yunlong Yuan, Jianhua Han, Lu Hou, Wei Zhang, Lanqing Hong, Hengshuang Zhao, Hang Xu · 2025 · arXiv 2504.01934

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.

SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment

cs.CV · 2026-05-09 · unverdicted · novelty 5.0

SynerMedGen introduces generation-aligned understanding tasks and a two-stage training strategy that enables strong zero-shot medical image synthesis performance and outperforms specialized models when generation training is added.

Show-o2: Improved Native Unified Multimodal Models

cs.CV · 2025-06-18 · unverdicted · novelty 4.0

Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.

What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing

cs.CV · 2026-05-20

citing papers explorer

Showing 4 of 4 citing papers.

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026-05-12 · unverdicted · none · ref 54
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment cs.CV · 2026-05-09 · unverdicted · none · ref 60
SynerMedGen introduces generation-aligned understanding tasks and a two-stage training strategy that enables strong zero-shot medical image synthesis performance and outperforms specialized models when generation training is added.
Show-o2: Improved Native Unified Multimodal Models cs.CV · 2025-06-18 · unverdicted · none · ref 46
Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.
What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing cs.CV · 2026-05-20 · unreviewed · ref 7

Illume+: Illuminating unified mllm with dual visual tokenization and diffusion refinement

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer