hub

Hunyuanimage 3.0 technical report

Siyu Cao, Hangting Chen, Peng Chen, Yiji Cheng, Yutao Cui, Xinchi Deng, Ying Dong, Kipper Gong, Tianpeng Gu, Xiusen Gu, et al · 2025 · arXiv 2509.23951

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation

cs.CV · 2026-04-13 · unverdicted · novelty 8.0

The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

Banana100 dataset shows that none of 21 popular NR-IQA metrics consistently rate images degraded by 100 iterative edits lower than clean originals.

Qwen-Image-VAE-2.0 Technical Report

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Qwen-Image-VAE-2.0 achieves state-of-the-art high-compression image reconstruction and superior diffusability for diffusion models, with a new text-rich document benchmark.

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

D-OPSD enables continuous supervised fine-tuning of few-step diffusion models via on-policy self-distillation where the model acts as both teacher (multimodal context) and student (text-only context) on its own roll-outs.

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models as conditional regeneration using prompt and semantic tokens from the initial image, yielding higher alignment scores than editing-based methods.

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

By requiring and using highly discriminative LLM text features, the work enables the first effective one-step text-conditioned image generation with MeanFlow.

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.

Gen-Searcher: Reinforcing Agentic Search for Image Generation

cs.CV · 2026-03-30 · unverdicted · novelty 6.0

Gen-Searcher is the first search-augmented image generation agent trained with SFT followed by agentic RL using dual text and image rewards on custom datasets and the KnowGen benchmark.

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.

InsHuman: Towards Natural and Identity-Preserving Human Insertion

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

cs.CV · 2026-04-30 · unverdicted · novelty 5.0

Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

cs.CV · 2025-11-27 · unverdicted · novelty 5.0

Z-Image is an efficient 6B-parameter foundation model for image generation that rivals larger commercial systems in photorealism and bilingual text rendering through a new single-stream diffusion transformer and streamlined training.

Qwen-Image-2.0 Technical Report

cs.CV · 2026-05-11 · unverdicted · novelty 4.0

Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

cs.CV · 2026-04-21 · unverdicted · novelty 4.0 · 2 refs

Tstars-Tryon 1.0 is a deployed virtual try-on system claiming high robustness, photorealism, multi-reference flexibility, and near real-time speed for diverse fashion items.

Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks

cs.CV · 2026-04-03 · unverdicted · novelty 4.0

Nano Banana 2 delivers competitive perceptual quality on image restoration but produces over-enhanced results that diverge from input fidelity in ways standard metrics miss.

Wan-Image: Pushing the Boundaries of Generative Visual Intelligence

cs.CV · 2026-04-21 · unverdicted · novelty 3.0

Wan-Image is a unified multi-modal system that integrates LLMs and diffusion transformers to deliver professional-grade image generation features including complex typography, multi-subject consistency, and precise editing, outperforming several prior models in human tests.

citing papers explorer

Showing 17 of 17 citing papers.

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation cs.CV · 2026-04-13 · unverdicted · none · ref 2
The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 7
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro cs.CV · 2026-04-03 · unverdicted · none · ref 11
Banana100 dataset shows that none of 21 popular NR-IQA metrics consistently rate images degraded by 100 iterative edits lower than clean originals.
Qwen-Image-VAE-2.0 Technical Report cs.CV · 2026-05-13 · unverdicted · none · ref 3
Qwen-Image-VAE-2.0 achieves state-of-the-art high-compression image reconstruction and superior diffusability for diffusion models, with a new text-rich document benchmark.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models cs.CV · 2026-05-06 · unverdicted · none · ref 6
D-OPSD enables continuous supervised fine-tuning of few-step diffusion models via on-policy self-distillation where the model acts as both teacher (multimodal context) and student (text-only context) on its own roll-outs.
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models cs.CV · 2026-04-28 · unverdicted · none · ref 5
Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models as conditional regeneration using prompt and semantic tokens from the initial image, yielding higher alignment scores than editing-based methods.
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation cs.CV · 2026-04-20 · unverdicted · none · ref 49
By requiring and using highly discriminative LLM text features, the work enables the first effective one-step text-conditioned image generation with MeanFlow.
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing cs.CV · 2026-04-06 · unverdicted · none · ref 9
SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.
Gen-Searcher: Reinforcing Agentic Search for Image Generation cs.CV · 2026-03-30 · unverdicted · none · ref 45
Gen-Searcher is the first search-augmented image generation agent trained with SFT followed by agentic RL using dual text and image rewards on custom datasets and the KnowGen benchmark.
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026-05-12 · unverdicted · none · ref 11
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
InsHuman: Towards Natural and Identity-Preserving Human Insertion cs.CV · 2026-05-08 · unverdicted · none · ref 7
InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling cs.CV · 2026-04-30 · unverdicted · none · ref 8
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer cs.CV · 2025-11-27 · unverdicted · none · ref 8
Z-Image is an efficient 6B-parameter foundation model for image generation that rivals larger commercial systems in photorealism and bilingual text rendering through a new single-stream diffusion transformer and streamlined training.
Qwen-Image-2.0 Technical Report cs.CV · 2026-05-11 · unverdicted · none · ref 4
Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items cs.CV · 2026-04-21 · unverdicted · none · ref 2 · 2 links
Tstars-Tryon 1.0 is a deployed virtual try-on system claiming high robustness, photorealism, multi-reference flexibility, and near real-time speed for diverse fashion items.
Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks cs.CV · 2026-04-03 · unverdicted · none · ref 6
Nano Banana 2 delivers competitive perceptual quality on image restoration but produces over-enhanced results that diverge from input fidelity in ways standard metrics miss.
Wan-Image: Pushing the Boundaries of Generative Visual Intelligence cs.CV · 2026-04-21 · unverdicted · none · ref 4
Wan-Image is a unified multi-modal system that integrates LLMs and diffusion transformers to deliver professional-grade image generation features including complex typography, multi-subject consistency, and precise editing, outperforming several prior models in human tests.

Hunyuanimage 3.0 technical report

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer