Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Biwen Lei, Changrong Hu, Chao Zhang, Chongqing Zhao, Chunchao Guo (refer to the report for detailed contributions), Di Wang, Fan Yang, Haohan Weng, Haolin Liu, Hao Zhang, Haozhao Kuang, Huiwen Shi, Jiaao Yu, Jianbing Peng, Jianchen Zhu, Jian Liu, Jie Jiang, Jie Xiao, Jihong Zhang, Jinbao Xue, Jingwei Huang, Jing Xu, Junta Wu, Kai Liu, Lei Qin, Liang Dong, Lifu Wang, Lin Niu, Lixin Xu, Meng Chen, Minghui Chen, Mingxin Yang, Paige Wang, Peng He, Qingxiang Lin, Ruining Tang, Runzhou Wu, Shaoxiong Yang, Sheng Zhang, Shuhui Yang, Sicong Liu, Song Zhang, Tian Liu, Tianyu Huang, Weihao Zhuang, Xianghui Yang, Xinhai Liu, Xinming Wu, Xinzhou Wang, Xipeng Zhang, Xuhui Zuo, Xu Zheng, Yang Liu, Yangyu Tao, Yifei Feng, Yihang Lian, Yiling Zhu, Yingkai Wang, YingPing He, Yiwen Jia, Yixuan Tang, Yonghao Tan, Yong Yang, Yuhong Liu, Yulin Cai, Yunfei Zhao, Zebin He, Zeqiang Lai, Zhan Li, Zheng Ye, Zhichao Hu, Zhongyi Fan, Zhuo Chen, Zibo Zhao

Authors on Pith no claims yet

classification 💻 cs.CV

keywords modelshunyuan3dlarge-scalemodeltextureassetsdiffusionfoundation

0 comments

read the original abstract

We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: https://github.com/Tencent/Hunyuan3D-2

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 28 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction
cs.CV 2026-04 unverdicted novelty 7.0

OmniFit uses a conditional transformer decoder to predict dense body landmarks from multi-modal inputs for scale-agnostic SMPL-X fitting, outperforming prior methods by 57-81% and reaching millimeter accuracy on CAPE ...
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches
cs.CV 2026-04 unverdicted novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in re...
Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors
cs.CV 2026-04 unverdicted novelty 7.0

A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.
Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale
cs.CV 2026-04 unverdicted novelty 7.0

A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.
GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos
cs.CV 2026-04 unverdicted novelty 7.0

GenLCA enables scalable training of a 3D diffusion model for photorealistic, animatable full-body avatars by tokenizing large-scale real-world videos with a pretrained reconstructor and applying visibility-aware diffu...
InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging
cs.CV 2026-04 unverdicted novelty 7.0

A two-stage autoregressive framework centered on BoxMesh recovers parametric sewing patterns from 3D garment surfaces, claiming state-of-the-art results on benchmarks and generalization to real scans and single-view images.
TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation
cs.CV 2026-05 unverdicted novelty 6.0

TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
Pixal3D: Pixel-Aligned 3D Generation from Images
cs.CV 2026-05 unverdicted novelty 6.0

Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.
GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks
cs.CV 2026-05 unverdicted novelty 6.0

GenMed uses diffusion models to capture P(X,Y) for medical tasks and performs inference via gradient-based test-time optimization, supporting arbitrary observation combinations without retraining.
DVD: Discrete Voxel Diffusion for 3D Generation and Editing
cs.CV 2026-05 unverdicted novelty 6.0

DVD treats voxel occupancy as a discrete variable in a diffusion framework to generate, assess, and edit sparse 3D voxels without continuous thresholding.
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
cs.CV 2026-05 unverdicted novelty 6.0

DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
CADFit: Precise Mesh-to-CAD Program Generation with Hybrid Optimization
cs.CV 2026-05 unverdicted novelty 6.0

CADFit recovers complex editable CAD construction sequences from meshes via IoU-driven hybrid optimization and outperforms prior mesh-to-CAD methods on volumetric IoU, Chamfer Distance, and invalid program ratio.
3D-ReGen: A Unified 3D Geometry Regeneration Framework
cs.CV 2026-04 unverdicted novelty 6.0

3D-ReGen is a conditioned 3D regenerator using VecSet that learns a regeneration prior from unlabeled 3D datasets via self-supervised tasks and achieves state-of-the-art results on controllable 3D geometry tasks.
Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers
cs.CV 2026-04 unverdicted novelty 6.0

Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.
MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling
cs.CV 2026-04 unverdicted novelty 6.0

MetaEarth3D is the first generative foundation model for spatially consistent, unbounded 3D scene generation at planetary scale using optical Earth observation data.
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
cs.CV 2026-04 unverdicted novelty 6.0

PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional ...
Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data
cs.CV 2026-04 unverdicted novelty 6.0

BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.
Pair2Scene: Learning Local Object Relations for Procedural Scene Generation
cs.CV 2026-04 unverdicted novelty 6.0

Pair2Scene generates complex 3D scenes beyond training data by training a network on local object-pair placement rules and applying them recursively with collision-aware sampling.
Pair2Scene: Learning Local Object Relations for Procedural Scene Generation
cs.CV 2026-04 unverdicted novelty 6.0

Pair2Scene generates complex 3D scenes beyond training data by recursively applying a learned model of local support and functional object-pair relations inside hierarchies, using collision-aware rejection sampling fo...
AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly
cs.RO 2026-04 unverdicted novelty 6.0

AssemLM uses a specialized point cloud encoder inside a multimodal LLM to reach state-of-the-art 6D pose prediction for assembly tasks, backed by a new 900K-sample benchmark called AssemBench.
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation
cs.CV 2026-04 unverdicted novelty 6.0

UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 5.0

The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation
cs.CV 2026-04 unverdicted novelty 5.0

Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples spa...
"From remembering to shaping": Narrating Shared Experiences by Co-Designing Cultural Heritage Artifacts in Collaborative VR
cs.HC 2026-04 unverdicted novelty 5.0

A collaborative VR workflow with GenAI lets users merge prompts and creatively repurpose outputs to co-create 3D artifacts that narrate shared cultural heritage experiences.
Controllable Video Object Insertion via Multiview Priors
cs.CV 2026-04 unverdicted novelty 5.0

A multi-view prior-based framework for video object insertion that uses dual-path conditioning and an integration-aware consistency module to improve appearance stability and occlusion handling.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation
cs.CV 2026-04 unverdicted novelty 4.0

AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantica...
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 4.0

The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
cs.CV 2026-04 unverdicted novelty 4.0

OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.