PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li; Siyu Xia; Tao Yu; Wangguandong Zheng; Wei Xue; Wenhan Luo; Xiaowei Chi; Xingqun Qi; Yangguang Li; Yan-Pei Cao

arxiv: 2409.10141 · v3 · pith:KDYRSBWBnew · submitted 2024-09-16 · 💻 cs.CV

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li , Wangguandong Zheng , Yuan Liu , Tao Yu , Yangguang Li , Xingqun Qi , Xiaowei Chi , Siyu Xia

show 4 more authors

Yan-Pei Cao Wei Xue Wenhan Luo Yike Guo

This is my paper

classification 💻 cs.CV

keywords humandiffusionmultiviewbodycross-scaledetailedexplicitfull-body

0 comments

read the original abstract

Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, full-body reconstruction from a monocular RGB image remains challenging due to the ill-posed nature of the problem and sophisticated clothing topology with self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling detailed and identity-preserved novel-view generation without any geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X, which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multi-view normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experimental results and quantitative evaluations on CAPE and THuman2.1 datasets demonstrate PSHumans superiority in geometry details, texture fidelity, and generalization capability.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Human Interaction-Aware 3D Reconstruction from a Single Image
cs.CV 2026-04 unverdicted novelty 5.0

HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.
LUNA: Learning Universal 3D Human Animation Beyond Skinning
cs.CV 2026-06 unverdicted novelty 4.0

LUNA is an LBS-free neural animation model that maps 2D controls to 3D Gaussian deformations via a transformer motion regressor and hybrid supervision for realistic motion and zero-shot generalization.