Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Ahmed A. A. Osman; Dimitrios Tzionas; Georgios Pavlakos; Michael J. Black; Nima Ghorbani; Timo Bolkart; Vasileios Choutas

arxiv: 1904.05866 · v1 · pith:ZZQJETAQnew · submitted 2019-04-11 · 💻 cs.CV

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Georgios Pavlakos , Vasileios Choutas , Nima Ghorbani , Timo Bolkart , Ahmed A. A. Osman , Dimitrios Tzionas , Michael J. Black This is my paper

classification 💻 cs.CV

keywords bodyimagessmpl-xhumanmodelexpressivefacefeatures

0 comments

read the original abstract

To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement
cs.CV 2026-06 unverdicted novelty 6.0

Presents a scene-adaptive 3D human animation method using ground-adaptive motion retargeting and viewpoint-adaptive latent fusion to control human trajectories and camera views, reporting gains on two benchmarks.
TaskNPoint: How to Teach Your Humanoid to Hit a Backhand in Minutes
cs.RO 2026-06 unverdicted novelty 6.0

TaskNPoint lets humanoid robots learn dynamic skills such as tennis backhands from single short human video demonstrations plus under one hour of single-GPU simulation training, achieving zero-shot generalization to n...
Self-Learning Expression Deformations for Data-Efficient Gaussian Avatars
cs.CV 2026-06 unverdicted novelty 6.0

SAGE self-learns Gaussian expression deformations via joint surfel-SDF optimization and self-supervised consistency, enabling comparable avatar quality from single frames, monocular rotations, or one-shot inputs.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
cs.RO 2026-04 unverdicted novelty 6.0

Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.