MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

Ayush Tewari; Christian Theobalt; Florian Bernard; Hyeongwoo Kim; Michael Zollh\"ofer; Pablo Garrido; Patrick P\'erez

arxiv: 1703.10580 · v2 · pith:MLALTAOUnew · submitted 2017-03-30 · 💻 cs.CV

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

Ayush Tewari , Michael Zollh\"ofer , Hyeongwoo Kim , Pablo Garrido , Florian Bernard , Patrick P\'erez , Christian Theobalt This is my paper

classification 💻 cs.CV

keywords faceconvolutionaldecoderencodergenerativeimagemodelmodel-based

0 comments

read the original abstract

In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is our new differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Self-Learning Expression Deformations for Data-Efficient Gaussian Avatars
cs.CV 2026-06 unverdicted novelty 6.0

SAGE self-learns Gaussian expression deformations via joint surfel-SDF optimization and self-supervised consistency, enabling comparable avatar quality from single frames, monocular rotations, or one-shot inputs.