arxiv: 2605.04035 · v2 · submitted 2026-05-05 · 💻 cs.CV · cs.LG

Recognition: no theorem link

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

Alejandro Blumentals, Artem Sevastopolsky, Brian Amberg, Christian Zimmermann, Dmitry Kostiaev, Evangelos Ntavelis, Fabio Maninchedda, Jeronimo Bayer, Mathias Deschler, Matthias Vestner, Mehak Gupta, Mohamad Shahbazi, Peter Kaufmann, Reinhard Knothe, Sean Wu, Sebastian Martin, Shridhar Ravikumar, Simon Schaefer, Stefan Brugger, Thomas Etterlin, Tom Runia, Trevor Phillips, Vittorio Megaro

Pith reviewed 2026-05-11 02:08 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords 3D Gaussian splattinghead reconstructionmulti-view capturefeed-forward modelUV parameterization3D avatarsfacial animation

0 comments

The pith

HeadsUp reconstructs high-quality 3D Gaussian heads from multi-view images in a single feed-forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HeadsUp as a scalable method that turns sets of images captured from many cameras into detailed 3D head models represented as Gaussian points. An encoder compresses the views into a compact latent code, and a decoder expands it into 3D Gaussians whose positions are defined by a UV map on a standard neutral head shape. This design keeps the number of Gaussians independent of image count or resolution, allowing training on high-resolution data from more than 10,000 different people. The resulting model produces state-of-the-art reconstructions and works on entirely new faces without needing extra per-person optimization steps. The same latent space also supports creating new head identities and driving the models with facial expressions.

Core claim

HeadsUp is a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. It employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation, which is decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This enables training on an internal dataset with more than 10,000 subjects and achieves state-of-the-art reconstruction quality while generalizing to novel identities without test-time optimization.

What carries the argument

UV-parameterized 3D Gaussians anchored to a neutral head template, which decouples the number of Gaussians from the number and resolution of input images.

Load-bearing premise

The internal dataset of more than 10,000 subjects is diverse enough for the model to generalize accurately to unseen identities without test-time optimization.

What would settle it

Measuring reconstruction error of the feed-forward model versus per-identity optimized baselines on a public multi-view head dataset with held-out identities; substantially higher error on novel subjects would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2605.04035 by Alejandro Blumentals, Artem Sevastopolsky, Brian Amberg, Christian Zimmermann, Dmitry Kostiaev, Evangelos Ntavelis, Fabio Maninchedda, Jeronimo Bayer, Mathias Deschler, Matthias Vestner, Mehak Gupta, Mohamad Shahbazi, Peter Kaufmann, Reinhard Knothe, Sean Wu, Sebastian Martin, Shridhar Ravikumar, Simon Schaefer, Stefan Brugger, Thomas Etterlin, Tom Runia, Trevor Phillips, Vittorio Megaro.

**Figure 1.** Figure 1: We introduce HeadsUp, a novel feed-forward approach leveraging 3D Gaussians to predict high-quality avatars. By scaling to thousands of subjects and diverse expressions, our method achieves exceptional rendering quality on completely held-out subjects. Notice the accurate, high-resolution recovery of intricate fine details, such as eyelashes, complex earrings, teeth and tongue. The figure displays renders … view at source ↗

**Figure 2.** Figure 2: Overview of HeadsUp. Our method reconstructs high-fidelity 3D Gaussian heads from multi-view images. Given a set of input views, our model utilizes a transformer-based encoder and a 3D Gaussian decoder to predict UV-parameterized 3D Gaussians for both the foreground and background. The model is trained end-to-end using a combination of photometric and perceptual supervision. to the Gaussian attributes: po… view at source ↗

**Figure 3.** Figure 3: Visual Comparison on Ava-256. Our method produces sharper reconstructions with better identity preservation compared to prior work. Increasing the number of views permits reconstruction of details like earrings, hair and skin texture. Additionally, our background model successfully captures intricate head-boundary details that previous foreground-masking techniques typically discard; we only use the back… view at source ↗

**Figure 4.** Figure 4: Ablation study on a single-stage model trained for 500K steps with 10K subjects, 10 input views, 32×32 latent and 256×256 Gaussian UV resolution unless stated otherwise. (a) Training data scaling: log-linear improvement up to 2K subjects, diminishing returns beyond 4K. (b) Input view scaling: quality improves with more views, with diminishing returns after 8. (c) Model capacity: increasing latent resoluti… view at source ↗

**Figure 5.** Figure 5: Training data scaling. Models trained on fewer subjects fail to generalize to reconstruction of novel identities. At 250 subjects, facial features and hair color deviate significantly from ground truth. The reconstruction quality improves with more training data. On this validation set, the quality improves marginally after 4K subjects. N = 1 N = 2 N = 4 N = 6 N = 8 N = 16 GT view at source ↗

**Figure 6.** Figure 6: Impact of the number of input views. Reconstruction quality scales naturally with the number of input images. A single frontal view (N = 1) yields blurry results, identity drift, and fails to recover shirt text. However, adding more views progressively resolves these ambiguities, yielding clear improvements in fine details like the teeth and hair. are often combined with foreground elements within masked g… view at source ↗

**Figure 7.** Figure 7: Downstream Applications. (a) Text-driven identity generation: view at source ↗

read the original abstract

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HeadsUp gives a practical feed-forward route to UV-anchored 3D Gaussian heads trained at 10k-subject scale, but the zero-shot claims still hinge on unverified internal data diversity and a fixed template.

read the letter

The main point is an encoder-decoder that turns multi-view captures into a set of 3D Gaussians whose positions are set by UV coordinates on a single neutral head template. This decouples output size from input resolution and lets them train end-to-end on more than 10k subjects so new identities run without per-subject fitting. They also report scaling curves across number of identities, views, and model size, plus two quick downstream uses: sampling new heads from the latent space and driving them with blendshapes. Those pieces are concrete and address real pipeline needs in avatar work. The UV choice is a reasonable engineering move that keeps the representation consistent across people. The scaling analysis is the part that actually adds usable information beyond the abstract claim. The soft spots sit where the stress-test note flagged them. Generalization rests on an internal dataset whose coverage of shape, age, ethnicity, and capture variation is not shown against public statistics, and there is no direct comparison to deformable-template baselines that might handle extreme proportions better. If the fixed neutral mesh forces Gaussians to over-rely on position and scale offsets, quality could drop on heads far from the template mean. The paper would be stronger with at least one public benchmark comparison and an ablation on template rigidity. This is aimed at graphics and vision groups building feed-forward 3D assets or real-time avatars. Readers who need scaling data or a ready latent space for animation will find the practical bits useful even if they re-train on their own data. I would send it to peer review because the architecture is clear, the scale is larger than prior head datasets, and the claims are falsifiable once the experiments are checked.

Referee Report

2 major / 2 minor

Summary. The paper proposes HeadsUp, a scalable feed-forward encoder-decoder method for high-quality 3D Gaussian head reconstruction from multi-view captures. Input views are compressed into a compact latent code that is decoded into UV-parameterized 3D Gaussians anchored to a fixed neutral head template; this decouples Gaussian count from input resolution. The model is trained on an internal dataset of >10,000 subjects (an order of magnitude larger than prior multi-view head datasets), claims SOTA reconstruction quality with zero-shot generalization to novel identities, includes scaling analysis across identities/views/capacity, and demonstrates downstream uses in novel identity generation and blendshape animation.

Significance. If substantiated, the work would be significant for enabling efficient, optimization-free 3D head assets at scale, with direct value for AR/VR, animation, and graphics pipelines. The large-scale training regime, explicit scaling study, and UV-based decoupling of representation size from capture resolution are practical strengths; the feed-forward design and latent-space applications further differentiate it from per-subject optimization baselines.

major comments (2)

[Abstract] Abstract: the central claim that HeadsUp 'achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization' is unsupported by any quantitative metrics (PSNR, SSIM, LPIPS, etc.), baseline comparisons, ablation tables, or error analysis. This is load-bearing for the primary contribution and must be addressed with explicit results tables in the experiments section.
[Dataset and Generalization] Dataset description and generalization analysis: the zero-shot generalization claim rests on the unverified assumption that the internal >10,000-subject dataset provides sufficient coverage of identity variation (cranial proportions, ethnicity, age, capture conditions) and that the fixed neutral-template UV parameterization preserves geometric/appearance detail without significant loss. No diversity statistics, cross-dataset evaluation, or ablation against deformable templates are referenced; if these assumptions fail, both SOTA quality and feed-forward generalization would not hold.

minor comments (2)

[Method] The method description would benefit from explicit notation for the encoder-decoder architecture, latent dimensionality, and the precise UV-to-Gaussian mapping (e.g., how position/scale/rotation attributes compensate for template rigidity).
[Scaling Analysis] Scaling analysis section should include concrete plots or tables showing quality vs. number of identities, views, and model parameters to make the 'practical insights for quality-compute trade-offs' actionable.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback, which helps strengthen the quantitative support for our claims and the analysis of generalization. We will revise the manuscript accordingly and address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that HeadsUp 'achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization' is unsupported by any quantitative metrics (PSNR, SSIM, LPIPS, etc.), baseline comparisons, ablation tables, or error analysis. This is load-bearing for the primary contribution and must be addressed with explicit results tables in the experiments section.

Authors: We agree that the abstract claim requires explicit quantitative backing. We will add a dedicated summary table in the experiments section presenting PSNR, SSIM, LPIPS, and related metrics with direct baseline comparisons, plus error analysis to substantiate the state-of-the-art quality and zero-shot generalization without test-time optimization. revision: yes
Referee: [Dataset and Generalization] Dataset description and generalization analysis: the zero-shot generalization claim rests on the unverified assumption that the internal >10,000-subject dataset provides sufficient coverage of identity variation (cranial proportions, ethnicity, age, capture conditions) and that the fixed neutral-template UV parameterization preserves geometric/appearance detail without significant loss. No diversity statistics, cross-dataset evaluation, or ablation against deformable templates are referenced; if these assumptions fail, both SOTA quality and feed-forward generalization would not hold.

Authors: We acknowledge the need for stronger verification of these assumptions. We will expand the dataset section with available diversity statistics on demographics and capture conditions. We will also add an ablation comparing the fixed neutral UV template to a deformable alternative to demonstrate detail preservation. Cross-dataset evaluation is limited by the lack of other large-scale public multi-view head datasets of similar scope; we will discuss this constraint explicitly in the revision. revision: partial

standing simulated objections not resolved

Full cross-dataset quantitative evaluation on comparable large-scale public multi-view head datasets, as no such datasets currently exist.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper presents an empirical machine learning method: a trained encoder-decoder neural architecture that maps multi-view images to UV-parameterized 3D Gaussians on a neutral template. No mathematical derivations, equations, or uniqueness theorems are described that reduce any claimed prediction or result to the inputs by construction. Generalization to novel identities is asserted as an empirical outcome of training on the internal >10k-subject dataset and evaluating on held-out subjects, without any self-definitional fitting or renaming of known patterns. No load-bearing self-citations, smuggled ansatzes, or fitted-input predictions appear in the provided description. The central claims rest on architectural design and data scale rather than circular reductions, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no explicit free parameters, axioms, or invented entities beyond standard neural network components and the known UV parameterization technique from graphics.

pith-pipeline@v0.9.0 · 5587 in / 1324 out tokens · 53461 ms · 2026-05-11T02:08:50.153345+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages · 4 internal anchors

[1]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

An, S., Xu, H., Shi, Y., Song, G., Ogras, U.Y., Luo, L.: PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20950–20959 (2023)

work page 2023
[2]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Athar, S., Xu, Z., Sunkavalli, K., Shechtman, E., Shu, Z.: RigNeRF: Fully Con- trollable Neural 3D Portraits. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20364–20373 (2022)

work page 2022
[3]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Bhattarai, A.R., Nießner, M., Sevastopolsky, A.: TriPlaneNet: An Encoder for EG3D Inversion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 3533–3542 (2024)

work page 2024
[4]

In: SIGGRAPH Asia 2024 Conference Papers

Bühler, M.C., Li, G., Wood, E., Helminger, L., Chen, X., Shah, T., Wang, D., Garbin, S., Orts-Escolano, S., Hilliges, O., Lagun, D., Riviere, J., Gotardo, P., Beeler, T., Meka, A., Sarkar, K.: Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures. In: SIGGRAPH Asia 2024 Conference Papers. ACM (2024).https://doi.org/10....

work page doi:10.1145/3680528.3687580 2024
[5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chan, E.R., Lin, C.Z., Chan, M.A., Nagano, K., Pan, B., De Mello, S., Gallo, O., Guibas, L.J., Tremblay, J., Khamis, S., et al.: Efficient Geometry-aware 3D Generative Adversarial Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16123–16133 (2022)

work page 2022
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19457–19467 (2024)

work page 2024
[7]

arXiv preprint arXiv:2406.06050 (2024)

Chen, J., Li, C., Zhang, J., Zhu, L., Huang, B., Chen, H., Lee, G.H.: Generalizable Human Gaussians from Single-View Image. arXiv preprint arXiv:2406.06050 (2024)

work page arXiv 2024
[8]

In: European Conference on Computer Vision (ECCV)

Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. In: European Conference on Computer Vision (ECCV). pp. 370–386. Springer (2024) 16 E. Ntavelis et al

work page 2024
[9]

NeurIPS (2024)

Chu, X., Harada, T.: Generalizable and Animatable Gaussian Head Avatar. NeurIPS (2024)

work page 2024
[10]

arXiv preprint arXiv:2401.10215 (2024)

Chu, X., Li, Y., Zeng, A., Yang, T., Lin, L., Liu, Y., Harada, T.: GPAvatar: Gener- alizable and Precise Head Avatar from Image(s). arXiv preprint arXiv:2401.10215 (2024)

work page arXiv 2024
[11]

Journal of Machine Learning Research25(70), 1–53 (2024)

Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., et al.: Scaling instruction-finetuned language models. Journal of Machine Learning Research25(70), 1–53 (2024)

work page 2024
[12]

In: SIGGRAPH

Debevec, P., Hawkins, T., Tchou, C., Duiker, H.P., Sarokin, W.: Acquiring the Reflectance Field of a Human Face. In: SIGGRAPH. New Orleans, LA (Jul 2000), http://ict.usc.edu/pubs/Acquiring%20the%20Re%EF%AC%82ectance%20Field% 20of%20a%20Human%20Face.pdf

work page 2000
[13]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

work page 2019
[14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Deng, Y., Wang, D., Ren, X., Chen, X., Wang, B.: Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[15]

In: European Conference on Computer Vision (ECCV)

Deng, Y., Wang, D., Wang, B.: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer. In: European Conference on Computer Vision (ECCV). pp. 303–321. Springer (2024)

work page 2024
[16]

ICLR (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021)

work page 2021
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8649–8658 (2021)

work page 2021
[18]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Giebenhain, S., Kirschstein, T., Georgopoulos, M., Rünz, M., Agapito, L., Nießner, M.: MonoNPHM: Dynamic Head Reconstruction from Monocular Videos. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21051–21061 (2024)

work page 2024
[19]

In: Advances in neural information processing systems

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. In: Advances in neural information processing systems. pp. 2672–2680 (2014),http://papers.nips.cc/ paper/5423-generative-adversarial-nets.pdf

work page 2014
[20]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Gu, Y., Xu, H., Xie, Y., Song, G., Shi, Y., Di, Y., Ye, P., et al.: DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[21]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

work page 2016
[22]

In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers

He, Y., Gu, X., Ye, X., Xu, C., Zhao, Z., Dong, Y., Yuan, W., Dong, Z., Bo, L.: LAM: Large Avatar Model for One-shot Animatable Gaussian Head. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. pp. 1–13 (2025)

work page 2025
[23]

Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Work- shop on Deep Generative Models and Downstream Applications (2021),https: //openreview.net/forum?id=qw8AKxfYbI HeadsUp: Large-scale Gaussian Head Reconstruction 17

work page 2021
[24]

arXiv preprint arXiv:2311.04400 , year=

Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., Tan, H.: LRM: Large Reconstruction Model for Single Image to 3D. arXiv preprint arXiv:2311.04400 (2023)

work page arXiv 2023
[25]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Hoogeboom, E., Mensink, T., Heek, J., Lamerigts, K., Gao, R., Salimans, T.: Simpler diffusion: 1.5 fid on imagenet512 with pixel-space diffusion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18062–18071 (2025)

work page 2025
[26]

International Organization for Standardization: ISO 7250-1:2017 Basic human body measurements for technological design — Part 1: Body measurement definitions and landmarks (2017), https://www.iso.org/standard/65246.html , accessed: 2026-02-16

work page 2017
[27]

arXiv preprint arXiv:2601.13837 (2026)

Ji, X., Weiss, S., Kansy, M., Naruniec, J., Cao, X., Solenthaler, B., Bradley, D.: FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation. arXiv preprint arXiv:2601.13837 (2026)

work page arXiv 2026
[28]

International Journal of Computer Vision129(12), 3174–3194 (2021)

Jin, H., Liao, S., Shao, L.: Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild. International Journal of Computer Vision129(12), 3174–3194 (2021)

work page 2021
[29]

In: European Conference on Computer Vision (2016)

Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: European Conference on Computer Vision (2016)

work page 2016
[30]

In: CVPR

Kant, Y., Weber, E., Kim, J.K., Khirodkar, R., Zhaoen, S., Martinez, J., Gilitschen- ski, I., Saito, S., Bagautdinov, T.: Pippo: High-Resolution Multi-View Humans from a Single Image. In: CVPR. pp. 16418–16429 (2025)

work page 2025
[31]

ACM TOG42(4) (July 2023),https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM TOG42(4) (July 2023),https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

work page 2023
[32]

arXiv preprint arXiv:2206.08343 (2022)

Khakhulin, T., Sklyarova, V., Lempitsky, V., Zakharov, E.: Realistic One-shot Mesh-based Head Avatars. arXiv preprint arXiv:2206.08343 (2022)

work page arXiv 2022
[33]

In: ECCV

Khirodkar, R., Bagautdinov, T., Martinez, J., Zhaoen, S., James, A., Selednik, P., Anderson, S., Saito, S.: Sapiens: Foundation for Human Vision Models. In: ECCV. pp. 206–228. Springer (2024)

work page 2024
[34]

Adam: A Method for Stochastic Optimization

Kingma, D.P.: Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[35]

Segment Anything

Kirillov, A., Mintun, E., Ravi, N., et al.: Segment Anything. arXiv preprint arXiv:2304.02643 (2023)

work page internal anchor Pith review arXiv 2023
[36]

In: SIGGRAPH Asia 2024 Conference Papers

Kirschstein, T., Giebenhain, S., Tang, J., Georgopoulos, M., Nießner, M.: GGHead: Fast and Generalizable 3D Gaussian Heads. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–11 (2024)

work page 2024
[37]

ACM TOG42(4), 1–14 (2023)

Kirschstein, T., Qian, S., Giebenhain, S., Walter, T., Nießner, M.: NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. ACM TOG42(4), 1–14 (2023)

work page 2023
[38]

Avat3r: Large an- imatable gaussian reconstruction model for high-fidelity 3d head avatars.arXiv preprint arXiv:2502.20220, 2025

Kirschstein, T., Romero, J., Sevastopolsky, A., Nießner, M., Saito, S.: Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars. arXiv preprint arXiv:2502.20220 (2025)

work page arXiv 2025
[39]

In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)

Kwon, Y., Fang, B., Lu, Y., Dong, H., Zhang, C., Vicente Carrasco, F., Mosella- Montoro, A., Xu, J., Takagi, S., Kim, D., Prakash, A., De la Torre, F.: Generalizable Human Gaussians for Sparse View Synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)

work page 2024
[40]

Ntavelis et al

Lawrence, J., Goldman, D.B., Achar, S., Blascovich, G.M., Desloge, J.G., Fortes, T., Gomez, E.M., Häberling, S., Hoppe, H., Huibers, A., Knaus, C., Kuschak, B., Martin-Brualla, R., Nover, H., Russell, A.I., Seitz, S.M., Tong, K.: Project Starline: 18 E. Ntavelis et al. A High-Fidelity Telepresence System. ACM Transactions on Graphics (Proc. of SIGGRAPH As...

work page 2021
[41]

In: European Conference on Computer Vision (ECCV)

Li, H., Chen, C., Shi, T., Qiu, Y., An, S., Chen, G., et al.: SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation. In: European Conference on Computer Vision (ECCV). Springer (2024)

work page 2024
[42]

In: SIGGRAPH Asia 2024 Conference Papers

Li, J., Cao, C., Schwartz, G., Khirodkar, R., Saito, S., et al.: URAvatar: Universal Relightable Gaussian Codec Avatars. In: SIGGRAPH Asia 2024 Conference Papers. ACM (2024)

work page 2024
[43]

Panolam: Large avatar model for gaussian full- head synthesis from one-shot unposed image.arXiv preprint arXiv:2509.07552, 2025

Li, P., He, Y., Hu, Y., Dong, Y., Yuan, W., Liu, Y., Zhu, S., Cheng, G., Dong, Z., Guo, Y.: PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image. arXiv preprint arXiv:2509.07552 (2025)

work page arXiv 2025
[44]

In: Proceedings of the computer vision and pattern recognition conference

Li, P., Zheng, W., Liu, Y., Yu, T., Li, Y., Qi, X., Chi, X., Xia, S., Cao, Y.P., Xue, W., et al.: PSHuman: Photorealistic Single-Image 3D Human Reconstruction Using Cross-Scale Multiview Diffusion and Explicit Remeshing. In: Proceedings of the computer vision and pattern recognition conference. pp. 16008–16018 (2025)

work page 2025
[45]

ACM TOG36(6), 194–1 (2017)

Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a Model of Facial Shape and Expression from 4D Scans. ACM TOG36(6), 194–1 (2017)

work page 2017
[46]

arXiv preprint arXiv:2508.18389 (2025)

Liang, H., Ge, Z., Tiwari, A., Majee, S., Godaliyadda, G.M.D., Veeraraghavan, A., Balakrishnan, G.: FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses. arXiv preprint arXiv:2508.18389 (2025)

work page arXiv 2025
[47]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lin, S., Ryabtsev, A., Sengupta, S., Curless, B.L., Seitz, S.M., Kemelmacher- Shlizerman, I.: Real-Time High-Resolution Background Matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8762–8771 (2021)

work page 2021
[48]

ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)

Lombardi, S., Saragih, J., Simon, T., Sheikh, Y.: Deep Appearance Models for Face Rendering. ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)

work page 2018
[49]

In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

Lu, C., Zhou, Y., Bao, F., Chen, J., LI, C., Zhu, J.: Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 5775–5787. Curran Associates, Inc. (2022), https://proceedings.neur...

work page 2022
[50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Lu, Y., Dong, J., Kwon, Y., Zhao, Q., Dai, B., De la Torre, F.: GAS: Genera- tive Avatar Synthesis from a Single Image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12883–12893 (2025)

work page 2025
[51]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Lyu, W., Zhou, Y., Yang, M.H., Shu, Z.: FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12691–12701 (2025)

work page 2025
[52]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Ma, S., Simon, T., Saragih, J., Wang, D., Li, Y., De La Torre, F., Sheikh, Y.: Pixel Codec Avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 64–73 (2021)

work page 2021
[53]

NeurIPS (2024)

Martinez, J., Kim, E., Romero, J., Bagautdinov, T., Saito, S., Yu, S.I., Anderson, S., Zollhöfer, M., Wang, T.L., Bai, S., Li, C., Wei, S.E., Joshi, R., Borsos, W., Simon, T., Saragih, J., Theodosis, P., Greene, A., Josyula, A., Maeta, S.M., Jewett, A.I., Venshtain, S., Heilman, C., Chen, Y.T., Fu, S., Elshaer, M.E.A., Du, T., Wu, L., Chen, S.C., Kang, K....

work page 2024
[54]

In: European Conference on Computer Vision (ECCV) (2020)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In: European Conference on Computer Vision (ECCV) (2020)

work page 2020
[55]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: DINOv2: Learning Robust Visual Features without Supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[56]

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Oroz, A., Nießner, M., Kirschstein, T.: PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing. arXiv preprint arXiv:2511.02777 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin- Brualla, R.: Nerfies: Deformable Neural Radiance Fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 5865–5874 (2021)

work page 2021
[58]

In: Proceedings of the IEEE/CVF international conference on computer vision

Peebles, W., Xie, S.: Scalable Diffusion Models with Transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)

work page 2023
[59]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Qian, S., Kirschstein, T., Schoneveld, L., Davoli, D., Giebenhain, S., Nießner, M.: GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[60]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025)

Qiu, L., Gu, X., Li, P., Zuo, Q., Shen, W., Zhang, J., Qiu, K., Yuan, W., Chen, G., Dong, Z., Bo, L.: LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025)

work page 2025
[61]

Pf-lhm: 3d animatable avatar reconstruc- tion from pose-free articulated human images.arXiv preprint arXiv:2506.13766, 2025

Qiu, L., Li, P., Zuo, Q., Gu, X., Dong, Y., Yuan, W., Zhu, S., Han, X., Chen, G., Dong, Z.: PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images. arXiv preprint arXiv:2506.13766 (2025)

work page arXiv 2025
[62]

ACM Transactions on Graphics (TOG)42(1), 1–13 (2022)

Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal Tuning for Latent- based Editing of Real Images. ACM Transactions on Graphics (TOG)42(1), 1–13 (2022)

work page 2022
[63]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Saito, S., Schwartz, G., Simon, T., Li, J., Nam, G.: Relightable Gaussian Codec Avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 130–141 (2024)

work page 2024
[64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Shao, Z., Wang, Z., Li, Z., Wang, D., Lin, X., Zhang, Y., Fan, M., Wang, Z.: Splat- tingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16922–16932 (2024)

work page 2024
[65]

In: Proc

Sitzmann, V., Rezchikov, S., Freeman, W.T., Tenenbaum, J.B., Durand, F.: Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering. In: Proc. NeurIPS (2021)

work page 2021
[66]

arXiv preprint arXiv:2303.01416 (2023)

Skorokhodov, I., Siarohin, A., Xu, Y., Ren, J., Lee, H.Y., Wonka, P., Tulyakov, S.: 3D Generation on ImageNet. arXiv preprint arXiv:2303.01416 (2023)

work page arXiv 2023
[67]

In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018) 20 E

Sungatullina, D., Zakharov, E., Ulyanov, D., Lempitsky, V.: Image Manipulation with Perceptual Discriminators. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018) 20 E. Ntavelis et al

work page 2018
[68]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Splatter Image: Ultra-Fast Single-View 3D Reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10208–10217 (2024)

work page 2024
[69]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Tang, J., Davoli, D., Kirschstein, T., Schoneveld, L., Niessner, M.: GAF: Gaus- sian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5546–5558 (2025)

work page 2025
[70]

ACM Transactions on Graphics (SIGGRAPH Asia) 43(6) (2024)

Teotia, K., Kim, H., Garrido, P., Habermann, M., Elgharib, M., Theobalt, C.: GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations. ACM Transactions on Graphics (SIGGRAPH Asia) 43(6) (2024)

work page 2024
[71]

In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers

Teotia, K., Rhodin, H., Mendiratta, M., Kim, H., Habermann, M., Theobalt, C.: Audio Driven Universal Gaussian Head Avatars. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–12 (2025)

work page 2025
[72]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Tran, P., Zakharov, E., Ho, L.N., Tran, A.T., Hu, L., Li, H.: VOODOO 3D: Volumet- ric Portrait Disentanglement for One-Shot 3D Head Reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

work page 2024
[73]

IEEE Transactions on Pattern Analysis and Machine Intelligence , author =

Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 376–380 (1991).https://doi.org/10.1109/34.88573

work page doi:10.1109/34.88573 1991
[74]

In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017), https://proceedings.neurips.c...

work page 2017
[75]

In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2025)

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: VGGT: Visual Geometry Grounded Transformer. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2025)

work page 2025
[76]

In: CVPR

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: DUSt3R: Geometric 3D Vision Made Easy. In: CVPR. pp. 20697–20709 (2024)

work page 2024
[77]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Xiang, J., Gao, X., Guo, Y., Zhang, J.: FlashAvatar: High-fidelity head avatar with efficient gaussian embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1802–1812 (2024)

work page 2024
[78]

In: European Conference on Computer Vision

Xu, Y., Shi, Z., Yifan, W., Chen, H., Yang, C., Peng, S., Shen, Y., Wetzstein, G.: GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. In: European Conference on Computer Vision. pp. 1–20. Springer (2024)

work page 2024
[79]

Advances in Neural Information Processing Systems37, 99601–99645 (2024)

Xue, Y., Xie, X., Marin, R., Pons-Moll, G.: Human-3Diffusion: Realistic Avatar Cre- ation via Explicit 3D Consistent Diffusion Models. Advances in Neural Information Processing Systems37, 99601–99645 (2024)

work page 2024
[80]

In: Computer Graphics Forum

Yang, J., Wu, T., Fogarty, K., Zhong, F., Oztireli, C.: PSHead: 3D Head Reconstruc- tion from a Single Image with Diffusion Prior and Self-Enhancement. In: Computer Graphics Forum. p. e70279. Wiley Online Library (2025)

work page 2025

Showing first 80 references.