Recognition: no theorem link
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
Pith reviewed 2026-05-11 02:08 UTC · model grok-4.3
The pith
HeadsUp reconstructs high-quality 3D Gaussian heads from multi-view images in a single feed-forward pass.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HeadsUp is a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. It employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation, which is decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This enables training on an internal dataset with more than 10,000 subjects and achieves state-of-the-art reconstruction quality while generalizing to novel identities without test-time optimization.
What carries the argument
UV-parameterized 3D Gaussians anchored to a neutral head template, which decouples the number of Gaussians from the number and resolution of input images.
Load-bearing premise
The internal dataset of more than 10,000 subjects is diverse enough for the model to generalize accurately to unseen identities without test-time optimization.
What would settle it
Measuring reconstruction error of the feed-forward model versus per-identity optimized baselines on a public multi-view head dataset with held-out identities; substantially higher error on novel subjects would falsify the generalization claim.
Figures
read the original abstract
We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes HeadsUp, a scalable feed-forward encoder-decoder method for high-quality 3D Gaussian head reconstruction from multi-view captures. Input views are compressed into a compact latent code that is decoded into UV-parameterized 3D Gaussians anchored to a fixed neutral head template; this decouples Gaussian count from input resolution. The model is trained on an internal dataset of >10,000 subjects (an order of magnitude larger than prior multi-view head datasets), claims SOTA reconstruction quality with zero-shot generalization to novel identities, includes scaling analysis across identities/views/capacity, and demonstrates downstream uses in novel identity generation and blendshape animation.
Significance. If substantiated, the work would be significant for enabling efficient, optimization-free 3D head assets at scale, with direct value for AR/VR, animation, and graphics pipelines. The large-scale training regime, explicit scaling study, and UV-based decoupling of representation size from capture resolution are practical strengths; the feed-forward design and latent-space applications further differentiate it from per-subject optimization baselines.
major comments (2)
- [Abstract] Abstract: the central claim that HeadsUp 'achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization' is unsupported by any quantitative metrics (PSNR, SSIM, LPIPS, etc.), baseline comparisons, ablation tables, or error analysis. This is load-bearing for the primary contribution and must be addressed with explicit results tables in the experiments section.
- [Dataset and Generalization] Dataset description and generalization analysis: the zero-shot generalization claim rests on the unverified assumption that the internal >10,000-subject dataset provides sufficient coverage of identity variation (cranial proportions, ethnicity, age, capture conditions) and that the fixed neutral-template UV parameterization preserves geometric/appearance detail without significant loss. No diversity statistics, cross-dataset evaluation, or ablation against deformable templates are referenced; if these assumptions fail, both SOTA quality and feed-forward generalization would not hold.
minor comments (2)
- [Method] The method description would benefit from explicit notation for the encoder-decoder architecture, latent dimensionality, and the precise UV-to-Gaussian mapping (e.g., how position/scale/rotation attributes compensate for template rigidity).
- [Scaling Analysis] Scaling analysis section should include concrete plots or tables showing quality vs. number of identities, views, and model parameters to make the 'practical insights for quality-compute trade-offs' actionable.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps strengthen the quantitative support for our claims and the analysis of generalization. We will revise the manuscript accordingly and address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that HeadsUp 'achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization' is unsupported by any quantitative metrics (PSNR, SSIM, LPIPS, etc.), baseline comparisons, ablation tables, or error analysis. This is load-bearing for the primary contribution and must be addressed with explicit results tables in the experiments section.
Authors: We agree that the abstract claim requires explicit quantitative backing. We will add a dedicated summary table in the experiments section presenting PSNR, SSIM, LPIPS, and related metrics with direct baseline comparisons, plus error analysis to substantiate the state-of-the-art quality and zero-shot generalization without test-time optimization. revision: yes
-
Referee: [Dataset and Generalization] Dataset description and generalization analysis: the zero-shot generalization claim rests on the unverified assumption that the internal >10,000-subject dataset provides sufficient coverage of identity variation (cranial proportions, ethnicity, age, capture conditions) and that the fixed neutral-template UV parameterization preserves geometric/appearance detail without significant loss. No diversity statistics, cross-dataset evaluation, or ablation against deformable templates are referenced; if these assumptions fail, both SOTA quality and feed-forward generalization would not hold.
Authors: We acknowledge the need for stronger verification of these assumptions. We will expand the dataset section with available diversity statistics on demographics and capture conditions. We will also add an ablation comparing the fixed neutral UV template to a deformable alternative to demonstrate detail preservation. Cross-dataset evaluation is limited by the lack of other large-scale public multi-view head datasets of similar scope; we will discuss this constraint explicitly in the revision. revision: partial
- Full cross-dataset quantitative evaluation on comparable large-scale public multi-view head datasets, as no such datasets currently exist.
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper presents an empirical machine learning method: a trained encoder-decoder neural architecture that maps multi-view images to UV-parameterized 3D Gaussians on a neutral template. No mathematical derivations, equations, or uniqueness theorems are described that reduce any claimed prediction or result to the inputs by construction. Generalization to novel identities is asserted as an empirical outcome of training on the internal >10k-subject dataset and evaluating on held-out subjects, without any self-definitional fitting or renaming of known patterns. No load-bearing self-citations, smuggled ansatzes, or fitted-input predictions appear in the provided description. The central claims rest on architectural design and data scale rather than circular reductions, making the derivation self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
An, S., Xu, H., Shi, Y., Song, G., Ogras, U.Y., Luo, L.: PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20950–20959 (2023)
work page 2023
-
[2]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Athar, S., Xu, Z., Sunkavalli, K., Shechtman, E., Shu, Z.: RigNeRF: Fully Con- trollable Neural 3D Portraits. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20364–20373 (2022)
work page 2022
-
[3]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Bhattarai, A.R., Nießner, M., Sevastopolsky, A.: TriPlaneNet: An Encoder for EG3D Inversion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 3533–3542 (2024)
work page 2024
-
[4]
In: SIGGRAPH Asia 2024 Conference Papers
Bühler, M.C., Li, G., Wood, E., Helminger, L., Chen, X., Shah, T., Wang, D., Garbin, S., Orts-Escolano, S., Hilliges, O., Lagun, D., Riviere, J., Gotardo, P., Beeler, T., Meka, A., Sarkar, K.: Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures. In: SIGGRAPH Asia 2024 Conference Papers. ACM (2024).https://doi.org/10....
-
[5]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Chan, E.R., Lin, C.Z., Chan, M.A., Nagano, K., Pan, B., De Mello, S., Gallo, O., Guibas, L.J., Tremblay, J., Khamis, S., et al.: Efficient Geometry-aware 3D Generative Adversarial Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16123–16133 (2022)
work page 2022
-
[6]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19457–19467 (2024)
work page 2024
-
[7]
arXiv preprint arXiv:2406.06050 (2024)
Chen, J., Li, C., Zhang, J., Zhu, L., Huang, B., Chen, H., Lee, G.H.: Generalizable Human Gaussians from Single-View Image. arXiv preprint arXiv:2406.06050 (2024)
-
[8]
In: European Conference on Computer Vision (ECCV)
Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. In: European Conference on Computer Vision (ECCV). pp. 370–386. Springer (2024) 16 E. Ntavelis et al
work page 2024
-
[9]
Chu, X., Harada, T.: Generalizable and Animatable Gaussian Head Avatar. NeurIPS (2024)
work page 2024
-
[10]
arXiv preprint arXiv:2401.10215 (2024)
Chu, X., Li, Y., Zeng, A., Yang, T., Lin, L., Liu, Y., Harada, T.: GPAvatar: Gener- alizable and Precise Head Avatar from Image(s). arXiv preprint arXiv:2401.10215 (2024)
-
[11]
Journal of Machine Learning Research25(70), 1–53 (2024)
Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., et al.: Scaling instruction-finetuned language models. Journal of Machine Learning Research25(70), 1–53 (2024)
work page 2024
-
[12]
Debevec, P., Hawkins, T., Tchou, C., Duiker, H.P., Sarokin, W.: Acquiring the Reflectance Field of a Human Face. In: SIGGRAPH. New Orleans, LA (Jul 2000), http://ict.usc.edu/pubs/Acquiring%20the%20Re%EF%AC%82ectance%20Field% 20of%20a%20Human%20Face.pdf
work page 2000
-
[13]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)
work page 2019
-
[14]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Deng, Y., Wang, D., Ren, X., Chen, X., Wang, B.: Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[15]
In: European Conference on Computer Vision (ECCV)
Deng, Y., Wang, D., Wang, B.: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer. In: European Conference on Computer Vision (ECCV). pp. 303–321. Springer (2024)
work page 2024
-
[16]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021)
work page 2021
-
[17]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Gafni, G., Thies, J., Zollhöfer, M., Nießner, M.: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8649–8658 (2021)
work page 2021
-
[18]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Giebenhain, S., Kirschstein, T., Georgopoulos, M., Rünz, M., Agapito, L., Nießner, M.: MonoNPHM: Dynamic Head Reconstruction from Monocular Videos. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21051–21061 (2024)
work page 2024
-
[19]
In: Advances in neural information processing systems
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. In: Advances in neural information processing systems. pp. 2672–2680 (2014),http://papers.nips.cc/ paper/5423-generative-adversarial-nets.pdf
work page 2014
-
[20]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Gu, Y., Xu, H., Xie, Y., Song, G., Shi, Y., Di, Y., Ye, P., et al.: DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[21]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
work page 2016
-
[22]
He, Y., Gu, X., Ye, X., Xu, C., Zhao, Z., Dong, Y., Yuan, W., Dong, Z., Bo, L.: LAM: Large Avatar Model for One-shot Animatable Gaussian Head. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. pp. 1–13 (2025)
work page 2025
-
[23]
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPS 2021 Work- shop on Deep Generative Models and Downstream Applications (2021),https: //openreview.net/forum?id=qw8AKxfYbI HeadsUp: Large-scale Gaussian Head Reconstruction 17
work page 2021
-
[24]
arXiv preprint arXiv:2311.04400 , year=
Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., Tan, H.: LRM: Large Reconstruction Model for Single Image to 3D. arXiv preprint arXiv:2311.04400 (2023)
-
[25]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Hoogeboom, E., Mensink, T., Heek, J., Lamerigts, K., Gao, R., Salimans, T.: Simpler diffusion: 1.5 fid on imagenet512 with pixel-space diffusion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18062–18071 (2025)
work page 2025
-
[26]
International Organization for Standardization: ISO 7250-1:2017 Basic human body measurements for technological design — Part 1: Body measurement definitions and landmarks (2017), https://www.iso.org/standard/65246.html , accessed: 2026-02-16
work page 2017
-
[27]
arXiv preprint arXiv:2601.13837 (2026)
Ji, X., Weiss, S., Kansy, M., Naruniec, J., Cao, X., Solenthaler, B., Bradley, D.: FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation. arXiv preprint arXiv:2601.13837 (2026)
-
[28]
International Journal of Computer Vision129(12), 3174–3194 (2021)
Jin, H., Liao, S., Shao, L.: Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild. International Journal of Computer Vision129(12), 3174–3194 (2021)
work page 2021
-
[29]
In: European Conference on Computer Vision (2016)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: European Conference on Computer Vision (2016)
work page 2016
- [30]
-
[31]
ACM TOG42(4) (July 2023),https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM TOG42(4) (July 2023),https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
work page 2023
-
[32]
arXiv preprint arXiv:2206.08343 (2022)
Khakhulin, T., Sklyarova, V., Lempitsky, V., Zakharov, E.: Realistic One-shot Mesh-based Head Avatars. arXiv preprint arXiv:2206.08343 (2022)
- [33]
-
[34]
Adam: A Method for Stochastic Optimization
Kingma, D.P.: Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[35]
Kirillov, A., Mintun, E., Ravi, N., et al.: Segment Anything. arXiv preprint arXiv:2304.02643 (2023)
work page internal anchor Pith review arXiv 2023
-
[36]
In: SIGGRAPH Asia 2024 Conference Papers
Kirschstein, T., Giebenhain, S., Tang, J., Georgopoulos, M., Nießner, M.: GGHead: Fast and Generalizable 3D Gaussian Heads. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–11 (2024)
work page 2024
-
[37]
Kirschstein, T., Qian, S., Giebenhain, S., Walter, T., Nießner, M.: NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. ACM TOG42(4), 1–14 (2023)
work page 2023
-
[38]
Kirschstein, T., Romero, J., Sevastopolsky, A., Nießner, M., Saito, S.: Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars. arXiv preprint arXiv:2502.20220 (2025)
-
[39]
In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)
Kwon, Y., Fang, B., Lu, Y., Dong, H., Zhang, C., Vicente Carrasco, F., Mosella- Montoro, A., Xu, J., Takagi, S., Kim, D., Prakash, A., De la Torre, F.: Generalizable Human Gaussians for Sparse View Synthesis. In: Proceedings of the European Conference on Computer Vision (ECCV) (2024)
work page 2024
-
[40]
Lawrence, J., Goldman, D.B., Achar, S., Blascovich, G.M., Desloge, J.G., Fortes, T., Gomez, E.M., Häberling, S., Hoppe, H., Huibers, A., Knaus, C., Kuschak, B., Martin-Brualla, R., Nover, H., Russell, A.I., Seitz, S.M., Tong, K.: Project Starline: 18 E. Ntavelis et al. A High-Fidelity Telepresence System. ACM Transactions on Graphics (Proc. of SIGGRAPH As...
work page 2021
-
[41]
In: European Conference on Computer Vision (ECCV)
Li, H., Chen, C., Shi, T., Qiu, Y., An, S., Chen, G., et al.: SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation. In: European Conference on Computer Vision (ECCV). Springer (2024)
work page 2024
-
[42]
In: SIGGRAPH Asia 2024 Conference Papers
Li, J., Cao, C., Schwartz, G., Khirodkar, R., Saito, S., et al.: URAvatar: Universal Relightable Gaussian Codec Avatars. In: SIGGRAPH Asia 2024 Conference Papers. ACM (2024)
work page 2024
-
[43]
Li, P., He, Y., Hu, Y., Dong, Y., Yuan, W., Liu, Y., Zhu, S., Cheng, G., Dong, Z., Guo, Y.: PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image. arXiv preprint arXiv:2509.07552 (2025)
-
[44]
In: Proceedings of the computer vision and pattern recognition conference
Li, P., Zheng, W., Liu, Y., Yu, T., Li, Y., Qi, X., Chi, X., Xia, S., Cao, Y.P., Xue, W., et al.: PSHuman: Photorealistic Single-Image 3D Human Reconstruction Using Cross-Scale Multiview Diffusion and Explicit Remeshing. In: Proceedings of the computer vision and pattern recognition conference. pp. 16008–16018 (2025)
work page 2025
-
[45]
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a Model of Facial Shape and Expression from 4D Scans. ACM TOG36(6), 194–1 (2017)
work page 2017
-
[46]
arXiv preprint arXiv:2508.18389 (2025)
Liang, H., Ge, Z., Tiwari, A., Majee, S., Godaliyadda, G.M.D., Veeraraghavan, A., Balakrishnan, G.: FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses. arXiv preprint arXiv:2508.18389 (2025)
-
[47]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lin, S., Ryabtsev, A., Sengupta, S., Curless, B.L., Seitz, S.M., Kemelmacher- Shlizerman, I.: Real-Time High-Resolution Background Matting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8762–8771 (2021)
work page 2021
-
[48]
ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)
Lombardi, S., Saragih, J., Simon, T., Sheikh, Y.: Deep Appearance Models for Face Rendering. ACM Transactions on Graphics (TOG)37(4), 1–13 (2018)
work page 2018
-
[49]
In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A
Lu, C., Zhou, Y., Bao, F., Chen, J., LI, C., Zhu, J.: Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 5775–5787. Curran Associates, Inc. (2022), https://proceedings.neur...
work page 2022
-
[50]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Lu, Y., Dong, J., Kwon, Y., Zhao, Q., Dai, B., De la Torre, F.: GAS: Genera- tive Avatar Synthesis from a Single Image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12883–12893 (2025)
work page 2025
-
[51]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Lyu, W., Zhou, Y., Yang, M.H., Shu, Z.: FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12691–12701 (2025)
work page 2025
-
[52]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ma, S., Simon, T., Saragih, J., Wang, D., Li, Y., De La Torre, F., Sheikh, Y.: Pixel Codec Avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 64–73 (2021)
work page 2021
-
[53]
Martinez, J., Kim, E., Romero, J., Bagautdinov, T., Saito, S., Yu, S.I., Anderson, S., Zollhöfer, M., Wang, T.L., Bai, S., Li, C., Wei, S.E., Joshi, R., Borsos, W., Simon, T., Saragih, J., Theodosis, P., Greene, A., Josyula, A., Maeta, S.M., Jewett, A.I., Venshtain, S., Heilman, C., Chen, Y.T., Fu, S., Elshaer, M.E.A., Du, T., Wu, L., Chen, S.C., Kang, K....
work page 2024
-
[54]
In: European Conference on Computer Vision (ECCV) (2020)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In: European Conference on Computer Vision (ECCV) (2020)
work page 2020
-
[55]
DINOv2: Learning Robust Visual Features without Supervision
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: DINOv2: Learning Robust Visual Features without Supervision. arXiv preprint arXiv:2304.07193 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[56]
PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing
Oroz, A., Nießner, M., Kirschstein, T.: PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing. arXiv preprint arXiv:2511.02777 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin- Brualla, R.: Nerfies: Deformable Neural Radiance Fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 5865–5874 (2021)
work page 2021
-
[58]
In: Proceedings of the IEEE/CVF international conference on computer vision
Peebles, W., Xie, S.: Scalable Diffusion Models with Transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)
work page 2023
-
[59]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Qian, S., Kirschstein, T., Schoneveld, L., Davoli, D., Giebenhain, S., Nießner, M.: GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[60]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025)
Qiu, L., Gu, X., Li, P., Zuo, Q., Shen, W., Zhang, J., Qiu, K., Yuan, W., Chen, G., Dong, Z., Bo, L.: LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025)
work page 2025
-
[61]
Qiu, L., Li, P., Zuo, Q., Gu, X., Dong, Y., Yuan, W., Zhu, S., Han, X., Chen, G., Dong, Z.: PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images. arXiv preprint arXiv:2506.13766 (2025)
-
[62]
ACM Transactions on Graphics (TOG)42(1), 1–13 (2022)
Roich, D., Mokady, R., Bermano, A.H., Cohen-Or, D.: Pivotal Tuning for Latent- based Editing of Real Images. ACM Transactions on Graphics (TOG)42(1), 1–13 (2022)
work page 2022
-
[63]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Saito, S., Schwartz, G., Simon, T., Li, J., Nam, G.: Relightable Gaussian Codec Avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 130–141 (2024)
work page 2024
-
[64]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Shao, Z., Wang, Z., Li, Z., Wang, D., Lin, X., Zhang, Y., Fan, M., Wang, Z.: Splat- tingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16922–16932 (2024)
work page 2024
- [65]
-
[66]
arXiv preprint arXiv:2303.01416 (2023)
Skorokhodov, I., Siarohin, A., Xu, Y., Ren, J., Lee, H.Y., Wonka, P., Tulyakov, S.: 3D Generation on ImageNet. arXiv preprint arXiv:2303.01416 (2023)
-
[67]
In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018) 20 E
Sungatullina, D., Zakharov, E., Ulyanov, D., Lempitsky, V.: Image Manipulation with Perceptual Discriminators. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018) 20 E. Ntavelis et al
work page 2018
-
[68]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Splatter Image: Ultra-Fast Single-View 3D Reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10208–10217 (2024)
work page 2024
-
[69]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Tang, J., Davoli, D., Kirschstein, T., Schoneveld, L., Niessner, M.: GAF: Gaus- sian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5546–5558 (2025)
work page 2025
-
[70]
ACM Transactions on Graphics (SIGGRAPH Asia) 43(6) (2024)
Teotia, K., Kim, H., Garrido, P., Habermann, M., Elgharib, M., Theobalt, C.: GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations. ACM Transactions on Graphics (SIGGRAPH Asia) 43(6) (2024)
work page 2024
-
[71]
In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers
Teotia, K., Rhodin, H., Mendiratta, M., Kim, H., Habermann, M., Theobalt, C.: Audio Driven Universal Gaussian Head Avatars. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–12 (2025)
work page 2025
-
[72]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Tran, P., Zakharov, E., Ho, L.N., Tran, A.T., Hu, L., Li, H.: VOODOO 3D: Volumet- ric Portrait Disentanglement for One-Shot 3D Head Reenactment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
work page 2024
-
[73]
IEEE Transactions on Pattern Analysis and Machine Intelligence , author =
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 376–380 (1991).https://doi.org/10.1109/34.88573
-
[74]
In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017), https://proceedings.neurips.c...
work page 2017
-
[75]
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: VGGT: Visual Geometry Grounded Transformer. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR) (2025)
work page 2025
- [76]
-
[77]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Xiang, J., Gao, X., Guo, Y., Zhang, J.: FlashAvatar: High-fidelity head avatar with efficient gaussian embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1802–1812 (2024)
work page 2024
-
[78]
In: European Conference on Computer Vision
Xu, Y., Shi, Z., Yifan, W., Chen, H., Yang, C., Peng, S., Shen, Y., Wetzstein, G.: GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. In: European Conference on Computer Vision. pp. 1–20. Springer (2024)
work page 2024
-
[79]
Advances in Neural Information Processing Systems37, 99601–99645 (2024)
Xue, Y., Xie, X., Marin, R., Pons-Moll, G.: Human-3Diffusion: Realistic Avatar Cre- ation via Explicit 3D Consistent Diffusion Models. Advances in Neural Information Processing Systems37, 99601–99645 (2024)
work page 2024
-
[80]
Yang, J., Wu, T., Fogarty, K., Zhong, F., Oztireli, C.: PSHead: 3D Head Reconstruc- tion from a Single Image with Diffusion Prior and Self-Enhancement. In: Computer Graphics Forum. p. e70279. Wiley Online Library (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.