Recognition: unknown
MeshLAM: Feed-Forward One-Shot Animatable Textured Mesh Avatar Reconstruction
Pith reviewed 2026-05-09 22:04 UTC · model grok-4.3
The pith
A single forward pass from one image yields a complete, animatable textured 3D head mesh.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MeshLAM produces complete mesh representations with inherent animatability from a single image in a single forward pass by employing a dual shape and texture map architecture that simultaneously processes mesh vertices and texture maps with extracted image features from a shared transformer backbone, using an iterative GRU-based decoding mechanism with progressive geometry deformation and texture refinement together with a reprojection-based texture guidance mechanism.
What carries the argument
Dual shape and texture map architecture with shared transformer backbone and iterative GRU-based progressive deformation, which jointly carves geometry and refines appearance while preventing collapse.
If this is right
- The output meshes are immediately usable for animation without additional rigging steps.
- Reconstruction quality, animation fidelity, and runtime speed all exceed those of optimization-based and multi-view baselines.
- Topological integrity is maintained through the progressive deformation schedule even in a feed-forward setting.
- Appearance remains anchored to the input image via the reprojection guidance term.
Where Pith is reading between the lines
- The same single-pass pipeline could support real-time avatar updates if applied frame-by-frame to video.
- If the deformation mechanism generalizes, the approach might extend to full-body or hand meshes with minimal changes.
- Deployment on mobile devices becomes feasible once the forward pass fits within typical GPU memory limits.
- Failure on extreme head poses would point to the need for stronger pose-invariant feature extraction.
Load-bearing premise
The shared transformer plus iterative GRU refinement can preserve mesh topology and produce coherent textures from single-image features alone.
What would settle it
Run the model on a held-out single image, apply standard facial animation rigs to the output mesh, and check whether self-intersections appear or the reprojected texture deviates from the input photo.
Figures
read the original abstract
We introduce MeshLAM, a feed-forward framework for one-shot animatable mesh head reconstruction that generates high-fidelity, animatable 3D head avatars from a single image. Unlike previous work that relies on time-consuming test-time optimization or extensive multi-view data, our method produces complete mesh representations with inherent animatability from a single image in a single forward pass. Our approach employs a dual shape and texture map architecture that simultaneously processes mesh vertices and texture map with extracted image features from a shared transformer backbone, allowing for coherent shape carving and appearance modeling. To prevent mesh collapse and ensure topological integrity during feed-forward deformation, we propose an iterative GRU-based decoding mechanism with progressive geometry deformation and texture refinement, coupled with a novel reprojection-based texture guidance mechanism that anchors appearance learning to the input image. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in reconstruction quality, animation capability, and computational efficiency. Project page at https://meshlam.github.io.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MeshLAM, a feed-forward framework for one-shot reconstruction of animatable textured mesh avatars from a single image. It employs a dual shape and texture map architecture with a shared transformer backbone and an iterative GRU-based progressive deformation mechanism with reprojection-based texture guidance to produce complete meshes that are inherently animatable without test-time optimization or multi-view data. The authors claim superior performance over state-of-the-art methods in reconstruction quality, animation capability, and efficiency.
Significance. Should the quantitative results and ablations support the claims, this would be a notable contribution to the field of 3D avatar reconstruction by enabling efficient, single-image feed-forward generation of animatable meshes, which could impact applications in AR, VR, and digital humans. The avoidance of optimization at test time is particularly promising for real-time use cases.
major comments (1)
- The abstract states that the GRU-based mechanism 'prevents mesh collapse' and ensures 'topological integrity' during feed-forward deformation, but provides no details on explicit constraints such as fixed connectivity, Laplacian regularization, or collision terms. This is load-bearing for the central 'inherent animatability' claim, as monocular depth ambiguity could otherwise allow folding or inconsistent vertex/UV coherence under large expression changes.
minor comments (2)
- The abstract references 'extensive experiments' demonstrating outperformance but includes no quantitative metrics, ablation results, or error analysis; a brief summary of key numbers (e.g., reconstruction error, animation fidelity) should be added.
- Notation for the dual shape-texture architecture and the reprojection-based guidance is introduced without equations or pseudocode in the provided text; adding these would improve clarity of the iterative decoding process.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comment below and will revise the manuscript to provide additional clarifications on the mechanisms for maintaining mesh stability.
read point-by-point responses
-
Referee: The abstract states that the GRU-based mechanism 'prevents mesh collapse' and ensures 'topological integrity' during feed-forward deformation, but provides no details on explicit constraints such as fixed connectivity, Laplacian regularization, or collision terms. This is load-bearing for the central 'inherent animatability' claim, as monocular depth ambiguity could otherwise allow folding or inconsistent vertex/UV coherence under large expression changes.
Authors: We thank the referee for highlighting this important point regarding clarity. In MeshLAM, we employ a fixed-topology template mesh (detailed in Section 3.1), with predefined vertex connectivity that remains constant throughout the deformation process; this design choice inherently preserves topology without requiring additional constraints. The iterative GRU-based decoding (Section 3.2) performs progressive, multi-step geometry refinement rather than a single large update, which reduces the likelihood of folding or collapse arising from monocular depth ambiguity. The reprojection-based texture guidance mechanism further promotes coherence by anchoring texture predictions to the input image via differentiable rendering, helping maintain vertex-UV consistency during animation. While our loss does not include explicit Laplacian regularization or collision terms, the combination of fixed connectivity, iterative refinement, and guidance has yielded stable results in our experiments and ablations. We agree the abstract and method section would benefit from more explicit discussion of these aspects and will revise accordingly, including expanded text and a supporting figure in the next version. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper presents an architectural ML model (shared transformer + dual shape/texture maps + iterative GRU decoding with reprojection guidance) for single-image mesh avatar reconstruction. No equations, first-principles derivations, or 'predictions' are described that reduce to fitted inputs or self-citations by construction. The central claim of inherent animatability is an empirical outcome of the feed-forward network design rather than a mathematical identity or renamed fit. The provided abstract and context contain no load-bearing self-citations or ansatzes that collapse the result to its own inputs, making the derivation self-contained as a standard neural architecture proposal.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Neural networks trained on appropriate 3D head datasets can generalize to produce topologically valid meshes from single images.
invented entities (1)
-
MeshLAM dual shape-texture architecture
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Rignerf: Fully controllable neural 3d portraits
ShahRukh Athar, Zexiang Xu, Kalyan Sunkavalli, Eli Shecht- man, and Zhixin Shu. Rignerf: Fully controllable neural 3d portraits. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 20332–20341. IEEE, 2022. 2
2022
-
[2]
High-fidelity facial avatar reconstruction from monocular video with generative priors
Yunpeng Bai, Yanbo Fan, Xuan Wang, Yong Zhang, Jingx- iang Sun, Chun Yuan, and Ying Shan. High-fidelity facial avatar reconstruction from monocular video with generative priors. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 4541–4551. IEEE, 2023. 2
2023
-
[3]
Learning per- sonalized high quality volumetric head avatars from monoc- ular RGB videos
Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Ming- song Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, and Yinda Zhang. Learning per- sonalized high quality volumetric head avatars from monoc- ular RGB videos. InIEEE/CVF Conference on Computer Vision and Pattern R...
2023
-
[4]
A morphable model for the synthesis of 3d faces
V olker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. InProceedings of the 26th Annual Con- ference on Computer Graphics and Interactive Techniques, SIGGRAPH 1999, Los Angeles, CA, USA, August 8-13, 1999, pages 187–194. ACM, 1999. 2
1999
-
[5]
How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230, 000 3d facial landmarks)
Adrian Bulat and Georgios Tzimiropoulos. How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230, 000 3d facial landmarks). InIEEE Interna- tional Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 1021–1030. IEEE Com- puter Society, 2017. 7
2017
-
[6]
Lempitsky
Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor S. Lempitsky. Neural head reenactment with latent pose descrip- tors. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13- 19, 2020, pages 13783–13792. Computer Vision Foundation / IEEE, 2020. 2
2020
-
[7]
Generalizable and ani- matable gaussian head avatar
Xuangeng Chu and Tatsuya Harada. Generalizable and ani- matable gaussian head avatar. InAdvances in Neural Infor- mation Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Van- couver, BC, Canada, December 10 - 15, 2024, 2024. 2, 5, 6, 9, 1
2024
-
[8]
Gpavatar: Generalizable and precise head avatar from image(s)
Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, and Tatsuya Harada. Gpavatar: Generalizable and precise head avatar from image(s). InThe Twelfth Interna- tional Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. 6
2024
-
[9]
Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, and Jingdong Wang. Hallo2: Long-duration and high-resolution audio-driven portrait im- age animation.CoRR, abs/2410.07718, 2024. 2
-
[10]
Arcface: Additive angular margin loss for deep face recognition.IEEE Trans
Jiankang Deng, Jia Guo, Jing Yang, Niannan Xue, Irene Kotsia, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition.IEEE Trans. Pattern Anal. Mach. Intell., 44(10):5962–5979, 2022. 6
2022
-
[11]
Portrait4d-v2: Pseudo multi-view data creates better 4d head synthesizer
Yu Deng, Duomin Wang, and Baoyuan Wang. Portrait4d-v2: Pseudo multi-view data creates better 4d head synthesizer. In Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XVII, pages 316–333. Springer, 2024. 1
2024
-
[12]
Face parsing model
Jonathan Dinu. Face parsing model. https : / / huggingface . co / jonathandinu / face - parsing, 2022. Accessed: 2025-01-23. 5, 6, 7
2022
-
[14]
Mega- portraits: One-shot megapixel neural head avatars
Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov. Mega- portraits: One-shot megapixel neural head avatars. InMM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, pages 2663–2671. ACM, 2022. 2
2022
-
[15]
Dynamic neural radiance fields for monocular 4d facial avatar reconstruction
Guy Gafni, Justus Thies, Michael Zollhöfer, and Matthias Nießner. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 8649–8658. Computer Vision Foundation / IEEE, 2021. 2
2021
-
[16]
Reconstructing personalized seman- tic facial nerf models from monocular video.ACM Trans
Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, and Juyong Zhang. Reconstructing personalized seman- tic facial nerf models from monocular video.ACM Trans. Graph., 41(6):200:1–200:12, 2022. 2
2022
-
[17]
Morphable face models - an open frame- work
Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Lüthi, Sandro Schönborn, and Thomas Vetter. Morphable face models - an open frame- work. In13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, Xi’an, China, May 15-19, 2018, pages 75–82. IEEE Computer Society, 2018. 2
2018
-
[18]
Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems 27: Annual Confer- ence on Neural Information Processing Systems 2014, Decem- ber 8-13 2014, Montreal, Quebec, Canada, pages 2672–2680,
2014
-
[19]
arXiv preprint arXiv:2407.03168 , year =
Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, and Di Zhang. Liveportrait: Effi- 9 cient portrait animation with stitching and retargeting control. CoRR, abs/2407.03168, 2024. 2
-
[21]
Ad-nerf: Audio driven neural ra- diance fields for talking head synthesis
Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, and Juyong Zhang. Ad-nerf: Audio driven neural ra- diance fields for talking head synthesis. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5764–
2021
-
[22]
Efros, Aleksander Holynski, and Angjoo Kanazawa
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct-nerf2nerf: Edit- ing 3d scenes with instructions. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 19683–19693. IEEE, 2023. 7
2023
-
[23]
Freditor: High-fidelity and transfer- able nerf editing by frequency decomposition
Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, and Qixing Huang. Freditor: High-fidelity and transfer- able nerf editing by frequency decomposition. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XLI, pages 73–91. Springer, 2024. 7
2024
-
[24]
Lam: Large avatar model for one-shot animatable gaus- sian head
Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, and Liefeng Bo. Lam: Large avatar model for one-shot animatable gaus- sian head. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–13, 2025. 1, 2, 3, 9
2025
-
[25]
Gans trained by a two time-scale update rule converge to a local nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. 6
2017
-
[26]
Depth- aware generative adversarial network for talking head video generation
Fa-Ting Hong, Longhao Zhang, Li Shen, and Dan Xu. Depth- aware generative adversarial network for talking head video generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 3387–3396. IEEE, 2022. 2
2022
-
[27]
Headnerf: A real-time nerf-based parametric head model.CoRR, abs/2112.05637, 2021
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. Headnerf: A real-time nerf-based parametric head model.CoRR, abs/2112.05637, 2021. 2
-
[28]
Yingdong Hu, Yisheng He, Jinnan Chen, Weihao Yuan, Kejie Qiu, Zehong Lin, Siyu Zhu, Zilong Dong, and Jun Zhang. Forge4d: Feed-forward 4d human reconstruction and interpo- lation from uncalibrated sparse-view videos.arXiv preprint arXiv:2509.24209, 2025. 1
-
[29]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial net- works. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21- 26, 2017, pages 5967–5976. IEEE Computer Society, 2017. 2
2017
-
[30]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recogni- tion, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4401–4410. Computer Vision Foundation / IEEE, 2019. 2
2019
-
[31]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139:1–139:14,
-
[32]
Realistic one-shot mesh-based head avatars
Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, and Egor Zakharov. Realistic one-shot mesh-based head avatars. InComputer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part II, pages 345–362. Springer, 2022. 1, 2
2022
-
[33]
Realistic one-shot mesh-based head avatars
Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, and Egor Zakharov. Realistic one-shot mesh-based head avatars. InEuropean Conference on Computer Vision (ECCV), 2022. 2, 7
2022
-
[34]
Sapiens: Foundation for human vision mod- els.arXiv preprint arXiv:2408.12569, 2024
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Su Zhaoen, Austin James, Peter Selednik, Stuart Anderson, and Shunsuke Saito. Sapiens: Foundation for human vision mod- els.arXiv preprint arXiv:2408.12569, 2024. 5
-
[35]
Learn- ing to generate conditional tri-plane for 3d-aware expression controllable portrait animation
Taekyung Ki, Dongchan Min, and Gyeongsu Chae. Learn- ing to generate conditional tri-plane for 3d-aware expression controllable portrait animation. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part I, pages 476–493. Springer, 2024. 2
2024
-
[36]
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, and Shunsuke Saito. Avat3r: Large an- imatable gaussian reconstruction model for high-fidelity 3d head avatars.arXiv preprint arXiv:2502.20220, 2025. 2
-
[37]
Peng Li, Yisheng He, Yingdong Hu, Yuan Dong, Weihao Yuan, Yuan Liu, Siyu Zhu, Gang Cheng, Zilong Dong, and Yike Guo. Panolam: Large avatar model for gaussian full- head synthesis from one-shot unposed image.arXiv preprint arXiv:2509.07552, 2025. 1, 2
-
[38]
Black, Hao Li, and Javier Romero
Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4d scans.ACM Trans. Graph., 36(6):194:1–194:17,
-
[39]
Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and expres- sion from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 1, 3
2017
-
[40]
One-shot high-fidelity talking- head synthesis with deformable neural radiance field
Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhi- gang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, and Xuelong Li. One-shot high-fidelity talking- head synthesis with deformable neural radiance field. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17- 24, 2023, pages 17969–17978...
2023
-
[41]
Soap: Style-omniscient animatable portraits
Tingting Liao, Yujian Zheng, Adilbek Karmanov, Liwen Hu, Leyang Jin, Yuliang Xiu, and Hao Li. Soap: Style-omniscient animatable portraits. InACM SIGGRAPH 2025 Conference Proceedings, 2025. 2, 5
2025
-
[42]
Step1X-Edit: A Practical Framework for General Image Editing
Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chun- 10 rui Han, et al. Step1x-edit: A practical framework for general image editing.arXiv preprint arXiv:2504.17761, 2025. 7, 8
work page internal anchor Pith review arXiv 2025
-
[43]
Otavatar: One-shot talking face avatar with con- trollable tri-plane rendering
Zhiyuan Ma, Xiangyu Zhu, Guojun Qi, Zhen Lei, and Lei Zhang. Otavatar: One-shot talking face avatar with con- trollable tri-plane rendering. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Van- couver, BC, Canada, June 17-24, 2023, pages 16901–16910. IEEE, 2023. 2
2023
-
[44]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InComputer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, pages 405–421. Springer, 2020. 2
2020
-
[45]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InComputer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, pages 405–421. Springer, 2020. 3
2020
-
[46]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rab- bat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patrick...
2024
-
[47]
Continuous remeshing for inverse rendering
Werner Palfinger. Continuous remeshing for inverse rendering. Comput. Animat. Virtual Worlds, 33(5), 2022. 5
2022
-
[48]
Barron, Sofien Bouaziz, Dan B
Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B. Goldman, Steven M. Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5845–5854. IEEE, 2021. 2
2021
-
[49]
Barron, Sofien Bouaziz, Dan B
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B. Goldman, Ricardo Martin- Brualla, and Steven M. Seitz. Hypernerf: a higher- dimensional representation for topologically varying neural radiance fields.ACM Trans. Graph., 40(6):238:1–238:12,
-
[50]
A 3d face model for pose and illumination invariant face recognition
Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romd- hani, and Thomas Vetter. A 3d face model for pose and illumination invariant face recognition. InSixth IEEE Inter- national Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, 2-4 September 2009, Genova, Italy, pages 296–301. IEEE Computer Society, 2009. 2
2009
-
[51]
Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 20299–20309. IEEE, 2024. 1, 2, 6
2024
-
[52]
Vi- sion transformers for dense prediction
René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 12159– 12168. IEEE, 2021. 3, 4
2021
-
[53]
Li, and Shan Liu
Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, and Shan Liu. Pirenderer: Controllable portrait image generation via semantic neural rendering. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 13739–13748. IEEE,
2021
-
[54]
First order motion model for image animation
Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. First order motion model for image animation. InAdvances in Neural Information Process- ing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 7135–7145, 2019. 2
2019
-
[55]
Next3d: Genera- tive neural texture rasterization for 3d-aware head avatars
Jingxiang Sun, Xuan Wang, Lizhen Wang, Xiaoyu Li, Yong Zhang, Hongwen Zhang, and Yebin Liu. Next3d: Genera- tive neural texture rasterization for 3d-aware head avatars. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17- 24, 2023, pages 20991–21002. IEEE, 2023. 2
2023
-
[56]
CGOF++: controllable 3d face synthesis with conditional generative occupancy fields
Keqiang Sun, Shangzhe Wu, Ning Zhang, Zhaoyang Huang, Quan Wang, and Hongsheng Li. CGOF++: controllable 3d face synthesis with conditional generative occupancy fields. IEEE Trans. Pattern Anal. Mach. Intell., 46(2):913–926, 2024. 2
2024
-
[57]
Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, and Jing- dong Wang. Real-time neural radiance talking portrait synthe- sis via audio-spatial decomposition.CoRR, abs/2211.12368,
-
[58]
3dfaceshop: Explicitly controllable 3d-aware portrait generation.IEEE Trans
Junshu Tang, Bo Zhang, Binxin Yang, Ting Zhang, Dong Chen, Lizhuang Ma, and Fang Wen. 3dfaceshop: Explicitly controllable 3d-aware portrait generation.IEEE Trans. Vis. Comput. Graph., 30(9):6020–6037, 2024. 2
2024
-
[59]
EMO: emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions
Linrui Tian, Qi Wang, Bang Zhang, and Liefeng Bo. EMO: emote portrait alive generating expressive portrait videos with audio2video diffusion model under weak conditions. InCom- puter Vision - ECCV 2024 - 18th European Conference, Mi- lan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXIII, pages 244–260. Springer, 2024. 2
2024
-
[60]
Non- rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video
Edgar Tretschk, Ayush Tewari, Vladislav Golyanik, Michael Zollhöfer, Christoph Lassner, and Christian Theobalt. Non- rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 12939–12...
2021
-
[61]
Progressive disentangled representation learning for fine-grained controllable talking head synthesis
Duomin Wang, Yu Deng, Zixin Yin, Heung-Yeung Shum, and Baoyuan Wang. Progressive disentangled representation learning for fine-grained controllable talking head synthesis. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17- 24, 2023, pages 17979–17989. IEEE, 2023. 2
2023
-
[62]
Gaussianeditor: Editing 3d gaussians delicately with text instructions
Junjie Wang, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, and Qi Tian. Gaussianeditor: Editing 3d gaussians delicately with text instructions. InIEEE/CVF Conference on Computer 11 Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 20902–20911. IEEE, 2024. 7
2024
-
[63]
One-shot free-view neural talking-head synthesis for video conferenc- ing
Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. One-shot free-view neural talking-head synthesis for video conferenc- ing. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 10039–10049. Computer Vision Foundation / IEEE, 2021. 1, 2
2021
-
[64]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 7, 8
work page internal anchor Pith review arXiv 2025
-
[65]
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, et al. Sana: Efficient high-resolution image synthesis with lin- ear diffusion transformers.arXiv preprint arXiv:2410.10629,
work page internal anchor Pith review arXiv
-
[66]
VFHQ: A high-quality dataset and benchmark for video face super-resolution
Liangbin Xie, Xintao Wang, Honglun Zhang, Chao Dong, and Ying Shan. VFHQ: A high-quality dataset and benchmark for video face super-resolution. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19-20, 2022, pages 656–665. IEEE, 2022. 6
2022
-
[67]
PV3D: A 3d generative model for portrait video generation
Eric Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wen- qing Zhang, Song Bai, Jiashi Feng, and Mike Zheng Shou. PV3D: A 3d generative model for portrait video generation. InThe Eleventh International Conference on Learning Rep- resentations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. 2
2023
-
[68]
Mingwang Xu, Hui Li, Qingkun Su, Hanlin Shang, Liwei Zhang, Ce Liu, Jingdong Wang, Yao Yao, and Siyu Zhu. Hallo: Hierarchical audio-driven visual synthesis for portrait image animation.CoRR, abs/2406.08801, 2024. 2
-
[69]
Deep 3d portrait from a single image
Sicheng Xu, Jiaolong Yang, Dong Chen, Fang Wen, Yu Deng, Yunde Jia, and Xin Tong. Deep 3d portrait from a single image. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 7707–7717. Computer Vision Foundation / IEEE, 2020. 1, 2
2020
-
[70]
Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaus- sians
Yuelang Xu, Bengwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, and Yebin Liu. Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaus- sians. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 1931–1941. IEEE, 2024. 2
2024
-
[71]
Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan
Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, and Yujiu Yang. Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan. InComputer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVII, pages 85–101. Sprin...
2022
-
[72]
NOFA: nerf-based one-shot facial avatar reconstruction
Wangbo Yu, Yanbo Fan, Yong Zhang, Xuan Wang, Fei Yin, Yunpeng Bai, Yan-Pei Cao, Ying Shan, Yang Wu, Zhongqian Sun, and Baoyuan Wu. NOFA: nerf-based one-shot facial avatar reconstruction. InACM SIGGRAPH 2023 Conference Proceedings, SIGGRAPH 2023, Los Angeles, CA, USA, Au- gust 6-10, 2023, pages 85:1–85:12. ACM, 2023. 2
2023
-
[73]
Lempitsky
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor S. Lempitsky. Few-shot adversarial learning of realistic neural talking head models. In2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 9458–9467. IEEE, 2019. 2
2019
-
[74]
Metaportrait: Identity-preserving talking head genera- tion with fast personalized adaptation
Bowen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, Hsiang- Tao Wu, Dong Chen, Qifeng Chen, Yong Wang, and Fang Wen. Metaportrait: Identity-preserving talking head genera- tion with fast personalized adaptation. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 22096–22105. IEEE, 2023. 2
2023
-
[75]
Sadtalker: Learning realistic 3d motion coefficients for stylized audio- driven single image talking face animation
Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, and Fei Wang. Sadtalker: Learning realistic 3d motion coefficients for stylized audio- driven single image talking face animation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 8652–8661. IEEE, 2023. 2
2023
-
[76]
Sadtalker: Learning realistic 3d motion coefficients for stylized audio- driven single image talking face animation
Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, and Fei Wang. Sadtalker: Learning realistic 3d motion coefficients for stylized audio- driven single image talking face animation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 8652–8661. IEEE, 2023. 1, 2
2023
-
[77]
Learning dynamic tetrahedra for high- quality talking head synthesis
Zicheng Zhang, Ruobing Zheng, Bonan Li, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Ziwen Liu, and Ming Yang. Learning dynamic tetrahedra for high- quality talking head synthesis. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 5209–5219. IEEE,
2024
-
[78]
Havatar: High-fidelity head avatar via facial model conditioned neural radiance field
Xiaochen Zhao, Lizhen Wang, Jingxiang Sun, Hongwen Zhang, Jinli Suo, and Yebin Liu. Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM Trans. Graph., 43(1):6:1–6:16, 2024. 1, 2
2024
-
[79]
Black, and Otmar Hilliges
Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, and Otmar Hilliges. Pointavatar: Deformable point- based head avatars from videos. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 21057– 21067. IEEE, 2023. 2
2023
-
[80]
Peiye Zhuang, Liqian Ma, Sanmi Koyejo, and Alexander G. Schwing. Controllable radiance fields for dynamic face syn- thesis. InInternational Conference on 3D Vision, 3DV 2022, Prague, Czech Republic, September 12-16, 2022, pages 1–11. IEEE, 2022. 2
2022
-
[81]
Instant volumetric head avatars
Wojciech Zielonka, Timo Bolkart, and Justus Thies. Instant volumetric head avatars. InIEEE/CVF Conference on Com- puter Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 4574–4584. IEEE,
2023
-
[82]
More Results Visualization of Reconstructed Geometry and Texture Map.As shown in Fig
2 12 A. More Results Visualization of Reconstructed Geometry and Texture Map.As shown in Fig. 7, our method produces high- quality geometry and texture outputs from a single input im- age. The reconstructed mesh faithfully captures the subject’s facial structure, hair volume, and accessory shapes, demon- strating significant deformation from the FLAME tem...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.