Recognition: unknown
Giving Faces Their Feelings Back: Explicit Emotion Control for Feedforward Single-Image 3D Head Avatars
Pith reviewed 2026-05-10 11:45 UTC · model grok-4.3
The pith
Emotion can be treated as an independent control signal in single-image 3D head avatars.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a framework for explicit emotion control in feed-forward, single-image 3D head avatar reconstruction. Unlike existing pipelines where emotion is implicitly entangled with geometry or appearance, we treat emotion as a first-class control signal that can be manipulated independently and consistently across identities. Our method injects emotion into existing feed-forward architectures via a dual-path modulation mechanism without modifying their core design. Geometry modulation performs emotion-conditioned normalization in the original parametric space, disentangling emotional state from speech-driven articulation, while appearance modulation captures identity-aware, emotiondependent
What carries the argument
Dual-path modulation mechanism that separates geometry modulation (emotion-conditioned normalization in parametric space) from appearance modulation (identity-aware emotion-dependent cues).
If this is right
- Existing feed-forward 3D head avatar architectures gain emotion control without changes to their core design.
- Emotion transfer becomes controllable and consistent across different identities.
- Emotional state can be disentangled from speech-driven articulation for separate manipulation.
- Smooth interpolation between emotional states is supported while preserving reconstruction fidelity.
Where Pith is reading between the lines
- This separation could support real-time emotion editing in virtual meetings or games using single photos.
- The dataset construction technique might apply to other disentanglement tasks like age or lighting control.
- Extending the modulation to video inputs could enable dynamic emotion sequences beyond static images.
Load-bearing premise
That emotional dynamics can be transferred and aligned across different identities to create a time-synchronized dataset without artifacts or identity leakage.
What would settle it
Apply the same emotion sequence to multiple identities in the dataset and check whether the outputs show consistent timing, no visible artifacts, and no identity mixing.
Figures
read the original abstract
We present a framework for explicit emotion control in feed-forward, single-image 3D head avatar reconstruction. Unlike existing pipelines where emotion is implicitly entangled with geometry or appearance, we treat emotion as a first-class control signal that can be manipulated independently and consistently across identities. Our method injects emotion into existing feed-forward architectures via a dual-path modulation mechanism without modifying their core design. Geometry modulation performs emotion-conditioned normalization in the original parametric space, disentangling emotional state from speech-driven articulation, while appearance modulation captures identity-aware, emotion-dependent visual cues beyond geometry. To enable learning under this setting, we construct a time-synchronized, emotion-consistent multi-identity dataset by transferring aligned emotional dynamics across identities. Integrated into multiple state-of-the-art backbones, our framework preserves reconstruction and reenactment fidelity while enabling controllable emotion transfer, disentangled manipulation, and smooth emotion interpolation, advancing expressive and scalable 3D head avatars.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a framework for explicit emotion control in feed-forward single-image 3D head avatar reconstruction. It treats emotion as a first-class independent control signal and injects it into existing architectures via a dual-path modulation mechanism (geometry modulation through emotion-conditioned normalization in parametric space to disentangle from speech-driven articulation, plus appearance modulation for identity-aware emotion cues) without altering the core backbone design. A key enabler is the construction of a time-synchronized, emotion-consistent multi-identity dataset via transfer of aligned emotional dynamics across identities. The method is claimed to preserve reconstruction/reenactment fidelity while supporting controllable emotion transfer, disentangled manipulation, and smooth interpolation.
Significance. If the central claims hold after validation, the work would be a meaningful contribution to 3D head avatar research by enabling practical, backbone-agnostic addition of explicit emotion control. The dual-path design and cross-identity dataset construction address a real entanglement issue in current pipelines. Credit is due for the emphasis on integration without core modifications and the focus on consistency across identities, which could improve scalability of expressive avatars.
major comments (2)
- [Abstract / Dataset Construction] Dataset construction (as described in the abstract): the central claim that emotion can be manipulated independently and consistently across identities rests on the transferred multi-identity corpus preserving disentanglement. No quantitative validation is supplied (e.g., identity classification accuracy on neutral frames, emotion consistency scores across transferred sequences, or metrics for misalignment/artifacts), which directly risks the dual-path modulation learning correlated rather than independent factors.
- [Abstract / Evaluation] Evaluation and results sections: the abstract asserts that the framework 'preserves reconstruction and reenactment fidelity' and enables 'controllable emotion transfer' when integrated into multiple state-of-the-art backbones, yet no quantitative results, ablation studies, baseline comparisons, or implementation details (losses, training procedure, modulation equations) are provided. This makes it impossible to assess whether the geometry normalization truly disentangles emotional state from articulation or introduces artifacts.
minor comments (1)
- [Abstract] The abstract uses the phrase 'giving faces their feelings back' in the title but does not clarify how this relates to prior implicit emotion handling in the literature; a brief positioning sentence would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential contribution of the dual-path modulation approach and the cross-identity dataset construction. We address each major comment below, providing clarifications and committing to revisions where the manuscript can be strengthened without misrepresenting the presented work.
read point-by-point responses
-
Referee: [Abstract / Dataset Construction] Dataset construction (as described in the abstract): the central claim that emotion can be manipulated independently and consistently across identities rests on the transferred multi-identity corpus preserving disentanglement. No quantitative validation is supplied (e.g., identity classification accuracy on neutral frames, emotion consistency scores across transferred sequences, or metrics for misalignment/artifacts), which directly risks the dual-path modulation learning correlated rather than independent factors.
Authors: We agree that explicit quantitative validation of the transferred dataset would better support the disentanglement claim. The manuscript describes the construction via transfer of aligned emotional dynamics to maintain time-synchronization and emotion consistency across identities, but does not report the suggested metrics. In the revised version we will add quantitative evaluations, including identity classification accuracy on neutral frames, emotion consistency scores across transferred sequences, and misalignment metrics, to demonstrate that the corpus preserves independent factors. revision: yes
-
Referee: [Abstract / Evaluation] Evaluation and results sections: the abstract asserts that the framework 'preserves reconstruction and reenactment fidelity' and enables 'controllable emotion transfer' when integrated into multiple state-of-the-art backbones, yet no quantitative results, ablation studies, baseline comparisons, or implementation details (losses, training procedure, modulation equations) are provided. This makes it impossible to assess whether the geometry normalization truly disentangles emotional state from articulation or introduces artifacts.
Authors: The abstract summarizes the claims at a high level. The full manuscript presents qualitative results across multiple backbones showing fidelity preservation and controllable transfer, along with the dual-path design rationale. However, we acknowledge that additional quantitative support would allow a more rigorous assessment of disentanglement. In the revision we will expand the evaluation section with quantitative metrics (e.g., reconstruction error, reenactment fidelity scores), ablation studies on each modulation path, baseline comparisons, and explicit details on losses, training procedure, and modulation equations. revision: yes
Circularity Check
No significant circularity; extends external backbones with additive modulation
full rationale
The paper's core derivation introduces a dual-path modulation (geometry normalization in parametric space plus appearance modulation) into existing feed-forward single-image 3D head avatar architectures without altering their core design. Dataset construction via cross-identity emotion transfer is presented as an enabling preprocessing step rather than a fitted or self-derived quantity. No equations, predictions, or uniqueness claims reduce to self-definition, fitted inputs renamed as outputs, or load-bearing self-citations. The framework is explicitly integrated into multiple state-of-the-art external backbones, preserving fidelity while adding controllability, rendering the chain self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Emotion can be treated as an independent control signal separable from identity, speech articulation, and appearance in 3D head models
Reference graph
Works this paper leans on
-
[1]
In: CVPR
Abdal, R., Lee, H.Y., Zhu, P., Chai, M., Siarohin, A., Wonka, P., Tulyakov, S.: 3davatargan: Bridging domains for personalized editable avatars. In: CVPR. pp. 4552--4562 (June 2023)
2023
-
[2]
In: CVPR
Abdal, R., Yifan, W., Shi, Z., Xu, Y., Po, R., Kuang, Z., Chen, Q., Yeung, D.Y., Wetzstein, G.: Gaussian shell maps for efficient 3d human generation. In: CVPR. pp. 9441--9451 (June 2024)
2024
-
[3]
In: ICCV (2025)
Aneja, S., Sevastopolsky, A., Kirschstein, T., Thies, J., Dai, A., Nie ner, M.: Gaussianspeech: Audio-driven gaussian avatars. In: ICCV (2025)
2025
-
[4]
In: WACV
Bhattarai, A.R., Nie ner, M., Sevastopolsky, A.: Triplanenet: An encoder for eg3d inversion. In: WACV. pp. 3055--3065 (2024)
2024
-
[5]
In: SIGGRAPH
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: SIGGRAPH. pp. 187--194. ACM Press (1999)
1999
-
[6]
In: CVPR (2021)
Buehler, M.C., Meka, A., Li, G., Beeler, T., Hilliges, O.: Varitex: Variational neural face textures. In: CVPR (2021)
2021
-
[7]
In: ICCV
B \"u hler, M.C., Sarkar, K., Shah, T., Li, G., Wang, D., Helminger, L., Orts-Escolano, S., Lagun, D., Hilliges, O., Beeler, T., et al.: Preface: A data-driven volumetric prior for few-shot ultra high-resolution face synthesis. In: ICCV. pp. 3402--3413 (2023)
2023
-
[8]
ACM TOG 41(4) (Jul 2022)
Cao, C., Simon, T., Kim, J.K., Schwartz, G., Zollhoefer, M., Saito, S.S., Lombardi, S., Wei, S.E., Belko, D., Yu, S.I., Sheikh, Y., Saragih, J.: Authentic volumetric avatars from a phone scan. ACM TOG 41(4) (Jul 2022)
2022
-
[9]
IEEE TVCG 20(3), 413--425 (2014)
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: A 3d facial expression database for visual computing. IEEE TVCG 20(3), 413--425 (2014)
2014
-
[10]
In: CVPR (2021)
Chan, E., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In: CVPR (2021)
2021
-
[11]
In: CVPR (2022)
Chan, E.R., Lin, C.Z., Chan, M.A., Nagano, K., Pan, B., Mello, S.D., Gallo, O., Guibas, L., Tremblay, J., Khamis, S., Karras, T., Wetzstein, G.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
2022
-
[12]
In: SIGGRAPH Conference Papers
Chen, Y., Wang, L., Li, Q., Xiao, H., Zhang, S., Yao, H., Liu, Y.: Monogaussianavatar: Monocular gaussian point-based head avatar. In: SIGGRAPH Conference Papers. pp. 1--9 (2024)
2024
-
[13]
In: NeurIPS
Chu, X., Harada, T.: Generalizable and animatable gaussian head avatar. In: NeurIPS. vol. 37, pp. 57642--57670 (2024)
2024
-
[14]
In: ICLR (2024)
Chu, X., Li, Y., Zeng, A., Yang, T., Lin, L., Liu, Y., Harada, T.: Gpavatar: Generalizable and precise head avatar from image (s). In: ICLR (2024)
2024
-
[15]
In: SIGGRAPH Asia Conference Papers
Cui, J., Chen, Y., Xu, M., Shang, H., Chen, Y., Zhan, Y., Dong, Z., Yao, Y., Wang, J., Zhu, S.: Hallo4: High-fidelity dynamic portrait animation via direct preference optimization and temporal motion modulation. In: SIGGRAPH Asia Conference Papers. ACM (2025)
2025
-
[16]
In: ICLR (2025)
Cui, J., Li, H., Yao, Y., Zhu, H., Shang, H., Cheng, K., Zhou, H., Zhu, S., Wang, J.: Hallo2: Long-duration and high-resolution audio-driven portrait image animation. In: ICLR (2025)
2025
-
[17]
In: CVPR (2025)
Cui, J., Li, H., Zhan, Y., Shang, H., Cheng, K., Ma, Y., Mu, S., Zhou, H., Wang, J., Zhu, S.: Hallo3: Highly dynamic and realistic portrait image animation with video diffusion transformer. In: CVPR (2025)
2025
-
[18]
In: CVPR
Deng, Y., Wang, D., Ren, X., Chen, X., Wang, B.: Portrait4d: Learning one-shot 4d head avatar synthesis using synthetic data. In: CVPR. pp. 7119--7130 (2024)
2024
-
[19]
In: ECCV (2024)
Deng, Y., Wang, D., Wang, B.: Portrait4d-v2: Pseudo multi-view data creates better 4d head synthesizer. In: ECCV (2024)
2024
-
[20]
In: ECCV (2024)
Dhamo, H., Nie, Y., Moreau, A., Song, J., Shaw, R., Zhou, Y., Pérez-Pellitero, E.: Headgas: Real-time animatable head avatars via 3d gaussian splatting. In: ECCV (2024)
2024
-
[21]
In: CVPR (2022)
Fan, Y., Lin, Z., Saito, J., Wang, W., Komura, T.: Faceformer: Speech-driven 3d facial animation with transformers. In: CVPR (2022)
2022
-
[22]
In: SIGGRAPH Asia Conference Papers (2024)
Gao, X., Xiao, H., Zhong, C., Hu, S., Guo, Y., Zhang, J.: Portrait video editing empowered by multimodal generative priors. In: SIGGRAPH Asia Conference Papers (2024)
2024
-
[23]
ACM TOG 41(6) (2022)
Gao, X., Zhong, C., Xiang, J., Hong, Y., Guo, Y., Zhang, J.: Reconstructing personalized semantic facial nerf models from monocular video. ACM TOG 41(6) (2022)
2022
-
[24]
In: SIGGRAPH Asia Conference Papers (2025)
Gao, X., Zhou, J., Liu, D., Zhou, Y., Zhang, J.: Constructing diffusion avatar with learnable embeddings. In: SIGGRAPH Asia Conference Papers (2025)
2025
-
[25]
In: SIGGRAPH Asia Conference Papers (2024)
Giebenhain, S., Kirschstein, T., R \" u nz, M., Agapito, L., Nie ner, M.: Npga: Neural parametric gaussian avatars. In: SIGGRAPH Asia Conference Papers (2024)
2024
-
[26]
In: CVPR (2025)
Gu, Y., Tran, P., Zheng, Y., Xu, H., Li, H., Karmanov, A., Li, H.: Diffportrait360: Consistent portrait diffusion for 360 view synthesis. In: CVPR (2025)
2025
-
[27]
In: ICMI (2023)
Haque, K.I., Yumak, Z.: Facexhubert: Text-less speech-driven e (x) pressive 3d facial animation synthesis using self-supervised speech representation learning. In: ICMI (2023)
2023
-
[28]
ACM TOG 44(4), 1--12 (2025)
He, C., Li, J., Kirschstein, T., Sevastopolsky, A., Saito, S., Tan, Q., Romero, J., Cao, C., Rushmeier, H., Nam, G.: 3dgh: 3d head generation with composable hair and face. ACM TOG 44(4), 1--12 (2025)
2025
-
[29]
In: ECCV
He, Q., Ji, X., Gong, Y., Lu, Y., Diao, Z., Huang, L., Yao, Y., Zhu, S., Ma, Z., Xu, S., et al.: Emotalk3d: High-fidelity free-view synthesis of emotional 3d talking head. In: ECCV. Springer (2024)
2024
-
[30]
In: SIGGRAPH Conference Papers
He, Y., Gu, X., Ye, X., Xu, C., Zhao, Z., Dong, Y., Yuan, W., Dong, Z., Bo, L.: Lam: Large avatar model for one-shot animatable gaussian head. In: SIGGRAPH Conference Papers. pp. 1--13 (2025)
2025
-
[31]
In: CVPR
Hong, F.T., Zhang, L., Shen, L., Xu, D.: Depth-aware generative adversarial network for talking head video generation. In: CVPR. pp. 3397--3406 (2022)
2022
-
[32]
In: CVPR
Hong, Y., Peng, B., Xiao, H., Liu, L., Zhang, J.: Headnerf: A real-time nerf-based parametric head model. In: CVPR. pp. 20374--20384 (2022)
2022
-
[33]
In: ICLR (2026)
Ji, X., Weiss, S., Kansy, M., Naruniec, J., Cao, X., Solenthaler, B., Bradley, D.: Fast GHA : Generalized few-shot 3d gaussian head avatars with real-time animation. In: ICLR (2026)
2026
-
[34]
In: SIGGRAPH Conference Papers
Ji, X., Zhou, H., Wang, K., Wu, Q., Wu, W., Xu, F., Cao, X.: Eamm: One-shot emotional talking face via audio-based emotion-aware motion model. In: SIGGRAPH Conference Papers. SIGGRAPH '22 (2022)
2022
-
[35]
In: CVPR
Ji, X., Zhou, H., Wang, K., Wu, W., Loy, C.C., Cao, X., Xu, F.: Audio-driven emotional video portraits. In: CVPR. pp. 15480--15489 (2021)
2021
-
[36]
ACM TOG 42(4), 139--1 (2023)
Kerbl, B., Kopanas, G., Leimk \"u hler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM TOG 42(4), 139--1 (2023)
2023
-
[37]
In: CVPR (2026)
Kirschstein, T., Giebenhain, S., Nie ner, M.: Flexavatar: Learning complete 3d head avatars with partial supervision. In: CVPR (2026)
2026
-
[38]
In: SIGGRAPH Asia Conference Papers
Kirschstein, T., Giebenhain, S., Tang, J., Georgopoulos, M., Nie ner, M.: GGHead: Fast and Generalizable 3D Gaussian Heads . In: SIGGRAPH Asia Conference Papers. SA '24, Association for Computing Machinery, New York, NY, USA (2024)
2024
-
[39]
ACM TOG 42(4) (jul 2023)
Kirschstein, T., Qian, S., Giebenhain, S., Walter, T., Nie ner, M.: Nersemble: Multi-view radiance field reconstruction of human heads. ACM TOG 42(4) (jul 2023)
2023
-
[40]
In: ICCV (2025)
Kirschstein, T., Romero, J., Sevastopolsky, A., Nie ner, M., Saito, S.: Avat3r: Large animatable gaussian reconstruction model for high-fidelity 3d head avatars. In: ICCV (2025)
2025
-
[41]
In: ECCV (2024)
Li, H., Chen, C., Shi, T., Qiu, Y., An, S., Chen, G., Han, X.: Spherehead: Stable 3d full-head synthesis with spherical tri-plane representation. In: ECCV (2024)
2024
-
[42]
In: NeurIPS (2025)
Li, H., Liu, K., Qiu, L., Zuo, Q., Zheng, K., Dong, Z., Han, X.: Hyplanehead: Rethinking tri-plane-like representations in full-head image synthesis. In: NeurIPS (2025)
2025
-
[43]
In: ICLR (2026)
Li, H., Zhang, H., Qiu, Y., Sun, Z., Zheng, K., Qiu, L., Li, P., Zuo, Q., Chen, C., Zheng, Y., et al.: Condition matters in full-head 3d gans. In: ICLR (2026)
2026
-
[44]
In: CVPR
Li, L., Li, Y., Weng, Y., Zheng, Y., Zhou, K.: Rgbavatar: Reduced gaussian blendshapes for online modeling of head avatars. In: CVPR. pp. 10747--10757 (June 2025)
2025
-
[45]
ACM TOG 36(6), 194--1 (2017)
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4d scans. ACM TOG 36(6), 194--1 (2017)
2017
-
[46]
In: CVPR
Li, W., Zhang, L., Wang, D., Zhao, B., Wang, Z., Chen, M., Zhang, B., Wang, Z., Bo, L., Li, X.: One-shot high-fidelity talking-head synthesis with deformable neural radiance field. In: CVPR. pp. 17969--17978 (2023)
2023
-
[47]
In: ECCV
Li, X., Cheng, Y., Ren, X., Jia, H., Xu, D., Zhu, W., Yan, Y.: Topo4d: Topology-preserving gaussian splatting for high-fidelity 4d head capture. In: ECCV. pp. 128--145. Springer (2024)
2024
-
[48]
In: CVPR
Li, X., Wang, J., Cheng, Y., Zeng, Y., Ren, X., Zhu, W., Zhao, W., Yan, Y.: Towards high-fidelity 3d talking avatar with personalized dynamic texture. In: CVPR. pp. 204--214 (2025)
2025
-
[49]
In: NeurIPS
Li, X., De Mello, S., Liu, S., Nagano, K., Iqbal, U., Kautz, J.: Generalizable one-shot 3d neural head avatar. In: NeurIPS. vol. 36 (2024)
2024
-
[50]
In: CVPR (2026)
Li, Z., Pun, C.M., Fang, C., Wang, J., Cun, X.: Personalive! expressive portrait image animation for live streaming. In: CVPR (2026)
2026
-
[51]
In: CVPR (2026)
Liu, C., Jing, T., Ma, C., Zhou, X., Lian, Z., Jin, Q., Yuan, H., Huang, S.S.: Emodifftalk: Emotion-aware diffusion for editable 3d gaussian talking head. In: CVPR (2026)
2026
-
[52]
In: CVPR (2025)
Liu, H., Wang, X., Wan, Z., Ma, Y., Chen, J., Fan, Y., Shen, Y., Song, Y., Chen, Q.: Avatarartist: Open-domain 4d avatarization. In: CVPR (2025)
2025
-
[53]
In: SIGGRAPH Conference Papers
Liu, H., Wang, X., Wan, Z., Shen, Y., Song, Y., Liao, J., Chen, Q.: Headartist: Text-conditioned 3d head generation with self score distillation. In: SIGGRAPH Conference Papers. SIGGRAPH '24, Association for Computing Machinery, New York, NY, USA (2024)
2024
-
[54]
Ma, C., Tan, S., Pan, Y., Yang, J., Tong, X.: Esgaussianface: Emotional and stylized audio-driven facial animation via 3d gaussian splatting. TVCG pp. 1--12 (2026)
2026
-
[55]
In: CVPR
Ma, Z., Zhu, X., Qi, G.J., Lei, Z., Zhang, L.: Otavatar: One-shot talking face avatar with controllable tri-plane rendering. In: CVPR. pp. 16901--16910 (2023)
2023
-
[56]
In: NeurIPS Track on Datasets and Benchmarks (2024)
Martinez, J., Kim, E., Romero, J., Bagautdinov, T., Saito, S., Yu, S.I., Anderson, S., Zollhöfer, M., Team, C.A.: Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and Generalizable Avatars . In: NeurIPS Track on Datasets and Benchmarks (2024)
2024
-
[57]
In: ECCV (2020)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
2020
-
[58]
arXiv preprint arXiv:2312.06400 (2023)
Mir, A., Alonso, E., Mondragón, E.: Dit-head: High-resolution talking head synthesis using diffusion transformers. arXiv preprint arXiv:2312.06400 (2023)
-
[59]
in-the-wild
Paraperas Papantoniou, F., Filntisis, P.P., Maragos, P., Roussos, A.: Neural emotion director: Speech-preserving semantic control of facial expressions in "in-the-wild" videos. In: CVPR (2022)
2022
-
[60]
In: ACM MM
Prajwal, K.R., Mukhopadhyay, R., Namboodiri, V.P., Jawahar, C.: A lip sync expert is all you need for speech to lip generation in the wild. In: ACM MM. p. 484–492. MM '20, Association for Computing Machinery, New York, NY, USA (2020)
2020
-
[61]
In: CVPR
Qian, S., Kirschstein, T., Schoneveld, L., Davoli, D., Giebenhain, S., Nie ner, M.: Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. In: CVPR. pp. 20299--20309 (2024)
2024
-
[62]
In: ICCV
Richard, A., Zollh\"ofer, M., Wen, Y., de la Torre, F., Sheikh, Y.: Meshtalk: 3d face animation from speech using cross-modality disentanglement. In: ICCV. pp. 1173--1182 (October 2021)
2021
-
[63]
In: ACCV
Shen, X., Khan, F.F., Elhoseiny, M.: Emotalker: Audio driven emotion aware talking head generation. In: ACCV. pp. 1900--1917 (December 2024)
1900
-
[64]
In: SIGGRAPH Conference Papers
Song, L., Zhou, Y., Xu, Z., Zhou, Y., Aneja, D., Xu, C.: Streamme: Simplify 3d gaussian avatar within live stream. In: SIGGRAPH Conference Papers. SIGGRAPH '25, Association for Computing Machinery, New York, NY, USA (2025)
2025
-
[65]
In: WACV
Stypu kowski, M., Vougioukas, K., He, S., Zi e ba, M., Petridis, S., Pantic, M.: Diffused heads: Diffusion models beat gans on talking-face generation. In: WACV. pp. 5091--5100 (2024)
2024
-
[66]
In: CVPR (2023)
Sun, J., Wang, X., Wang, L., Li, X., Zhang, Y., Zhang, H., Liu, Y.: Next3d: Generative neural texture rasterization for 3d-aware head avatars. In: CVPR (2023)
2023
-
[67]
ACM TOG 43(4) (2024)
Sun, Z., Lv, T., Ye, S., Lin, M., Sheng, J., Wen, Y.H., Yu, M., Liu, Y.J.: Diffposetalk: Speech-driven stylistic 3d facial animation and head pose generation via diffusion models. ACM TOG 43(4) (2024)
2024
-
[68]
In: ECCV
Tan, S., Ji, B., Bi, M., Pan, Y.: Edtalk: Efficient disentanglement for emotional talking head synthesis. In: ECCV. pp. 398--416. Springer (2025)
2025
-
[69]
In: ICCV
Tan, S., Ji, B., Pan, Y.: Emmn: Emotional motion memory network for audio-driven emotional talking face generation. In: ICCV. pp. 22146--22156 (2023)
2023
-
[70]
In: SIGGRAPH Asia Conference Papers
Taubner, F., Zhang, R., Tuli, M., Bahmani, S., Lindell, D.B.: MVP4D : Multi-view portrait video diffusion for animatable 4D avatars. In: SIGGRAPH Asia Conference Papers. ACM (2025)
2025
-
[71]
In: CVPR
Taubner, F., Zhang, R., Tuli, M., Lindell, D.B.: CAP4D : Creating animatable 4D portrait avatars with morphable multi-view diffusion models. In: CVPR. pp. 5318--5330 (June 2025)
2025
-
[72]
In: SIGGRAPH 2022 Conference Papers
Wang, D., Chandran, P., Zoss, G., Bradley, D., Gotardo, P.: Morf: Morphable radiance fields for multiview neural head modeling. In: SIGGRAPH 2022 Conference Papers. SIGGRAPH '22, Association for Computing Machinery, New York, NY, USA (2022)
2022
-
[73]
In: CVPR
Wang, D., Deng, Y., Yin, Z., Shum, H.Y., Wang, B.: Progressive disentangled representation learning for fine-grained controllable talking head synthesis. In: CVPR. pp. 17979--17989 (2023)
2023
-
[74]
TVCG (2025)
Wang, J., Xie, J.C., Li, X., Xu, F., Pun, C.M., Gao, H.: Gaussianhead: High-fidelity head avatars with learnable gaussian derivation. TVCG (2025)
2025
-
[75]
In: ECCV
Wang, K., Wu, Q., Song, L., Yang, Z., Wu, W., Qian, C., He, R., Qiao, Y., Loy, C.C.: Mead: A large-scale audio-visual dataset for emotional talking-face generation. In: ECCV. Springer (2020)
2020
-
[76]
In: CVPR (June 2022)
Wang, L., Chen, Z., Yu, T., Ma, C., Li, L., Liu, Y.: Faceverse: a fine-grained and detail-controllable 3d face morphable model from a hybrid dataset. In: CVPR (June 2022)
2022
-
[77]
In: CVPR
Wang, T.C., Mallya, A., Liu, M.Y.: One-shot free-view neural talking-head synthesis for video conferencing. In: CVPR. pp. 10039--10049 (2021)
2021
-
[78]
Aniportrait: Audio-driven synthesis of photorealistic portrait animation
Wei, H., Yang, Z., Wang, Z.: Aniportrait: Audio-driven synthesis of photorealistic portrait animations. arXiv:2403.17694 (2024)
-
[79]
In: ICLR (2025)
Wu, T.W., Yang, J., Guo, Z., Wan, J., Zhong, F., Oztireli, C.: Gaussian head & shoulders: High fidelity neural upper body avatars with anchor gaussian guided texture warping. In: ICLR (2025)
2025
-
[80]
In: CVPR (2026)
Wu, Z., Zhou, B., Hu, L., Liu, H., Sun, Y., Wang, X., Cao, X., Shen, Y., Zhu, H.: Uika: Fast universal head avatar from pose-free images. In: CVPR (2026)
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.