Recognition: unknown
Face Anything: 4D Face Reconstruction from Any Image Sequence
Pith reviewed 2026-05-10 02:35 UTC · model grok-4.3
The pith
Canonical facial point prediction unifies depth estimation, dense 3D geometry, and point tracking for 4D face reconstruction from single-view sequences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The method formulates high-fidelity 4D facial reconstruction as canonical facial point prediction: each pixel receives a normalized facial coordinate in a shared canonical space. A transformer jointly predicts these coordinates and per-pixel depth after training on multi-view geometry data that has been non-rigidly warped into the canonical space. This single feed-forward architecture yields accurate depth, temporally stable dense 3D geometry, and robust facial point tracking on arbitrary image sequences.
What carries the argument
Canonical facial point prediction: a representation that assigns each pixel a normalized facial coordinate in a shared canonical space, converting dense tracking and dynamic reconstruction into a canonical reconstruction problem.
If this is right
- Accurate depth estimation from single-view image sequences
- Temporally stable reconstruction of dynamic 3D facial geometry
- Dense 3D output together with robust facial point tracking
- Approximately 3 times lower correspondence error and 16 percent better depth accuracy than prior dynamic reconstruction methods
- Faster inference in a single feed-forward pass without post-processing
Where Pith is reading between the lines
- The feed-forward design could support real-time video pipelines where separate optimization stages are impractical.
- Enforcing consistency through a canonical space may reduce drift over long sequences compared with frame-by-frame methods.
- Similar coordinate-based representations might transfer to reconstruction of other non-rigid surfaces once appropriate canonical spaces are defined.
Load-bearing premise
Multi-view geometry data can be reliably non-rigidly warped into a shared canonical space so that a model trained on it will generalize to arbitrary single-view image sequences without extra constraints or post-processing.
What would settle it
A test sequence containing rapid expression changes or large viewpoint shifts where the predicted canonical coordinates produce drifting tracks across frames or depth values that deviate measurably from ground-truth multi-view reconstructions.
Figures
read the original abstract
Accurate reconstruction and tracking of dynamic human faces from image sequences is challenging because non-rigid deformations, expression changes, and viewpoint variations occur simultaneously, creating significant ambiguity in geometry and correspondence estimation. We present a unified method for high-fidelity 4D facial reconstruction based on canonical facial point prediction, a representation that assigns each pixel a normalized facial coordinate in a shared canonical space. This formulation transforms dense tracking and dynamic reconstruction into a canonical reconstruction problem, enabling temporally consistent geometry and reliable correspondences within a single feed-forward model. By jointly predicting depth and canonical coordinates, our method enables accurate depth estimation, temporally stable reconstruction, dense 3D geometry, and robust facial point tracking within a single architecture. We implement this formulation using a transformer-based model that jointly predicts depth and canonical facial coordinates, trained using multi-view geometry data that non-rigidly warps into the canonical space. Extensive experiments on image and video benchmarks demonstrate state-of-the-art performance across reconstruction and tracking tasks, achieving approximately 3$\times$ lower correspondence error and faster inference than prior dynamic reconstruction methods, while improving depth accuracy by 16%. These results highlight canonical facial point prediction as an effective foundation for unified feed-forward 4D facial reconstruction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a unified feed-forward method for 4D facial reconstruction from any image sequence using canonical facial point prediction. By assigning each pixel a normalized coordinate in a shared canonical space and jointly predicting depth, the approach converts dense tracking and dynamic reconstruction into a canonical problem. A transformer model is trained on multi-view geometry data non-rigidly warped into this canonical space, claiming state-of-the-art performance with approximately 3 times lower correspondence error, 16% improved depth accuracy, and faster inference compared to prior methods.
Significance. If validated, this work offers a significant advancement in dynamic face reconstruction by providing a single architecture for accurate depth estimation, temporally stable geometry, dense 3D output, and robust point tracking without post-processing. The canonical coordinate representation is a strength for handling non-rigid deformations and viewpoint variations. Credit is due for the joint prediction formulation and the emphasis on feed-forward efficiency.
major comments (2)
- [Method (training procedure)] The non-rigid warping of multi-view data into canonical space is central to generating training labels (described in the method section), yet no quantitative validation of the warping accuracy, residual alignment errors, or sensitivity to expression changes and occlusions is provided. Given that the model is strictly feed-forward at inference on monocular sequences, any supervision noise from imperfect warping directly impacts the claimed generalization and the reported 3× correspondence improvement.
- [Experiments] The abstract and results section report benchmark improvements (3× correspondence error reduction, 16% depth gain) but omit details on error bars, exact baseline implementations, data splits, ablation studies, or statistical significance tests. This absence undermines the ability to assess the robustness of the SOTA claims and the temporal stability assertions.
minor comments (2)
- [Abstract] The phrasing 'Face Anything' in the title and 'any image sequence' could be clarified to specify the assumptions on input quality or face visibility.
- [Notation] The definition of canonical facial coordinates should include an explicit equation or diagram showing how normalization is performed across different expressions and views.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below with clarifications and commit to revisions that strengthen the manuscript without misrepresenting the original contributions.
read point-by-point responses
-
Referee: [Method (training procedure)] The non-rigid warping of multi-view data into canonical space is central to generating training labels (described in the method section), yet no quantitative validation of the warping accuracy, residual alignment errors, or sensitivity to expression changes and occlusions is provided. Given that the model is strictly feed-forward at inference on monocular sequences, any supervision noise from imperfect warping directly impacts the claimed generalization and the reported 3× correspondence improvement.
Authors: We agree that quantitative validation of the non-rigid warping procedure would provide stronger evidence for the quality of the generated training labels. In the revised manuscript, we will add a new subsection (or supplementary material) reporting metrics such as mean residual alignment error on held-out multi-view sequences, before/after warping comparisons, and sensitivity analyses to expression changes and partial occlusions. These additions will directly support the reliability of the supervision and the generalization claims. revision: yes
-
Referee: [Experiments] The abstract and results section report benchmark improvements (3× correspondence error reduction, 16% depth gain) but omit details on error bars, exact baseline implementations, data splits, ablation studies, or statistical significance tests. This absence undermines the ability to assess the robustness of the SOTA claims and the temporal stability assertions.
Authors: We acknowledge that additional experimental details are necessary for full reproducibility and to rigorously substantiate the reported improvements. In the revised version, we will expand the experiments section and supplementary material to include error bars (standard deviations across runs), precise specifications of baseline implementations and data splits, further ablation studies on the joint prediction and canonical representation, and statistical significance tests (e.g., paired t-tests) for the key metrics. These changes will also address the temporal stability claims with supporting quantitative evidence. revision: yes
Circularity Check
No circularity: canonical coordinate prediction is learned from external warped multi-view data
full rationale
The paper defines canonical facial points by non-rigidly warping multi-view geometry into a shared space and trains a transformer to regress depth plus these coordinates from monocular images. This is a standard supervised mapping with no equations that reduce the predicted outputs to the training inputs by construction, no self-citations invoked as uniqueness theorems, and no fitted parameters renamed as predictions. Evaluation occurs on separate benchmarks, so the claimed gains in correspondence and depth accuracy remain independent of the derivation inputs.
Axiom & Free-Parameter Ledger
invented entities (1)
-
canonical facial coordinates
no independent evidence
Reference graph
Works this paper leans on
-
[1]
In: ICCV (2021)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., et al.: Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In: ICCV (2021)
2021
-
[2]
Communications of the ACM18(9), 509–517 (1975)
Bentley, J.L.: Multidimensional Binary Search Trees used for Associative Search- ing. Communications of the ACM18(9), 509–517 (1975)
1975
-
[3]
In: NeurIPS (2023)
Bian, W., Huang, Z., Shi, X., Dong, Y., Li, Y., et al.: Context-PIPs: Persistent Independent Particles Demands Context Features. In: NeurIPS (2023)
2023
-
[4]
In: SIG- GRAPH (1999)
Blanz, V., Vetter, T.: A Morphable Model for the Synthesis of 3D Faces. In: SIG- GRAPH (1999)
1999
-
[5]
In: CVPR (2024)
Charatan,D., Li,S., Tagliasacchi, A.,Sitzmann, V.:pixelSplat:3DGaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. In: CVPR (2024)
2024
-
[6]
In: ECCV (2022)
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: TensoRF: Tensorial Radiance Fields. In: ECCV (2022)
2022
-
[7]
In: CVPR (2025)
Chen, Y., Jiang, J., Jiang, K., Tang, X., Li, Z., et al.: DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds. In: CVPR (2025)
2025
-
[8]
In: ECCV (2024)
Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., et al.: MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. In: ECCV (2024)
2024
-
[9]
ACM Trans
Chen, Y., Wang, L., Li, Q., Xiao, H., Zhang, S., et al.: MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar. ACM Trans. Graph., Proc. SIG- GRAPH (2024)
2024
-
[10]
In: ECCV (2024)
Cho, S., Huang, J., Nam, J., An, H., Kim, S., et al.: Local All-Pair Correspondence for Point Tracking. In: ECCV (2024)
2024
-
[11]
In: CVPR (2022)
Danecek, R., Black, M.J., Bolkart, T.: EMOCA: Emotion Driven Monocular Face Capture and Animation. In: CVPR (2022)
2022
-
[12]
In: CVPRW (2019)
Deng, Y., Yang, J., Xu, S., Chen, D., Jia, Y., et al.: Accurate 3D Face Recon- struction With Weakly-Supervised Learning: From Single Image to Image Set. In: CVPRW (2019)
2019
-
[13]
In: ECCV (2024)
Dhamo, H., Nie, Y., Moreau, A., Song, J., Shaw, R., et al.: HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting. In: ECCV (2024)
2024
-
[14]
Doersch, C., Yang, Y., Vecerik, M., Gokay, D., Gupta, A., et al.: TAPIR: Tracking AnyPointwithper-frameInitializationandTemporalRefinement.In:ICCV(2023)
2023
-
[15]
In: ICCV (2025)
Feng, H., Zhang, J., Wang, Q., Ye, Y., Yu, P., et al.: St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World. In: ICCV (2025)
2025
-
[16]
In: ECCV (2018)
Feng, Y., Wu, F., Shao, X., Wang, Y., Zhou, X.: Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network. In: ECCV (2018)
2018
-
[17]
In: IEEE Conference on Automatic Face and Gesture Recognition
Gerig, T., Morel-Forster, A., Blumer, C., Egger, B., Lüthi, M., et al.: Morphable Face Models - An Open Framework. In: IEEE Conference on Automatic Face and Gesture Recognition. pp. 75–82 (2018)
2018
-
[18]
arXiv preprint arXiv:2505.00615 , year=
Giebenhain, S., Kirschstein, T., Rünz, T., Agapito, L., Nießner, M.: Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction. arXiv preprint arXiv:2505.00615 (2025)
-
[19]
In: CVPR (2022) 16 U
Grassal, P.W., Prinzler, M., Leistner, T., Rother, C., Nießner, M., et al.: Neural Head Avatars from Monocular RGB Videos. In: CVPR (2022) 16 U. Kocasarı et al
2022
-
[20]
In: CVPR (2024)
Huang, Y.H., Sun, Y.T., Yang, Z., Lyu, X., Cao, Y.P., et al.: SC-GS: Sparse- Controlled Gaussian Splatting for Editable Dynamic Scenes. In: CVPR (2024)
2024
-
[21]
ACM Trans
Jiang, L., Mao, Y., Xu, L., Lu, T., Ren, K., et al.: AnySplat: Feed-forward 3d Gaussian Splatting from Unconstrained Views. ACM Trans. Graph.44(6), 1–16 (2025)
2025
-
[22]
In: ICCV (2025)
Jiang, Z., Zheng, C., Laina, I., Larlus, D., Vedaldi, A.: Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction. In: ICCV (2025)
2025
-
[23]
In: ECCV (2024)
Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., et al.: CoTracker: It is Better to Track Together. In: ECCV (2024)
2024
-
[24]
In: 3DV (2026)
Keetha, N., Müller, N., Schönberger, J., Porzi, L., Zhang, Y., et al.: MapAnything: Universal Feed-Forward Metric 3D Reconstruction. In: 3DV (2026)
2026
-
[25]
ACM Trans
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph.42(4) (July 2023)
2023
-
[26]
In: ECCV (2024)
Khirodkar,R.,Bagautdinov,T.,Martinez,J.,Zhaoen,S.,James,A.,etal.:Sapiens: Foundation for Human Vision Models. In: ECCV (2024)
2024
-
[27]
ACM Trans
Kirschstein, T., Qian, S., Giebenhain, S., Walter, T., Nießner, M.: NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. ACM Trans. Graph. 42(4), 1–14 (2023)
2023
-
[28]
In: ECCV (2024)
Leroy, V., Cabon, Y., Revaud, J.: Grounding Image Matching in 3D with MASt3R. In: ECCV (2024)
2024
-
[29]
In: ECCV (2024)
Li, H., Zhang, H., Liu, S., Zeng, Z., Ren, T., et al.: TAPTR: Tracking Any Point with Transformers as Detection. In: ECCV (2024)
2024
-
[30]
ACM Trans
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a Model of Facial Shape and Expression from 4D Scans. ACM Trans. Graph., (Proc. SIGGRAPH Asia)36(6), 194:1–194:17 (2017)
2017
-
[31]
In: CVPR (2024)
Li, Z., Chen, Z., Li, Z., Xu, Y.: Spacetime Gaussian Feature Splatting for Real- Time Dynamic View Synthesis. In: CVPR (2024)
2024
-
[32]
In: CVPR (2026)
Lin, C., Lin, Y., Pan, P., Yu, Y., Hu, T., et al.: MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second. In: CVPR (2026)
2026
-
[33]
Depth Anything 3: Recovering the Visual Space from Any Views
Lin, H., Chen, S., Liew, J.H., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth Anything 3: Recovering the Visual Space from Any Views. arXiv preprint arXiv:2511.10647 (2025)
work page internal anchor Pith review arXiv 2025
-
[34]
In: ICLR (2019)
Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. In: ICLR (2019)
2019
-
[35]
In: CVPR (2025)
Lu, J., Huang, T., Li, P., Dou, Z., Lin, C., et al.: Align3R: Aligned Monocular Depth Estimation for Dynamic Videos. In: CVPR (2025)
2025
-
[36]
MediaPipe: A Framework for Building Perception Pipelines
Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., et al.: MediaPipe: A Framework for Building Perception Pipelines. arXiv preprint arXiv:1906.08172 (2019)
work page internal anchor Pith review arXiv 1906
-
[37]
In: 3DV (2024)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In: 3DV (2024)
2024
-
[38]
In: SIGGRAPH Asia
Mallick, S.S., Goel, R., Kerbl, B., Steinberger, M., Carrasco, F.V., et al.: Taming 3DGS: High-Quality Radiance Fields with Limited Resources. In: SIGGRAPH Asia. Association for Computing Machinery, New York, NY, USA (2024)
2024
-
[39]
In: CVPR (2021)
Martin-Brualla, R., Radwan, N., Sajjadi, M.S.M., Barron, J.T., Dosovitskiy, A., et al.: NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collec- tions. In: CVPR (2021)
2021
-
[40]
NeurIPS Track on Datasets and Benchmarks (2024) Face Anything 17
Martinez, J., Kim, E., Romero, J., Bagautdinov, T., Saito, S., et al.: Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and General- izable Avatars. NeurIPS Track on Datasets and Benchmarks (2024) Face Anything 17
2024
-
[41]
ACM Trans
Meuleman, A., Shah, I., Lanvin, A., Kerbl, B., Drettakis, G.: On-the-fly Recon- struction for Large-Scale Novel View Synthesis from Unposed Images. ACM Trans. Graph.44(4) (2025)
2025
-
[42]
In: ECCV (2020)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., et al.: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In: ECCV (2020)
2020
-
[43]
In: AAAI (2026)
Ming, X., Han, Y., Huang, T., Xu, F.: VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild. In: AAAI (2026)
2026
-
[44]
In: CVPR (2026)
Moreau, A., Shaw, R., Nazarczuk, M., Shin, J., Tanay, T., et al.: Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting. In: CVPR (2026)
2026
-
[45]
ACM Trans
Müller, T., Evans, A., Schied, C., Keller, A.: Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph.41(4), 102:1–102:15 (Jul 2022)
2022
-
[46]
In: CVPR (2022)
Niemeyer, M., Barron, J.T., Mildenhall, B., Sajjadi, M.S.M., Geiger, A., et al.: RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. In: CVPR (2022)
2022
-
[47]
In: ICCV (2021)
Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., et al.: Nerfies: Deformable Neural Radiance Fields. In: ICCV (2021)
2021
-
[48]
ACM Trans
Park, K., Sinha, U., Hedman, P., Barron, J.T., Bouaziz, S., et al.: HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields. ACM Trans. Graph.40(6) (dec 2021)
2021
-
[49]
In: IEEE International Con- ference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments (2009)
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D Face Model for Pose and Illumination Invariant Face Recognition. In: IEEE International Con- ference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments (2009)
2009
-
[50]
In: CVPR (2020)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: Neural Radiance Fields for Dynamic Scenes. In: CVPR (2020)
2020
-
[51]
In: ICCV (2021)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision Transformers for Dense Prediction. In: ICCV (2021)
2021
-
[52]
In: CVPR (2017)
Richardson, E., Sela, M., Or-El, R., Kimmel, R.: Learning Detailed Face Recon- struction from a Single Image. In: CVPR (2017)
2017
-
[53]
In: ICCV (2025)
Saleh, F., Aliakbarian, S., Hewitt, C., Petikam, L., Xiao, X., et al.: David: Data- efficient and Accurate Vision Models from Synthetic Data. In: ICCV (2025)
2025
-
[54]
In: CVPR (2016)
Schönberger, J.L., Frahm, J.M.: Structure-from-Motion Revisited. In: CVPR (2016)
2016
-
[55]
In: ECCV (2016)
Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise View Selection for Unstructured Multi-View Stereo. In: ECCV (2016)
2016
-
[56]
IJCV (2025)
Shaw, R., Jang, Y., Papaioannou, A., Moreau, A., Dhamo, H., et al.: ICo3D: An Interactive Conversational 3D Virtual Human. IJCV (2025)
2025
-
[57]
In: ECCV (2024)
Shaw, R., Song, J., Moreau, A., Nazarczuk, M., Catley-Chandar, S., et al.: SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting. In: ECCV (2024)
2024
-
[58]
V-DPM: 4D video reconstruction with dynamic point maps.arXiv preprint arXiv:2601.09499, 2026
Sucar, E., Insafutdinov, E., Lai, Z., Vedaldi, A.: V-DPM: 4D Video Reconstruction with Dynamic Point Maps. arXiv preprint arXiv:2601.09499 (2025)
-
[59]
In: ICCV (2025)
Sucar, E., Lai, Z., Insafutdinov, E., Vedaldi, A.: Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction. In: ICCV (2025)
2025
-
[60]
In: ICCV (2017)
Tewari,A.K.,Zollhöfer,M.,Kim,H.,Garrido,P.,Bernard,F.,etal.:MoFA:Model- Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Recon- struction. In: ICCV (2017)
2017
-
[61]
In: ICCV (2023) 18 U
Wang, G., Chen, Z., Loy, C.C., Liu, Z.: SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis. In: ICCV (2023) 18 U. Kocasarı et al
2023
-
[62]
In: CVPR (2025)
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., et al.: VGGT: Visual Geometry Grounded Transformer. In: CVPR (2025)
2025
-
[63]
IEEE Trans
Wang, J., Xie, J.C., Li, X., Xu, F., Pun, C.M., et al.: GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation. IEEE Trans. on Visualization and Computer Graphics (2025)
2025
-
[64]
In: CVPR (June 2022)
Wang, L., Chen, Z., Yu, T., Ma, C., Li, L., Liu, Y.: FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset. In: CVPR (June 2022)
2022
-
[65]
In: CVPR (2025)
Wang, Q., Zhang, Y., Holynski, A., Efros, A.A., Kanazawa, A.: Continuous 3D Perception Model with Persistent State. In: CVPR (2025)
2025
-
[66]
In: CVPR (2024)
Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: DUSt3R: Geometric 3D Vision Made Easy. In: CVPR (2024)
2024
-
[67]
In: ICLR (2026)
Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., et al.: Pi3: Permutation- Equivariant Visual Geometry Learning. In: ICLR (2026)
2026
-
[68]
In: CVPR (2024)
Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., et al.: 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering. In: CVPR (2024)
2024
-
[69]
In: CVPR (2026)
Wu, Z., Zhou, B., Hu, L., Liu, H., Sun, Y., et al.: UIKA: Fast Universal Head Avatar from Pose-Free Images. In: CVPR (2026)
2026
-
[70]
In: ICLR (2026)
Wu, Z., Yan, Q., Yi, X., Wang, L., Liao, R.: StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams. In: ICLR (2026)
2026
-
[71]
In: CVPRW (2022)
Xie, L., Wang, X., Zhang, H., Dong, C., Shan, Y.: VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution. In: CVPRW (2022)
2022
-
[72]
In: CVPR (2025)
Xu, H., Peng, S., Wang, F., Blum, H., Barath, D., et al.: DepthSplat: Connecting Gaussian Splatting and Depth. In: CVPR (2025)
2025
-
[73]
In: CVPR (2024)
Xu, Y., Chen, B., Li, Z., Zhang, H., Wang, L., et al.: Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians. In: CVPR (2024)
2024
-
[74]
In: NeurIPS (2025)
Xu, Z., Li, Z., Dong, Z., Zhou, X., Newcombe, R., et al.: 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos. In: NeurIPS (2025)
2025
-
[75]
In: CVPR (2024)
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth Anything: Un- leashing the Power of Large-Scale Unlabeled Data. In: CVPR (2024)
2024
-
[76]
Yang,L.,Kang,B.,Huang,Z.,Zhao,Z.,Xu,X.,Feng,J.,Zhao,H.:DepthAnything V2. arXiv preprint arXiv:2406.09414 (2024)
work page internal anchor Pith review arXiv 2024
-
[77]
In: ICLR (2025)
Ye, B., Liu, S., Xu, H., Xueting, L., Pollefeys, M., et al.: No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images. In: ICLR (2025)
2025
-
[78]
In: ICLR (2025)
Zhang, J., Herrmann, C., Hur, J., Jampani, V., Darrell, T., et al.: MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion. In: ICLR (2025)
2025
-
[79]
Zhao, Z., Bao, Z., Li, Q., Qiu, G., Liu, K.: PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting. arXiv preprint arXiv:2401.12900 (2024)
-
[80]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zheng, Y., Yang, H., Zhang, T., Bao, J., Chen, D., Huang, Y., Yuan, L., Chen, D., Zeng, M., Wen, F.: General facial representation learning in a visual-linguistic manner. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18697–18709 (2022)
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.