Recognition: unknown
Robust 4D Visual Geometry Transformer with Uncertainty-Aware Priors
Pith reviewed 2026-05-10 17:03 UTC · model grok-4.3
The pith
A transformer framework with three uncertainty mechanisms disentangles dynamic motion from static structure in 4D scene reconstruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that entropy-guided subspace projection, local-consistency driven geometry purification, and uncertainty-aware cross-view consistency, when combined inside a visual geometry transformer, enable reliable separation of dynamic and static scene components by treating uncertainty as an explicit signal at each processing stage.
What carries the argument
The three synergistic mechanisms of entropy-guided subspace projection for isolating motion cues, local-consistency geometry purification via neighborhood constraints, and uncertainty-aware cross-view consistency formulated as heteroscedastic estimation.
Load-bearing premise
The three proposed mechanisms can reliably disentangle dynamic and static components across diverse real-world sequences without task-specific fine-tuning or per-scene optimization.
What would settle it
A dynamic video sequence where applying the entropy-guided projection, local purification, and uncertainty-weighted refinement produces no measurable drop in mean accuracy error or rise in segmentation F-measure relative to an unmodified baseline transformer.
read the original abstract
Reconstructing dynamic 4D scenes is an important yet challenging task. While 3D foundation models like VGGT excel in static settings, they often struggle with dynamic sequences where motion causes significant geometric ambiguity. To address this, we present a framework designed to disentangle dynamic and static components by modeling uncertainty across different stages of the reconstruction process. Our approach introduces three synergistic mechanisms: (1) Entropy-Guided Subspace Projection, which leverages information-theoretic weighting to adaptively aggregate multi-head attention distributions, effectively isolating dynamic motion cues from semantic noise; (2) Local-Consistency Driven Geometry Purification, which enforces spatial continuity via radius-based neighborhood constraints to eliminate structural outliers; and (3) Uncertainty-Aware Cross-View Consistency, which formulates multi-view projection refinement as a heteroscedastic maximum likelihood estimation problem, utilizing depth confidence as a probabilistic weight. Experiments on dynamic benchmarks show that our approach outperforms current state-of-the-art methods, reducing Mean Accuracy error by 13.43\% and improving segmentation F-measure by 10.49\%. Our framework maintains the efficiency of feed-forward inference and requires no task-specific fine-tuning or per-scene optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Robust 4D Visual Geometry Transformer incorporating uncertainty-aware priors to reconstruct dynamic scenes. It introduces three mechanisms—entropy-guided subspace projection to isolate motion cues via information-theoretic weighting, local-consistency driven geometry purification using radius-based constraints, and uncertainty-aware cross-view consistency formulated as heteroscedastic maximum likelihood estimation with depth confidence weights—to disentangle dynamic and static components. The framework claims to outperform state-of-the-art methods on dynamic benchmarks, reducing Mean Accuracy error by 13.43% and improving segmentation F-measure by 10.49%, while preserving feed-forward inference without task-specific fine-tuning or per-scene optimization.
Significance. If the reported gains are substantiated and the mechanisms generalize, the work would meaningfully extend static 3D foundation models like VGGT to dynamic 4D settings, addressing geometric ambiguity from motion. The combination of information-theoretic and probabilistic uncertainty modeling offers a practical, optimization-free approach with potential impact on video-based reconstruction, robotics, and AR applications.
major comments (2)
- Abstract: The central performance claims (13.43% Mean Accuracy error reduction and 10.49% F-measure gain) are stated without reference to specific dynamic benchmarks, datasets, baseline methods, number of runs, or error bars. This is load-bearing for the claim that the three mechanisms outperform SOTA, as the gains could arise from unstated factors rather than the proposed components.
- Method section (around the descriptions of the three mechanisms): The entropy-guided subspace projection, local geometry purification, and heteroscedastic MLE cross-view consistency are described at a conceptual level only, with no equations, pseudocode, or derivation details showing how they avoid failure modes such as motion under-segmentation or outlier propagation in fast/non-rigid sequences. This undermines verification of the weakest assumption that they reliably disentangle components in a purely feed-forward manner across diverse real-world data.
minor comments (2)
- Clarify the exact definition and computation of 'Mean Accuracy error' and 'segmentation F-measure' in the experimental section, as these terms can vary across papers.
- Ensure VGGT and other acronyms are expanded on first use in the introduction.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve transparency and technical detail.
read point-by-point responses
-
Referee: Abstract: The central performance claims (13.43% Mean Accuracy error reduction and 10.49% F-measure gain) are stated without reference to specific dynamic benchmarks, datasets, baseline methods, number of runs, or error bars. This is load-bearing for the claim that the three mechanisms outperform SOTA, as the gains could arise from unstated factors rather than the proposed components.
Authors: We agree that greater specificity in the abstract would strengthen the presentation. In the revised manuscript we will update the abstract to name the specific dynamic benchmarks and datasets, list the primary baseline methods, and indicate that results are averaged over multiple runs with error bars or standard deviations reported in the main experiments section. revision: yes
-
Referee: Method section (around the descriptions of the three mechanisms): The entropy-guided subspace projection, local geometry purification, and heteroscedastic MLE cross-view consistency are described at a conceptual level only, with no equations, pseudocode, or derivation details showing how they avoid failure modes such as motion under-segmentation or outlier propagation in fast/non-rigid sequences. This undermines verification of the weakest assumption that they reliably disentangle components in a purely feed-forward manner across diverse real-world data.
Authors: We accept that the current Method section presents the mechanisms at a high level. In the revision we will add the explicit mathematical formulations (including the entropy weighting, radius-based neighborhood constraints, and heteroscedastic MLE objective), provide pseudocode for the end-to-end pipeline, and include targeted discussion plus ablation evidence showing how each component reduces motion under-segmentation and outlier propagation on fast or non-rigid sequences. revision: yes
Circularity Check
No significant circularity; mechanisms and gains are independently proposed and experimentally validated
full rationale
The paper introduces three explicit mechanisms (entropy-guided subspace projection using information-theoretic weighting, radius-based local geometry purification, and heteroscedastic MLE for uncertainty-aware cross-view consistency) as novel ways to disentangle dynamic/static components in 4D reconstruction. These are not defined in terms of each other or the target performance metrics; they are described as feed-forward operations drawing on standard probabilistic and geometric concepts. The reported gains (13.43% Mean Accuracy reduction, 10.49% F-measure improvement) are presented as outcomes of benchmark experiments rather than quantities fitted or renamed from the same inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing justifications for the central claims. The derivation remains self-contained with external experimental falsifiability.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Are we ready for autonomous driving? the kitti vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012
2012
-
[2]
Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time
Richard A Newcombe, Dieter Fox, and Steven M Seitz. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 343–352, 2015
2015
-
[3]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023
2023
-
[4]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5294–5306, 2025
2025
-
[5]
Robust consistent video depth estimation
Johannes Kopf, Xuejian Rong, and Jia-Bin Huang. Robust consistent video depth estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1611–1621, 2021
2021
-
[6]
Easi3r: Estimating disentangled motion from dust3r without training
Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, and Anpei Chen. Easi3r: Estimating disentangled motion from dust3r without training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9158–9168, 2025
2025
-
[7]
Map-free visual relocalization: Metric pose relative to a single image
Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Aron Monszpart, Victor Prisacariu, Daniyar Turmukhambetov, and Eric Brachmann. Map-free visual relocalization: Metric pose relative to a single image. In European Conference on Computer Vision, pages 690–708. Springer, 2022
2022
-
[8]
Llafs: When large language models meet few-shot segmentation
Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, and Jun Liu. Llafs: When large language models meet few-shot segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3065–3075, 2024
2024
-
[9]
Deepmvs: Learning multi-view stereopsis
Po-Han Huang, Kevin Matzen, Johannes Kopf, Narendra Ahuja, and Jia-Bin Huang. Deepmvs: Learning multi-view stereopsis. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2821–2830, 2018
2018
-
[10]
Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling
Qi Zhu, Jiangwei Lao, Deyi Ji, Junwei Luo, Kang Wu, Yingying Zhang, Lixiang Ru, Jian Wang, Jingdong Chen, Ming Yang, et al. Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
2025
-
[11]
Ibd: Alleviating hallucinations in large vision-language models via image-biased decoding
Lanyun Zhu, Deyi Ji, Tianrun Chen, Peng Xu, Jieping Ye, and Jun Liu. Ibd: Alleviating hallucinations in large vision-language models via image-biased decoding. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2024
-
[12]
Structural and statistical texture knowledge distillation and learning for segmentation.IEEE Transactionson PatternAnalysis and MachineIntelligence, 47(5):3639–3656, 2025
Deyi Ji, Feng Zhao, Hongtao Lu, Feng Wu, and Jieping Ye. Structural and statistical texture knowledge distillation and learning for segmentation.IEEE Transactionson PatternAnalysis and MachineIntelligence, 47(5):3639–3656, 2025
2025
-
[13]
Discrete latent perspective learning for segmentation and detection
Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei Jin, Hongtao Lu, and Jieping Ye. Discrete latent perspective learning for segmentation and detection. InInternational Conference on Machine Learning, pages 21719–21730, 2024
2024
-
[14]
Not every patch is needed: Towards a more efficient and effective backbone for video-based person re-identification.IEEE Transactionson Image Processing, 2025
Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, and Jun Liu. Not every patch is needed: Towards a more efficient and effective backbone for video-based person re-identification.IEEE Transactionson Image Processing, 2025
2025
-
[15]
You Shen, Zhipeng Zhang, Yansong Qu, Xiawu Zheng, Jiayi Ji, Shengchuan Zhang, and Liujuan Cao. Fastvggt: Training-free acceleration of visual geometry transformer.arXiv preprint arXiv:2509.02560, 2025
-
[16]
Structural and statistical texture knowledge distillation for semantic segmentation
Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, and Hongtao Lu. Structural and statistical texture knowledge distillation for semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16876–16885, 2022
2022
-
[17]
Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021. 10
2021
-
[18]
Pptformer: Pseudo multi-perspective transformer for uav segmentation
Deyi Ji, Wenwei Jin, Hongtao Lu, and Feng Zhao. Pptformer: Pseudo multi-perspective transformer for uav segmentation. International Joint Conference on Artificial Intelligence, pages 893–901, 2024
2024
-
[19]
Megadepth: Learning single-view depth prediction from internet photos
Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018
2041
-
[20]
Replay master: Automatic sample selection and effective memory utilization for continual semantic segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, De Wen Soh, and Jun Liu. Replay master: Automatic sample selection and effective memory utilization for continual semantic segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[21]
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chun- hua Shen, and Tong He. pi3: Permutation-equivariant visual geometry learning.arXiv preprint arXiv:2507.13347, 2025
work page internal anchor Pith review arXiv 2025
-
[22]
Llafs++: Few-shot image segmentation with large language models.IEEE Transactionson Pattern Analysis and Machine Intelligence, 2025
Lanyun Zhu, Tianrun Chen, Deyi Ji, Peng Xu, Jieping Ye, and Jun Liu. Llafs++: Few-shot image segmentation with large language models.IEEE Transactionson Pattern Analysis and Machine Intelligence, 2025
2025
-
[23]
Context-aware graph convolution network for target re-identification
Deyi Ji, Haoran Wang, Hanzhe Hu, Weihao Gan, Wei Wu, and Junjie Yan. Context-aware graph convolution network for target re-identification. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1646–1654, 2021
2021
-
[24]
CPCF: A cross-prompt contrastive framework for referring multimodal large language models
Lanyun Zhu, Deyi Ji, Tianrun Chen, Haiyang Wu, De Wen Soh, and Jun Liu. CPCF: A cross-prompt contrastive framework for referring multimodal large language models. InForty-secondInternational Conference on Machine Learning, 2025
2025
-
[25]
View-centric multi-object tracking with homographic matching in moving uav.IEEE Transactionson Geoscience and Remote Sensing, 2026
Deyi Ji, Lanyun Zhu, Siqi Gao, Qi Zhu, Yiru Zhao, Peng Xu, Yue Ding, Hongtao Lu, Jieping Ye, Feng Wu, et al. View-centric multi-object tracking with homographic matching in moving uav.IEEE Transactionson Geoscience and Remote Sensing, 2026
2026
-
[26]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20697–20709, 2024
2024
-
[27]
MASt3R: Grounding Image Matching in 3D with Multi-View Strengths and Relations
Victor Leroy, D Ceylan, David Novotny, Andrea Vedaldi, and Christian Rupprecht. MASt3R: Grounding Image Matching in 3D with Multi-View Strengths and Relations. InAdvancesin Neural Information Processing Systems (NeurIPS), 2024
2024
-
[28]
Stream3r: Scalable sequential 3d reconstruction with causal transformer
LAN Yushi, Yihang Luo, Fangzhou Hong, Shangchen Zhou, Honghua Chen, Zhaoyang Lyu, Bo Dai, Shuai Yang, Chen Change Loy, and Xingang Pan. Stream3r: Scalable sequential 3d reconstruction with causal transformer. In The FourteenthInternational Conference on Learning Representations, 2026
2026
-
[29]
Ultra-high resolution segmentation with ultra-rich context: A novel benchmark
Deyi Ji, Feng Zhao, Hongtao Lu, Mingyuan Tao, and Jieping Ye. Ultra-high resolution segmentation with ultra-rich context: A novel benchmark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23621–23630, 2023
2023
-
[30]
Volumede- form: Real-time volumetric non-rigid reconstruction
Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. Volumede- form: Real-time volumetric non-rigid reconstruction. InEuropean conference on computer vision, pages 362–379. Springer, 2016
2016
-
[31]
Learning statistical texture for semantic segmentation
Lanyu Zhu, Deyi Ji, Shiping Zhu, Weihao Gan, Wei Wu, and Junjie Yan. Learning statistical texture for semantic segmentation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
2021
-
[32]
Pixelwise view selection for unstructured multi-view stereo
Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InEuropean conference on computer vision, pages 501–518. Springer, 2016
2016
-
[33]
Popen: Preference-based optimization and ensemble for lvlm-based reasoning segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu, Xuanyi Liu, Deyi Ji, Haiyang Wu, De Wen Soh, and Jun Liu. Popen: Preference-based optimization and ensemble for lvlm-based reasoning segmentation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
2025
-
[34]
Retrv-r1: A reasoning-driven mllm framework for universal and efficient multimodal retrieval.Neural Information Processing Systems (NeurIPS), 2025
Lanyun Zhu, Deyi Ji, Tianrun Chen, Haiyang Wu, and Shiqi Wang. Retrv-r1: A reasoning-driven mllm framework for universal and efficient multimodal retrieval.Neural Information Processing Systems (NeurIPS), 2025
2025
-
[35]
Deyi Ji, Feng Zhao, and Hongtao Lu. Guided patch-grouping wavelet transformer with spatial congruence for ultra-high resolution segmentation.International Joint Conference on Artificial Intelligence, pages 920–928, 2023. 11
2023
-
[36]
SpatialTrackerV2: 3D point tracking made easy.arXiv preprint arXiv:2507.12462, 2025
Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, and Xiaowei Zhou. Spatialtrackerv2: 3d point tracking made easy.arXiv preprint arXiv:2507.12462, 2025
-
[37]
Megasam: Accurate, fast and robust structure and motion from casual dynamic videos
Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Holynski, and Noah Snavely. Megasam: Accurate, fast and robust structure and motion from casual dynamic videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10486–10496, 2025
2025
-
[38]
Paper copilot: A personalized research assistant.arXiv preprint arXiv:2403.12345, 2024
QZhangetal. Monst3r: Amonocularandsemanticpipelinefor3dreconstruction. arXivpreprintarXiv:2403.12345, 2024
-
[39]
Das3r: Dynamics-aware gaussian splatting for static scene reconstruction
Kai Xu, Tze Ho Elden Tse, Jizong Peng, and Angela Yao. Das3r: Dynamics-aware gaussian splatting for static scene reconstruction. arXiv preprint arXiv:2412.19584, 2024
-
[40]
Y Wang et al. Cut3r: A contrastive and unifying training framework for 3d reconstruction.arXiv preprint arXiv:2503.67890, 2025
-
[41]
Page-4d: Disentangled pose and geometry estimation for 4d perception.arXiv e-prints, pages arXiv–2510, 2025
Kaichen Zhou, Yuhan Wang, Grace Chen, Xinhai Chang, Gaspard Beaudouin, Fangneng Zhan, Paul Pu Liang, and Mengyu Wang. Page-4d: Disentangled pose and geometry estimation for 4d perception.arXiv e-prints, pages arXiv–2510, 2025
2025
-
[42]
Uncertainty guided multi-view stereo network for depth estimation
Wanjuan Su, Qingshan Xu, and Wenbing Tao. Uncertainty guided multi-view stereo network for depth estimation. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7796–7808, 2022
2022
-
[43]
Multi-view 3d object reconstruction and uncertainty modelling with neural shape prior
Ziwei Liao and Steven L Waslander. Multi-view 3d object reconstruction and uncertainty modelling with neural shape prior. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3098–3107, 2024
2024
-
[44]
Geomvsnet: Learning multi-view stereo with geometry perception
Zhe Zhang, Rui Peng, Yuxi Hu, and Ronggang Wang. Geomvsnet: Learning multi-view stereo with geometry perception. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21508–21518, 2023
2023
-
[45]
Learning multi-view stereo with geometry-aware prior.IEEE Transactionson Circuits and Systems for Video Technology, 2025
Kehua Chen, Zhenlong Yuan, Haihong Xiao, Tianlu Mao, and Zhaoqi Wang. Learning multi-view stereo with geometry-aware prior.IEEE Transactionson Circuits and Systems for Video Technology, 2025
2025
-
[46]
Uncertainty- aware vision-based metric cross-view geolocalization
Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, and Rainer Stiefelhagen. Uncertainty- aware vision-based metric cross-view geolocalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21621–21631, 2023
2023
-
[47]
What uncertainties do we need in bayesian deep learning for computer vision? Advancesin neural information processing systems, 30, 2017
Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? Advancesin neural information processing systems, 30, 2017
2017
-
[48]
Estimating the mean and variance of the target probability distribution
David A Nix and Andreas S Weigend. Estimating the mean and variance of the target probability distribution. In Proceedings of 1994 ieee international conference on neural networks (ICNN’94), volume 1, pages 55–60. IEEE, 1994
1994
-
[49]
Yu Hu, Chong Cheng, Sicheng Yu, Xiaoyang Guo, and Hao Wang. Vggt4d: Mining motion cues in visual geometry transformers for 4d scene reconstruction.arXiv preprint arXiv:2511.19971, 2025
-
[50]
A benchmark dataset and evaluation methodology for video object segmentation
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine- Hornung. A benchmark dataset and evaluation methodology for video object segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732, 2016
2016
-
[51]
Monocular dynamic view synthesis: A reality check.Advancesin Neural Information Processing Systems, 35:33768–33780, 2022
Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, and Angjoo Kanazawa. Monocular dynamic view synthesis: A reality check.Advancesin Neural Information Processing Systems, 35:33768–33780, 2022. 12
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.