pith. machine review for the scientific record. sign in

arxiv: 2604.07882 · v1 · submitted 2026-04-09 · 💻 cs.CV

Recognition: no theorem link

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

Boyuan Wang , Xiaofeng Wang , Yongkang Li , Zheng Zhu , Yifan Chang , Angen Ye , Guosheng Zhao , Chaojun Ni , Guan Huang , Yijie Ren , Yueqi Duan , Xingang Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionphysical attributesGaussian Splattingself-supervised learningmonocular videofuture predictionnon-rigid objectsfeedforward network
0
0 comments X

The pith

A feedforward neural network reconstructs 3D geometry, appearance, and physical attributes of non-rigid objects from a single monocular video.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ReconPhys as a method to recover both visual appearance and underlying physical properties of deformable objects directly from ordinary video input. It replaces slow per-scene optimization with a single trained network that processes the entire sequence at once and outputs a 3D Gaussian Splatting model together with estimates of physical parameters. Because the training uses only synthetic data and a self-supervised loss, the approach removes the need for ground-truth physics labels or manual tuning and produces results fast enough for practical use in simulation pipelines.

Core claim

ReconPhys is the first feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from a single monocular video. It employs a dual-branch architecture trained via a self-supervised strategy that eliminates the need for ground-truth physics labels. Given a video sequence, ReconPhys simultaneously infers geometry, appearance, and physical attributes, achieving higher accuracy in future prediction than optimization baselines while reducing inference time from hours to under one second.

What carries the argument

Dual-branch architecture combining a 3D Gaussian Splatting branch for geometry and appearance with a parallel physical attribute estimation branch, trained end-to-end in a self-supervised manner on synthetic video data.

If this is right

  • Produces simulation-ready 3D assets directly from video without per-scene optimization or manual annotation.
  • Enables rapid inference that supports downstream tasks in robotics and graphics pipelines.
  • Outperforms existing optimization methods in both accuracy of future frame prediction and speed of reconstruction on synthetic test data.
  • Removes the requirement for ground-truth physical labels during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be extended to handle multi-object interactions or partial occlusions if the synthetic training distribution is broadened accordingly.
  • If the inferred physical attributes transfer to real scenes, they could serve as initialization for more precise refinement in hybrid optimization-plus-learning pipelines.
  • The approach opens the possibility of building large-scale datasets of physically annotated 3D assets from consumer video without expensive capture setups.

Load-bearing premise

Self-supervised training on synthetic videos is enough to produce physical attribute estimates that remain accurate when applied to real videos and support reliable future physical predictions.

What would settle it

Apply the trained model to real monocular videos of non-rigid objects whose physical behavior is known or can be measured independently; check whether the predicted future frames or simulated dynamics match the actual observed motion.

Figures

Figures reproduced from arXiv: 2604.07882 by Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Xiaofeng Wang, Xingang Wang, Yifan Chang, Yijie Ren, Yongkang Li, Yueqi Duan, Zheng Zhu.

Figure 1
Figure 1. Figure 1: ReconPhys predicts physical attributes using a feedforward neural network (FFN). This approach [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our framework. Given an input image, a 3DGS predictor reconstructs the 3D object and [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Training pipeline of ReconPhys with self-forcing. The physics predictor estimates physical parameters [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of different methods. Our method produces more stable dynamics and realistic future [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of real-world non-rigid assets. Two objects are shown dropping and deforming over [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: For the same object with different physical attributes, our method produces more accurate [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Our method can be utilized in non-rigid manipulation. Four manipulation scenarios are [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
read the original abstract

Reconstructing non-rigid objects with physical plausibility remains a significant challenge. Existing approaches leverage differentiable rendering for per-scene optimization, recovering geometry and dynamics but requiring expensive tuning or manual annotation, which limits practicality and generalizability. To address this, we propose ReconPhys, the first feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from a single monocular video. Our method employs a dual-branch architecture trained via a self-supervised strategy, eliminating the need for ground-truth physics labels. Given a video sequence, ReconPhys simultaneously infers geometry, appearance, and physical attributes. Experiments on a large-scale synthetic dataset demonstrate superior performance: our method achieves 21.64 PSNR in future prediction compared to 13.27 by state-of-the-art optimization baselines, while reducing Chamfer Distance from 0.349 to 0.004. Crucially, ReconPhys enables fast inference (<1 second) versus hours required by existing methods, facilitating rapid generation of simulation-ready assets for robotics and graphics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes ReconPhys, the first feedforward dual-branch framework to jointly estimate physical attributes and reconstruct 3D Gaussian Splatting geometry plus appearance from a single monocular video. It uses a self-supervised training strategy that avoids ground-truth physics labels. On a large-scale synthetic dataset, it reports 21.64 PSNR for future-frame prediction (vs. 13.27 for optimization baselines) and Chamfer Distance reduced to 0.004 (vs. 0.349), with inference under 1 second versus hours for prior methods.

Significance. If the quantitative gains and self-supervised generalization hold beyond the synthetic setting, the work would be significant: it replaces expensive per-scene optimization with a fast learned model, directly enabling simulation-ready assets for robotics and graphics. The avoidance of physics-label supervision is a clear practical strength.

major comments (2)
  1. [Abstract] Abstract: the central claim of practical utility for 'rapid generation of simulation-ready assets for robotics and graphics' rests on generalization from synthetic training to real videos, yet the abstract reports experiments exclusively on synthetic data and provides no real-world test results or domain-gap analysis; this is load-bearing for the asserted practicality.
  2. [Abstract] Abstract: the reported PSNR and Chamfer Distance improvements are presented without any description of the dual-branch architecture, the self-supervised loss terms, or the precise re-implementation and hyper-parameter tuning of the 'state-of-the-art optimization baselines'; without these details the numerical gains cannot be independently verified or attributed to the proposed method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We have revised the manuscript to improve precision in our claims and to enhance verifiability of the reported results while preserving the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of practical utility for 'rapid generation of simulation-ready assets for robotics and graphics' rests on generalization from synthetic training to real videos, yet the abstract reports experiments exclusively on synthetic data and provides no real-world test results or domain-gap analysis; this is load-bearing for the asserted practicality.

    Authors: We agree that the abstract should more precisely reflect the experimental scope. In the revised manuscript we have updated the abstract to state explicitly that all quantitative results are obtained on a large-scale synthetic dataset. The language on practical utility has been moderated from a direct assertion to 'promising for enabling rapid generation of simulation-ready assets'. A dedicated paragraph discussing the synthetic-to-real domain gap, the challenges of acquiring real-world physics ground truth, and planned future validation has been added to the Discussion section. revision: yes

  2. Referee: [Abstract] Abstract: the reported PSNR and Chamfer Distance improvements are presented without any description of the dual-branch architecture, the self-supervised loss terms, or the precise re-implementation and hyper-parameter tuning of the 'state-of-the-art optimization baselines'; without these details the numerical gains cannot be independently verified or attributed to the proposed method.

    Authors: The abstract is intentionally concise. Full details of the dual-branch architecture appear in Section 3.1, the self-supervised loss formulation and training strategy are given in Section 3.2, and the baseline re-implementations together with hyper-parameter choices are described in Section 4.2 and the supplementary material. To improve immediate readability we have added a short clause in the abstract mentioning the dual-branch feedforward design and self-supervised training. We maintain that the body of the paper supplies the information needed for independent verification. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents ReconPhys as a self-supervised feedforward network that jointly predicts 3D Gaussian Splatting parameters and physical attributes from monocular video input. The self-supervised loss is defined on reconstruction and future-frame consistency without access to ground-truth physics labels, and the quantitative claims (21.64 PSNR future prediction, 0.004 Chamfer distance) are measured on held-out synthetic sequences rather than being algebraically identical to any fitted parameter or training objective. No equation reduces a claimed prediction to an input by construction, no uniqueness theorem is imported from prior self-work to force the architecture, and no ansatz is smuggled via citation. The derivation therefore remains self-contained: the model architecture and training objective are stated independently of the reported test metrics, which serve as external empirical evidence rather than tautological restatements.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the effectiveness of a self-supervised dual-branch neural architecture trained on synthetic video data; no explicit free parameters, axioms, or invented entities are stated beyond standard neural network training assumptions.

free parameters (1)
  • neural network weights
    Learned via self-supervised training on the synthetic dataset; specific values not reported in abstract.
axioms (1)
  • domain assumption Self-supervised losses derived from video consistency are sufficient to recover accurate physical attributes.
    Invoked by the training strategy that eliminates the need for ground-truth physics labels.

pith-pipeline@v0.9.0 · 5515 in / 1315 out tokens · 43865 ms · 2026-05-10T16:52:51.977907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 10 canonical work pages · 2 internal anchors

  1. [1]

    Neural deformation graphs for globally-consistent non-rigid reconstruction.CVPR, 2021

    Aljaž Božič, Pablo Palafox, Michael Zollhöfer, Justus Thies, Angela Dai, and Matthias Nießner. Neural deformation graphs for globally-consistent non-rigid reconstruction.CVPR, 2021. 2

  2. [2]

    Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

    Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24185–24198, 2024. 8

  3. [3]

    Neural parametric gaussians for monocular non-rigid object reconstruction

    Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, and Jan Eric Lenssen. Neural parametric gaussians for monocular non-rigid object reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10715–10725, 2024. 2

  4. [4]

    Objaverse-xl: A universe of 10m+ 3d objects

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Information Processing Systems, 36:35799–35813, 2023. 7

  5. [5]

    Learning multi-object dynamics with compositional neural radiance fields

    Danny Driess, Zhiao Huang, Yunzhu Li, Russ Tedrake, and Marc Toussaint. Learning multi-object dynamics with compositional neural radiance fields. InConference on robot learning, pages 1755–1768. PMLR, 2023. 3

  6. [6]

    Diffpd: Differentiable projective dynamics.ACM Transactions on Graphics (ToG), 41(2):1–21, 2021

    Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. Diffpd: Differentiable projective dynamics.ACM Transactions on Graphics (ToG), 41(2):1–21, 2021. 4

  7. [7]

    Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 35 (4):3119–3133, 2024

    Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, and Houqiang Li. Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 35 (4):3119–3133, 2024. 3

  8. [8]

    Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

    Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009, 2025. 7

  9. [9]

    Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes

    Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4220–4230, 2024. 3

  10. [10]

    Phystwin: Physics- informed reconstruction and simulation of deformable objects from videos.arXiv preprint arXiv:2503.17973, 2025

    Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics-informed reconstruction and simulation of deformable objects from videos.arXiv preprint arXiv:2503.17973, 2025. 2, 11

  11. [11]

    Vr-gs: A physical dynamics-aware interactive gaussian splatting system in virtual reality

    Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao Feng, Huamin Wang, Minchen Li, Henry Lau, Feng Gao, Yin Yang, et al. Vr-gs: A physical dynamics-aware interactive gaussian splatting system in virtual reality. InACM SIGGRAPH 2024 conference papers, pages 1–1, 2024. 3

  12. [12]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023. 2, 3

  13. [13]

    Segment anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023. 11

  14. [14]

    Resnet 50

    Brett Koonce. Resnet 50. InConvolutional neural networks with swift for tensorflow: image recognition and dataset categorization, pages 63–72. Springer, 2021. 6 13 ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

  15. [15]

    Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting

    Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 252–269. Springer, 2024. 3

  16. [16]

    arXiv preprint arXiv:2303.05512 , year=

    Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Krishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, and Chuang Gan. Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identification.arXiv preprint arXiv:2303.05512, 2023. 2, 3

  17. [17]

    Evaluating real-world robot manipulation policies in simulation

    Xuanlin Li, Kyle Hsu, Jiayuan Gu, Oier Mees, Karl Pertsch, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, et al. Evaluating real-world robot manipulation policies in simulation. InConference on Robot Learning, pages 3705–3728. PMLR, 2025. 4

  18. [18]

    Neural scene flow fields for space-time view synthesis of dynamic scenes

    Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dynamic scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6498–6508, 2021. 3

  19. [19]

    Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle

    Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21136–21145, 2024. 2, 3

  20. [20]

    Omniphysgs: 3d constitutive gaussians for general physics-based dynamics generation.arXiv preprint arXiv:2501.18982, 2025

    Yuchen Lin, Chenguo Lin, Jianjin Xu, and Yadong Mu. Omniphysgs: 3d constitutive gaussians for general physics-based dynamics generation.arXiv preprint arXiv:2501.18982, 2025. 2

  21. [21]

    Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In2024 International Conference on 3D Vision (3DV), pages 800–809. IEEE, 2024. 2, 3

  22. [22]

    Learning neural constitutive laws from motion observations for generalizable pde dynamics

    Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B Tenenbaum, Tao Du, Chuang Gan, and Wojciech Matusik. Learning neural constitutive laws from motion observations for generalizable pde dynamics. In International Conference on Machine Learning, pages 23279–23300. PMLR, 2023. 4

  23. [23]

    Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021. 2, 3

  24. [24]

    Nerfies: Deformable neural radiance fields

    Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5865–5874, 2021. 2, 3

  25. [25]

    arXiv preprint arXiv:2106.13228 (2021)

    Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M Seitz. Hypernerf: A higher-dimensional representation for topo- logically varying neural radiance fields.arXiv preprint arXiv:2106.13228, 2021. 3

  26. [26]

    D-nerf: Neural radiance fields for dynamic scenes

    Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10318–10327, 2021. 2, 3

  27. [27]

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019. 4

  28. [28]

    Learning to simulate complex physics with graph networks

    Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph networks. InInternational conference on machine learning, pages 8459–8468. PMLR, 2020. 4 14 ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

  29. [29]

    Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video

    Edgar Tretschk, Ayush Tewari, Vladislav Golyanik, Michael Zollhöfer, Christoph Lassner, and Christian Theobalt. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. InProceedings of the IEEE/CVF international conference on computer vision, pages 12959–12970, 2021. 3

  30. [30]

    State of the art in dense monocular non-rigid 3d reconstruction

    Edith Tretschk, Navami Kairanda, Mallikarjun BR, Rishabh Dabral, Adam Kortylewski, Bernhard Egger, Marc Habermann, Pascal Fua, Christian Theobalt, and Vladislav Golyanik. State of the art in dense monocular non-rigid 3d reconstruction. InComputer Graphics Forum, volume 42, pages 485–520. Wiley Online Library, 2023. 2

  31. [31]

    Attention is all you need.Advances in neural information processing systems, 30,

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30,

  32. [32]

    Embodiedreamer: Advancing real2sim2real transfer for policy training via embodied world modeling.arXiv preprint arXiv:2507.05198, 2025

    Boyuan Wang, Xinpan Meng, Xiaofeng Wang, Zheng Zhu, Angen Ye, Yang Wang, Zhiqin Yang, Chaojun Ni, Guan Huang, and Xingang Wang. Embodiedreamer: Advancing real2sim2real transfer for policy training via embodied world modeling.arXiv preprint arXiv:2507.05198, 2025. 4

  33. [33]

    Humandreamer-x: Photorealistic single-image human avatars reconstruction via gaussian restoration.arXiv preprint arXiv:2504.03536, 2025

    Boyuan Wang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Xiaopei Zhang, Guan Huang, Yijie Ren, Lihong Liu, et al. Humandreamer-x: Photorealistic single-image human avatars reconstruction via gaussian restoration.arXiv preprint arXiv:2504.03536, 2025. 3

  34. [34]

    Flow supervision for deformable nerf

    Chaoyang Wang, Lachlan Ewen MacDonald, Laszlo A Jeni, and Simon Lucey. Flow supervision for deformable nerf. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21128–21137, 2023. 3

  35. [35]

    4d gaussian splatting for real-time dynamic scene rendering

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20310–20320, 2024. 2, 3, 8

  36. [36]

    Structured 3d latents for scalable and versatile 3d generation

    Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025. 7

  37. [37]

    Physgaussian: Physics-integrated 3d gaussians for generative dynamics

    Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics-integrated 3d gaussians for generative dynamics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4389–4398, 2024. 2, 3

  38. [38]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 7

  39. [39]

    Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023

    Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023. 3

  40. [40]

    Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20331–20341, 2024. 2, 3

  41. [41]

    Cogs: Controllable gaussian splatting

    Heng Yu, Joel Julin, Zoltán Á Milacski, Koichiro Niinuma, and László A Jeni. Cogs: Controllable gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21624–21633, 2024. 3

  42. [42]

    Dynamic 3d gaussian tracking for graph-based neural dynamics modeling.arXiv preprint arXiv:2410.18912, 2024

    Mingtong Zhang, Kaifeng Zhang, and Yunzhu Li. Dynamic 3d gaussian tracking for graph-based neural dynamics modeling.arXiv preprint arXiv:2410.18912, 2024. 2 15 ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

  43. [43]

    Avatarrex: Real-time expressive full-body avatars.ACM Transactions on Graphics (TOG), 42(4):1–19, 2023

    Zerong Zheng, Xiaochen Zhao, Hongwen Zhang, Boning Liu, and Yebin Liu. Avatarrex: Real-time expressive full-body avatars.ACM Transactions on Graphics (TOG), 42(4):1–19, 2023. 3

  44. [44]

    Reconstruction and simulation of elastic objects with spring-mass 3d gaussians

    Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring-mass 3d gaussians. InEuropean Conference on Computer Vision, pages 407–423. Springer,