pith. sign in

arxiv: 2605.30347 · v1 · pith:NUTN6N5Fnew · submitted 2026-05-28 · 💻 cs.CV · cs.GR

NeuROK: Generative 4D Neural Object Kinematics

Pith reviewed 2026-06-29 08:18 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords 4D dynamicsneural kinematicslatent spaceLagrangian mechanicsdeformable objectsgenerative modeltransformer encoder-decoder
0
0 comments X

The pith

A learned latent space for all object states lets dynamics be simulated using Lagrangian mechanics only in that low-dimensional space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a data-driven parameterization called NeuROK, consisting of a latent space capturing every possible state of an object together with a decoder that turns any latent sample into a plausible deformed shape, makes it possible to generate realistic 4D temporal deformations without assuming any predefined physical model for each object category. This is done by training a transformer encoder-decoder on a large curated 4D dataset and then performing the actual dynamics simulation inside the latent space from the classical Lagrangian mechanics viewpoint. A sympathetic reader would care because the method removes the need for per-category system identification and thereby scales to diverse object types where earlier approaches were restricted to small, narrow datasets.

Core claim

The central claim is that learning both a latent space representing all possible states of the object and a decoder that maps any sampled latent to a plausibly deformed shape of the object significantly simplifies the generation of simulative dynamics, since only the dynamics within this low-dimensional latent space need to be considered from the Lagrangian mechanics perspective in classical physics. The resulting transformer-based model demonstrates effectiveness and generality across diverse dynamic object types.

What carries the argument

Neural Object Kinematics (NeuROK): the learned latent space of object states together with its decoder to deformed shapes, which carries the argument by moving all physics simulation into that latent space.

If this is right

  • Dynamics simulation reduces to operating only inside the low-dimensional latent space rather than the full 3D geometry.
  • The same trained model applies to many different dynamic object categories without requiring new physical models or parameter fitting.
  • Realistic temporal deformations arise under varied physical conditions directly from the learned kinematic parameterization.
  • The framework produces clear improvements over earlier methods that rely on predefined physical models and system identification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the latent space is complete, new dynamic sequences could be produced simply by choosing different initial latent states and integrating the latent equations forward.
  • The approach could be attached to existing static 3D generative models to supply the missing time dimension without retraining the geometry generator.
  • Efficiency gains would appear if the latent dynamics themselves can be learned or approximated faster than full 3D physics solvers.

Load-bearing premise

A data-driven latent space learned from a curated large-scale 4D dataset can represent every possible state of an object and its decoder can map any latent sample to a plausibly deformed shape.

What would settle it

Apply the trained latent dynamics to an object type absent from the training set and check whether the resulting 4D deformation sequences match independent physical observations or ground-truth simulations.

Figures

Figures reproduced from arXiv: 2605.30347 by Chen Geng, Guangzhao He, Jiajun Wu, Shangzhe Wu, Yue Gao, Yunzhi Zhang.

Figure 1
Figure 1. Figure 1: We present a versatile and scalable framework for generating simulative 4D dynamics of static 3D objects under physical [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Kinematic state parameterization. (a) Several kinematic state parameterizations can be used to describe a physical system. The symbolic parameterizations used in classical mechanics are concise yet not accessible in inverse problems. Traditional inverse simulation approaches use geometry-derived parameterizations, yet require dense physical constraints to solve the over-parameterized system. We instead lea… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our framework. Given a static 3D shape, NEUROK uses a transformer-based encoder to predict an instance￾specific latent space to represent different kinematic states of this object. Each sampled latent on the learned manifold can be decoded to a corresponding state of the input object. Under different physical conditions (e.g., forces, actions, velocities), our method generates dynamic trajector… view at source ↗
Figure 4
Figure 4. Figure 4: Generative learning of NEUROK. During training, we randomly sample an instance mesh and one of its possible defor￾mation fields from the training set, and supervise all three models with KL and reconstruction targets. During inference, we only use Econd to obtain the prior distribution pM0 (z) for the instance M0 and sample from this distribution a latent, which is further decoded to a predicted deformatio… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison on learning object kinematics. We evaluate different methods on learning compact and smooth kinematic spaces. Given an input object and the shape of a target pose, we perform inverse kinematics and find the best-matching kinematic state. We compare how well the reconstructed shape decoded from the obtained state vectors matches the target. Input “Box” Input “Lamp” Input “Cloth” Input… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison on physically-inspired 4D generation. We compare against baselines on the task of generating physically-plausible 4D motion given a single shape and conditioning actions. 6.2. Generative 4D Simulation We show that our pipeline generates 4D simulative dynam￾ics for diverse objects, evaluated across eight objects. Baselines. We compare against representative meth￾ods for generating 4D … view at source ↗
Figure 8
Figure 8. Figure 8: Analysis of energy conservation. Our approach main￾tains physical consistency in the generated trajectories through Eu￾ler–Lagrangian modeling. Under this formulation, the total energy of the simulated motion remains approximately constant. Input 4D Generation Input 4D Generation [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Generalization on unseen categories. Our model can generalize to novel object categories that are completely not present in the training data. Simulating Real Objects. Our pipeline can also simulate and manipulate real scenes. We scan a real scene and apply our approach to simulate the dynamics of the objects within it. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 7
Figure 7. Figure 7: Simulating real objects. Our model can be used to sim￾ulate real-captured objects. See the Supp. Mat. for more results. models, Pixie [49] predicts simulation parameters us￾ing amortized-inference networks, and OmniPhysGS [69] represents each asset with material-aware Constitutive Gaussians for general physics-based dynamics. Ani￾mateAnyMesh [102] is an end-to-end 4D generator trained on large-scale 4D dat… view at source ↗
read the original abstract

Data-driven approaches have revolutionized 3D vision, enabling transformers to effectively reconstruct and generate static 3D objects. However, generating simulative 4D dynamics -- realistic temporal deformations of static objects under various physical conditions -- remains challenging and often ad hoc, despite its importance in building comprehensive 3D world models. Most existing methods assume a predefined physical model and use system identification to estimate parameters, restricting these methods to specific categories and small-scale datasets. We propose that these restrictions can be overcome by learning a data-driven kinematic state parameterization for object-centric physical systems. Specifically, we learn both a latent space representing all possible states of the object and a decoder that maps any sampled latent to a plausibly deformed shape of the object. We refer to this parameterization as Neural Object Kinematics (NeuROK), and learn a transformer-based encoder-decoder model on a curated large-scale 4D dataset. This formulation and the learned model significantly simplify the generation of simulative dynamics since we only need to consider the dynamics within a low-dimensional latent space from the Lagrangian mechanics' perspective in classical physics. We demonstrate the effectiveness and generality of this neural simulation framework across diverse dynamic object types, showing clear advantages over prior works. Project page: https://chen-geng.com/neurok

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper introduces NeuROK, a transformer-based encoder-decoder that learns a low-dimensional latent space representing all possible states of an object together with a decoder mapping latents to deformed shapes. It claims that simulating dynamics via Lagrangian mechanics directly in this latent space (without predefined physical models or system identification) simplifies 4D generation and yields clear advantages over prior category-specific methods, with demonstrations across diverse dynamic object types on a curated large-scale 4D dataset.

Significance. If the central claim holds—that a purely data-driven latent parameterization plus Lagrangian simulation produces realistic, physically consistent temporal deformations—the approach could materially advance category-agnostic 4D world models by removing the need for hand-specified physics per object class.

major comments (3)
  1. [Abstract] Abstract: the assertion that the formulation 'significantly simplify[s] the generation of simulative dynamics' and shows 'clear advantages over prior works' is unsupported by any quantitative metrics, ablation studies, or comparisons; the central claim therefore cannot be evaluated from the supplied text.
  2. [Abstract] Abstract / implied method: the paper states that dynamics are considered 'from the Lagrangian mechanics' perspective in classical physics' yet supplies no description of how the kinetic and potential energy terms are instantiated or learned inside the latent space, nor whether any conservation laws or boundary conditions are enforced during roll-out; this is load-bearing for the physical-consistency claim.
  3. [Abstract] Abstract: the weakest modeling assumption—that any trajectory sampled in the learned latent space corresponds to a physically valid shape sequence—is stated without empirical test or constraint, leaving open whether the decoder can produce implausible deformations under simulated dynamics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with honest responses and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the formulation 'significantly simplify[s] the generation of simulative dynamics' and shows 'clear advantages over prior works' is unsupported by any quantitative metrics, ablation studies, or comparisons; the central claim therefore cannot be evaluated from the supplied text.

    Authors: We agree the abstract overstates the claims without direct support. The full paper contains quantitative comparisons and ablations in the experiments section, but these are not referenced in the abstract. We will revise the abstract to remove the unsubstantiated assertions about simplification and advantages, replacing them with a neutral description of the approach and directing readers to the results. revision_made = 'yes' revision: yes

  2. Referee: [Abstract] Abstract / implied method: the paper states that dynamics are considered 'from the Lagrangian mechanics' perspective in classical physics' yet supplies no description of how the kinetic and potential energy terms are instantiated or learned inside the latent space, nor whether any conservation laws or boundary conditions are enforced during roll-out; this is load-bearing for the physical-consistency claim.

    Authors: The method section details the latent parameterization and Lagrangian simulation, but we acknowledge the abstract and high-level description lack explicit equations or implementation details for the energy terms. We will expand the method section with a dedicated subsection describing how kinetic and potential energies are defined and optimized in latent space, along with any conservation enforcement during rollout. revision_made = 'yes' revision: yes

  3. Referee: [Abstract] Abstract: the weakest modeling assumption—that any trajectory sampled in the learned latent space corresponds to a physically valid shape sequence—is stated without empirical test or constraint, leaving open whether the decoder can produce implausible deformations under simulated dynamics.

    Authors: This concern is valid; the current experiments rely on data-driven training to promote plausibility but do not include explicit tests of physical validity for arbitrary latent trajectories. We will add a new analysis subsection with qualitative and quantitative checks on decoder outputs under simulated dynamics, plus discussion of limitations. revision_made = 'partial' revision: partial

Circularity Check

0 steps flagged

No significant circularity: data-driven latent parameterization trained on external 4D dataset

full rationale

The paper learns a latent space and decoder from a curated large-scale 4D dataset to represent object states, then simulates dynamics in that latent space using the Lagrangian perspective. No equations, self-citations, or fitted parameters are shown reducing the central claim to its own inputs by construction. The approach is explicitly data-driven rather than self-referential, with the Lagrangian application serving as an interpretive framework on independently learned components. This matches the default expectation of non-circularity for data-driven methods without load-bearing self-references or definitional loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on abstract; no explicit free parameters, invented entities, or detailed axioms are described. The key domain assumption is that classical Lagrangian mechanics can be applied inside the learned latent space to produce realistic dynamics.

axioms (1)
  • domain assumption Lagrangian mechanics perspective can be applied to dynamics within the learned low-dimensional latent space to generate realistic deformations
    Invoked in abstract as the mechanism that simplifies generation after learning the NeuROK parameterization.

pith-pipeline@v0.9.1-grok · 5771 in / 1172 out tokens · 30704 ms · 2026-06-29T08:18:20.316116+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

121 extracted references · 27 canonical work pages · 7 internal anchors

  1. [1]

    Fast and deep facial deformations.ACM Transactions on Graphics (TOG), 39(4):94–1, 2020

    Stephen W Bailey, Dalton Omens, Paul Dilorenzo, and James F O’Brien. Fast and deep facial deformations.ACM Transactions on Graphics (TOG), 39(4):94–1, 2020. 3

  2. [2]

    Learning data-driven discretizations for partial differential equations.Proceedings of the National Academy of Sciences, 116(31):15344–15349, 2019

    Yohai Bar-Sinai, Stephan Hoyer, Jason Hickey, and Michael P Brenner. Learning data-driven discretizations for partial differential equations.Proceedings of the National Academy of Sciences, 116(31):15344–15349, 2019. 3

  3. [3]

    Interaction networks for learning about objects, relations and physics.Advances in neural information processing systems, 29, 2016

    Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics.Advances in neural information processing systems, 29, 2016. 3

  4. [4]

    Simulation as an engine of physical scene under- standing.Proceedings of the national academy of sciences, 110(45):18327–18332, 2013

    Peter W Battaglia, Jessica B Hamrick, and Joshua B Tenen- baum. Simulation as an engine of physical scene under- standing.Proceedings of the national academy of sciences, 110(45):18327–18332, 2013. 3

  5. [5]

    A sur- vey of projection-based model reduction methods for para- metric dynamical systems.SIAM review, 57(4):483–531,

    Peter Benner, Serkan Gugercin, and Karen Willcox. A sur- vey of projection-based model reduction methods for para- metric dynamical systems.SIAM review, 57(4):483–531,

  6. [6]

    Learn- ing articulated rigid body dynamics with lagrangian graph neural network.Advances in Neural Information Process- ing Systems, 35:29789–29800, 2022

    Ravinder Bhattoo, Sayan Ranu, and NM Krishnan. Learn- ing articulated rigid body dynamics with lagrangian graph neural network.Advances in Neural Information Process- ing Systems, 35:29789–29800, 2022. 3

  7. [7]

    Numerical methods for data science, 2018

    David Bindel. Numerical methods for data science, 2018. 6

  8. [8]

    Face recognition based on fitting a 3d morphable model.IEEE Transactions on pat- tern analysis and machine intelligence, 25(9):1063–1074,

    V olker Blanz and Thomas Vetter. Face recognition based on fitting a 3d morphable model.IEEE Transactions on pat- tern analysis and machine intelligence, 25(9):1063–1074,

  9. [9]

    A 3d morphable model learnt from 10,000 faces

    James Booth, Anastasios Roussos, Stefanos Zafeiriou, Al- lan Ponniah, and David Dunaway. A 3d morphable model learnt from 10,000 faces. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 5543–5552, 2016. 3

  10. [10]

    Projective dynamics: fusing con- straint projections for fast simulation.ACM Transactions on Graphics (TOG), 33(4):1–11, 2014

    Sofien Bouaziz, Sebastian Martin, Tiantian Liu, Ladislav Kavan, and Mark Pauly. Projective dynamics: fusing con- straint projections for fast simulation.ACM Transactions on Graphics (TOG), 33(4):1–11, 2014. 2

  11. [11]

    Neural defor- mation graphs for globally-consistent non-rigid reconstruc- tion

    Aljaz Bozic, Pablo Palafox, Michael Zollhofer, Justus Thies, Angela Dai, and Matthias Nießner. Neural defor- mation graphs for globally-consistent non-rigid reconstruc- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 1450–1459,

  12. [12]

    Gic: Gaussian-informed continuum for physical property iden- tification and simulation.Advances in Neural Information Processing Systems, 37:75035–75063, 2024

    Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, and Qifeng Chen. Gic: Gaussian-informed continuum for physical property iden- tification and simulation.Advances in Neural Information Processing Systems, 37:75035–75063, 2024. 2

  13. [13]

    A Compositional Object-Based Approach to Learning Physical Dynamics

    Michael B Chang, Tomer Ullman, Antonio Torralba, and Joshua B Tenenbaum. A compositional object-based ap- proach to learning physical dynamics.arXiv preprint arXiv:1612.00341, 2016. 3

  14. [14]

    Licrom: Linear-subspace continuous reduced order model- ing with neural fields

    Yue Chang, Peter Yichen Chen, Zhecheng Wang, Maur- izio M Chiaramonte, Kevin Carlberg, and Eitan Grinspun. Licrom: Linear-subspace continuous reduced order model- ing with neural fields. InSIGGRAPH Asia 2023 Conference Papers, pages 1–12, 2023. 3

  15. [15]

    Physgen3d: Crafting a miniature interactive world from a single image

    Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, and Shenlong Wang. Physgen3d: Crafting a miniature interactive world from a single image. InProceedings of the Computer Vision and Pattern Recog- nition Conference, pages 6178–6189, 2025. 2

  16. [16]

    Vid2sim: Generalizable, video-based reconstruction of ap- pearance, geometry and physics for mesh-free simulation

    Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, and Lingjie Liu. Vid2sim: Generalizable, video-based reconstruction of ap- pearance, geometry and physics for mesh-free simulation. InProceedings of the Computer Vision and Pattern Recog- nition Conference, pages 26545–26555, 2025. 2

  17. [17]

    Freeart3d: Training-free articulated object generation using 3d diffusion

    Chuhao Chen, Isabella Liu, Xinyue Wei, Hao Su, and Minghua Liu. Freeart3d: Training-free articulated object generation using 3d diffusion. InSIGGRAPH Asia 2025 Conference Papers, 2025. 6, 8

  18. [18]

    Implicit neural spatial representa- tions for time-dependent pdes

    Honglin Chen, Rundi Wu, Eitan Grinspun, Changxi Zheng, and Peter Yichen Chen. Implicit neural spatial representa- tions for time-dependent pdes. InInternational Conference on Machine Learning, pages 5162–5177. PMLR, 2023. 3

  19. [19]

    Crom: Continuous reduced-order modeling of pdes using implicit neural representations.arXiv preprint arXiv:2206.02607,

    Peter Yichen Chen, Jinxu Xiang, Dong Heon Cho, Yue Chang, GA Pershing, Henrique Teles Maia, Maurizio M Chiaramonte, Kevin Carlberg, and Eitan Grinspun. Crom: Continuous reduced-order modeling of pdes using implicit neural representations.arXiv preprint arXiv:2206.02607,

  20. [20]

    Model reduction for the material point method via an implicit neural representation of the deformation map.Journal of Computational Physics, 478: 111908, 2023

    Peter Yichen Chen, Maurizio M Chiaramonte, Eitan Grin- spun, and Kevin Carlberg. Model reduction for the material point method via an implicit neural representation of the deformation map.Journal of Computational Physics, 478: 111908, 2023. 3

  21. [21]

    URDFormer: A pipeline for con- structing articulated simulation environments from real- world images.arXiv preprint arXiv:2405.11656, 2024

    Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, and Abhishek Gupta. URDFormer: A pipeline for con- structing articulated simulation environments from real- world images.arXiv preprint arXiv:2405.11656, 2024. 2

  22. [22]

    Active sub- space methods in theory and practice: applications to krig- ing surfaces.SIAM Journal on Scientific Computing, 36(4): A1500–A1524, 2014

    Paul G Constantine, Eric Dow, and Qiqi Wang. Active sub- space methods in theory and practice: applications to krig- ing surfaces.SIAM Journal on Scientific Computing, 36(4): A1500–A1524, 2014. 6 9

  23. [23]

    Lagrangian Neural Networks,

    Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neu- ral networks.arXiv preprint arXiv:2003.04630, 2020. 3

  24. [24]

    Levin, and Maria Shugrina

    Rishit Dagli, Donglai Xiang, Vismay Modi, Charles Loop, Clement Fuji Tsang, Anka He Chen, Anita Hu, Gavriel State, David IW Levin, and Maria Shugrina. V omp: Predicting volumetric mechanical property fields.arXiv preprint arXiv:2510.22975, 2025. 2

  25. [25]

    Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36:35799–35813,

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Chris- tian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36:35799–35813,

  26. [26]

    Diffpd: Differentiable projective dynamics.ACM Transactions on Graphics (ToG), 41(2):1–21, 2021

    Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. Diffpd: Differentiable projective dynamics.ACM Transactions on Graphics (ToG), 41(2):1–21, 2021. 2

  27. [27]

    Worldscore: A unified evaluation benchmark for world generation

    Haoyi Duan, Hong-Xing Yu, Sirui Chen, Li Fei-Fei, and Ji- ajun Wu. Worldscore: A unified evaluation benchmark for world generation. InProceedings of the IEEE/CVF inter- national conference on computer vision, 2025. 8

  28. [28]

    A point set generation network for 3d object reconstruction from a single image

    Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613,

  29. [29]

    Pie-nerf: Physics-based in- teractive elastodynamics with nerf

    Yutao Feng, Yintong Shang, Xuan Li, Tianjia Shao, Chen- fanfu Jiang, and Yin Yang. Pie-nerf: Physics-based in- teractive elastodynamics with nerf. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4450–4461, 2024. 2

  30. [30]

    Simplifying hamiltonian and lagrangian neural networks via explicit constraints.Advances in neural information processing systems, 33:13880–13889, 2020

    Marc Finzi, Ke Alexander Wang, and Andrew G Wilson. Simplifying hamiltonian and lagrangian neural networks via explicit constraints.Advances in neural information processing systems, 33:13880–13889, 2020. 3

  31. [31]

    Latent-space dynamics for re- duced deformable simulation

    Lawson Fulton, Vismay Modi, David Duvenaud, David IW Levin, and Alec Jacobson. Latent-space dynamics for re- duced deformable simulation. InComputer graphics forum, pages 379–391. Wiley Online Library, 2019. 3

  32. [32]

    Learning neural parametric head models

    Simon Giebenhain, Tobias Kirschstein, Markos Geor- gopoulos, Martin R ¨unz, Lourdes Agapito, and Matthias Nießner. Learning neural parametric head models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21003–21012, 2023. 3

  33. [33]

    Pgc: Physics-based gaussian cloth from a single pose

    Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-yu Chen, Oshri Halimi, Alja ˇz Boˇziˇc, Shunsuke Saito, Jiajun Wu, C Karen Liu, et al. Pgc: Physics-based gaussian cloth from a single pose. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 21215–21225, 2025. 2

  34. [34]

    Category-agnostic neural object rigging

    Guangzhao He, Chen Geng, Shangzhe Wu, and Jiajun Wu. Category-agnostic neural object rigging. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22078–22088, 2025. 3, 6, 8

  35. [35]

    Arapreg: An as- rigid-as possible regularization loss for learning deformable shape generators

    Qixing Huang, Xiangru Huang, Bo Sun, Zaiwei Zhang, Junfeng Jiang, and Chandrajit Bajaj. Arapreg: An as- rigid-as possible regularization loss for learning deformable shape generators. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 5815–5825,

  36. [36]

    Vbench: Comprehensive benchmark suite for video generative models

    Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, et al. Vbench: Comprehensive benchmark suite for video generative models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21807–21818, 2024. 8

  37. [37]

    Perceiver: General perception with iterative attention

    Andrew Jaegle, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. Perceiver: General perception with iterative attention. InInternational confer- ence on machine learning, pages 4651–4664. PMLR, 2021. 5

  38. [38]

    Keypointde- former: Unsupervised 3d keypoint discovery for shape con- trol

    Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, and Angjoo Kanazawa. Keypointde- former: Unsupervised 3d keypoint discovery for shape con- trol. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 12783–12792,

  39. [39]

    Dyrt: Dynamic response textures for real time deformation simulation with graphics hardware

    Doug L James and Dinesh K Pai. Dyrt: Dynamic response textures for real time deformation simulation with graphics hardware. InProceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 582– 585, 2002. 3

  40. [40]

    Precom- puted acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources

    Doug L James, Jernej Barbi ˇc, and Dinesh K Pai. Precom- puted acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Transactions on Graphics (TOG), 25(3):987–995,

  41. [41]

    The material point method for simulating continuum materials

    Chenfanfu Jiang, Craig Schroeder, Joseph Teran, Alexey Stomakhin, and Andrew Selle. The material point method for simulating continuum materials. InACM SIGGRAPH 2016 courses, pages 1–52, 2016. 2, 4

  42. [42]

    Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025

    Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025. 2

  43. [43]

    Ditto: Building digital twins of articulated objects from interac- tion

    Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building digital twins of articulated objects from interac- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 5616–5626,

  44. [44]

    Improving physics-augmented contin- uum neural radiance field-based geometry-agnostic system identification with lagrangian particle optimization

    Takuhiro Kaneko. Improving physics-augmented contin- uum neural radiance field-based geometry-agnostic system identification with lagrangian particle optimization. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5480, 2024. 2

  45. [45]

    Robot see robot do: Imitating articulated object manipu- lation with monocular 4d reconstruction.arXiv preprint arXiv:2409.18121, 2024

    Justin Kerr, Chung Min Kim, Mingxuan Wu, Brent Yi, Qianqian Wang, Ken Goldberg, and Angjoo Kanazawa. Robot see robot do: Imitating articulated object manipu- lation with monocular 4d reconstruction.arXiv preprint arXiv:2409.18121, 2024. 2

  46. [46]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding vari- ational bayes.arXiv preprint arXiv:1312.6114, 2013. 5

  47. [47]

    Mechanics

    Lev Davidovich Landau and Evgeni˘i Mikha˘ilovich Lifshitz. Mechanics. CUP Archive, 1960. 2, 4, 6 10

  48. [48]

    Articulate- anything: Automatic modeling of articulated objects via a vision-language foundation model.arXiv preprint arXiv:2410.13882, 2024

    Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Kr- ishna, Dinesh Jayaraman, and Eric Eaton. Articulate- anything: Automatic modeling of articulated objects via a vision-language foundation model.arXiv preprint arXiv:2410.13882, 2024. 2

  49. [49]

    Pixie: Fast and generalizable supervised learning of 3d physics from pixels

    Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, and Lingjie Liu. Pixie: Fast and generalizable supervised learning of 3d physics from pixels. arXiv preprint arXiv:2508.17437, 2025. 8

  50. [50]

    Model reduction of dy- namical systems on nonlinear manifolds using deep convo- lutional autoencoders.Journal of Computational Physics, 404:108973, 2020

    Kookjin Lee and Kevin T Carlberg. Model reduction of dy- namical systems on nonlinear manifolds using deep convo- lutional autoencoders.Journal of Computational Physics, 404:108973, 2020. 3

  51. [51]

    Nap: Neural 3d articulated object prior.Advances in Neural Information Processing Systems, 36:31878–31894, 2023

    Jiahui Lei, Congyue Deng, William B Shen, Leonidas J Guibas, and Kostas Daniilidis. Nap: Neural 3d articulated object prior.Advances in Neural Information Processing Systems, 36:31878–31894, 2023. 2

  52. [52]

    Behavior-1k: A benchmark for embodied ai with 1,000 ev- eryday activities and realistic simulation

    Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Mart ´ın-Mart´ın, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, et al. Behavior-1k: A benchmark for embodied ai with 1,000 ev- eryday activities and realistic simulation. InConference on Robot Learning, pages 80–93. PMLR, 2023. 2

  53. [53]

    Deformnet: Latent space modeling and dynamics prediction for deformable object manipula- tion

    Chenchang Li, Zihao Ai, Tong Wu, Xiaosa Li, Wenbo Ding, and Huazhe Xu. Deformnet: Latent space modeling and dynamics prediction for deformable object manipula- tion. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14770–14776. IEEE, 2024. 3

  54. [54]

    Con- trolling diverse robots by inferring jacobian fields with deep networks.Nature, pages 1–7, 2025

    Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Ma- tusik, Chao Liu, Daniela Rus, and Vincent Sitzmann. Con- trolling diverse robots by inferring jacobian fields with deep networks.Nature, pages 1–7, 2025. 3

  55. [55]

    Plasticitynet: Learning to simulate metal, sand, and snow for optimization time in- tegration.Advances in Neural Information Processing Sys- tems, 35:27783–27796, 2022

    Xuan Li, Yadi Cao, Minchen Li, Yin Yang, Craig Schroeder, and Chenfanfu Jiang. Plasticitynet: Learning to simulate metal, sand, and snow for optimization time in- tegration.Advances in Neural Information Processing Sys- tems, 35:27783–27796, 2022. 2

  56. [56]

    Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identi- fication.arXiv preprint arXiv:2303.05512, 2023

    Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Kr- ishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, and Chuang Gan. Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identi- fication.arXiv preprint arXiv:2303.05512, 2023. 2

  57. [57]

    Dress-1-to- 3: Single image to simulation-ready 3d outfit with diffu- sion prior and differentiable physics.ACM Transactions on Graphics (TOG), 44(4):1–16, 2025

    Xuan Li, Chang Yu, Wenxin Du, Ying Jiang, Tianyi Xie, Yunuo Chen, Yin Yang, and Chenfanfu Jiang. Dress-1-to- 3: Single image to simulation-ready 3d outfit with diffu- sion prior and differentiable physics.ACM Transactions on Graphics (TOG), 44(4):1–16, 2025. 2

  58. [58]

    Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids

    Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenen- baum, and Antonio Torralba. Learning particle dynamics for manipulating rigid bodies, deformable objects, and flu- ids.arXiv preprint arXiv:1810.01566, 2018. 3

  59. [59]

    Diffcloth: Differentiable cloth simulation with dry fric- tional contact.ACM Transactions on Graphics (TOG), 42 (1):1–20, 2022

    Yifei Li, Tao Du, Kui Wu, Jie Xu, and Wojciech Matusik. Diffcloth: Differentiable cloth simulation with dry fric- tional contact.ACM Transactions on Graphics (TOG), 42 (1):1–20, 2022. 2

  60. [60]

    3d neural scene representations for visuomotor control

    Yunzhu Li, Shuang Li, Vincent Sitzmann, Pulkit Agrawal, and Antonio Torralba. 3d neural scene representations for visuomotor control. InConference on Robot Learning, pages 112–123. PMLR, 2022. 3

  61. [61]

    Learning preconditioners for conjugate gradient pde solvers

    Yichen Li, Peter Yichen Chen, Tao Du, and Wojciech Ma- tusik. Learning preconditioners for conjugate gradient pde solvers. InInternational Conference on Machine Learning, pages 19425–19439. PMLR, 2023. 3

  62. [62]

    Diffavatar: Simulation-ready garment optimization with differentiable simulation

    Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafi- anos, Wojciech Matusik, and Tuur Stuyck. Diffavatar: Simulation-ready garment optimization with differentiable simulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4368– 4378, 2024. 2

  63. [63]

    Self- supervised learning of latent space dynamics.Proceedings of the ACM on Computer Graphics and Interactive Tech- niques, 8(4):1–18, 2025

    Yue Li, Gene Wei-Chin Lin, Egor Larionov, Aljaz Bozic, Doug Roble, Ladislav Kavan, Stelian Coros, Bernhard Thomaszewski, Tuur Stuyck, and Hsiao-Yu Chen. Self- supervised learning of latent space dynamics.Proceedings of the ACM on Computer Graphics and Interactive Tech- niques, 8(4):1–18, 2025. 3

  64. [64]

    Fourier Neural Operator for Parametric Partial Differential Equations

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations.arXiv preprint arXiv:2010.08895, 2020. 3

  65. [65]

    Wonderplay: Dynamic 3d scene generation from a single image and ac- tions.arXiv preprint arXiv:2505.18151, 2025

    Zizhang Li, Hong-Xing Yu, Wei Liu, Yin Yang, Charles Herrmann, Gordon Wetzstein, and Jiajun Wu. Wonderplay: Dynamic 3d scene generation from a single image and ac- tions.arXiv preprint arXiv:2505.18151, 2025. 2

  66. [66]

    Differen- tiable cloth simulation for inverse problems.Advances in neural information processing systems, 32, 2019

    Junbang Liang, Ming Lin, and Vladlen Koltun. Differen- tiable cloth simulation for inverse problems.Advances in neural information processing systems, 32, 2019. 2

  67. [67]

    Phys4dgen: Physics-compliant 4d generation with multi-material composition perception

    Jiajing Lin, Zhenzhong Wang, Dejun Xu, Shu Jiang, Yun- Peng Gong, and Min Jiang. Phys4dgen: Physics-compliant 4d generation with multi-material composition perception. arXiv preprint arXiv:2411.16800, 2024. 2

  68. [68]

    VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization

    Jiajing Lin, Shu Jiang, Qingyuan Zeng, Zhenzhong Wang, and Min Jiang. Visionlaw: Inferring interpretable intrinsic dynamics from visual observations via bilevel optimization. arXiv preprint arXiv:2508.13792, 2025

  69. [69]

    Omniphysgs: 3d constitutive gaussians for gen- eral physics-based dynamics generation.arXiv preprint arXiv:2501.18982, 2025

    Yuchen Lin, Chenguo Lin, Jianjin Xu, and Yadong Mu. Omniphysgs: 3d constitutive gaussians for gen- eral physics-based dynamics generation.arXiv preprint arXiv:2501.18982, 2025. 2, 8

  70. [70]

    Paris: Part-level reconstruction and motion analysis for articulated objects

    Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. Paris: Part-level reconstruction and motion analysis for articulated objects. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 352–363, 2023. 2

  71. [71]

    Singapo: Single image controlled generation of articulated parts in objects.arXiv preprint arXiv:2410.16499, 2024

    Jiayi Liu, Denys Iliash, Angel X Chang, Manolis Savva, and Ali Mahdavi-Amiri. Singapo: Single image controlled generation of articulated parts in objects.arXiv preprint arXiv:2410.16499, 2024. 2, 6, 8

  72. [72]

    Differentiable robot rendering.arXiv preprint arXiv:2410.13851, 2024

    Ruoshi Liu, Alper Canberk, Shuran Song, and Carl V on- drick. Differentiable robot rendering.arXiv preprint arXiv:2410.13851, 2024. 3

  73. [73]

    Physgen: Rigid-body physics-grounded image- 11 to-video generation

    Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shen- long Wang. Physgen: Rigid-body physics-grounded image- 11 to-video generation. InEuropean Conference on Computer Vision, pages 360–378. Springer, 2024. 2

  74. [74]

    Smpl: a skinned multi-person linear model.ACM Transactions on Graph- ics (TOG), 34(6):1–16, 2015

    Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J Black. Smpl: a skinned multi-person linear model.ACM Transactions on Graph- ics (TOG), 34(6):1–16, 2015. 3

  75. [75]

    Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning

    Michael Lutter, Christian Ritter, and Jan Peters. Deep la- grangian networks: Using physics as model prior for deep learning.arXiv preprint arXiv:1907.04490, 2019. 3

  76. [76]

    Learning neural constitutive laws from motion observations for generalizable pde dynamics

    Pingchuan Ma, Peter Yichen Chen, Bolei Deng, Joshua B Tenenbaum, Tao Du, Chuang Gan, and Wojciech Matusik. Learning neural constitutive laws from motion observations for generalizable pde dynamics. InInternational Confer- ence on Machine Learning, pages 23279–23300. PMLR,

  77. [77]

    Xpbd: position-based simulation of compliant constrained dynamics

    Miles Macklin, Matthias M ¨uller, and Nuttapong Chentanez. Xpbd: position-based simulation of compliant constrained dynamics. InProceedings of the 9th International Confer- ence on Motion in Games, pages 49–54, 2016. 2

  78. [78]

    Explorable mesh deformation subspaces from unstructured 3d generative models

    Arman Maesumi, Paul Guerrero, Noam Aigerman, Vladimir Kim, Matthew Fisher, Siddhartha Chaudhuri, and Daniel Ritchie. Explorable mesh deformation subspaces from unstructured 3d generative models. InSIGGRAPH Asia 2023 Conference Papers, pages 1–11, 2023. 3

  79. [79]

    Real2code: Reconstruct articulated objects via code generation.arXiv preprint arXiv:2406.08474, 2024

    Zhao Mandi, Yijia Weng, Dominik Bauer, and Shuran Song. Real2code: Reconstruct articulated objects via code generation.arXiv preprint arXiv:2406.08474, 2024. 2

  80. [80]

    Dexmachina: Functional retargeting for bimanual dexterous manipulation.arXiv preprint arXiv:2505.24853, 2025

    Zhao Mandi, Yifan Hou, Dieter Fox, Yashraj Narang, Ajay Mandlekar, and Shuran Song. Dexmachina: Functional retargeting for bimanual dexterous manipulation.arXiv preprint arXiv:2505.24853, 2025. 2

Showing first 80 references.