arxiv: 2604.16843 · v1 · submitted 2026-04-18 · 💻 cs.CE

Recognition: unknown

Watching Physics: the Generative Science of Matter and Motion

Ellen Kuhl, Hagen Holthusen, Kevin Linka

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:28 UTC · model grok-4.3

classification 💻 cs.CE

keywords generative video modelsdeformation mechanicsphysics simulationkinematicsscientific inferencestrain recoverymatter in motion

0 comments

The pith

Generative models recover measurable physical quantities like surface strain from video when the underlying mechanics appear directly in the visible motion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether generative video models can extract reliable physics of matter in motion from images alone. It combines visual data with targeted experiments and high-fidelity simulations across three deformation systems that grow in complexity. Success occurs only when the relevant physics is encoded in observable kinematics; otherwise visual realism decouples from physical correctness. The work therefore frames a unified approach that turns image generation into a tool for inference and design rather than mere appearance matching.

Core claim

Using deformation mechanics as a testbed, we study rubber compression, can crushing, and cardiac motion to identify regimes in which visual learning succeeds, fails, and requires mechanistic supervision. When physics manifests in visible kinematics, generative models recover measurable quantities such as surface strain; when internal state variables dominate, visual plausibility no longer ensures physical admissibility. This convergence defines the Generative Sciences of Matter and Motion, which unifies Simulogenics, Physiogenics, and Materiogenics as physics-grounded foundation models for inference, prediction, and design.

What carries the argument

The regime distinction between visible kinematics (where strain and motion directly encode physics) and hidden internal state variables (where visual output can be plausible yet inadmissible), enforced by coupling video data to experiments and simulations.

If this is right

Generative models become usable for quantitative inference of strains and dynamics directly from video when the physics is kinematically visible.
In systems dominated by internal variables, additional mechanistic supervision is required to restore physical admissibility.
The three proposed subfields (Simulogenics, Physiogenics, Materiogenics) provide a common framework for turning visual generation into a scientific instrument.
Visual generation can support design loops once it is constrained to produce only admissible physical states.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same visibility criterion may apply to other time-evolving physical systems where video is abundant but internal fields are not directly observed.
A practical test would be to replace the current deformation examples with a new class of motion, such as fluid free-surface flow, and measure whether the success threshold still tracks kinematic visibility.
If the regime distinction holds, it supplies a diagnostic for deciding when purely data-driven video models can be trusted versus when hybrid physics-informed architectures are mandatory.

Load-bearing premise

That coupling visual data with experiments and high-fidelity simulations will make generative outputs physically admissible enough to support scientific inference, prediction, and design.

What would settle it

Training a generative model only on video of cardiac motion and then checking whether its predicted surface strains match independent experimental measurements or high-fidelity finite-element results within measurement error.

read the original abstract

Can we learn the physics of matter in motion directly from images and video--and trust it? Answering this question requires integrating experiments, physics-based simulation, and data across traditionally separate disciplines. Much of this knowledge is visual and temporal rather than textual: images and videos encode structure, dynamics, and causality that equations alone cannot fully capture. Recent generative models produce compelling visual content, yet they rely on observational data and often lack physical validity. Here we show that generative video models gain scientific value when they couple visual data with experiments and high-fidelity simulations. Using deformation mechanics as a testbed, we study three systems of increasing complexity--rubber compression, can crushing, and cardiac motion--and identify regimes in which visual learning succeeds, fails, and requires mechanistic supervision. When physics manifests in visible kinematics, generative models recover measurable quantities such as surface strain; when internal state variables dominate, visual plausibility no longer ensures physical admissibility. We propose that this convergence defines a new frontier, the Generative Sciences of Matter and Motion, which unifies Simulogenics, Physiogenics, and Materiogenics. These physics-grounded foundation models can turn visual generation into a scientific instrument for inference, prediction, and design of matter in motion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Generative video models can recover visible surface quantities like strain in mechanics examples but lose physical admissibility when internal states dominate, and the paper uses three cases to flag where coupling to simulations and experiments helps.

read the letter

This paper's main takeaway is that generative models pick up kinematics from video in deformation problems when the physics shows on the surface, recovering things like strain, but they produce plausible yet inadmissible results once internal variables take over. The three systems—rubber compression, can crushing, and cardiac motion—serve as concrete illustrations of that split and the need for mechanistic supervision to keep outputs usable for science.

Referee Report

3 major / 2 minor

Summary. The paper claims that generative video models acquire scientific utility for the physics of matter in motion when visual data are coupled to experiments and high-fidelity simulations. Using three deformation systems of increasing complexity (rubber compression, can crushing, cardiac motion) as testbeds, it identifies regimes in which visible kinematics permit recovery of measurable quantities such as surface strain, while internal-state-dominated regimes render visual plausibility insufficient to guarantee physical admissibility. The manuscript proposes that this convergence defines a new interdisciplinary frontier—the Generative Sciences of Matter and Motion—unifying Simulogenics, Physiogenics, and Materiogenics as physics-grounded foundation models for inference, prediction, and design.

Significance. If the case studies demonstrate reliable recovery of physical quantities and a reproducible distinction between admissible and merely plausible outputs, the work could establish a practical framework for trustworthy generative models in mechanics and biomechanics. The explicit grounding in both experimental data and high-fidelity simulations is a clear strength, as is the regime-based analysis that supplies falsifiable criteria for when visual learning suffices for scientific use. These elements could influence the development of foundation models that support design tasks in material science and cardiac mechanics.

major comments (3)

[Abstract] Abstract: the central claim that 'generative models recover measurable quantities such as surface strain' when physics manifests in visible kinematics is load-bearing for the regime distinction, yet the abstract supplies no quantitative metrics, error analysis, or comparison to ground-truth strain fields from the cited experiments or simulations.
[Abstract] Abstract / proposal section: the unification claim rests on the newly introduced terms Simulogenics, Physiogenics, and Materiogenics, but these are presented without explicit definitions, scope boundaries, or differentiation from existing physics-informed generative modeling approaches, rendering the 'new frontier' assertion difficult to evaluate.
[Cardiac motion case study] Cardiac motion case study: the assertion that internal state variables cause visual plausibility to fail to ensure admissibility is central to the failure-regime identification, yet no concrete failure examples, comparison against high-fidelity simulation outputs, or quantitative admissibility metric is referenced.

minor comments (2)

[Abstract] The long sentence beginning 'Here we show that generative video models gain scientific value...' could be split for readability.
[Throughout] Consider adding a short table or diagram that maps the three systems to the success/failure regimes and the required level of mechanistic supervision.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive review and for identifying specific areas where the manuscript can be clarified and strengthened. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'generative models recover measurable quantities such as surface strain' when physics manifests in visible kinematics is load-bearing for the regime distinction, yet the abstract supplies no quantitative metrics, error analysis, or comparison to ground-truth strain fields from the cited experiments or simulations.

Authors: We agree that the abstract should include quantitative support for this claim to make the regime distinction immediately evaluable. In the revised version we will incorporate specific metrics from the rubber compression and can crushing studies, including mean absolute error in surface strain recovery relative to digital image correlation measurements and finite-element ground truth, along with brief error analysis and direct comparisons. revision: yes
Referee: [Abstract] Abstract / proposal section: the unification claim rests on the newly introduced terms Simulogenics, Physiogenics, and Materiogenics, but these are presented without explicit definitions, scope boundaries, or differentiation from existing physics-informed generative modeling approaches, rendering the 'new frontier' assertion difficult to evaluate.

Authors: We accept that the new terminology requires explicit grounding. We will add concise definitions and scope statements for each term in the proposal section and will differentiate them from prior physics-informed generative methods (e.g., those enforcing PDE residuals or conservation laws) to clarify the distinct contribution of coupling visual generation with experimental and simulation data. revision: yes
Referee: [Cardiac motion case study] Cardiac motion case study: the assertion that internal state variables cause visual plausibility to fail to ensure admissibility is central to the failure-regime identification, yet no concrete failure examples, comparison against high-fidelity simulation outputs, or quantitative admissibility metric is referenced.

Authors: This observation is correct and points to a needed strengthening of the cardiac case. We will revise the section to present concrete examples of visually plausible yet inadmissible outputs (e.g., violations of myocardial incompressibility), direct side-by-side comparisons with high-fidelity electromechanical simulation results, and quantitative admissibility metrics such as divergence error norms and strain-energy deviations to substantiate the failure regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's derivation rests on empirical case studies of three deformation systems (rubber compression, can crushing, cardiac motion) that distinguish visible-kinematics regimes from internal-state-dominated regimes. These distinctions are drawn directly from coupling visual data to experiments and high-fidelity simulations as external ground truth, without any reduction of outputs to inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The proposal of unifying terms (Simulogenics, Physiogenics, Materiogenics) is a naming convention for the suggested frontier rather than a self-definitional loop in which a claimed result is presupposed by its own definition. No equations or uniqueness theorems are invoked that collapse the central claim into the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces new unifying terminology and assumes that visible kinematics plus simulation coupling suffice for physical validity without providing the supporting derivation or data.

axioms (1)

domain assumption Visual data from images and video encodes sufficient kinematic information to recover physical quantities when internal state variables do not dominate.
Invoked to distinguish success and failure regimes in the three test systems.

invented entities (1)

Generative Sciences of Matter and Motion no independent evidence
purpose: Unifying framework encompassing Simulogenics, Physiogenics, and Materiogenics
New umbrella term proposed to describe the convergence of generative models with physics.

pith-pipeline@v0.9.0 · 5521 in / 1282 out tokens · 39022 ms · 2026-05-10T07:28:04.197248+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 4 canonical work pages · 2 internal anchors

[1]

F., 1996

Ashby, M. F., 1996. Materials Selection in Mechan- ical Design. Butterworth-Heinemann

1996
[2]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

M. Assran, A. Bardes, D. Fan, Q. Garrido, R. Howes, M. Komeili, M. Muckley, A. Rizvi, C. Roberts, K. Sinha, A. Zholus, S. Arnaud, A. Gejji, A. Martin, F. R. Hogan, D. Dugas, P. Bojanowski, V. Khalidov, P. Labatut, F. Massa, M. Szafraniec, K. Krishnakumar, Y. Li, X. Ma, S. Chandar, F. Meier, Y. LeCun, M. Rabbat, N. Ballas, V-JEPA 2: Self-Supervised Video M...

work page internal anchor Pith review arXiv 2025
[3]

The Living Heart Project: A robust and integrative simulator for human heart function

Baillargeon, B., Rebelo, N., Fox, D.D., Taylor, R.L., Kuhl, E., 2014. The Living Heart Project: A robust and integrative simulator for human heart function. European Journal of Mechanics A/Solids. 48, 38-47

2014
[4]

P., Sigmund, O., 2003

Bendsøe, M. P., Sigmund, O., 2003. Topology Optimization: Theory, Methods, and Applications. Springer

2003
[5]

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert- Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford,...

2020
[6]

A., 1785

Coulomb, C. A., 1785. Th´ eorie des machines sim- ples. M´ emoires de l’Acad´ emie Royale des Sciences
[7]

P., 1948

Feynman, R. P., 1948. Space-time approach to non-relativistic quantum mechanics. Reviews of Modern Physics, 20, 367–387

1948
[8]

J., 1979

Gibson, J. J., 1979. The Ecological Approach to Visual Perception. Houghton Mifflin

1979
[9]

Generative adversarial nets

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adversarial nets. Advances in Neural Information Processing Sys- tems (NeurIPS), 27

2014
[10]

Digital Twin: Manufacturing Excellence through Virtual Factory Replication

Grieves, M., 2014. Digital Twin: Manufacturing Excellence through Virtual Factory Replication. White paper, Florida Institute of Technology

2014
[11]

Recurrent world models facilitate policy evolution

Ha, D., Schmidhuber, J., 2018. Recurrent world models facilitate policy evolution. Advances in Neural Information Processing Systems (NeurIPS), 31

2018
[12]

The Hamiltonian formula- tion of classical mechanics

Hestenes, D., 1973. The Hamiltonian formula- tion of classical mechanics. American Journal of Physics, 41, 905–914

1973
[13]

PYVALE: A Fast, Scalable, Open-Source 2D Digital Image Correlation (DIC) Engine Capable of Handling Gigapixel Images

Hirst, J., Sibson, L., Tayeb, A., Poole, B., Samp- son, M., Bielajewa, W., Atkinson, M., Marsh, A., Spencer, R., Hamill, R., Hamelin, C., Harte, A., Fletcher, L., 2026. PYVALE: A Fast, Scalable, Open-Source 2D Digital Image Correlation (DIC) Engine Capable of Handling Gigapixel Images. arXiv preprint arXiv:2601.12941

work page arXiv 2026
[14]

Denoising dif- fusion probabilistic models

Ho, J., Jain, A., Abbeel, P., 2020. Denoising dif- fusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS), 33

2020
[15]

Video Diffusion Models

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Dhariwal, P., Chen, M., Sutskever, I., 2022. Video diffusion models. arXiv preprint arXiv:2204.03458

work page internal anchor Pith review arXiv 2022
[16]

Lectures de Potentia Restitutiva

Hooke, R., 1678. Lectures de Potentia Restitutiva. Royal Society, London
[17]

B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A

Kovachki, N. B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A. M., Anandkumar, A., 2023. Neural operator: Learning maps between function spaces. Journal of Machine Learning Research, 24(89), 1–97

2023
[18]

N., 1941

Kolmogorov, A. N., 1941. The local structure of turbulence in incompressible viscous fluid. Doklady Akademii Nauk SSSR, 30, 301–305

1941
[19]

Deep learning

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature, 521, 436–444

2015
[20]

S. Liu, Z. Ren, S. Gupta, S. Wang, PhysGen: Rigid- Body Physics-Grounded Image-to-Video Genera- tion, arXiv preprint arXiv:2409.18964, 2024

work page arXiv 2024
[21]

Marr, D., 1982. Vision. MIT Press

1982
[22]

A proposal for the Dartmouth summer research project on artificial intelligence

McCarthy, J., Minsky, M., Rochester, N., Shan- non, C., 1955. A proposal for the Dartmouth summer research project on artificial intelligence. Dartmouth College

1955
[23]

Moor, M., Banerjee, O., Abad, Z. S. H., Krumholz, H. M., Leskovec, J., Topol, E. J., Rajpurkar, P.,
[24]

Nature, 616, 259–265

Foundation models for generalist medical artificial intelligence. Nature, 616, 259–265
[25]

E., 1965

Moore, G. E., 1965. Cramming more components onto integrated circuits. Electronics, 38, 114–117

1965
[26]

Navier, C. L. M. H., 1822. M´ emoire sur les lois du mouvement des fluides. M´ emoires de l’Acad´ emie Royale des Sciences de l’Institut de France
[27]

Philosophiæ Naturalis Principia Mathematica

Newton, I., 1687. Philosophiæ Naturalis Principia Mathematica. Royal Society, London
[28]

Peirlinck, M., Sahli Costabal, F., Yao, J., Guc- cione, J.M., Tripathy, S., Wang, Y., Ozturk, D., Segars, P., Morrison, T.M., Levine, S., Kuhl, E,
[29]

Perspectives, challenges and opportunities

Precision medicine in human heart mod- eling. Perspectives, challenges and opportunities. Biomechanics and Modeling in Mechanobiology, 20, 803-831
[30]

W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I., 2021

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I., 2021. Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learn- ing (ICML), 139, 8748–8763

2021
[31]

Saleh, M., Luzin, V., Toppler, K., Kabir, K.,
[32]

Composites Part B: Engineering, 78, 415–430

Response of thin-skinned sandwich pan- els to contact loading with flat-ended cylindri- cal punches: Experiments, numerical simulations and neutron diffraction measurements. Composites Part B: Engineering, 78, 415–430
[33]

What is Life? Cambridge University Press

Schr¨ odinger, E., 1944. What is Life? Cambridge University Press

1944
[34]

E., 1948

Shannon, C. E., 1948. A mathematical theory of communication. Bell System Technical Journal, 27, 379–423

1948
[35]

Deep unsupervised learning using nonequilibrium ther- modynamics

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S., Sohl-Dickstein, J., 2015. Deep unsupervised learning using nonequilibrium ther- modynamics. Proceedings of the 32nd Interna- tional Conference on Machine Learning (ICML)

2015
[36]

Hyperelastic models for rubber-like materials: consistent tangent operators and suitability for Treloar’s data

Steinmann, P., Hossain, M., Possart, G., 2012. Hyperelastic models for rubber-like materials: consistent tangent operators and suitability for Treloar’s data. Archive of Applied Mechanics, 82, 1183–1217

2012
[37]

G., 1845

Stokes, G. G., 1845. On the theories of the internal friction of fluids. Transactions of the Cambridge Philosophical Society
[38]

Treloar, L. R. G., 1944. Stress–strain data for vul- canised rubber under various types of deformation. Transactions of the Faraday Society, 40, 59–70

1944
[39]

M., 1950

Turing, A. M., 1950. Computing machinery and intelligence. Mind, 59, 433–460

1950
[40]

N., Kaiser, L., Polosukhin, I.,

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I.,
[41]

Advances in Neural Information Processing Systems (NeurIPS), 30

Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30
[42]

First draft of a report on the EDVAC

von Neumann, J., 1945. First draft of a report on the EDVAC. University of Pennsylvania

1945
[43]

C., 1977

Zienkiewicz, O. C., 1977. The Finite Element Method. McGraw-Hill. 9 Appendix Figure 5 provides additional details about the region of interest for digital image correlation and the dimensions for the finite element simulation of the compression of a rubber block and Table 1 sum- marizes details about the finite element simulation. Figure 6 defines the she...

1977
[44]

1.75 Lab FEM Fig. 7Exploratory Study II: Structural collapse of crushed can.The reaction force–time curves from both experiment (blue) and simulation (orange) exhibit the characteristic instability-driven force drops and oscillations during progres- sive buckling and fold formation. 10 T able 1Exploratory Study I: Material model, loading and numerical par...

2082