Recognition: unknown
PhysInOne: Visual Physics Learning and Reasoning in One Suite
Pith reviewed 2026-05-10 16:36 UTC · model grok-4.3
The pith
PhysInOne supplies 2 million annotated videos of 153810 scenes covering 71 physical phenomena to train AI world models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PhysInOne provides 2 million videos across 153810 dynamic 3D scenes covering 71 basic physical phenomena in mechanics, optics, fluid dynamics, and magnetism, with comprehensive ground-truth annotations including 3D geometry, semantics, dynamic motion, physical properties, and text descriptions. Fine-tuning foundation models on PhysInOne significantly enhances physical plausibility in physics-aware video generation, long- and short-term future frame prediction, physical property estimation, and motion transfer, while exposing critical gaps in modeling complex physical dynamics and estimating intrinsic properties.
What carries the argument
The PhysInOne synthetic dataset, consisting of multi-object 3D scenes rendered as videos with dense physical ground-truth labels.
If this is right
- Fine-tuned models generate videos with greater adherence to physical laws than models trained on smaller datasets.
- Long- and short-term future frame prediction improves in accuracy for multi-object interactions.
- Physical property estimation tasks, such as inferring mass or elasticity, become more reliable.
- Motion transfer between objects succeeds more often while preserving physical constraints.
- The dataset serves as a new benchmark scale for evaluating physics-grounded generation and simulation models.
Where Pith is reading between the lines
- The exposed gaps suggest that dataset scale alone may not suffice and could motivate hybrid training with real captured data or explicit physics modules.
- Success in simulation could speed development of embodied AI agents that plan actions using learned physical priors before real-world deployment.
- The annotation richness might support new self-supervised objectives that combine vision with language descriptions of physical rules.
- Extending the same generation pipeline to additional phenomena or higher-fidelity rendering could test whether current limits are data-size or representation issues.
Load-bearing premise
Training on these simulated scenes will produce AI improvements that generalize to real-world physical reasoning without major mismatches from unstated simulation artifacts.
What would settle it
Measuring whether models fine-tuned on PhysInOne achieve measurably higher physical plausibility scores than baselines when tested on real-world videos of the same 71 phenomena.
Figures
read the original abstract
We present PhysInOne, a large-scale synthetic dataset addressing the critical scarcity of physically-grounded training data for AI systems. Unlike existing datasets limited to merely hundreds or thousands of examples, PhysInOne provides 2 million videos across 153,810 dynamic 3D scenes, covering 71 basic physical phenomena in mechanics, optics, fluid dynamics, and magnetism. Distinct from previous works, our scenes feature multiobject interactions against complex backgrounds, with comprehensive ground-truth annotations including 3D geometry, semantics, dynamic motion, physical properties, and text descriptions. We demonstrate PhysInOne's efficacy across four emerging applications: physics-aware video generation, long-/short-term future frame prediction, physical property estimation, and motion transfer. Experiments show that fine-tuning foundation models on PhysInOne significantly enhances physical plausibility, while also exposing critical gaps in modeling complex physical dynamics and estimating intrinsic properties. As the largest dataset of its kind, orders of magnitude beyond prior works, PhysInOne establishes a new benchmark for advancing physics-grounded world models in generation, simulation, and embodied AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PhysInOne, a synthetic dataset comprising 2 million videos from 153,810 dynamic 3D scenes that cover 71 physical phenomena across mechanics, optics, fluid dynamics, and magnetism. Scenes include multi-object interactions with complex backgrounds and rich ground-truth annotations (3D geometry, semantics, motion, physical properties, text). The authors evaluate the dataset on four tasks—physics-aware video generation, short-/long-term future prediction, physical property estimation, and motion transfer—claiming that fine-tuning foundation models on PhysInOne yields significant gains in physical plausibility while revealing gaps in complex dynamics and intrinsic-property estimation.
Significance. If the simulator faithfully reproduces the targeted phenomena and the reported gains transfer beyond the synthetic distribution, PhysInOne would constitute a substantial resource: its scale (orders of magnitude larger than prior physics datasets) and breadth of annotated phenomena could accelerate development of physics-grounded world models for generation, simulation, and embodied AI. The comprehensive annotation suite and multi-application evaluation are strengths.
major comments (3)
- [Experiments] Experiments section: the abstract and results claim that fine-tuning on PhysInOne 'significantly enhances physical plausibility' and 'exposes critical gaps,' yet no quantitative metrics, baseline comparisons, error bars, or statistical tests are provided to support these statements. This absence makes it impossible to assess the magnitude or reliability of the claimed improvements.
- [Dataset] Dataset construction / simulator description: the central claim that the 153,810 scenes provide faithful ground truth for 71 phenomena rests on the unverified assumption that the underlying physics engine reproduces the targeted dynamics without systematic artifacts. No quantitative validation against analytical solutions, closed-form expressions, or real-world footage is reported for any subset of the phenomena.
- [Experiments] Evaluation protocol: all four application experiments appear to be conducted entirely within the synthetic distribution. The absence of any held-out real-world test set or cross-domain transfer experiment leaves the generalization claim—that improvements will benefit real-world physical reasoning—untested and therefore load-bearing for the paper's broader impact argument.
minor comments (2)
- [Dataset] Clarify the exact procedure used to generate the 2 million videos from the 153,810 scenes (e.g., number of trajectories per scene, camera sampling strategy) so that the dataset scale can be reproduced.
- [Introduction] The abstract states the dataset is 'orders of magnitude beyond prior works'; a concise table comparing scene count, video count, and phenomenon coverage against the most relevant existing datasets would strengthen this claim.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. We address each major comment point by point below, with revisions to the manuscript where the concerns are valid.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the abstract and results claim that fine-tuning on PhysInOne 'significantly enhances physical plausibility' and 'exposes critical gaps,' yet no quantitative metrics, baseline comparisons, error bars, or statistical tests are provided to support these statements. This absence makes it impossible to assess the magnitude or reliability of the claimed improvements.
Authors: We agree that the original submission insufficiently quantified the claimed improvements. The revised manuscript now includes a substantially expanded Experiments section with new tables reporting concrete metrics for each of the four tasks (e.g., FVD and physical-plausibility scores for video generation; MSE and long-term prediction accuracy; property-estimation error rates; motion-transfer success rates). All results include baseline comparisons (models trained without PhysInOne or on prior smaller physics datasets), error bars computed over five independent runs with different random seeds, and statistical significance tests (paired t-tests with p-values). These additions directly support the statements in the abstract and allow readers to evaluate the magnitude and reliability of the gains. revision: yes
-
Referee: [Dataset] Dataset construction / simulator description: the central claim that the 153,810 scenes provide faithful ground truth for 71 phenomena rests on the unverified assumption that the underlying physics engine reproduces the targeted dynamics without systematic artifacts. No quantitative validation against analytical solutions, closed-form expressions, or real-world footage is reported for any subset of the phenomena.
Authors: We acknowledge that explicit quantitative validation of the simulator was missing. In the revised Dataset section we have added a dedicated 'Simulator Fidelity Validation' subsection. For a representative subset of phenomena we now report: (i) trajectory and collision errors versus closed-form analytical solutions for rigid-body mechanics (mean position error <4% across 500 test cases); (ii) ray-tracing accuracy against Snell's law and reflection formulas for optics; and (iii) qualitative side-by-side comparisons with real-world footage for selected fluid and magnetic interactions, accompanied by per-frame annotation consistency checks. We also explicitly discuss known limitations of the engine for highly chaotic or multi-scale phenomena. revision: yes
-
Referee: [Experiments] Evaluation protocol: all four application experiments appear to be conducted entirely within the synthetic distribution. The absence of any held-out real-world test set or cross-domain transfer experiment leaves the generalization claim—that improvements will benefit real-world physical reasoning—untested and therefore load-bearing for the paper's broader impact argument.
Authors: We agree that all reported experiments remain within the synthetic distribution. The revised manuscript now contains a new 'Limitations and Broader Impact' section that explicitly states the synthetic scope of the evaluations and tempers generalization claims. We discuss the sim-to-real gap (lighting, texture, sensor noise) and outline why full real-world transfer experiments lie beyond the present scope. While we cannot add comprehensive real-world test sets in this revision, we have included a small-scale qualitative transfer illustration on publicly available real physics videos to illustrate the direction of future work. revision: partial
Circularity Check
No circularity; dataset creation and empirical evaluation are independent of inputs
full rationale
The paper introduces PhysInOne as a new synthetic dataset with specified scale, coverage of 71 phenomena, and annotations, then reports experimental outcomes from fine-tuning models on it for generation, prediction, estimation, and transfer tasks. No derivation chain, equations, or first-principles claims exist that could reduce to fitted parameters, self-definitions, or self-citations. All load-bearing statements concern the dataset's construction and measured performance deltas, which are presented as empirical observations rather than tautological outputs. Any self-citations serve only as background and do not underpin uniqueness theorems or ansatzes.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic 3D scenes generated with physical rules can serve as effective proxies for real-world physical phenomena in AI training
Reference graph
Works this paper leans on
-
[1]
MAGI-1: Autoregressive Video Generation at Scale
Sand AI. MAGI-1: Autoregressive Video Generation at Scale. arXiv:2505.13211, 2025. 2, 6, 7, 15
work page internal anchor Pith review arXiv 2025
-
[2]
Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, and Deniz Yuret
Tayfun Ates, M. Samil Atesoglu, Cagatay Yigit, Ilker Kesen, Mert Kobas, Erkut Erdem, Aykut Erdem, Tilbe Goksun, and Deniz Yuret. CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions. ACL Findings, 2022. 3
2022
- [3]
-
[4]
PHYRE: A New Benchmark for Physical Reasoning
Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, and Ross Girshick. PHYRE: A New Benchmark for Physical Reasoning. NeurIPS, 2019. 3
2019
-
[5]
Krish- nan
Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong, Amir Hosein Khasahmadi, and Rahul G. Krish- nan. Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models. ICCV,
-
[6]
VideoPhy: Evaluating Phys- ical Commonsense for Video Generation
Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai- Wei Chang, and Aditya Grover. VideoPhy: Evaluating Phys- ical Commonsense for Video Generation. ICLR, 2025. 3, 5
2025
-
[7]
Hritik Bansal, Clark Peng, Yonatan Bitton, Roman Golden- berg, Aditya Grover, and Kai-Wei Chang. VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evalua- tion in Video Generation. arXiv:2503.06800, 2025. 3
-
[8]
CoPhy: Counterfactual Learning of Physical Dynamics
Fabien Baradel, Natalia Neverova, Julien Mille, Greg Mori, and Christian Wolf. CoPhy: Counterfactual Learning of Physical Dynamics. ICLR, 2022. 3
2022
-
[9]
Bear, Elias Wang, Damian Mrowca, Felix Binder, Hsiao Yu Fish Tung, R
Daniel M. Bear, Elias Wang, Damian Mrowca, Felix Binder, Hsiao Yu Fish Tung, R. T. Pramod, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan Yun Sun, Li Fei-Fei, Nancy Kanwisher, Joshua B. Tenenbaum, Daniel L.K. Yamins, and Judith Fan. Physion: Evaluating Physical Prediction from Vision in Humans and Machines. NeurIPS, 2021. 3
2021
-
[10]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jam- pani, and Robin Rombach. Stable Video Diffusion: Scal- ing Latent Video Diffusion Models to Large Datasets. arXiv:2311.15127, 2023. 2, 5, 6, 12
work page internal anchor Pith review arXiv 2023
-
[11]
https://www.blenderkit.com/
BlenderKit. https://www.blenderkit.com/. 4
-
[12]
Intphys 2: Benchmarking intuitive physics understanding in complex synthetic environments
Florian Bordes, Quentin Garrido, Justine T Kao, Adina Williams, Michael Rabbat, and Emmanuel Dupoux. Int- Phys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments. arXiv:2506.09849, 2025. 3
-
[13]
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingx- iao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, and Ning Yu. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise. CVPR,
-
[14]
Gaussian- Informed Continuum for Physical Property Identification and Simulation
Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, and Qifeng Chen. Gaussian- Informed Continuum for Physical Property Identification and Simulation. NeurIPS, 2024. 2, 3, 7, 8, 15, 16
2024
-
[15]
Sophy: Learning to generate simulation-ready objects with physical materials,
Junyi Cao and Evangelos Kalogerakis. SOPHY: Learning to Generate Simulation-Ready Objects with Physical Materials. arXiv:2504.12684, 2025. 3
-
[16]
PhysX- 3D: Physical-Grounded 3D Asset Generation
Ziang Cao, Zhaoxi Chen, Liang Pan, and Ziwei Liu. PhysX- 3D: Physical-Grounded 3D Asset Generation. NeurIPS,
-
[17]
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Eka- terina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, and Sergey Tulyakov. Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers. CVPR, 2024. 2
2024
-
[18]
Tenenbaum, and Chuang Gan
Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, An- tonio Torralba, Joshua B. Tenenbaum, and Chuang Gan. ComPhy: Compostional Physical Reasoning of Objects and Events from Videos. ICLR, 2022. 3
2022
-
[19]
Anoop Cherian, Radu Corcodel, Siddarth Jain, and Diego Romeres. LLMPhy: Complex Physical Reason- ing Using Large Language Models and World Models. arXiv:2411.08027, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Un- derstanding
Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, and Yue Wang. PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Un- derstanding. ICLR, 2025. 3, 6
2025
-
[21]
https://commoncrawl.org/the-data/
Common Crawl. https://commoncrawl.org/the-data/. 2
-
[22]
V oMP: Predicting V olumet- ric Mechanical Property Fields
Rishit Dagli, Donglai Xiang, Vismay Modi, Charles Loop, Clement Fuji, Anka He, Chen Anita, Hu Gavriel, State David, and Maria Shugrina. V oMP: Predicting V olumet- ric Mechanical Property Fields. arXiv:2510.22975, 2025. 3
-
[23]
Objaverse-XL: A Universe of 10M+ 3D Objects
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, Eli VanderBilt, Anirud- dha Kembhavi, Carl V ondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. Objaverse-XL: A Universe of 10M+ 3D Objects. NeurIPS, 2023. 2
2023
-
[24]
Objaverse: A Universe of Annotated 3D Objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A Universe of Annotated 3D Objects. CVPR, 2023. 2
2023
-
[25]
ImageNet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. CVPR, 2009. 2
2009
-
[26]
Understanding World or Predicting Future? A Comprehensive Survey of World Models
Jingtao Ding, Yunke Zhang, Yu Shang, Yuheng Zhang, Ze- fang Zong, Jie Feng, Yuan Yuan, Hongyuan Su, Nian Li, Nicholas Sukiennik, Fengli Xu, and Yong Li. Understanding World or Predicting Future? A Comprehensive Survey of World Models. ACM Computing Surveys, 2025. 2
2025
-
[27]
https://www.doriflow.com/
Doriflow. https://www.doriflow.com/. 5, 7
-
[28]
PIP: Physical Interaction Prediction via Mental Simulation with Span Selection
Jiafei Duan, Samson Yu, Soujanya Poria, Bihan Wen, and Cheston Tan. PIP: Physical Interaction Prediction via Mental Simulation with Span Selection. ECCV, 2022. 3
2022
-
[29]
ScalarFlow: A large-scale volumetric data set of real-world scalar transport flows for computer animation and machine learning
Marie Lena Eckert, Kiwon Um, and Nils Thuerey. ScalarFlow: A large-scale volumetric data set of real-world scalar transport flows for computer animation and machine learning. TOG, 2019. 3
2019
-
[30]
https://www.fab.com/
FAB. https://www.fab.com/. 4, 3
-
[31]
Fast Dynamic Radiance Fields with Time-Aware Neural V oxels
Jiemin Fang, Xinggang Wang, and Matthias Nießner. Fast Dynamic Radiance Fields with Time-Aware Neural V oxels. SIGGRAPH Asia, 2022. 2, 6, 15
2022
-
[32]
Fatehi and M.T
R. Fatehi and M.T. Manzari. Error estimation in smoothed particle hydrodynamics and a new scheme for second deriva- tives. Computers & Mathematics with Applications, 2011. 5
2011
-
[33]
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
Hiroki Furuta, Heiga Zen, Dale Schuurmans, Aleksandra Faust, Yutaka Matsuo, Percy Liang, and Sherry Yang. Im- proving Dynamic Object Interactions in Text-to-Video Gen- eration with AI Feedback. arXiv:2412.02617, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Casta ˜neda Garcia, Jan Warchocki, Jan Van Gemert, Daan Brinks, and Nergis Tomen. Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems. CVPR, 2025. 2
2025
-
[35]
Smoothed particle hydro- dynamics: theory and application to non-spherical stars
RA Gingold and JJ Monaghan. Smoothed particle hydro- dynamics: theory and application to non-spherical stars. Monthly notices of the royal astronomical society, 1977. 4
1977
-
[36]
Fuchs, Ingmar Posner, and Andrea Vedaldi
Oliver Groth, Fabian B. Fuchs, Ingmar Posner, and Andrea Vedaldi. ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking. ECCV, 2018. 3
2018
-
[37]
Jing Gu, Xian Liu, Yu Zeng, Ashwin Nagarajan, Fangrui Zhu, Daniel Hong, Yue Fan, Qianqi Yan, Kaiwen Zhou, Ming-Yu Liu, and Xin Eric Wang. PhyWorldBench: A Comprehensive Evaluation of Physical Realism in Text-to- Video Models. arXiv:2507.13428, 2025. 3
-
[38]
NeuroFluid: Fluid Dynamics Grounding with Particle- Driven Neural Radiance Fields
Shanyan Guan, Huayu Deng, Yunbo Wang, and Xiaokang Yang. NeuroFluid: Fluid Dynamics Grounding with Particle- Driven Neural Radiance Fields. ICML, 2022. 3
2022
-
[39]
Funda- mentals of Physics, Extended, 12th Edition
David Halliday, Robert Resnick, and Jearl Walker. Funda- mentals of Physics, Extended, 12th Edition. 2021. 2
2021
-
[40]
VIDEOSCORE: Build- ing Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bo- han Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Bill Yuchen Lin, and Wenhu Chen. VIDEOSCORE: Build- ing Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation. ...
2024
-
[41]
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard and Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. ACL, 2018. 5
2018
-
[42]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685, 2021. 5
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[43]
A Moving Least Squares Material Point Method with Displacement Discon- tinuity and Two-Way Rigid Body Coupling
Yuanming Hu, Yu Fang, Ziheng Ge, Ziyin Qu, Yixin Zhu, Andre Pradhana, and Chenfanfu Jiang. A Moving Least Squares Material Point Method with Displacement Discon- tinuity and Two-Way Rigid Body Coupling. SIGGRAPH,
-
[44]
GRASP: A Novel Benchmark for Evaluating Language GRounding and Situ- ated Physics Understanding in Multimodal Language Mod- els
Serwan Jassim, Mario Holubar, Annika Richter, Cornelius Wolff, Xenia Ohmer, and Elia Bruni. GRASP: A Novel Benchmark for Evaluating Language GRounding and Situ- ated Physics Understanding in Multimodal Language Mod- els. IJCAI, 2024. 3
2024
-
[45]
PhysTwin: Physics- Informed Reconstruction and Simulation of Deformable Objects from Videos
Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. PhysTwin: Physics- Informed Reconstruction and Simulation of Deformable Objects from Videos. ICCV, 2025. 3
2025
-
[46]
Improving Physics-Augmented Contin- uum Neural Radiance Field-Based Geometry-Agnostic Sys- tem Identification with Lagrangian Particle Optimization
Takuhiro Kaneko. Improving Physics-Augmented Contin- uum Neural Radiance Field-Based Geometry-Agnostic Sys- tem Identification with Lagrangian Particle Optimization. CVPR, 2024. 2
2024
-
[47]
How Far is Video Generation from World Model: A Physical Law Per- spective
Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How Far is Video Generation from World Model: A Physical Law Per- spective. ICML, 2025. 2, 5
2025
-
[48]
Pixie: Fast and generalizable supervised learning of 3d physics from pixels,
Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, and Lingjie Liu. Pixie: Fast and Gen- eralizable Supervised Learning of 3D Physics from Pixels. arXiv:2508.17437, 2025. 3
-
[49]
What about gravity in video generation? Post-Training Newton’s Laws with Verifiable Rewards
Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, and Dim- itris Samaras. What about gravity in video generation? Post-Training Newton’s Laws with Verifiable Rewards. arXiv:2512.00425, 2025. 3
-
[50]
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watch- ing Stuff Drop
Chenyu Li, Oscar Michel, Xichen Pan, Sainan Liu, Mike Roberts, and Saining Xie. PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watch- ing Stuff Drop. ICML, 2025. 3
2025
-
[51]
Worldmodelbench: Judging video generation models as world models
Dacheng Li, Yunhao Fang, Yukang Chen, Shuo Yang, Shiyi Cao, Justin Wong, Michael Luo, Xiaolong Wang, Hongxu Yin, Joseph E Gonzalez, Ion Stoica, Song Han, and Yao Lu. WorldModelBench: Judging Video Generation Models As World Models. arXiv:2502.20694, 2025. 3
-
[52]
NVFi: Neural Veloc- ity Fields for 3D Physics Learning from Dynamic Videos
Jinxi Li, Ziyang Song, and Bo Yang. NVFi: Neural Veloc- ity Fields for 3D Physics Learning from Dynamic Videos. NeurIPS, 2023. 2
2023
-
[53]
TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
Jinxi Li, Ziyang Song, and Bo Yang. TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos. ICCV, 2025. 2, 6, 15
2025
-
[54]
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Jinxi Li, Ziyang Song, Siyuan Zhou, and Bo Yang. FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity. CVPR, 2025. 2, 6, 7, 15
2025
-
[55]
PAC-NeRF: Physics Augmented Continuum Neural Radi- ance Fields for Geometry-Agnostic System Identification
Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Krishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, and Chuang Gan. PAC-NeRF: Physics Augmented Continuum Neural Radi- ance Fields for Geometry-Agnostic System Identification. ICLR, 2023. 2, 3, 7, 8, 15, 16
2023
-
[56]
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding? ICCV,
Yun Li, Yiming Zhang, Tao Lin, Xiangrui Liu, Wenxiao Cai, Zheng Liu, and Bo Zhao. STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding? ICCV,
-
[57]
Lawrence Zitnick
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. ECCV, 2014. 2
2014
-
[58]
Om- niPhysGS: 3D Constitutive Gaussians for General Physics- Based Dynamics Generation
Yuchen Lin, Chenguo Lin, Jianjin Xu, and Yadong Mu. Om- niPhysGS: 3D Constitutive Gaussians for General Physics- Based Dynamics Generation. ICLR, 2025. 2
2025
-
[59]
Physics3d: Learning physical properties of 3d gaussians via video diffusion
Fangfu Liu, Hanyang Wang, Shunyu Yao, and Shengjun Zhang. Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion. arXiv:2406.04338, 2024. 2
-
[60]
XPBD: position-based simulation of compliant constrained dynamics
Miles Macklin, Matthias Muller, and Nuttapong Chentanez. XPBD: position-based simulation of compliant constrained dynamics. MIG, 2016. 5
2016
-
[61]
Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yi- ran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, and Ping Luo. PhyBench: A Physical Com- monsense Benchmark for Evaluating Text-to-Image Models. arXiv:2406.11802, 2024. 3
-
[62]
Towards World Simulator: Crafting Physi- cal Commonsense-Based Benchmark for Video Generation
Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quan- feng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, and Ping Luo. Towards World Simulator: Crafting Physi- cal Commonsense-Based Benchmark for Video Generation. ICML, 2025. 2, 3, 6
2025
-
[63]
Do generative video models understand physical principles?arXiv preprint arXiv:2501.09038, 2025
Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, and Robert Geirhos. Do generative video models learn phys- ical principles from watching videos? arXiv:2501.09038,
-
[64]
OpenAI. GPT-4 Technical Report. arXiv:2303.08774, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[65]
Video generation models as world simulators
OpenAI. Video generation models as world simulators. 2024. 2
2024
-
[66]
Probing Physics Knowledge Using Tools from Developmental Psychology
Luis Piloto, Ari Weinstein, Dhruva TB, Arun Ahuja, Mehdi Mirza, Greg Wayne, David Amos, Chia-chun Hung, and Matt Botvinick. Probing Physics Knowledge Using Tools from Developmental Psychology. arXiv:1804.01128, 2018. 3
-
[67]
The 2017 DAVIS Challenge on Video Object Segmentation
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbel´aez, Alex Sorkine-Hornung, and Luc Van Gool. The 2017 DA VIS challenge on video object segmentation. arXiv:1704.00675, 2017. 8
work page internal anchor Pith review arXiv 2017
-
[68]
Generating Physically Stable and Buildable Brick Structures from Text
Ava Pun, Kangle Deng, Ruixuan Liu, Deva Ramanan, Changliu Liu, and Jun-Yan Zhu. Generating Physically Stable and Buildable Brick Structures from Text. ICCV,
-
[69]
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viorica Pˇatrˇaucean, Lucas Smaira, Ankush Gupta, Adri`a Re- casens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osin- dero, Dima Dam...
2023
-
[70]
ESPRIT: Explaining solutions to physical reasoning tasks.ACL, 2020
Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming Xiong, Richard Socher, and Dragomir Radev. ESPRIT: Explaining solutions to physical reasoning tasks.ACL, 2020. 3
2020
-
[71]
Image Based Reconstruction of Liquids from 2D Surface Detec- tions
Florian Richter, Ryan K Orosco, and Michael C Yip. Image Based Reconstruction of Liquids from 2D Surface Detec- tions. CVPR, 2022. 3
2022
-
[72]
IntPhys 2019: A Framework for Visual Intuitive Physics Understanding
Ronan Riochet, Mario Ynocente Castro, Mathieu Bernard, Adam Lerer, Rob Fergus, V´eronique Izard, and Emmanuel Dupoux. IntPhys 2019: A Framework for Visual Intuitive Physics Understanding. TPAMI, 2022. 3
2019
-
[73]
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5B: An open large-scale dataset for training next generation image-text model...
2022
-
[74]
Ful ´e, and Erik Blasch
Alireza Shamsoshoara, Fatemeh Afghah, Abolfazl Razi, Liming Zheng, Peter Z. Ful ´e, and Erik Blasch. Aerial im- agery pile burn detection using deep learning: The FLAME dataset. Computer Networks, 2021. 3
2021
-
[75]
Materialistic: Selecting Similar Materials in Images
Prafull Sharma, Julien Philip, Micha ¨el Gharbi, Bill Free- man, Fredo Durand, and Valentin Deschaintre. Materialistic: Selecting Similar Materials in Images. TOG, 2023. 3
2023
-
[76]
Phyx: Does your model have the” wits” for physical reasoning?arXiv preprint arXiv:2505.15929, 2025
Hui Shen, Taiqiang Wu, Qi Han, Yunta Hsieh, Jizhou Wang, Yuyue Zhang, Yuxin Cheng, Zijian Hao, Yuansheng Ni, Xin Wang, Zhongwei Wan, Kai Zhang, Wendong Xu, Jing Xiong, Ping Luo, Wenhu Chen, Chaofan Tao, Zhuoqing Mao, and Ngai Wong. PhyX: Does Your Model Have the ”Wits” for Physical Reasoning? arXiv:2505.15929, 2025. 3, 6
-
[77]
https://sketchfab.com/
Sketchfab. https://sketchfab.com/. 4, 3
-
[78]
Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Eliza- beth Spelke, Joshua B
Kevin A. Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Eliza- beth Spelke, Joshua B. Tenenbaum, and Tomer D. Ullman. Modeling expectation violation in intuitive physics with coarse probabilistic object representations. NeurIPS, 2019. 3
2019
-
[79]
https://www.taichi-lang.org/
Taichi Lang. https://www.taichi-lang.org/. 5
-
[80]
Tenenbaum, Daniel LK Yamins, Judith E Fan, and Kevin A
Hsiao-Yu Tung, Mingyu Ding, Zhenfang Chen, Daniel Bear, Chuang Gan, Joshua B. Tenenbaum, Daniel LK Yamins, Judith E Fan, and Kevin A. Smith. Physion++: Evaluating Physical Scene Understanding that Requires Online Infer- ence of Different Physical Properties. NeurIPS, 2023. 3
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.