RigPI: Dynamic Parameter Identification of Rigid Body via VLM-Seeded Differentiable Simulation

Rongrong Zhang; Wei Jiang; Wenqiang Xu; Xincheng He

arxiv: 2606.25212 · v2 · pith:74WRTWK2new · submitted 2026-06-23 · 💻 cs.RO

RigPI: Dynamic Parameter Identification of Rigid Body via VLM-Seeded Differentiable Simulation

Xincheng He , Rongrong Zhang , Wei Jiang , Wenqiang Xu This is my paper

Pith reviewed 2026-06-26 05:20 UTC · model grok-4.3

classification 💻 cs.RO

keywords rigid body dynamicsparameter identificationdifferentiable simulationvision-language modelsrobotic manipulationsystem identificationforce-torque sensing

0 comments

The pith

RigPI uses a vision-language model to seed and constrain a differentiable simulator for identifying inertial and frictional parameters of rigid bodies during robot interaction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RigPI to determine the physical properties of objects that robots manipulate from real interactions. It combines semantic guidance from a vision-language model with force-torque and motion data inside a differentiable physics engine. The two-stage process first uses the model to set a plausible starting point and search limits, then refines the values with simulation gradients. If the method works, the resulting parameters allow forward prediction of how the robot and object will behave together. This supports building digital twins that remain faithful when the robot performs new actions.

Core claim

RigPI integrates vision-based semantic priors, force-torque measurements, and motion observations within a differentiable simulation pipeline. A vision-language model supplies informed initialization and a constrained search space, while gradient information from the simulator enables efficient parameter refinement. The two-stage optimization strategy reduces sensitivity to noise and avoids physically implausible solutions. Real-world experiments on objects with revolute and prismatic joints show that the method produces accurate and stable estimates that reproduce manipulation trajectories on a physical robot.

What carries the argument

The two-stage optimization that first applies VLM-derived initialization and search-space constraints, then performs gradient-based refinement inside the differentiable simulator.

If this is right

Parameter estimates remain stable across repeated trials despite sensing noise.
Identified values allow a robot to reproduce observed manipulation trajectories when used in prediction.
The framework applies to both single rigid bodies and multi-link assemblies with revolute or prismatic joints.
Two-stage refinement prevents convergence to non-physical parameter sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same seeding approach could reduce the amount of physical interaction data needed when identifying parameters for new object classes.
Parameter sets obtained this way might support longer-horizon planning in manipulation tasks where model mismatch would otherwise accumulate.
Extending the differentiable simulator to include contact-rich or deformable elements would test whether the VLM seeding strategy generalizes beyond rigid bodies.

Load-bearing premise

The vision-language model must provide initialization and search-space limits accurate and unbiased enough that gradient descent converges to physically consistent values instead of noise-driven local minima.

What would settle it

Forward-simulate the identified parameters on held-out real-robot interaction sequences and measure trajectory error; sustained large mismatches between predicted and observed motion would show the estimates lack predictive validity.

Figures

Figures reproduced from arXiv: 2606.25212 by Rongrong Zhang, Wei Jiang, Wenqiang Xu, Xincheng He.

**Figure 2.** Figure 2: RigPI takes recorded robot interaction data as input, including forces, torques, and poses, and is initialized with a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of physical parameter identification using our RigPI system. Each subfigure shows the unconstrained or [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results of trajectory reproduction by [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrative example of convergence curves for dif [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Accurate physical parameter identification of manipulated objects is fundamental to advanced robotic manipulation and the construction of faithful digital twins. However, acquiring physically consistent inertial and frictional properties from real-world interactions remains challenging due to sensing noise, modeling errors, and limited prior knowledge. This paper presents RigPI, a systematic framework for identifying dynamic parameters of both unconstrained rigid bodies and multi-link rigid bodies during robot-object interaction. RigPI integrates vision-based semantic priors, force-torque measurements, and motion observations within a differentiable simulation pipeline. A vision-language model (VLM) provides informed initialization and a constrained search space, while gradient information from a differentiable physics simulator enables efficient and stable parameter refinement. The proposed two-stage optimization strategy alleviates sensitivity to noise and avoids physically implausible solutions. Extensive real-world experiments on objects with revolute and prismatic joints demonstrate that RigPI achieves accurate and stable parameter estimates, and successfully reproduces manipulation trajectories on a real robot with parameter-aware predictive validity. These results highlight the effectiveness and robustness of RigPI for real-world robotic system identification tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RigPI gives a workable two-stage VLM-plus-differentiable-sim pipeline for inertial and friction identification on real robot hardware, but the abstract supplies no numbers or baselines so the performance claims stay hard to judge.

read the letter

The main point is that RigPI seeds a differentiable physics simulator with VLM-derived priors to identify inertial and frictional parameters from force-torque and motion data during robot-object contact. The two-stage setup first uses the VLM for initialization and search-space limits, then refines via gradients to stay inside physically plausible bounds.

What stands out is the explicit handling of both free rigid bodies and multi-link systems with revolute or prismatic joints, plus the claim that the method reproduces manipulation trajectories on hardware. That integration of semantic vision priors with gradient-based refinement is a reasonable practical step for digital-twin work.

The soft spot is the lack of any reported error metrics, standard deviations, or comparisons against existing identification routines in the abstract. Without those, it is difficult to tell whether the optimization actually improves on simpler least-squares or Bayesian baselines or whether the VLM seeds are reliably good enough to avoid bad local minima. The stress-test note finds no internal contradictions, which is fair, but the data-to-claim link still needs the full tables and protocols to evaluate.

This is for robotics groups doing manipulation planning or building simulation twins who already have access to a differentiable simulator. A reader who wants a concrete pipeline description will find it useful; someone needing proven gains over prior art will want the numbers first.

It deserves peer review so the experimental details and comparisons can be checked properly.

Referee Report

2 major / 2 minor

Summary. The paper introduces RigPI, a two-stage framework for dynamic parameter identification of rigid bodies (including multi-link systems with revolute or prismatic joints) that combines VLM-derived semantic priors for initialization and search-space constraints, force-torque sensing, motion observations, and gradient-based refinement inside a differentiable physics simulator. The central claim is that this pipeline yields physically consistent inertial and frictional parameters that enable accurate, stable estimates and parameter-aware predictive reproduction of manipulation trajectories on hardware.

Significance. If the experimental claims hold with quantitative support, the work would offer a practical advance in robotic system identification by reducing reliance on manual priors and mitigating noise sensitivity through constrained differentiable optimization. Strengths include the explicit integration of vision-language priors with physics-based gradients and the focus on real-world robot-object interaction rather than simulation-only validation.

major comments (2)

[Abstract / Experiments] Abstract and experimental validation sections: the claims of 'accurate and stable parameter estimates' and 'successful reproduction of manipulation trajectories with parameter-aware predictive validity' are asserted without any reported quantitative metrics (e.g., RMSE on parameters or trajectories, error bars, baseline comparisons to non-VLM or non-differentiable methods, or exclusion criteria for trials). This absence prevents evaluation of the data-to-claim link and is load-bearing for the central experimental contribution.
[Method / Experiments] Optimization pipeline description (likely §3 or §4): it is not shown whether any reported performance metric is independent of the fitted parameters themselves (e.g., whether trajectory reproduction error is computed on held-out data or is partly defined by the same parameters being optimized), raising a potential circularity concern that must be addressed with explicit equations and evaluation protocol.

minor comments (2)

[Method] Notation for the constrained search space and VLM-derived bounds should be defined explicitly with equations rather than prose descriptions to allow reproducibility.
[Figures] Figure captions for any trajectory plots or parameter convergence curves should include axis labels, units, and whether shaded regions represent standard deviation across trials.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on RigPI. We address each major comment below and will revise the manuscript to strengthen the experimental reporting and evaluation protocol.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental validation sections: the claims of 'accurate and stable parameter estimates' and 'successful reproduction of manipulation trajectories with parameter-aware predictive validity' are asserted without any reported quantitative metrics (e.g., RMSE on parameters or trajectories, error bars, baseline comparisons to non-VLM or non-differentiable methods, or exclusion criteria for trials). This absence prevents evaluation of the data-to-claim link and is load-bearing for the central experimental contribution.

Authors: We agree that the provided manuscript text does not report specific quantitative metrics such as RMSE values, error bars, baseline comparisons, or trial exclusion criteria to support the claims. In the revised manuscript, we will incorporate these metrics from the real-world experiments, including parameter estimation errors, trajectory reproduction errors, comparisons to non-VLM and non-differentiable baselines, and details on data handling to establish a clear data-to-claim link. revision: yes
Referee: [Method / Experiments] Optimization pipeline description (likely §3 or §4): it is not shown whether any reported performance metric is independent of the fitted parameters themselves (e.g., whether trajectory reproduction error is computed on held-out data or is partly defined by the same parameters being optimized), raising a potential circularity concern that must be addressed with explicit equations and evaluation protocol.

Authors: We acknowledge the circularity concern. The revised manuscript will include an explicit description of the evaluation protocol with equations, clarifying that trajectory reproduction errors are computed on held-out real-robot interaction data separate from the optimization process, ensuring independence from the fitted parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The provided abstract and description outline a standard two-stage pipeline: VLM supplies initialization and search-space constraints, followed by gradient-based refinement in a differentiable simulator using force-torque and motion data. Reported outcomes are accuracy of fitted parameters and forward prediction of held-out manipulation trajectories on hardware. No equations, self-citations, or steps are exhibited that reduce the claimed predictions or uniqueness results to the fitted inputs by construction, nor is any performance metric defined tautologically in terms of the parameters themselves. The central claim rests on empirical validation rather than self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5718 in / 1094 out tokens · 28586 ms · 2026-06-26T05:20:32.228856+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 12 canonical work pages · 5 internal anchors

[1]

Rigid body load identification for manipulators,

C. G. Atkeson, C. H. An, and J. M. Hollerbach, “Rigid body load identification for manipulators,” in1985 24th IEEE Conference on Decision and Control, 1985, pp. 996–1002

1985
[2]

On-line estimation of inertial parameters using a recursive total least-squares approach,

D. Kubus, T. Kroger, and F. M. Wahl, “On-line estimation of inertial parameters using a recursive total least-squares approach,” in2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 3845–3852

2008
[3]

An accurate identification method based on double weighting for inertial parameters of robot payloads,

T. Xu, J. Fan, Q. Fang, Y . Zhu, and J. Zhao, “An accurate identification method based on double weighting for inertial parameters of robot payloads,”Robotica, vol. 40, no. 12, p. 4358–4374, 2022

2022
[4]

Newton: GPU-accelerated physics simulation for robotics, and simulation research

Newton Contributors, “Newton: GPU-accelerated physics simulation for robotics, and simulation research.” Newton a Series of LF Projects, LLC, 2025. [Online]. Available: https://github.com/newton-physics/ newton

2025
[5]

Difftaichi: Differentiable programming for physical simulation,

Y . Hu, L. Anderson, T.-M. Li, Q. Sun, N. Carr, J. Ragan-Kelley, and F. Durand, “Difftaichi: Differentiable programming for physical simulation,”International Conference on Learning Representations (ICLR), 2020

2020
[6]

Dojo: A differentiable physics engine for robotics,

T. Howell, S. Le Cleac’h, J. Bruedigam, Z. Kolter, M. Schwager, and Z. Manchester, “Dojo: A differentiable physics engine for robotics,”arXiv preprint arXiv:2203.00806, 2022. [Online]. Available: https://arxiv.org/abs/2203.00806

work page arXiv 2022
[7]

Differentiable simulation for physical system identification,

Q. Le Lidec, I. Kalevatykh, I. Laptev, C. Schmid, and J. Carpentier, “Differentiable simulation for physical system identification,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3413–3420, 2021

2021
[8]

gradsim: Differentiable simulation for system identification and visuomotor control,

K. M. Jatavallabhula, M. Macklin, F. Golemo, V . V oleti, L. Petrini, M. Weiss, B. Considine, J. Parent-Levesque, K. Xie, K. Erleben, L. Paull, F. Shkurti, D. Nowrouzezahrai, and S. Fidler, “gradsim: Differentiable simulation for system identification and visuomotor control,” 2021. [Online]. Available: https://arxiv.org/abs/2104.02646

work page arXiv 2021
[9]

Differentiable physics and stable modes for tool-use and manipulation planning,

M. Toussaint, K. R. Allen, K. A. Smith, and J. B. Tenenbaum, “Differentiable physics and stable modes for tool-use and manipulation planning,” 2018. [Online]. Available: https://api.semanticscholar.org/ CorpusID:46980516

2018
[10]

Differentiable physics simulation of dynamics-augmented neural objects,

S. Le Cleac’h, H.-X. Yu, M. Guo, T. Howell, R. Gao, J. Wu, Z. Manchester, and M. Schwager, “Differentiable physics simulation of dynamics-augmented neural objects,”IEEE Robotics and Automation Letters, vol. 8, no. 5, pp. 2780–2787, 2023, publisher: IEEE

2023
[11]

A Differentiable Physics Engine for Deep Learning in Robotics

J. Degrave, M. Hermans, J. Dambre, and F. wyffels, “A differentiable physics engine for deep learning in robotics,” 2018. [Online]. Available: https://arxiv.org/abs/1611.01652

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Learning object properties using robot proprioception via differentiable robot-object interaction,

P. Y . Chen, C. Liu, P. Ma, J. Eastman, D. Rus, D. Randle, Y . Ivanov, and W. Matusik, “Learning object properties using robot proprioception via differentiable robot-object interaction,” 2025. [Online]. Available: https://arxiv.org/abs/2410.03920

work page arXiv 2025
[13]

Graph networks as learnable physics engines for inference and control

A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia, “Graph networks as learnable physics engines for inference and control,” 2018. [Online]. Available: https://arxiv.org/abs/1806.01242

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning

M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks: Using physics as model prior for deep learning,” 2019. [Online]. Available: https://arxiv.org/abs/1907.04490

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

Differentiable fluid physics parameter identification by stirring and for stirring,

W. Xu*, D. Zheng*, Y . Li, J. Ren, and C. Lu, “Differentiable fluid physics parameter identification by stirring and for stirring,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2024

2024
[16]

Differentiable cloth parameter identification and state estimation in manipulation,

D. Zheng*, S. Yao*, W. Xu, and C. Lu, “Differentiable cloth parameter identification and state estimation in manipulation,”IEEE Robotics and Automation Letters, 2024

2024
[17]

Galileo: Perceiving physical object properties by integrating a physics engine with deep learning,

J. Wu, I. Yildirim, J. J. Lim, B. Freeman, and J. Tenenbaum, “Galileo: Perceiving physical object properties by integrating a physics engine with deep learning,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc.,
[18]

Available: https://proceedings.neurips.cc/paper files/ paper/2015/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf

[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2015/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf

2015
[19]

image2mass: Estimating the mass of an object from its image,

T. Standley, O. Sener, D. Chen, and S. Savarese, “image2mass: Estimating the mass of an object from its image,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 324–333. [Online]. Available: https://proceedings.ml...

2017
[20]

Physbench: Benchmarking and enhancing vision-language models for physical world understanding,

W. Chow, J. Mao, B. Li, D. Seita, V . Guizilini, and Y . Wang, “Physbench: Benchmarking and enhancing vision-language models for physical world understanding,”arXiv preprint arXiv:2501.16411, 2025

work page arXiv 2025
[21]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

J. C.et al.. Anthony Brohan, Noah Brown, “Rt-2: Vision-language- action models transfer web knowledge to robotic control,” 2023. [Online]. Available: https://arxiv.org/abs/2307.15818

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Vision-language model-based physical reasoning for robot liquid perception,

W. Lai, Y . Gao, and T. L. Lam, “Vision-language model-based physical reasoning for robot liquid perception,” 2024. [Online]. Available: https://arxiv.org/abs/2404.06904

work page arXiv 2024
[23]

Physvlm: Enabling visual language models to understand robotic physical reachability,

W. Zhou, M. Tao, C. Zhao, H. Guo, H. Dong, M. Tang, and J. Wang, “Physvlm: Enabling visual language models to understand robotic physical reachability,” 2025. [Online]. Available: https://arxiv.org/abs/2503.08481

work page arXiv 2025
[24]

Robot navigation using physically grounded vision-language models in outdoor environments,

M. Elnoor, K. Weerakoon, G. Seneviratne, R. Xian, T. Guan, M. K. M. Jaffar, V . Rajagopal, and D. Manocha, “Robot navigation using physically grounded vision-language models in outdoor environments,” 2024. [Online]. Available: https://arxiv.org/abs/2409. 20445

2024
[25]

Intern-s1: A scientific multimodal foundation model,

Y . C.et al.. Lei Bai, Zhongrui Cai, “Intern-s1: A scientific multimodal foundation model,” 2025. [Online]. Available: https: //arxiv.org/abs/2508.15763

work page arXiv 2025
[26]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

M. S.et al.. Gheorghe Comanici, Eric Bieber, “Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,” 2025. [Online]. Available: https://arxiv.org/abs/2507.06261

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Rigid body load identification for manipulators,

C. G. Atkeson, C. H. An, and J. M. Hollerbach, “Rigid body load identification for manipulators,” in1985 24th IEEE Conference on Decision and Control, 1985, pp. 996–1002

1985

[2] [2]

On-line estimation of inertial parameters using a recursive total least-squares approach,

D. Kubus, T. Kroger, and F. M. Wahl, “On-line estimation of inertial parameters using a recursive total least-squares approach,” in2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 3845–3852

2008

[3] [3]

An accurate identification method based on double weighting for inertial parameters of robot payloads,

T. Xu, J. Fan, Q. Fang, Y . Zhu, and J. Zhao, “An accurate identification method based on double weighting for inertial parameters of robot payloads,”Robotica, vol. 40, no. 12, p. 4358–4374, 2022

2022

[4] [4]

Newton: GPU-accelerated physics simulation for robotics, and simulation research

Newton Contributors, “Newton: GPU-accelerated physics simulation for robotics, and simulation research.” Newton a Series of LF Projects, LLC, 2025. [Online]. Available: https://github.com/newton-physics/ newton

2025

[5] [5]

Difftaichi: Differentiable programming for physical simulation,

Y . Hu, L. Anderson, T.-M. Li, Q. Sun, N. Carr, J. Ragan-Kelley, and F. Durand, “Difftaichi: Differentiable programming for physical simulation,”International Conference on Learning Representations (ICLR), 2020

2020

[6] [6]

Dojo: A differentiable physics engine for robotics,

T. Howell, S. Le Cleac’h, J. Bruedigam, Z. Kolter, M. Schwager, and Z. Manchester, “Dojo: A differentiable physics engine for robotics,”arXiv preprint arXiv:2203.00806, 2022. [Online]. Available: https://arxiv.org/abs/2203.00806

work page arXiv 2022

[7] [7]

Differentiable simulation for physical system identification,

Q. Le Lidec, I. Kalevatykh, I. Laptev, C. Schmid, and J. Carpentier, “Differentiable simulation for physical system identification,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3413–3420, 2021

2021

[8] [8]

gradsim: Differentiable simulation for system identification and visuomotor control,

K. M. Jatavallabhula, M. Macklin, F. Golemo, V . V oleti, L. Petrini, M. Weiss, B. Considine, J. Parent-Levesque, K. Xie, K. Erleben, L. Paull, F. Shkurti, D. Nowrouzezahrai, and S. Fidler, “gradsim: Differentiable simulation for system identification and visuomotor control,” 2021. [Online]. Available: https://arxiv.org/abs/2104.02646

work page arXiv 2021

[9] [9]

Differentiable physics and stable modes for tool-use and manipulation planning,

M. Toussaint, K. R. Allen, K. A. Smith, and J. B. Tenenbaum, “Differentiable physics and stable modes for tool-use and manipulation planning,” 2018. [Online]. Available: https://api.semanticscholar.org/ CorpusID:46980516

2018

[10] [10]

Differentiable physics simulation of dynamics-augmented neural objects,

S. Le Cleac’h, H.-X. Yu, M. Guo, T. Howell, R. Gao, J. Wu, Z. Manchester, and M. Schwager, “Differentiable physics simulation of dynamics-augmented neural objects,”IEEE Robotics and Automation Letters, vol. 8, no. 5, pp. 2780–2787, 2023, publisher: IEEE

2023

[11] [11]

A Differentiable Physics Engine for Deep Learning in Robotics

J. Degrave, M. Hermans, J. Dambre, and F. wyffels, “A differentiable physics engine for deep learning in robotics,” 2018. [Online]. Available: https://arxiv.org/abs/1611.01652

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Learning object properties using robot proprioception via differentiable robot-object interaction,

P. Y . Chen, C. Liu, P. Ma, J. Eastman, D. Rus, D. Randle, Y . Ivanov, and W. Matusik, “Learning object properties using robot proprioception via differentiable robot-object interaction,” 2025. [Online]. Available: https://arxiv.org/abs/2410.03920

work page arXiv 2025

[13] [13]

Graph networks as learnable physics engines for inference and control

A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia, “Graph networks as learnable physics engines for inference and control,” 2018. [Online]. Available: https://arxiv.org/abs/1806.01242

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning

M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks: Using physics as model prior for deep learning,” 2019. [Online]. Available: https://arxiv.org/abs/1907.04490

work page internal anchor Pith review Pith/arXiv arXiv 2019

[15] [15]

Differentiable fluid physics parameter identification by stirring and for stirring,

W. Xu*, D. Zheng*, Y . Li, J. Ren, and C. Lu, “Differentiable fluid physics parameter identification by stirring and for stirring,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2024

2024

[16] [16]

Differentiable cloth parameter identification and state estimation in manipulation,

D. Zheng*, S. Yao*, W. Xu, and C. Lu, “Differentiable cloth parameter identification and state estimation in manipulation,”IEEE Robotics and Automation Letters, 2024

2024

[17] [17]

Galileo: Perceiving physical object properties by integrating a physics engine with deep learning,

J. Wu, I. Yildirim, J. J. Lim, B. Freeman, and J. Tenenbaum, “Galileo: Perceiving physical object properties by integrating a physics engine with deep learning,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc.,

[18] [18]

Available: https://proceedings.neurips.cc/paper files/ paper/2015/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf

[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2015/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf

2015

[19] [19]

image2mass: Estimating the mass of an object from its image,

T. Standley, O. Sener, D. Chen, and S. Savarese, “image2mass: Estimating the mass of an object from its image,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 324–333. [Online]. Available: https://proceedings.ml...

2017

[20] [20]

Physbench: Benchmarking and enhancing vision-language models for physical world understanding,

W. Chow, J. Mao, B. Li, D. Seita, V . Guizilini, and Y . Wang, “Physbench: Benchmarking and enhancing vision-language models for physical world understanding,”arXiv preprint arXiv:2501.16411, 2025

work page arXiv 2025

[21] [21]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

J. C.et al.. Anthony Brohan, Noah Brown, “Rt-2: Vision-language- action models transfer web knowledge to robotic control,” 2023. [Online]. Available: https://arxiv.org/abs/2307.15818

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

Vision-language model-based physical reasoning for robot liquid perception,

W. Lai, Y . Gao, and T. L. Lam, “Vision-language model-based physical reasoning for robot liquid perception,” 2024. [Online]. Available: https://arxiv.org/abs/2404.06904

work page arXiv 2024

[23] [23]

Physvlm: Enabling visual language models to understand robotic physical reachability,

W. Zhou, M. Tao, C. Zhao, H. Guo, H. Dong, M. Tang, and J. Wang, “Physvlm: Enabling visual language models to understand robotic physical reachability,” 2025. [Online]. Available: https://arxiv.org/abs/2503.08481

work page arXiv 2025

[24] [24]

Robot navigation using physically grounded vision-language models in outdoor environments,

M. Elnoor, K. Weerakoon, G. Seneviratne, R. Xian, T. Guan, M. K. M. Jaffar, V . Rajagopal, and D. Manocha, “Robot navigation using physically grounded vision-language models in outdoor environments,” 2024. [Online]. Available: https://arxiv.org/abs/2409. 20445

2024

[25] [25]

Intern-s1: A scientific multimodal foundation model,

Y . C.et al.. Lei Bai, Zhongrui Cai, “Intern-s1: A scientific multimodal foundation model,” 2025. [Online]. Available: https: //arxiv.org/abs/2508.15763

work page arXiv 2025

[26] [26]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

M. S.et al.. Gheorghe Comanici, Eric Bieber, “Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,” 2025. [Online]. Available: https://arxiv.org/abs/2507.06261

work page internal anchor Pith review Pith/arXiv arXiv 2025