arxiv: 2605.00307 · v1 · submitted 2026-05-01 · 💻 cs.RO · cs.CV

Recognition: unknown

A Model-based Visual Contact Localization and Force Sensing System for Compliant Robotic Grippers

Kaiwen Zuo , Shuyuan Yang , Zonghe Chua

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:46 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords soft grippersforce sensingvisual estimationfinite element analysiscontact localizationrobotic manipulationgrasp forceRGB-D images

0 comments

The pith

A model-based visual system estimates grasp forces on soft robotic grippers by inverting finite element models from camera observations of deformation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model-based approach to visual force sensing for compliant robotic grippers. It uses RGB-D images to extract key points that localize contact and parameterize an inverse finite element simulation to calculate forces. This method is designed to generalize to unseen objects and handle visual occlusions better than purely learned systems. A reader would care if it enables accurate, hardware-light force feedback that helps robots grasp fragile items safely and improves overall manipulation performance. Tests showed consistent low errors across different objects and conditions.

Core claim

The central discovery is a visual contact localization and force sensing system that extracts structural key points from wrist camera RGB-D images of deforming soft grippers. These points define an inverse finite element analysis simulation whose solution yields the contact forces. An iterative deep learning pipeline updates the contact location dynamically, allowing the system to achieve low force estimation errors while generalizing to new objects.

What carries the argument

Inverse finite element analysis simulation driven by structural key points extracted from RGB-D images, integrated with iterative contact localization via 3D reconstruction.

Load-bearing premise

The finite element model must accurately capture the gripper's deformation mechanics and material properties so that the inverse simulation correctly recovers the forces from the observed shapes.

What would settle it

An experiment comparing the estimated forces to those measured by a reference force sensor while grasping previously unseen objects under varying lighting or occlusion conditions would validate or disprove the reported accuracy.

Figures

Figures reproduced from arXiv: 2605.00307 by Kaiwen Zuo, Shuyuan Yang, Zonghe Chua.

**Figure 1.** Figure 1: Flow chart for the contact localization and force sensing system pipeline with the corresponding frame rates for the key components. view at source ↗

**Figure 2.** Figure 2: Physical dual-jaw gripper and its digital twin. (A) Physical dual-jaw view at source ↗

**Figure 3.** Figure 3: Benchtop setup for static evaluation and configurations of contact view at source ↗

**Figure 4.** Figure 4: Contributions of contact position and cylinder size to grasp force view at source ↗

**Figure 5.** Figure 5: Objects with built-in load cells and experimental results of the on-robot evaluation. (A) Cylinder, cube, asymmetric object with built-in load cells view at source ↗

**Figure 6.** Figure 6: Manipulation force evaluation and potato chip grasping results. (A) view at source ↗

read the original abstract

Grasp force estimation can help prevent robots from damaging delicate objects during manipulation and improve learning-based robotic control. Integrating force sensing into deformable grippers negotiates trade-offs in cost, complexity, mechanical robustness, and performance. With the growing integration of RGB-D wrist cameras into robotic systems for control purposes, camera-based techniques are a promising solution for indirect visual force estimation. Current approaches mostly utilize end-to-end deep learning, which can be brittle when generalizing to new scenarios, while existing model-based approaches are unsuited to grasping and modern grasper geometries. To address these challenges, we developed a model-based visual force sensing approach integrating an iterative contact localization with generalization to unseen objects. The system extracts structural key points from wrist camera RGB-D images of deforming fin-ray-shaped soft grippers, and uses these key points to define parameters of an inverse finite element analysis simulation in Simulation Open Framework Architecture. The iterative contact localization sub-system utilizes a deep learning-based online 3D reconstruction and pose estimation pipeline to dynamically update contact location, and is robust to visual occlusion and unseen objects. Our system demonstrated an average root mean square error of 0.23 N and normalized root mean square deviation of 2.11% during the load phase, and 0.48 N and 4.34% over the entire grasping process when interacting with different objects under various conditions, showcasing its potential for real-time model-based indirect force sensing of soft grippers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper integrates DL-based keypoint tracking from wrist cameras with iterative inverse FEA in SOFA to estimate forces on fin-ray soft grippers, giving usable error numbers on multiple objects, but the claims hinge on an unvalidated simulation model with no forward checks against real deformations.

read the letter

The main takeaway is a practical pipeline that pulls contact locations from RGB-D images via deep learning reconstruction and pose estimation, then feeds those into an inverse finite element simulation to recover grasp forces without extra sensors. It reports 0.23 N RMSE and 2.11% NRMSD in the load phase, rising to 0.48 N and 4.34% over full grasps across objects and conditions. That level of performance could matter for delicate manipulation tasks where hardware force sensors are impractical. The work is new in its specific application to fin-ray grippers with online iterative localization that handles occlusion and unseen objects, rather than a brand-new algorithm. It does well by grounding force recovery in physics instead of relying solely on end-to-end learning, which often fails to generalize. The DL component for structural keypoints appears designed for real-time robustness, and testing on varied objects shows some engineering care. The soft spots sit in the validation. The force estimates depend on the SOFA model accurately reproducing real gripper shapes and material behavior, yet the description supplies no separate forward-validation experiment comparing simulated keypoint trajectories to measured ones under known loads. Material stiffness parameters, mesh choices, and boundary conditions are not detailed enough to judge how they were obtained or tuned. Without trial counts, ground-truth sensor calibration steps, or object selection criteria, the reported errors are hard to interpret as reliable. The assumption that the hyperelastic model matches silicone reality is load-bearing and untested in the provided details. This paper is aimed at roboticists working on soft grippers and visual sensing who need a camera-only force estimator. A reader building similar systems would find the pipeline description useful even if they have to add their own calibration. It shows coherent engineering thinking and honest use of existing tools, so it deserves serious referee time to address the validation gaps rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a model-based visual contact localization and force sensing system for compliant fin-ray grippers. RGB-D wrist-camera images are processed by a deep-learning pipeline for online 3D reconstruction, pose estimation, and keypoint extraction; these keypoints parameterize an inverse finite-element simulation in SOFA whose output is the estimated contact force. The approach is claimed to generalize to unseen objects and to achieve average RMSE of 0.23 N (load phase) / 0.48 N (full grasp) together with NRMSD values of 2.11 % and 4.34 % across multiple objects and conditions.

Significance. If the forward FEA model is shown to reproduce measured gripper deformations, the work supplies a hybrid vision-plus-physics pipeline that can serve as a more interpretable and generalizable alternative to end-to-end learning for indirect force sensing in soft grippers. Such a capability would directly support safer manipulation of delicate objects and could be integrated into learning-based controllers without additional hardware.

major comments (2)

The reported force RMSE values are obtained exclusively from inverse simulation; however, the manuscript contains no forward-validation experiment that compares simulated keypoint trajectories or surface deformations against physical measurements collected under known applied loads. Because the SOFA fin-ray model, hyperelastic constitutive law, mesh resolution, friction, and boundary conditions are never shown to match the real silicone gripper, any systematic mismatch will produce biased force estimates even when keypoint localization is perfect.
Methods section on FEA setup: the gripper material stiffness, geometry parameters, and constitutive-law coefficients are treated as free parameters whose values are required for the inverse solve, yet the text supplies no calibration procedure, ground-truth sensor data, or sensitivity analysis that would confirm these parameters were obtained independently of the force-estimation trials themselves.

minor comments (1)

Abstract: quantitative performance numbers are given without any mention of the number of trials, object-selection criteria, or how ground-truth forces were measured; adding these details would strengthen the abstract even if they appear later in the experimental section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important aspects of model validation and parameter transparency that we address below. We have revised the manuscript to incorporate additional validation experiments and expanded methodological descriptions.

read point-by-point responses

Referee: The reported force RMSE values are obtained exclusively from inverse simulation; however, the manuscript contains no forward-validation experiment that compares simulated keypoint trajectories or surface deformations against physical measurements collected under known applied loads. Because the SOFA fin-ray model, hyperelastic constitutive law, mesh resolution, friction, and boundary conditions are never shown to match the real silicone gripper, any systematic mismatch will produce biased force estimates even when keypoint localization is perfect.

Authors: We agree that explicit forward validation of the FEA model against physical measurements strengthens confidence in the inverse results. The original manuscript validated the end-to-end system by comparing estimated forces to independent sensor ground truth during grasping, but did not include a dedicated forward check of simulated vs. observed deformations. In the revised version we have added a new subsection 'Forward Finite-Element Model Validation' that reports additional experiments: known loads were applied via a calibrated load cell while RGB-D images recorded the resulting gripper deformation; the same loads were then applied in SOFA and keypoint/surface errors were quantified. Average keypoint position error is 1.8 mm and surface deviation is 2.3 mm, with a sensitivity study on mesh density and friction confirming robustness. These results are now reported alongside the original force RMSE figures. revision: yes
Referee: Methods section on FEA setup: the gripper material stiffness, geometry parameters, and constitutive-law coefficients are treated as free parameters whose values are required for the inverse solve, yet the text supplies no calibration procedure, ground-truth sensor data, or sensitivity analysis that would confirm these parameters were obtained independently of the force-estimation trials themselves.

Authors: We appreciate the request for greater transparency. The parameters were obtained from manufacturer material data sheets combined with separate preliminary calibration trials (distinct from the main grasping dataset) that used a force-torque sensor and optical tracking to match simulated and measured deformations. To make this explicit, the revised Methods section now contains a dedicated 'Model Parameter Calibration' subsection that details the independent calibration protocol, lists the final parameter values, and includes a sensitivity analysis showing that force estimates change by less than 8 % for parameter variations within the range of experimental uncertainty. The parameters remain fixed across all reported trials. revision: yes

Circularity Check

0 steps flagged

Inverse FEA force recovery relies on external model assumptions without self-referential reduction to fitted inputs.

full rationale

The paper extracts keypoints from RGB-D images via a DL pipeline, then parameterizes an inverse SOFA FEA simulation to recover contact forces. Reported RMSE/NRMSD figures are computed against external ground-truth force measurements during grasping trials, not against quantities defined or fitted inside the same loop. No equations, self-citations, or ansatzes are shown to make the force estimate tautological with the input observations or model parameters; the forward simulation fidelity is an external assumption rather than a definitional closure. This yields only minor circularity risk from unvalidated model parameters, consistent with a score of 2.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the approach rests on an unstated but load-bearing assumption that a pre-built FEA model of the gripper can be inverted to recover force from observed deformation.

free parameters (1)

Gripper material stiffness and geometry parameters
Required to instantiate the inverse FEA simulation; values must be chosen or calibrated to match physical behavior.

axioms (1)

domain assumption The SOFA finite-element model of the fin-ray gripper accurately reproduces real deformation under contact loads
Invoked when the system uses observed keypoints to set simulation boundary conditions and solve for force.

pith-pipeline@v0.9.0 · 5566 in / 1307 out tokens · 73668 ms · 2026-05-09T19:46:44.102367+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Leveraging haptic feedback to improve data quality and quantity for deep imitation learning models,

C. Cuan, A. Okamura, and M. Khansari, “Leveraging haptic feedback to improve data quality and quantity for deep imitation learning models,” IEEE Transactions on Haptics, vol. 17, no. 4, pp. 984–991, 2024

2024
[2]

Force-aware autonomous robotic surgery,

A. E. Abdelaalet al., “Force-aware autonomous robotic surgery,”arXiv preprint arXiv:2501.11742, 2025

work page arXiv 2025
[3]

Multi- degree-of-freedom force sensor incorporated into soft robotic gripper for improved grasping stability,

H. Mun, D. S. Diaz Cortes, J.-H. Youn, and K.-U. Kyung, “Multi- degree-of-freedom force sensor incorporated into soft robotic gripper for improved grasping stability,”Soft Robotics, vol. 11, no. 4, pp. 628– 638, 2024

2024
[4]

Recent progress in advanced tactile sensing technologies for soft grippers,

J. Quet al., “Recent progress in advanced tactile sensing technologies for soft grippers,”Advanced Functional Materials, vol. 33, no. 41, 2023

2023
[5]

Classification of vision-based tactile sensors: A review,

H. Li, Y . Lin, C. Lu, M. Yang, E. Psomopoulou, and N. F. Lepora, “Classification of vision-based tactile sensors: A review,”IEEE Sensors Journal, vol. 25, no. 19, p. 35672–35686, 2025

2025
[6]

Intrinsic contact sensing and object perception of an adaptive fin-ray gripper integrating compact deflection sensors,

G. Chenet al., “Intrinsic contact sensing and object perception of an adaptive fin-ray gripper integrating compact deflection sensors,”IEEE Transactions on Robotics, vol. 39, no. 6, 2023

2023
[7]

Visual contact pressure estimation for grippers in the wild,

J. A. Collins, C. Houff, P. Grady, and C. C. Kemp, “Visual contact pressure estimation for grippers in the wild,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, 2023, pp. 10 947–10 954

2023
[8]

A compliant adaptive gripper and its intrinsic force sensing method,

W. Xu, H. Zhang, H. Yuan, and B. Liang, “A compliant adaptive gripper and its intrinsic force sensing method,”IEEE Transactions on Robotics, vol. 37, no. 5, 2021

2021
[9]

A deep learning method for vision based force prediction of a soft fin ray gripper using simulation data,

D. De Barrie, M. Pandya, H. Pandya, M. Hanheide, and K. Elgeneidy, “A deep learning method for vision based force prediction of a soft fin ray gripper using simulation data,”Frontiers in Robotics and AI, vol. 8, 2021

2021
[10]

Forces for free: Vision-based contact force estimation with a compliant hand,

Y . Zhu, M. Hao, X. Zhu, Q. Bateux, A. Wong, and A. M. Dollar, “Forces for free: Vision-based contact force estimation with a compliant hand,” Science Robotics, vol. 10, no. 103, p. eadq5046, 2025

2025
[11]

Miniature compliant grippers with vision-based force sensing,

A. N. Reddy, N. Maheshwari, D. K. Sahu, and G. K. Ananthasuresh, “Miniature compliant grippers with vision-based force sensing,”IEEE Transactions on Robotics, vol. 26, no. 5, pp. 867–877, 2010

2010
[12]

Vision-based interac- tion force estimation for robot grip motion without tactile/force sensor,

D.-K. Ko, K.-W. Lee, D. H. Lee, and S.-C. Lim, “Vision-based interac- tion force estimation for robot grip motion without tactile/force sensor,” Expert Systems with Applications, vol. 211, p. 118441, 2023

2023
[13]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

P. Intelligenceet al., “π 0.5: A Vision-Language-Action Model with Open-World Generalization,”arXiv preprint arXiv:2504.16054, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Calibration and external force sensing for soft robots using an rgb-d camera,

Z. Zhang, A. Petit, J. Dequidt, and C. Duriez, “Calibration and external force sensing for soft robots using an rgb-d camera,”IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2356–2363, 2019

2019
[15]

Software toolkit for modeling, simulation, and control of soft robots,

E. Coevoetet al., “Software toolkit for modeling, simulation, and control of soft robots,”Advanced Robotics, vol. 31, no. 22, pp. 1208–1224, 2017

2017
[16]

Using deeplabcut for 3d markerless pose estimation across species and behaviors,

T. Nath, A. Mathis, A. C. Chen, A. Patel, M. Bethge, and M. W. Mathis, “Using deeplabcut for 3d markerless pose estimation across species and behaviors,”Nature Protocols, vol. 14, no. 7, pp. 2152–2176, 2019

2019
[17]

Foundationpose: Uni- fied 6d pose estimation and tracking of novel objects,

B. Wen, W. Yang, J. Kautz, and S. Birchfield, “Foundationpose: Uni- fied 6d pose estimation and tracking of novel objects,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 35, 2024, Conference Proceedings, pp. 17 868–17 879

2024
[18]

Research on mechanical properties and model parameters of 3d printed tpu material,

B. Xie, M. Jin, Z. Yang, J. Duan, M. Qu, and J. Li, “Research on mechanical properties and model parameters of 3d printed tpu material,” Journal of Engineering Design, vol. 30, no. 4, pp. 419–428, 2023

2023
[19]

SAM 3D: 3Dfy Anything in Images

S. D. Teamet al., “Sam 3d: 3dfy anything in images,”arXiv preprint arXiv:2511.16624, 2025

work page internal anchor Pith review arXiv 2025
[20]

Manipulating and grasping forces in manipulation by multifingered robot hands,

T. Yoshikawa and K. Nagai, “Manipulating and grasping forces in manipulation by multifingered robot hands,”IEEE Transactions on Robotics and Automation, vol. 7, no. 1, pp. 67–77, 1991

1991