pith. sign in

arxiv: 2606.15516 · v2 · pith:AWCUWGXWnew · submitted 2026-06-14 · 💻 cs.RO

Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands

Pith reviewed 2026-06-27 04:41 UTC · model grok-4.3

classification 💻 cs.RO
keywords dexterous graspingcross-embodiment transfercontact feedbackcompliant manipulationvisuomotor policyforce calibrationhybrid control
0
0 comments X

The pith

Calibrated contact feedback lets grasping policies transfer across structurally different dexterous hands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that stable dexterous grasping requires regulating contact forces as objects slip or become occluded, not just matching motion. It introduces a force-position interface that places motion intent in a shared hand-pose latent while calibrating each hand's effort signals through system identification into joint torques, fingertip forces, and per-finger load descriptors. A flow-matching visuomotor policy is trained on vision, proprioception, and these calibrated signals, with visual masking to promote force reliance under occlusion, and executed via a hybrid force-position controller. If correct, the same learned primitives become reusable across hands in long-horizon tasks without hand-specific retraining.

Core claim

A cross-embodiment force-position interface represents motion in a shared latent while mapping each hand's effort to comparable physical torques in N.m, fingertip forces, and compact load descriptors; a flow-matching policy trained on vision plus these signals, with structured masking, produces transferable compliant grasping that reuses primitives in extended manipulation sequences.

What carries the argument

The cross-embodiment force-position interface that converts raw effort signals into interchangeable fingertip forces and per-finger load descriptors via system identification.

If this is right

  • Compliant grasping policies trained on one hand apply directly to structurally different hands.
  • Learned primitives integrate into longer manipulation sequences without additional hand-specific training.
  • Visual masking during training increases policy dependence on calibrated force feedback when vision is unreliable.
  • The hybrid controller keeps force targets consistent between demonstration collection and policy execution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The calibration step could be extended to other sensor types to broaden cross-embodiment transfer beyond force.
  • Testing the same interface on hands with larger kinematic differences would reveal the limits of the load-descriptor representation.
  • If the per-finger descriptors prove sufficient, similar compact representations might simplify transfer for non-grasping contact tasks such as in-hand rotation.

Load-bearing premise

System identification can produce effort-to-torque mappings that yield comparable fingertip forces and per-finger load descriptors across heterogeneous hands.

What would settle it

A direct comparison on two different hands where the same policy succeeds with calibration but fails to maintain stable grasps when the calibration step is removed and raw effort signals are used instead.

Figures

Figures reproduced from arXiv: 2606.15516 by Michael Yip, Soofiyan Atar, Yao-Ting Huang.

Figure 1
Figure 1. Figure 1: Overview. We propose a unified force–position encoding for dexterous hands that transfers across embodiments. A VLM grounds language instructions into task keypoints, and a state machine dispatches each phase to an optimization-based primitive, enabling stable long-horizon grasping without an end-to-end policy. The bottom row shows primitive actions across embodiments on compliant and rigid objects. Abstra… view at source ↗
Figure 2
Figure 2. Figure 2: Calibration setup. Fingertip tethered through a string and spring to a six-axis load cell for per-hand torque calibration. The contact channel calibrates the per-hand torque predictor Gh against a physical reference ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Model pipeline. MARC is a DiT flow-matching policy [9] conditioned on a unified force–position state. The environment states (unified tuple st) are as described in Sec. 3. Structured masking is applied to visual patch tokens or entire camera streams during training, forcing the policy to act on force and proprioception under occlusion. MANO latent decoders are frozen, the DiT is trained from scratch, and t… view at source ↗
Figure 4
Figure 4. Figure 4: Handover. Wrist forces experienced during object handover [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Steady grasping. Wrist force, fingertip force, and fd during steady grasping for a compliant/rigid ob￾ject. We compose long-horizon manipulation from reusable contact-aware primitives rather than a single end-to-end policy. A vision-language model proposes task keypoints from the scene image and provides only high-level spatial grounding instructions; it does not condition πθ. A deterministic state machine… view at source ↗
Figure 6
Figure 6. Figure 6: Success rates for seen and unseen robotic hands across objects. Bars show per-object success over 10 trials on MARC; legend values indicate mean success. Unseen-hand generalization. To test transfer beyond the training set ( [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Dexterous grasping depends on contact regulation, not motion alone. Stable manipulation requires fingers to maintain appropriate object loading as contacts slip, deform, or become visually occluded. Existing cross-embodiment dexterous policies unify motion through retargeted hand poses or latent actions, but force feedback remains tied to each hand's sensing and actuation, limiting transfer. This work introduces a cross-embodiment force-position interface for contact-aware manipulation across heterogeneous dexterous hands. Motion intent is represented in a shared hand-pose latent, while each hand's effort signal is calibrated through system identification into physical joint torque in N.m. These torques are mapped to fingertip forces and compact per-finger load descriptors, giving the policy comparable observations of where the hand should move and how the object is loaded. Using this interface, a flow-matching visuomotor policy is trained on vision, proprioception, and calibrated contact, with structured visual masking that encourages reliance on force under grasp-relevant occlusion. The same calibrated signal drives a hybrid force-position controller for demonstration collection and execution, keeping force targets consistent across training and deployment. Experiments across structurally different hands show that calibrated contact feedback enables transferable compliant grasping, with learned primitives reusable in long-horizon manipulation pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces a cross-embodiment force-position interface for compliant grasping across heterogeneous dexterous hands. Motion intent is encoded in a shared hand-pose latent while each hand's effort signal is calibrated via system identification to physical joint torques (N·m), then mapped to fingertip forces and compact per-finger load descriptors. A flow-matching visuomotor policy is trained on vision, proprioception, and these calibrated contact signals with structured visual masking; the same signals drive a hybrid force-position controller for both data collection and execution. Experiments on structurally different hands are claimed to show that calibrated contact enables transferable compliant grasping with primitives reusable in long-horizon pipelines.

Significance. If the system-identification calibration produces interchangeable fingertip-force and per-finger load observations across hardware, the work would meaningfully advance cross-embodiment transfer in contact-rich manipulation by decoupling force feedback from hand-specific sensing and actuation, going beyond motion-only retargeting. The hybrid controller and occlusion-aware masking are concrete design choices that could improve robustness.

minor comments (1)
  1. [Abstract] The abstract states that experiments demonstrate transferable grasping but supplies no quantitative metrics, error bars, dataset sizes, ablation results, or baseline comparisons, preventing assessment of whether the calibration actually supports the transfer claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and for recognizing the potential of the cross-embodiment force-position interface to advance contact-rich transfer beyond motion-only retargeting. The recommendation of 'uncertain' is noted; we address the overall report below. No specific major comments were enumerated in the provided report, so we have no point-by-point rebuttals to offer at this stage.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's pipeline calibrates raw effort signals via system identification to physical joint torques (N·m), then maps those to fingertip forces and per-finger load descriptors. This step references external physical units rather than any fitted parameter or self-referential definition, allowing the shared pose latent and hybrid controller to treat signals as interchangeable. No equations or claims reduce a prediction to its own inputs by construction, no self-citation chains are load-bearing, and no ansatz is smuggled in. The derivation remains self-contained against external physical benchmarks and falsifiable measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are detailed. The central claim rests on the unverified effectiveness of the system-identification calibration step.

pith-pipeline@v0.9.1-grok · 5756 in / 1018 out tokens · 29716 ms · 2026-06-27T04:41:15.335405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages

  1. [1]

    H. Yuan, B. Zhou, Y . Fu, and Z. Lu. Cross-embodiment dexterous grasping with reinforcement learning, 2024. URLhttps://arxiv.org/abs/2410.02479

  2. [2]

    Jiang, Y

    G. Jiang, Y . Liang, J. Ye, J.-Y . Huang, C. Jing, R. Duan, P. Abbeel, X. Wang, and X. Zou. Cross-hand latent representation for vision-language-action models, 2026. URL https:// arxiv.org/abs/2603.10158

  3. [3]

    N. Hogan. Impedance control: An approach to manipulation: Part II—implementation.Journal of Dynamic Systems, Measurement, and Control, 107(1):8, 1985. doi:10.1115/1.3140713. URL https://doi.org/10.1115%2F1.3140713

  4. [4]

    Z. Wu, R. A. Potamias, X. Zhang, Z. Zhang, J. Deng, and S. Luo. Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations, 2025. URL https://arxiv.org/abs/2509.24661

  5. [5]

    Bauer, E

    E. Bauer, E. Nava, and R. K. Katzschmann. Latent action diffusion for cross-embodiment manipulation, 2026. URLhttps://arxiv.org/abs/2506.14608

  6. [6]

    Bhirangi, V

    R. Bhirangi, V . Pattabiraman, E. Erciyes, Y . Cao, T. Hellebrekers, and L. Pinto. Anyskin: Plug- and-play skin sensing for robotic touch, 2024. URL https://arxiv.org/abs/2409.08276

  7. [7]

    Zhang, C

    D. Zhang, C. Yuan, C. Wen, H. Zhang, J. Zhao, and Y . Gao. Kinedex: Learning tactile- informed visuomotor policies via kinesthetic teaching for dexterous manipulation, 2025. URL https://arxiv.org/abs/2505.01974

  8. [8]

    H. Seraji. Adaptive hybrid control of manipulators. InProceedings of the Workshop on Space Telerobotics, V olume 3, 1987

  9. [9]

    McAllister, S

    D. McAllister, S. Ge, B. Yi, C. M. Kim, E. Weber, H. Choi, H. Feng, and A. Kanazawa. Flow matching policy gradients, 2025. URLhttps://arxiv.org/abs/2507.21053

  10. [10]

    X. Fei, Z. Xu, H. Fang, T. Zhang, and L. Shao. T(r,o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping, 2025. URL https://arxiv.org/abs/2510.12724

  11. [11]

    Zhang, K

    H. Zhang, K. Y . Ma, M. Z. Shou, W. Lin, and Y . Wu. Machagrasp: Morphology-aware cross-embodiment dexterous hand articulation generation for grasping, 2026. URL https: //arxiv.org/abs/2510.06068. 9

  12. [12]

    ACM Trans

    J. Romero, D. Tzionas, and M. J. Black. Embodied hands: modeling and capturing hands and bodies together.ACM Transactions on Graphics, 36(6):1–17, Nov. 2017. ISSN 1557-7368. doi:10.1145/3130800.3130883. URLhttp://dx.doi.org/10.1145/3130800.3130883

  13. [13]

    High-precision hand tracking & mocap gloves — manus

    MANUS Technology Group. High-precision hand tracking & mocap gloves — manus. https: //www.manus-meta.com/, 2025. Accessed: 2025-09-13

  14. [14]

    Jiang, Y

    Z. Jiang, Y . Xie, K. Lin, Z. Xu, W. Wan, A. Mandlekar, L. Fan, and Y . Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning, 2025. URLhttps://arxiv.org/abs/2410.24185

  15. [15]

    T. Tao, M. K. Srirama, J. J. Liu, K. Shaw, and D. Pathak. Dexwild: Dexterous human interactions for in-the-wild robot policies, 2026. URLhttps://arxiv.org/abs/2505.07813

  16. [16]

    M. Xu, H. Zhang, Y . Hou, Z. Xu, L. Fan, M. Veloso, and S. Song. Dexumi: Using human hand as the universal manipulation interface for dexterous manipulation, 2025. URL https: //arxiv.org/abs/2505.21864

  17. [17]

    H.-S. Fang, B. Romero, Y . Xie, A. Hu, B.-R. Huang, J. Alvarez, M. Kim, G. Margolis, K. An- barasu, M. Tomizuka, E. Adelson, and P. Agrawal. Dexop: A device for robotic transfer of dexterous human manipulation, 2025. URLhttps://arxiv.org/abs/2509.04441

  18. [18]

    K. Zhu, F. Bai, Y . Xiang, Y . Cai, X. Chen, R. Li, X. Wang, H. Dong, Y . Yang, X. Fan, and Y . Chen. Dexflywheel: A scalable and self-improving data generation framework for dexterous manipulation, 2025. URLhttps://arxiv.org/abs/2509.23829

  19. [19]

    S. Atar, D. Huang, F. Richter, and M. Yip. In-hand manipulation of articulated tools with dexterous robot hands with sim-to-real transfer, 2026. URL https://arxiv.org/abs/2509. 23075

  20. [20]

    H. Shi, S. Hu, Y . Hou, W. Wang, K. Liu, and S. Song. Minimalist compliance control, 2026. URLhttps://arxiv.org/abs/2603.00913

  21. [21]

    S. Chen, J. Bohg, and C. K. Liu. Springgrasp: An optimization pipeline for robust and compliant dexterous pre-grasp synthesis, 2024

  22. [22]

    R. Chen, M. Mukadam, M. Kaess, T. Wu, F. R. Hogan, J. Malik, and A. Sharma. Ptld: Sim- to-real privileged tactile latent distillation for dexterous manipulation, 2026. URL https: //arxiv.org/abs/2603.04531

  23. [23]

    Higuera, A

    C. Higuera, A. Sharma, C. K. Bodduluri, T. Fan, P. Lancaster, M. Kalakrishnan, M. Kaess, B. Boots, M. Lambeta, T. Wu, and M. Mukadam. Sparsh: Self-supervised touch representations for vision-based tactile sensing, 2024. URLhttps://arxiv.org/abs/2410.24090

  24. [24]

    X. Chen, Y . Pan, M. Li, and X. Ding. Dexvitac: Collecting human visuo-tactile-kinematic demonstrations for contact-rich dexterous manipulation, 2026. URL https://arxiv.org/ abs/2603.17851

  25. [25]

    Peebles and S

    W. Peebles and S. Xie. Scalable diffusion models with transformers, 2023. URL https: //arxiv.org/abs/2212.09748

  26. [26]

    Collaboration, A

    E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, et al. Open x-embodiment: Robotic learning datasets and rt-x models, 2025. URL https://arxiv.org/ abs/2310.08864

  27. [27]

    O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy, 2024. URL https://arxiv.org/ abs/2405.12213. 10

  28. [28]

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model, 2024. URL https://arxiv.org/abs/2406.09246

  29. [29]

    Black, N

    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, et al. π0: A vision-language-action flow model for general robot control, 2026. URLhttps://arxiv.org/abs/2410.24164

  30. [30]

    S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation, 2025. URL https://arxiv.org/abs/2410. 07864

  31. [31]

    J. Wen, Y . Zhu, J. Li, Z. Tang, C. Shen, and F. Feng. Dexvla: Vision-language model with plug-in diffusion expert for general robot control, 2025. URL https://arxiv.org/abs/ 2502.05855

  32. [32]

    H. Liu, S. Guo, P. Mai, J. Cao, H. Li, and J. Ma. Robodexvlm: Visual language model- enabled task planning and motion control for dexterous robot manipulation, 2025. URL https://arxiv.org/abs/2503.01616

  33. [33]

    Zhong, X

    Y . Zhong, X. Huang, R. Li, C. Zhang, Z. Chen, T. Guan, F. Zeng, K. N. Lui, Y . Ye, Y . Liang, Y . Yang, and Y . Chen. Dexgraspvla: A vision-language-action framework towards general dexterous grasping, 2025. URLhttps://arxiv.org/abs/2502.20900

  34. [34]

    de Bakker, J

    V . de Bakker, J. Hejna, T. G. W. Lum, O. Celik, A. Taranovic, D. Blessing, G. Neumann, J. Bohg, and D. Sadigh. Scaffolding dexterous manipulation with vision-language models, 2026. URLhttps://arxiv.org/abs/2506.19212

  35. [35]

    Romero, D

    J. Romero, D. Tzionas, and M. J. Black. Embodied hands: Modeling and capturing hands and bodies together.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), Nov. 2017

  36. [36]

    Handa, K

    A. Handa, K. V . Wyk, W. Yang, J. Liang, Y .-W. Chao, Q. Wan, S. Birchfield, N. Ratliff, and D. Fox. Dexpilot: Vision based teleoperation of dexterous robotic hand-arm system, 2019. URL https://arxiv.org/abs/1910.03135

  37. [37]

    He and W

    G. He and W. Zhang. Wujihand retargeting, 2026. URL https://github.com/ wuji-technology/wuji-retargeting. * Equal contribution

  38. [38]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385

  39. [39]

    C. Ott, R. Mukherjee, and Y . Nakamura. Unified impedance and admittance control. In2010 IEEE international conference on robotics and automation, pages 554–561. IEEE, 2010

  40. [40]

    G. Yan, J. Zhu, Y . Deng, S. Yang, R.-Z. Qiu, X. Cheng, M. Memmel, R. Krishna, A. Goyal, X. Wang, and D. Fox. Maniflow: A dexterous manipulation policy via flow matching.arXiv preprint arXiv:, 2025

  41. [41]

    point": [u_min, v_min, u_max, v_max],

    T. L. Team, B. Burchfiel, H. Kress-Gazit, S. Feng, S. Ford, R. Tedrake, et al. A careful examination of large behavior models for multitask dexterous manipulation, 2025. URL https://arxiv.org/abs/2507.05331. 11 Figure S1:Spatial torque descriptor Appendix S1 System Identification Details Mechanical setup.A Franka Emika Panda arm holds the dexterous hand a...