Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands

Michael Yip; Soofiyan Atar; Yao-Ting Huang

arxiv: 2606.15516 · v2 · pith:AWCUWGXWnew · submitted 2026-06-14 · 💻 cs.RO

Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands

Soofiyan Atar , Yao-Ting Huang , Michael Yip This is my paper

Pith reviewed 2026-06-27 04:41 UTC · model grok-4.3

classification 💻 cs.RO

keywords dexterous graspingcross-embodiment transfercontact feedbackcompliant manipulationvisuomotor policyforce calibrationhybrid control

0 comments

The pith

Calibrated contact feedback lets grasping policies transfer across structurally different dexterous hands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that stable dexterous grasping requires regulating contact forces as objects slip or become occluded, not just matching motion. It introduces a force-position interface that places motion intent in a shared hand-pose latent while calibrating each hand's effort signals through system identification into joint torques, fingertip forces, and per-finger load descriptors. A flow-matching visuomotor policy is trained on vision, proprioception, and these calibrated signals, with visual masking to promote force reliance under occlusion, and executed via a hybrid force-position controller. If correct, the same learned primitives become reusable across hands in long-horizon tasks without hand-specific retraining.

Core claim

A cross-embodiment force-position interface represents motion in a shared latent while mapping each hand's effort to comparable physical torques in N.m, fingertip forces, and compact load descriptors; a flow-matching policy trained on vision plus these signals, with structured masking, produces transferable compliant grasping that reuses primitives in extended manipulation sequences.

What carries the argument

The cross-embodiment force-position interface that converts raw effort signals into interchangeable fingertip forces and per-finger load descriptors via system identification.

If this is right

Compliant grasping policies trained on one hand apply directly to structurally different hands.
Learned primitives integrate into longer manipulation sequences without additional hand-specific training.
Visual masking during training increases policy dependence on calibrated force feedback when vision is unreliable.
The hybrid controller keeps force targets consistent between demonstration collection and policy execution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The calibration step could be extended to other sensor types to broaden cross-embodiment transfer beyond force.
Testing the same interface on hands with larger kinematic differences would reveal the limits of the load-descriptor representation.
If the per-finger descriptors prove sufficient, similar compact representations might simplify transfer for non-grasping contact tasks such as in-hand rotation.

Load-bearing premise

System identification can produce effort-to-torque mappings that yield comparable fingertip forces and per-finger load descriptors across heterogeneous hands.

What would settle it

A direct comparison on two different hands where the same policy succeeds with calibration but fails to maintain stable grasps when the calibration step is removed and raw effort signals are used instead.

Figures

Figures reproduced from arXiv: 2606.15516 by Michael Yip, Soofiyan Atar, Yao-Ting Huang.

**Figure 1.** Figure 1: Overview. We propose a unified force–position encoding for dexterous hands that transfers across embodiments. A VLM grounds language instructions into task keypoints, and a state machine dispatches each phase to an optimization-based primitive, enabling stable long-horizon grasping without an end-to-end policy. The bottom row shows primitive actions across embodiments on compliant and rigid objects. Abstra… view at source ↗

**Figure 2.** Figure 2: Calibration setup. Fingertip tethered through a string and spring to a six-axis load cell for per-hand torque calibration. The contact channel calibrates the per-hand torque predictor Gh against a physical reference ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Model pipeline. MARC is a DiT flow-matching policy [9] conditioned on a unified force–position state. The environment states (unified tuple st) are as described in Sec. 3. Structured masking is applied to visual patch tokens or entire camera streams during training, forcing the policy to act on force and proprioception under occlusion. MANO latent decoders are frozen, the DiT is trained from scratch, and t… view at source ↗

**Figure 4.** Figure 4: Handover. Wrist forces experienced during object handover [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Steady grasping. Wrist force, fingertip force, and fd during steady grasping for a compliant/rigid object. We compose long-horizon manipulation from reusable contact-aware primitives rather than a single end-to-end policy. A vision-language model proposes task keypoints from the scene image and provides only high-level spatial grounding instructions; it does not condition πθ. A deterministic state machine… view at source ↗

**Figure 6.** Figure 6: Success rates for seen and unseen robotic hands across objects. Bars show per-object success over 10 trials on MARC; legend values indicate mean success. Unseen-hand generalization. To test transfer beyond the training set ( [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Dexterous grasping depends on contact regulation, not motion alone. Stable manipulation requires fingers to maintain appropriate object loading as contacts slip, deform, or become visually occluded. Existing cross-embodiment dexterous policies unify motion through retargeted hand poses or latent actions, but force feedback remains tied to each hand's sensing and actuation, limiting transfer. This work introduces a cross-embodiment force-position interface for contact-aware manipulation across heterogeneous dexterous hands. Motion intent is represented in a shared hand-pose latent, while each hand's effort signal is calibrated through system identification into physical joint torque in N.m. These torques are mapped to fingertip forces and compact per-finger load descriptors, giving the policy comparable observations of where the hand should move and how the object is loaded. Using this interface, a flow-matching visuomotor policy is trained on vision, proprioception, and calibrated contact, with structured visual masking that encourages reliance on force under grasp-relevant occlusion. The same calibrated signal drives a hybrid force-position controller for demonstration collection and execution, keeping force targets consistent across training and deployment. Experiments across structurally different hands show that calibrated contact feedback enables transferable compliant grasping, with learned primitives reusable in long-horizon manipulation pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is a calibrated force-position interface that standardizes contact signals across different dexterous hands via system ID to physical torques and load descriptors.

read the letter

The key point is that this work moves beyond motion retargeting by adding a shared way to observe and control contact forces. They calibrate each hand's effort to joint torques in N·m, map those to fingertip forces and per-finger load descriptors, pair it with a shared pose latent, and use structured visual masking so the policy leans on force when vision is blocked. The same signals feed a hybrid force-position controller for both data collection and execution.

This setup is a reasonable response to the real problem that force feedback stays hand-specific while poses can be retargeted. The logic holds together: calibration makes the observations comparable, the masking encourages proper reliance on contact, and reusing the controller keeps training and deployment consistent.

The soft spot is the lack of visible numbers on calibration accuracy or transfer performance. The abstract states that experiments across structurally different hands show transferable compliant grasping and reusable primitives, but without error metrics on the mappings, ablation results, or direct comparisons to motion-only baselines, it is hard to judge how well the assumption of interchangeable load descriptors actually holds under kinematic and sensing differences.

This is for researchers focused on dexterous manipulation and cross-embodiment transfer. A reader looking for concrete ways to include force in transferable policies would find the interface design useful. It deserves a serious referee because the approach targets a known limitation with a method that could be tested and extended if the results support the claims.

I would send it to review.

Referee Report

0 major / 1 minor

Summary. The paper introduces a cross-embodiment force-position interface for compliant grasping across heterogeneous dexterous hands. Motion intent is encoded in a shared hand-pose latent while each hand's effort signal is calibrated via system identification to physical joint torques (N·m), then mapped to fingertip forces and compact per-finger load descriptors. A flow-matching visuomotor policy is trained on vision, proprioception, and these calibrated contact signals with structured visual masking; the same signals drive a hybrid force-position controller for both data collection and execution. Experiments on structurally different hands are claimed to show that calibrated contact enables transferable compliant grasping with primitives reusable in long-horizon pipelines.

Significance. If the system-identification calibration produces interchangeable fingertip-force and per-finger load observations across hardware, the work would meaningfully advance cross-embodiment transfer in contact-rich manipulation by decoupling force feedback from hand-specific sensing and actuation, going beyond motion-only retargeting. The hybrid controller and occlusion-aware masking are concrete design choices that could improve robustness.

minor comments (1)

[Abstract] The abstract states that experiments demonstrate transferable grasping but supplies no quantitative metrics, error bars, dataset sizes, ablation results, or baseline comparisons, preventing assessment of whether the calibration actually supports the transfer claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and for recognizing the potential of the cross-embodiment force-position interface to advance contact-rich transfer beyond motion-only retargeting. The recommendation of 'uncertain' is noted; we address the overall report below. No specific major comments were enumerated in the provided report, so we have no point-by-point rebuttals to offer at this stage.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's pipeline calibrates raw effort signals via system identification to physical joint torques (N·m), then maps those to fingertip forces and per-finger load descriptors. This step references external physical units rather than any fitted parameter or self-referential definition, allowing the shared pose latent and hybrid controller to treat signals as interchangeable. No equations or claims reduce a prediction to its own inputs by construction, no self-citation chains are load-bearing, and no ansatz is smuggled in. The derivation remains self-contained against external physical benchmarks and falsifiable measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are detailed. The central claim rests on the unverified effectiveness of the system-identification calibration step.

pith-pipeline@v0.9.1-grok · 5756 in / 1018 out tokens · 29716 ms · 2026-06-27T04:41:15.335405+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages

[1]

H. Yuan, B. Zhou, Y . Fu, and Z. Lu. Cross-embodiment dexterous grasping with reinforcement learning, 2024. URLhttps://arxiv.org/abs/2410.02479

arXiv 2024
[2]

Jiang, Y

G. Jiang, Y . Liang, J. Ye, J.-Y . Huang, C. Jing, R. Duan, P. Abbeel, X. Wang, and X. Zou. Cross-hand latent representation for vision-language-action models, 2026. URL https:// arxiv.org/abs/2603.10158

arXiv 2026
[3]

N. Hogan. Impedance control: An approach to manipulation: Part II—implementation.Journal of Dynamic Systems, Measurement, and Control, 107(1):8, 1985. doi:10.1115/1.3140713. URL https://doi.org/10.1115%2F1.3140713

work page doi:10.1115/1.3140713 1985
[4]

Z. Wu, R. A. Potamias, X. Zhang, Z. Zhang, J. Deng, and S. Luo. Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations, 2025. URL https://arxiv.org/abs/2509.24661

arXiv 2025
[5]

Bauer, E

E. Bauer, E. Nava, and R. K. Katzschmann. Latent action diffusion for cross-embodiment manipulation, 2026. URLhttps://arxiv.org/abs/2506.14608

arXiv 2026
[6]

Bhirangi, V

R. Bhirangi, V . Pattabiraman, E. Erciyes, Y . Cao, T. Hellebrekers, and L. Pinto. Anyskin: Plug- and-play skin sensing for robotic touch, 2024. URL https://arxiv.org/abs/2409.08276

arXiv 2024
[7]

Zhang, C

D. Zhang, C. Yuan, C. Wen, H. Zhang, J. Zhao, and Y . Gao. Kinedex: Learning tactile- informed visuomotor policies via kinesthetic teaching for dexterous manipulation, 2025. URL https://arxiv.org/abs/2505.01974

arXiv 2025
[8]

H. Seraji. Adaptive hybrid control of manipulators. InProceedings of the Workshop on Space Telerobotics, V olume 3, 1987

1987
[9]

McAllister, S

D. McAllister, S. Ge, B. Yi, C. M. Kim, E. Weber, H. Choi, H. Feng, and A. Kanazawa. Flow matching policy gradients, 2025. URLhttps://arxiv.org/abs/2507.21053

arXiv 2025
[10]

X. Fei, Z. Xu, H. Fang, T. Zhang, and L. Shao. T(r,o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping, 2025. URL https://arxiv.org/abs/2510.12724

arXiv 2025
[11]

Zhang, K

H. Zhang, K. Y . Ma, M. Z. Shou, W. Lin, and Y . Wu. Machagrasp: Morphology-aware cross-embodiment dexterous hand articulation generation for grasping, 2026. URL https: //arxiv.org/abs/2510.06068. 9

arXiv 2026
[12]

ACM Trans

J. Romero, D. Tzionas, and M. J. Black. Embodied hands: modeling and capturing hands and bodies together.ACM Transactions on Graphics, 36(6):1–17, Nov. 2017. ISSN 1557-7368. doi:10.1145/3130800.3130883. URLhttp://dx.doi.org/10.1145/3130800.3130883

work page doi:10.1145/3130800.3130883 2017
[13]

High-precision hand tracking & mocap gloves — manus

MANUS Technology Group. High-precision hand tracking & mocap gloves — manus. https: //www.manus-meta.com/, 2025. Accessed: 2025-09-13

2025
[14]

Jiang, Y

Z. Jiang, Y . Xie, K. Lin, Z. Xu, W. Wan, A. Mandlekar, L. Fan, and Y . Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning, 2025. URLhttps://arxiv.org/abs/2410.24185

arXiv 2025
[15]

T. Tao, M. K. Srirama, J. J. Liu, K. Shaw, and D. Pathak. Dexwild: Dexterous human interactions for in-the-wild robot policies, 2026. URLhttps://arxiv.org/abs/2505.07813

Pith/arXiv arXiv 2026
[16]

M. Xu, H. Zhang, Y . Hou, Z. Xu, L. Fan, M. Veloso, and S. Song. Dexumi: Using human hand as the universal manipulation interface for dexterous manipulation, 2025. URL https: //arxiv.org/abs/2505.21864

arXiv 2025
[17]

H.-S. Fang, B. Romero, Y . Xie, A. Hu, B.-R. Huang, J. Alvarez, M. Kim, G. Margolis, K. An- barasu, M. Tomizuka, E. Adelson, and P. Agrawal. Dexop: A device for robotic transfer of dexterous human manipulation, 2025. URLhttps://arxiv.org/abs/2509.04441

arXiv 2025
[18]

K. Zhu, F. Bai, Y . Xiang, Y . Cai, X. Chen, R. Li, X. Wang, H. Dong, Y . Yang, X. Fan, and Y . Chen. Dexflywheel: A scalable and self-improving data generation framework for dexterous manipulation, 2025. URLhttps://arxiv.org/abs/2509.23829

arXiv 2025
[19]

S. Atar, D. Huang, F. Richter, and M. Yip. In-hand manipulation of articulated tools with dexterous robot hands with sim-to-real transfer, 2026. URL https://arxiv.org/abs/2509. 23075

2026
[20]

H. Shi, S. Hu, Y . Hou, W. Wang, K. Liu, and S. Song. Minimalist compliance control, 2026. URLhttps://arxiv.org/abs/2603.00913

arXiv 2026
[21]

S. Chen, J. Bohg, and C. K. Liu. Springgrasp: An optimization pipeline for robust and compliant dexterous pre-grasp synthesis, 2024

2024
[22]

R. Chen, M. Mukadam, M. Kaess, T. Wu, F. R. Hogan, J. Malik, and A. Sharma. Ptld: Sim- to-real privileged tactile latent distillation for dexterous manipulation, 2026. URL https: //arxiv.org/abs/2603.04531

Pith/arXiv arXiv 2026
[23]

Higuera, A

C. Higuera, A. Sharma, C. K. Bodduluri, T. Fan, P. Lancaster, M. Kalakrishnan, M. Kaess, B. Boots, M. Lambeta, T. Wu, and M. Mukadam. Sparsh: Self-supervised touch representations for vision-based tactile sensing, 2024. URLhttps://arxiv.org/abs/2410.24090

arXiv 2024
[24]

X. Chen, Y . Pan, M. Li, and X. Ding. Dexvitac: Collecting human visuo-tactile-kinematic demonstrations for contact-rich dexterous manipulation, 2026. URL https://arxiv.org/ abs/2603.17851

arXiv 2026
[25]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers, 2023. URL https: //arxiv.org/abs/2212.09748

Pith/arXiv arXiv 2023
[26]

Collaboration, A

E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, et al. Open x-embodiment: Robotic learning datasets and rt-x models, 2025. URL https://arxiv.org/ abs/2310.08864

Pith/arXiv arXiv 2025
[27]

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy, 2024. URL https://arxiv.org/ abs/2405.12213. 10

Pith/arXiv arXiv 2024
[28]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model, 2024. URL https://arxiv.org/abs/2406.09246

Pith/arXiv arXiv 2024
[29]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, et al. π0: A vision-language-action flow model for general robot control, 2026. URLhttps://arxiv.org/abs/2410.24164

Pith/arXiv arXiv 2026
[30]

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation, 2025. URL https://arxiv.org/abs/2410. 07864

2025
[31]

J. Wen, Y . Zhu, J. Li, Z. Tang, C. Shen, and F. Feng. Dexvla: Vision-language model with plug-in diffusion expert for general robot control, 2025. URL https://arxiv.org/abs/ 2502.05855

Pith/arXiv arXiv 2025
[32]

H. Liu, S. Guo, P. Mai, J. Cao, H. Li, and J. Ma. Robodexvlm: Visual language model- enabled task planning and motion control for dexterous robot manipulation, 2025. URL https://arxiv.org/abs/2503.01616

arXiv 2025
[33]

Zhong, X

Y . Zhong, X. Huang, R. Li, C. Zhang, Z. Chen, T. Guan, F. Zeng, K. N. Lui, Y . Ye, Y . Liang, Y . Yang, and Y . Chen. Dexgraspvla: A vision-language-action framework towards general dexterous grasping, 2025. URLhttps://arxiv.org/abs/2502.20900

arXiv 2025
[34]

de Bakker, J

V . de Bakker, J. Hejna, T. G. W. Lum, O. Celik, A. Taranovic, D. Blessing, G. Neumann, J. Bohg, and D. Sadigh. Scaffolding dexterous manipulation with vision-language models, 2026. URLhttps://arxiv.org/abs/2506.19212

arXiv 2026
[35]

Romero, D

J. Romero, D. Tzionas, and M. J. Black. Embodied hands: Modeling and capturing hands and bodies together.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), Nov. 2017

2017
[36]

Handa, K

A. Handa, K. V . Wyk, W. Yang, J. Liang, Y .-W. Chao, Q. Wan, S. Birchfield, N. Ratliff, and D. Fox. Dexpilot: Vision based teleoperation of dexterous robotic hand-arm system, 2019. URL https://arxiv.org/abs/1910.03135

arXiv 2019
[37]

He and W

G. He and W. Zhang. Wujihand retargeting, 2026. URL https://github.com/ wuji-technology/wuji-retargeting. * Equal contribution

2026
[38]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385

Pith/arXiv arXiv 2015
[39]

C. Ott, R. Mukherjee, and Y . Nakamura. Unified impedance and admittance control. In2010 IEEE international conference on robotics and automation, pages 554–561. IEEE, 2010

2010
[40]

G. Yan, J. Zhu, Y . Deng, S. Yang, R.-Z. Qiu, X. Cheng, M. Memmel, R. Krishna, A. Goyal, X. Wang, and D. Fox. Maniflow: A dexterous manipulation policy via flow matching.arXiv preprint arXiv:, 2025

2025
[41]

point": [u_min, v_min, u_max, v_max],

T. L. Team, B. Burchfiel, H. Kress-Gazit, S. Feng, S. Ford, R. Tedrake, et al. A careful examination of large behavior models for multitask dexterous manipulation, 2025. URL https://arxiv.org/abs/2507.05331. 11 Figure S1:Spatial torque descriptor Appendix S1 System Identification Details Mechanical setup.A Franka Emika Panda arm holds the dexterous hand a...

Pith/arXiv arXiv 2025

[1] [1]

H. Yuan, B. Zhou, Y . Fu, and Z. Lu. Cross-embodiment dexterous grasping with reinforcement learning, 2024. URLhttps://arxiv.org/abs/2410.02479

arXiv 2024

[2] [2]

Jiang, Y

G. Jiang, Y . Liang, J. Ye, J.-Y . Huang, C. Jing, R. Duan, P. Abbeel, X. Wang, and X. Zou. Cross-hand latent representation for vision-language-action models, 2026. URL https:// arxiv.org/abs/2603.10158

arXiv 2026

[3] [3]

N. Hogan. Impedance control: An approach to manipulation: Part II—implementation.Journal of Dynamic Systems, Measurement, and Control, 107(1):8, 1985. doi:10.1115/1.3140713. URL https://doi.org/10.1115%2F1.3140713

work page doi:10.1115/1.3140713 1985

[4] [4]

Z. Wu, R. A. Potamias, X. Zhang, Z. Zhang, J. Deng, and S. Luo. Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations, 2025. URL https://arxiv.org/abs/2509.24661

arXiv 2025

[5] [5]

Bauer, E

E. Bauer, E. Nava, and R. K. Katzschmann. Latent action diffusion for cross-embodiment manipulation, 2026. URLhttps://arxiv.org/abs/2506.14608

arXiv 2026

[6] [6]

Bhirangi, V

R. Bhirangi, V . Pattabiraman, E. Erciyes, Y . Cao, T. Hellebrekers, and L. Pinto. Anyskin: Plug- and-play skin sensing for robotic touch, 2024. URL https://arxiv.org/abs/2409.08276

arXiv 2024

[7] [7]

Zhang, C

D. Zhang, C. Yuan, C. Wen, H. Zhang, J. Zhao, and Y . Gao. Kinedex: Learning tactile- informed visuomotor policies via kinesthetic teaching for dexterous manipulation, 2025. URL https://arxiv.org/abs/2505.01974

arXiv 2025

[8] [8]

H. Seraji. Adaptive hybrid control of manipulators. InProceedings of the Workshop on Space Telerobotics, V olume 3, 1987

1987

[9] [9]

McAllister, S

D. McAllister, S. Ge, B. Yi, C. M. Kim, E. Weber, H. Choi, H. Feng, and A. Kanazawa. Flow matching policy gradients, 2025. URLhttps://arxiv.org/abs/2507.21053

arXiv 2025

[10] [10]

X. Fei, Z. Xu, H. Fang, T. Zhang, and L. Shao. T(r,o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping, 2025. URL https://arxiv.org/abs/2510.12724

arXiv 2025

[11] [11]

Zhang, K

H. Zhang, K. Y . Ma, M. Z. Shou, W. Lin, and Y . Wu. Machagrasp: Morphology-aware cross-embodiment dexterous hand articulation generation for grasping, 2026. URL https: //arxiv.org/abs/2510.06068. 9

arXiv 2026

[12] [12]

ACM Trans

J. Romero, D. Tzionas, and M. J. Black. Embodied hands: modeling and capturing hands and bodies together.ACM Transactions on Graphics, 36(6):1–17, Nov. 2017. ISSN 1557-7368. doi:10.1145/3130800.3130883. URLhttp://dx.doi.org/10.1145/3130800.3130883

work page doi:10.1145/3130800.3130883 2017

[13] [13]

High-precision hand tracking & mocap gloves — manus

MANUS Technology Group. High-precision hand tracking & mocap gloves — manus. https: //www.manus-meta.com/, 2025. Accessed: 2025-09-13

2025

[14] [14]

Jiang, Y

Z. Jiang, Y . Xie, K. Lin, Z. Xu, W. Wan, A. Mandlekar, L. Fan, and Y . Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning, 2025. URLhttps://arxiv.org/abs/2410.24185

arXiv 2025

[15] [15]

T. Tao, M. K. Srirama, J. J. Liu, K. Shaw, and D. Pathak. Dexwild: Dexterous human interactions for in-the-wild robot policies, 2026. URLhttps://arxiv.org/abs/2505.07813

Pith/arXiv arXiv 2026

[16] [16]

M. Xu, H. Zhang, Y . Hou, Z. Xu, L. Fan, M. Veloso, and S. Song. Dexumi: Using human hand as the universal manipulation interface for dexterous manipulation, 2025. URL https: //arxiv.org/abs/2505.21864

arXiv 2025

[17] [17]

H.-S. Fang, B. Romero, Y . Xie, A. Hu, B.-R. Huang, J. Alvarez, M. Kim, G. Margolis, K. An- barasu, M. Tomizuka, E. Adelson, and P. Agrawal. Dexop: A device for robotic transfer of dexterous human manipulation, 2025. URLhttps://arxiv.org/abs/2509.04441

arXiv 2025

[18] [18]

K. Zhu, F. Bai, Y . Xiang, Y . Cai, X. Chen, R. Li, X. Wang, H. Dong, Y . Yang, X. Fan, and Y . Chen. Dexflywheel: A scalable and self-improving data generation framework for dexterous manipulation, 2025. URLhttps://arxiv.org/abs/2509.23829

arXiv 2025

[19] [19]

S. Atar, D. Huang, F. Richter, and M. Yip. In-hand manipulation of articulated tools with dexterous robot hands with sim-to-real transfer, 2026. URL https://arxiv.org/abs/2509. 23075

2026

[20] [20]

H. Shi, S. Hu, Y . Hou, W. Wang, K. Liu, and S. Song. Minimalist compliance control, 2026. URLhttps://arxiv.org/abs/2603.00913

arXiv 2026

[21] [21]

S. Chen, J. Bohg, and C. K. Liu. Springgrasp: An optimization pipeline for robust and compliant dexterous pre-grasp synthesis, 2024

2024

[22] [22]

R. Chen, M. Mukadam, M. Kaess, T. Wu, F. R. Hogan, J. Malik, and A. Sharma. Ptld: Sim- to-real privileged tactile latent distillation for dexterous manipulation, 2026. URL https: //arxiv.org/abs/2603.04531

Pith/arXiv arXiv 2026

[23] [23]

Higuera, A

C. Higuera, A. Sharma, C. K. Bodduluri, T. Fan, P. Lancaster, M. Kalakrishnan, M. Kaess, B. Boots, M. Lambeta, T. Wu, and M. Mukadam. Sparsh: Self-supervised touch representations for vision-based tactile sensing, 2024. URLhttps://arxiv.org/abs/2410.24090

arXiv 2024

[24] [24]

X. Chen, Y . Pan, M. Li, and X. Ding. Dexvitac: Collecting human visuo-tactile-kinematic demonstrations for contact-rich dexterous manipulation, 2026. URL https://arxiv.org/ abs/2603.17851

arXiv 2026

[25] [25]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers, 2023. URL https: //arxiv.org/abs/2212.09748

Pith/arXiv arXiv 2023

[26] [26]

Collaboration, A

E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, et al. Open x-embodiment: Robotic learning datasets and rt-x models, 2025. URL https://arxiv.org/ abs/2310.08864

Pith/arXiv arXiv 2025

[27] [27]

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy, 2024. URL https://arxiv.org/ abs/2405.12213. 10

Pith/arXiv arXiv 2024

[28] [28]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn. Openvla: An open-source vision-language-action model, 2024. URL https://arxiv.org/abs/2406.09246

Pith/arXiv arXiv 2024

[29] [29]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, et al. π0: A vision-language-action flow model for general robot control, 2026. URLhttps://arxiv.org/abs/2410.24164

Pith/arXiv arXiv 2026

[30] [30]

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation, 2025. URL https://arxiv.org/abs/2410. 07864

2025

[31] [31]

J. Wen, Y . Zhu, J. Li, Z. Tang, C. Shen, and F. Feng. Dexvla: Vision-language model with plug-in diffusion expert for general robot control, 2025. URL https://arxiv.org/abs/ 2502.05855

Pith/arXiv arXiv 2025

[32] [32]

H. Liu, S. Guo, P. Mai, J. Cao, H. Li, and J. Ma. Robodexvlm: Visual language model- enabled task planning and motion control for dexterous robot manipulation, 2025. URL https://arxiv.org/abs/2503.01616

arXiv 2025

[33] [33]

Zhong, X

Y . Zhong, X. Huang, R. Li, C. Zhang, Z. Chen, T. Guan, F. Zeng, K. N. Lui, Y . Ye, Y . Liang, Y . Yang, and Y . Chen. Dexgraspvla: A vision-language-action framework towards general dexterous grasping, 2025. URLhttps://arxiv.org/abs/2502.20900

arXiv 2025

[34] [34]

de Bakker, J

V . de Bakker, J. Hejna, T. G. W. Lum, O. Celik, A. Taranovic, D. Blessing, G. Neumann, J. Bohg, and D. Sadigh. Scaffolding dexterous manipulation with vision-language models, 2026. URLhttps://arxiv.org/abs/2506.19212

arXiv 2026

[35] [35]

Romero, D

J. Romero, D. Tzionas, and M. J. Black. Embodied hands: Modeling and capturing hands and bodies together.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), Nov. 2017

2017

[36] [36]

Handa, K

A. Handa, K. V . Wyk, W. Yang, J. Liang, Y .-W. Chao, Q. Wan, S. Birchfield, N. Ratliff, and D. Fox. Dexpilot: Vision based teleoperation of dexterous robotic hand-arm system, 2019. URL https://arxiv.org/abs/1910.03135

arXiv 2019

[37] [37]

He and W

G. He and W. Zhang. Wujihand retargeting, 2026. URL https://github.com/ wuji-technology/wuji-retargeting. * Equal contribution

2026

[38] [38]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385

Pith/arXiv arXiv 2015

[39] [39]

C. Ott, R. Mukherjee, and Y . Nakamura. Unified impedance and admittance control. In2010 IEEE international conference on robotics and automation, pages 554–561. IEEE, 2010

2010

[40] [40]

G. Yan, J. Zhu, Y . Deng, S. Yang, R.-Z. Qiu, X. Cheng, M. Memmel, R. Krishna, A. Goyal, X. Wang, and D. Fox. Maniflow: A dexterous manipulation policy via flow matching.arXiv preprint arXiv:, 2025

2025

[41] [41]

point": [u_min, v_min, u_max, v_max],

T. L. Team, B. Burchfiel, H. Kress-Gazit, S. Feng, S. Ford, R. Tedrake, et al. A careful examination of large behavior models for multitask dexterous manipulation, 2025. URL https://arxiv.org/abs/2507.05331. 11 Figure S1:Spatial torque descriptor Appendix S1 System Identification Details Mechanical setup.A Franka Emika Panda arm holds the dexterous hand a...

Pith/arXiv arXiv 2025