pith. sign in

arxiv: 2602.16712 · v2 · pith:EPO26OMQnew · submitted 2026-02-18 · 💻 cs.RO

One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

Pith reviewed 2026-05-21 12:25 UTC · model grok-4.3

classification 💻 cs.RO
keywords dexterous manipulationcanonical representationcross-embodiment transferrobotic handsunified action spacezero-shot generalizationgrasping policymorphology parameterization
0
0 comments X

The pith

A parameterized canonical representation unifies structurally diverse dexterous hands for cross-embodiment policy learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a unified parameter space and canonical URDF format that together standardize both the description of hand morphology and the space of possible actions. This unification lets learning algorithms condition policies on a shared representation rather than retraining for each new hand design. A VAE trained over the space produces a latent manifold in which interpolations between different hands produce smooth, physically plausible changes in structure. Experiments show that a single grasping policy, conditioned on this representation, transfers zero-shot to unseen hand morphologies including a three-finger LEAP Hand in both simulation and real-world trials.

Core claim

The authors establish that a single parameterized canonical representation, consisting of a unified parameter space and a canonical URDF format, captures essential morphological and kinematic variations while standardizing the action space and preserving dynamic properties of original URDFs. Conditioning policies on this representation enables effective cross-embodiment learning, as demonstrated by a grasping policy that achieves 81.9 percent zero-shot success on an unseen three-finger LEAP Hand in real-world tasks.

What carries the argument

Parameterized canonical representation comprising a unified parameter space for morphology and kinematics plus a canonical URDF that standardizes actions while preserving original dynamics.

If this is right

  • A single policy can be trained once and deployed on multiple hand designs by feeding the appropriate canonical parameters at inference time.
  • Interpolation in the learned latent manifold produces intermediate hand morphologies that remain kinematically valid and dynamically consistent.
  • Action spaces become directly comparable across embodiments, simplifying reward design and data sharing for cross-hand learning.
  • Zero-shot transfer becomes feasible for any hand whose parameters lie within the spanned space without additional fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future work could extend the same canonical form to full arm-plus-hand systems or to non-anthropomorphic grippers by adding a small number of additional parameters.
  • The latent manifold might support morphology optimization: searching for an ideal hand shape for a given task by moving within the learned space rather than enumerating discrete designs.
  • If the representation proves sufficient, simulation datasets collected on one canonical hand could be reused to train policies for many physical hands with minimal domain adaptation.

Load-bearing premise

The chosen parameters fully describe the morphological and kinematic features that matter for successful policy transfer across hand designs.

What would settle it

Train the same conditioned policy on a new hand whose kinematic structure falls outside the defined parameter space and measure whether zero-shot success drops sharply below the reported rates on covered morphologies.

Figures

Figures reproduced from arXiv: 2602.16712 by Mingyu Ding, Yunchao Yao, Zhenyu Wei.

Figure 1
Figure 1. Figure 1: We introduce a canonical hand representation that unifies diverse dexterous hands into a shared parameter space and [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of canonical and original URDFs across five dexterous hands with different finger numbers and handedness. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Coordinate frame inconsistencies in URDFs. (a) Global [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of latent-space interpolation between two dexterous hands. Canonical URDFs are shown at the ends, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Two-stage cross-embodiment grasp generation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Smoothed log of training reward over simulation steps [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Real-world grasping objects and results. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Gradient magnitude visualization. We evaluate both models trained on the canonical dataset and zero-shot models that have never seen the target hand vari￾ants. As summarized in Table VI, the trained models achieve high grasp success rates, demonstrating that the canonical hand representation preserves the essential dynamics and physical fidelity of the original hands, and that sim-to-real transfer is relia… view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of in-hand reorientation under the original and canonical URDFs. Top: LEAP Hand; bottom: Shadow Hand. TABLE XIII: Observation states used for policy training. Observation Description q˜ ∈ Rndof Normalized hand joint positions q tar ∈ Rndof Normalized target joint (action) pcube ∈ R3 Cube position rcube ∈ R3 Cube orientation Euler angles. q˙ ∈ Rndof Hand joint velocities. vcube ∈ R3 Cube line… view at source ↗
Figure 11
Figure 11. Figure 11: Grasp visualizations of the canonical URDF for the Allegro, [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visualization of canonical LEAP Hand variants. [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visualization of real-world experiment (I). [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization of real-world experiment (II). [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
read the original abstract

Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation. Project Page: https://zhenyuwei2003.github.io/OHRA/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a parameterized canonical representation for unifying diverse dexterous hand architectures via a unified parameter space and canonical URDF format. It learns a VAE over this space to produce a structured latent manifold and conditions a grasping policy on the canonical representation, reporting zero-shot transfer to unseen morphologies including an 81.9% success rate on the 3-finger LEAP Hand in simulation and real-world tasks.

Significance. If the unification is shown to preserve essential dynamics, the framework could provide a practical foundation for cross-embodiment policy learning in dexterous manipulation, reducing the need for hand-specific retraining. The combination of morphological parameterization, latent interpolation, and empirical zero-shot results represents a concrete step toward scalable universal manipulation policies.

major comments (1)
  1. [Abstract] Abstract: The central claim that the canonical URDF 'standardizes the action space while preserving dynamic and functional properties of the original URDFs' is load-bearing for the reported zero-shot transfer (e.g., 81.9% on LEAP Hand) but receives no quantitative validation such as forward-dynamics error, mass/inertia mismatch, joint-limit fidelity, or grasp-quality metrics between original and canonical URDFs. Without such checks, distortions in contact dynamics or actuator behavior could undermine the transfer assumption.
minor comments (2)
  1. [Abstract] The abstract states results from VAE encoding and policy conditioning but omits architecture details, training hyperparameters, baseline comparisons, and error bars or trial counts for the 81.9% figure.
  2. Notation for the unified parameter space should be introduced with an explicit table or equation listing all morphological and kinematic parameters and their ranges.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the canonical URDF 'standardizes the action space while preserving dynamic and functional properties of the original URDFs' is load-bearing for the reported zero-shot transfer (e.g., 81.9% on LEAP Hand) but receives no quantitative validation such as forward-dynamics error, mass/inertia mismatch, joint-limit fidelity, or grasp-quality metrics between original and canonical URDFs. Without such checks, distortions in contact dynamics or actuator behavior could undermine the transfer assumption.

    Authors: We agree that the manuscript would be strengthened by explicit quantitative validation of the dynamic and functional preservation between original and canonical URDFs. The canonical URDF is constructed by retargeting joint axes, link geometries, and actuator parameters from each source URDF into a standardized kinematic template while retaining the original numerical values for masses, inertias, joint limits, and friction coefficients; this design choice is intended to minimize distortion. However, we did not report direct error metrics in the initial submission. In the revised version we will add a dedicated analysis (new subsection in Section 4 or Appendix) that computes: (i) forward-dynamics rollout error (position/velocity RMSE over 100 random torque sequences), (ii) mass/inertia mismatch (relative L2 error per link), (iii) joint-limit fidelity (percentage of limits preserved exactly), and (iv) grasp-quality metrics (epsilon and volume of the grasp wrench space) evaluated on a common set of 50 grasps for each hand. These results will be presented alongside the existing zero-shot transfer numbers to directly address the concern. The 81.9 % success on the unseen LEAP Hand remains an empirical demonstration of practical transfer, but we concur that the additional metrics will make the preservation claim more rigorous. revision: yes

Circularity Check

0 steps flagged

No circularity: new canonical representation and empirical cross-embodiment results are independent of fitted inputs or self-referential definitions.

full rationale

The paper introduces a parameterized canonical representation and canonical URDF as design choices, then reports empirical results from VAE latent encoding and conditioned grasping policies that achieve zero-shot transfer (e.g., 81.9% on LEAP Hand). No derivation step reduces a claimed prediction or preservation property to a quantity defined by the same fitted parameters or by construction within the paper's equations. The unification and preservation claims are supported by separate validation experiments rather than tautological equivalence, making the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence and utility of a unified parameter space and canonical URDF that preserve key properties; these are introduced by the paper rather than derived from external benchmarks.

free parameters (1)
  • Morphological and kinematic parameters
    Define the unified space for capturing variations across hand architectures.
axioms (1)
  • domain assumption Interpolations in the latent manifold yield smooth and physically meaningful morphology transitions
    Invoked to support learning a structured latent space over the parameter space.
invented entities (1)
  • Canonical URDF format no independent evidence
    purpose: Standardizes action space while preserving dynamic and functional properties
    New standardized format introduced to unify structurally diverse hands.

pith-pipeline@v0.9.0 · 5809 in / 1207 out tokens · 60166 ms · 2026-05-21T12:25:08.249508+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices

    cs.CV 2026-05 unverdicted novelty 6.0

    EgoKit is a new toolkit and accessory set that unifies egocentric video collection with wrist views across heterogeneous consumer devices using a consistent interface and log format.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Dexterous functional grasping.arXiv preprint arXiv:2312.02975, 2023

    Ananye Agarwal, Shagun Uppal, Kenneth Shaw, and Deepak Pathak. Dexterous functional grasping.arXiv preprint arXiv:2312.02975, 2023

  2. [2]

    Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

    OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pa- chocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

  3. [3]

    La- tent action diffusion for cross-embodiment manipulation

    Erik Bauer, Elvis Nava, and Robert K Katzschmann. La- tent action diffusion for cross-embodiment manipulation. arXiv preprint arXiv:2506.14608, 2025

  4. [4]

    A. Bicchi. Hands for dexterous manipulation and ro- bust grasping: a difficult road toward simplicity.IEEE Transactions on Robotics and Automation, 16(6):652– 662, 2000. doi: 10.1109/70.897777

  5. [5]

    A system for general in-hand object re-orientation

    Tao Chen, Jie Xu, and Pulkit Agrawal. A system for general in-hand object re-orientation. InConference on Robot Learning, pages 297–307. PMLR, 2022

  6. [6]

    Visual dexter- ity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023

    Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, and Pulkit Agrawal. Visual dexter- ity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023. doi: 10.1126/scirobotics.adc9244. URL https://www.science. org/doi/abs/10.1126/scirobotics.adc9244

  7. [7]

    Cutkosky

    M.R. Cutkosky. On grasp choice, grasp models, and the design of hands for manufacturing tasks.IEEE Transactions on Robotics and Automation, 5(3):269–279,

  8. [8]

    doi: 10.1109/70.34763

  9. [9]

    T(r, o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping.arXiv preprint arXiv:2510.12724, 2025

    Xin Fei, Zhixuan Xu, Huaicong Fang, Tianrui Zhang, and Lin Shao. T(r, o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping.arXiv preprint arXiv:2510.12724, 2025

  10. [10]

    Telepreview: A user-friendly teleoperation system with virtual arm assistance for enhanced effectiveness.arXiv preprint arXiv:2412.13548, 2024

    Jingxiang Guo, Jiayu Luo, Zhenyu Wei, Yiwen Hou, Zhixuan Xu, Xiaoyi Lin, Chongkai Gao, and Lin Shao. Telepreview: A user-friendly teleoperation system with virtual arm assistance for enhanced effectiveness.arXiv preprint arXiv:2412.13548, 2024

  11. [11]

    Scaling cross-embodiment world models for dex- terous manipulation.arXiv preprint arXiv:2511.01177, 2025

    Zihao He, Bo Ai, Tongzhou Mu, Yulin Liu, Weikang Wan, Jiawei Fu, Yilun Du, Henrik I Christensen, and Hao Su. Scaling cross-embodiment world models for dex- terous manipulation.arXiv preprint arXiv:2511.01177, 2025

  12. [12]

    Dynamic handover: Throw and catch with bi- manual hands.arXiv preprint arXiv:2309.05655, 2023

    Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, and Xiaolong Wang. Dynamic handover: Throw and catch with bi- manual hands.arXiv preprint arXiv:2309.05655, 2023

  13. [13]

    Rl-100: Performant robotic manipulation with real-world reinforcement learning, 2025

    Kun Lei, Huanyu Li, Dongjie Yu, Zhenyu Wei, Lingxiao Guo, Zhennan Jiang, Ziyu Wang, Shiyu Liang, and Huazhe Xu. Rl-100: Performant robotic manipulation with real-world reinforcement learning.arXiv preprint arXiv:2510.14830, 2025

  14. [14]

    Gendexgrasp: Generalizable dexterous grasping.arXiv preprint arXiv:2210.00722, 2022

    Puhao Li, Tengyu Liu, Yuyang Li, Yixin Zhu, Yaodong Yang, and Siyuan Huang. Gendexgrasp: Generalizable dexterous grasping.arXiv preprint arXiv:2210.00722, 2022

  15. [15]

    Dexhanddiff: Interaction-aware diffusion planning for adaptive dexterous manipulation

    Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, and Mingyu Ding. Dexhanddiff: Interaction-aware diffusion planning for adaptive dexterous manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1745–1755, 2025

  16. [16]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

  17. [17]

    Get-zero: Graph em- bodiment transformer for zero-shot embodiment gener- alization

    Austin Patel and Shuran Song. Get-zero: Graph em- bodiment transformer for zero-shot embodiment gener- alization. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14262–14269. IEEE, 2025

  18. [18]

    Jinja documentation

    Pallets Projects. Jinja documentation. https://jinja. palletsprojects.com/en/stable/, 2022

  19. [19]

    In-hand object rotation via rapid motor adaptation

    Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, and Jitendra Malik. In-hand object rotation via rapid motor adaptation. InConference on Robot Learning, pages 1722–1732. PMLR, 2023

  20. [20]

    Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

    Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipula- tion with deep reinforcement learning and demonstra- tions.arXiv preprint arXiv:1709.10087, 2017

  21. [21]

    Dexterous hand series

    Shadow Robot. Dexterous hand series. https:// shadowrobot.com/dexterous-hand-series/

  22. [22]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  23. [23]

    Unigrasp: Learning a unified model to grasp with multifingered robotic hands

    Lin Shao, Fabio Ferreira, Mikael Jorda, Varun Nambiar, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Ous- sama Khatib, and Jeannette Bohg. Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters, 5(2):2286–2293, 2020

  24. [24]

    Leap hand: Low-cost, efficient, and anthropomor- phic hand for robot learning.arXiv preprint arXiv:2309.06440, 2023

    Kenneth Shaw, Ananye Agarwal, and Deepak Pathak. Leap hand: Low-cost, efficient, and anthropomor- phic hand for robot learning.arXiv preprint arXiv:2309.06440, 2023

  25. [25]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

  26. [26]

    Unidexgrasp++: Im- proving dexterous grasping policy learning via geometry- aware curriculum and iterative generalist-specialist learn- ing

    Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang. Unidexgrasp++: Im- proving dexterous grasping policy learning via geometry- aware curriculum and iterative generalist-specialist learn- ing. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

  27. [27]

    Cy- berdemo: Augmenting simulated human demonstration for real-world dexterous manipulation

    Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, and Xiaolong Wang. Cy- berdemo: Augmenting simulated human demonstration for real-world dexterous manipulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17952–17963, 2024

  28. [28]

    Lessons from learning to spin ”pens”, 2024

    Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, and Xiaolong Wang. Lessons from learn- ing to spin” pens”.arXiv preprint arXiv:2407.18902, 2024

  29. [29]

    Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation,

    Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, and He Wang. Dexgrasp- net: A large-scale robotic dexterous grasp dataset for general objects based on simulation.arXiv preprint arXiv:2210.02697, 2022

  30. [30]

    In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

    Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, and Lin Shao. D(R,O)grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasp- ing. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 4982–4988, 2025. doi: 10.1109/ICRA55743.2025.11127754

  31. [31]

    Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.arXiv preprint arXiv:2509.24661, 2025

    Zhiyuan Wu, Rolandos Alexandros Potamias, Xuyang Zhang, Zhongqun Zhang, Jiankang Deng, and Shan Luo. Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.arXiv preprint arXiv:2509.24661, 2025

  32. [32]

    Dexs- ingrasp: Learning a unified policy for dexterous object singulation and grasping in cluttered environments.arXiv preprint arXiv:2504.04516, 2025

    Lixin Xu, Zixuan Liu, Zhewei Gui, Jingxiang Guo, Zeyu Jiang, Zhixuan Xu, Chongkai Gao, and Lin Shao. Dexs- ingrasp: Learning a unified policy for dexterous object singulation and grasping in cluttered environments.arXiv preprint arXiv:2504.04516, 2025

  33. [33]

    Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation.arXiv preprint arXiv:2505.21864, 2025

    Mengda Xu, Han Zhang, Yifan Hou, Zhenjia Xu, Linxi Fan, Manuela Veloso, and Shuran Song. Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation.arXiv preprint arXiv:2505.21864, 2025

  34. [34]

    Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy

    Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, et al. Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023

  35. [35]

    Manifounda- tion model for general-purpose robotic manipulation of contact synthesis with arbitrary objects and robots

    Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianrun Hu, et al. Manifounda- tion model for general-purpose robotic manipulation of contact synthesis with arbitrary objects and robots. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10905–10912. ...

  36. [36]

    Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885, 2025

    Brent Yi, Chung Min Kim, Justin Kerr, Gina Wu, Re- becca Feng, Anthony Zhang, Jonas Kulhanek, Hongsuk Choi, Yi Ma, Matthew Tancik, et al. Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885, 2025

  37. [37]

    Lightning grasp: High performance procedural grasp synthesis with contact fields.arXiv preprint arXiv:2511.07418, 2025

    Zhao-Heng Yin and Pieter Abbeel. Lightning grasp: High performance procedural grasp synthesis with contact fields.arXiv preprint arXiv:2511.07418, 2025

  38. [38]

    Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

    Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, and Xiaolong Wang. Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

  39. [39]

    Robot synesthesia: In-hand ma- nipulation with visuotactile sensing

    Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, and Xiaolong Wang. Robot synesthesia: In-hand ma- nipulation with visuotactile sensing. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6558–6565. IEEE, 2024

  40. [40]

    Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning.arXiv preprint arXiv:2407.15815, 2024

    Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, and Huazhe Xu. Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning.arXiv preprint arXiv:2407.15815, 2024

  41. [41]

    3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

    Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations.arXiv preprint arXiv:2403.03954, 2024

  42. [42]

    Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes

    Jialiang Zhang, Haoran Liu, Danshi Li, XinQiang Yu, Haoran Geng, Yufei Ding, Jiayi Chen, and He Wang. Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. In8th Annual Conference on Robot Learning, 2024

  43. [43]

    Catch it! learning to catch in flight with mobile dexterous hands

    Yuanhang Zhang, Tianhai Liang, Zhenyang Chen, Yanjie Ze, and Huazhe Xu. Catch it! learning to catch in flight with mobile dexterous hands. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14385–14391. IEEE, 2025