One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

Mingyu Ding; Yunchao Yao; Zhenyu Wei

arxiv: 2602.16712 · v2 · pith:EPO26OMQnew · submitted 2026-02-18 · 💻 cs.RO

One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

Zhenyu Wei , Yunchao Yao , Mingyu Ding This is my paper

Pith reviewed 2026-05-21 12:25 UTC · model grok-4.3

classification 💻 cs.RO

keywords dexterous manipulationcanonical representationcross-embodiment transferrobotic handsunified action spacezero-shot generalizationgrasping policymorphology parameterization

0 comments

The pith

A parameterized canonical representation unifies structurally diverse dexterous hands for cross-embodiment policy learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a unified parameter space and canonical URDF format that together standardize both the description of hand morphology and the space of possible actions. This unification lets learning algorithms condition policies on a shared representation rather than retraining for each new hand design. A VAE trained over the space produces a latent manifold in which interpolations between different hands produce smooth, physically plausible changes in structure. Experiments show that a single grasping policy, conditioned on this representation, transfers zero-shot to unseen hand morphologies including a three-finger LEAP Hand in both simulation and real-world trials.

Core claim

The authors establish that a single parameterized canonical representation, consisting of a unified parameter space and a canonical URDF format, captures essential morphological and kinematic variations while standardizing the action space and preserving dynamic properties of original URDFs. Conditioning policies on this representation enables effective cross-embodiment learning, as demonstrated by a grasping policy that achieves 81.9 percent zero-shot success on an unseen three-finger LEAP Hand in real-world tasks.

What carries the argument

Parameterized canonical representation comprising a unified parameter space for morphology and kinematics plus a canonical URDF that standardizes actions while preserving original dynamics.

If this is right

A single policy can be trained once and deployed on multiple hand designs by feeding the appropriate canonical parameters at inference time.
Interpolation in the learned latent manifold produces intermediate hand morphologies that remain kinematically valid and dynamically consistent.
Action spaces become directly comparable across embodiments, simplifying reward design and data sharing for cross-hand learning.
Zero-shot transfer becomes feasible for any hand whose parameters lie within the spanned space without additional fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future work could extend the same canonical form to full arm-plus-hand systems or to non-anthropomorphic grippers by adding a small number of additional parameters.
The latent manifold might support morphology optimization: searching for an ideal hand shape for a given task by moving within the learned space rather than enumerating discrete designs.
If the representation proves sufficient, simulation datasets collected on one canonical hand could be reused to train policies for many physical hands with minimal domain adaptation.

Load-bearing premise

The chosen parameters fully describe the morphological and kinematic features that matter for successful policy transfer across hand designs.

What would settle it

Train the same conditioned policy on a new hand whose kinematic structure falls outside the defined parameter space and measure whether zero-shot success drops sharply below the reported rates on covered morphologies.

Figures

Figures reproduced from arXiv: 2602.16712 by Mingyu Ding, Yunchao Yao, Zhenyu Wei.

**Figure 1.** Figure 1: We introduce a canonical hand representation that unifies diverse dexterous hands into a shared parameter space and [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of canonical and original URDFs across five dexterous hands with different finger numbers and handedness. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: Coordinate frame inconsistencies in URDFs. (a) Global [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of latent-space interpolation between two dexterous hands. Canonical URDFs are shown at the ends, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Two-stage cross-embodiment grasp generation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Smoothed log of training reward over simulation steps [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Real-world grasping objects and results. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Gradient magnitude visualization. We evaluate both models trained on the canonical dataset and zero-shot models that have never seen the target hand variants. As summarized in Table VI, the trained models achieve high grasp success rates, demonstrating that the canonical hand representation preserves the essential dynamics and physical fidelity of the original hands, and that sim-to-real transfer is relia… view at source ↗

**Figure 10.** Figure 10: Visualization of in-hand reorientation under the original and canonical URDFs. Top: LEAP Hand; bottom: Shadow Hand. TABLE XIII: Observation states used for policy training. Observation Description q˜ ∈ Rndof Normalized hand joint positions q tar ∈ Rndof Normalized target joint (action) pcube ∈ R3 Cube position rcube ∈ R3 Cube orientation Euler angles. q˙ ∈ Rndof Hand joint velocities. vcube ∈ R3 Cube line… view at source ↗

**Figure 11.** Figure 11: Grasp visualizations of the canonical URDF for the Allegro, [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 13.** Figure 13: Visualization of canonical LEAP Hand variants. [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of real-world experiment (I). [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

**Figure 15.** Figure 15: Visualization of real-world experiment (II). [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗

read the original abstract

Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation. Project Page: https://zhenyuwei2003.github.io/OHRA/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a parameterized canonical representation to unify dexterous hands for cross-embodiment policy transfer, with reported 81.9% zero-shot success on an unseen hand, but the dynamic preservation claim in the canonical URDF lacks quantitative backing.

read the letter

The main thing to know is that this work builds a unified parameter space and canonical URDF to let grasping policies trained in one hand setup transfer to structurally different ones, including a real-world 81.9% zero-shot rate on the 3-finger LEAP hand. They encode the space with a VAE to get a latent manifold and condition policies on the canonical form, which they say standardizes actions while keeping original dynamics intact. That combination of unified space plus latent interpolation plus cross-embodiment results is the concrete step beyond fixed-hand baselines. The experiments cover grasp replay, latent analysis, and both simulation and hardware transfer, which gives the claims some empirical grounding. The soft spot is the missing quantitative test that the canonical URDF actually preserves mass distribution, joint limits, actuator models, and contact behavior from the source URDFs. Without forward-dynamics error metrics or grasp-quality comparisons before and after the mapping, it's hard to rule out that some of the transfer success comes from the unification being lossy rather than faithful. The abstract also skips full baselines, variance numbers, and data-processing details, so the 81.9% figure is plausible but not yet easy to stress-test. This is for robotics groups working on dexterous manipulation who need policies that survive embodiment changes. Readers focused on representation learning or sim-to-real transfer would find the latent manifold and conditioning approach useful to examine. It deserves a serious referee because the problem is real and the reported transfer numbers are high enough to warrant checking the implementation and the preservation assumptions in detail.

Referee Report

1 major / 2 minor

Summary. The paper introduces a parameterized canonical representation for unifying diverse dexterous hand architectures via a unified parameter space and canonical URDF format. It learns a VAE over this space to produce a structured latent manifold and conditions a grasping policy on the canonical representation, reporting zero-shot transfer to unseen morphologies including an 81.9% success rate on the 3-finger LEAP Hand in simulation and real-world tasks.

Significance. If the unification is shown to preserve essential dynamics, the framework could provide a practical foundation for cross-embodiment policy learning in dexterous manipulation, reducing the need for hand-specific retraining. The combination of morphological parameterization, latent interpolation, and empirical zero-shot results represents a concrete step toward scalable universal manipulation policies.

major comments (1)

[Abstract] Abstract: The central claim that the canonical URDF 'standardizes the action space while preserving dynamic and functional properties of the original URDFs' is load-bearing for the reported zero-shot transfer (e.g., 81.9% on LEAP Hand) but receives no quantitative validation such as forward-dynamics error, mass/inertia mismatch, joint-limit fidelity, or grasp-quality metrics between original and canonical URDFs. Without such checks, distortions in contact dynamics or actuator behavior could undermine the transfer assumption.

minor comments (2)

[Abstract] The abstract states results from VAE encoding and policy conditioning but omits architecture details, training hyperparameters, baseline comparisons, and error bars or trial counts for the 81.9% figure.
Notation for the unified parameter space should be introduced with an explicit table or equation listing all morphological and kinematic parameters and their ranges.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the canonical URDF 'standardizes the action space while preserving dynamic and functional properties of the original URDFs' is load-bearing for the reported zero-shot transfer (e.g., 81.9% on LEAP Hand) but receives no quantitative validation such as forward-dynamics error, mass/inertia mismatch, joint-limit fidelity, or grasp-quality metrics between original and canonical URDFs. Without such checks, distortions in contact dynamics or actuator behavior could undermine the transfer assumption.

Authors: We agree that the manuscript would be strengthened by explicit quantitative validation of the dynamic and functional preservation between original and canonical URDFs. The canonical URDF is constructed by retargeting joint axes, link geometries, and actuator parameters from each source URDF into a standardized kinematic template while retaining the original numerical values for masses, inertias, joint limits, and friction coefficients; this design choice is intended to minimize distortion. However, we did not report direct error metrics in the initial submission. In the revised version we will add a dedicated analysis (new subsection in Section 4 or Appendix) that computes: (i) forward-dynamics rollout error (position/velocity RMSE over 100 random torque sequences), (ii) mass/inertia mismatch (relative L2 error per link), (iii) joint-limit fidelity (percentage of limits preserved exactly), and (iv) grasp-quality metrics (epsilon and volume of the grasp wrench space) evaluated on a common set of 50 grasps for each hand. These results will be presented alongside the existing zero-shot transfer numbers to directly address the concern. The 81.9 % success on the unseen LEAP Hand remains an empirical demonstration of practical transfer, but we concur that the additional metrics will make the preservation claim more rigorous. revision: yes

Circularity Check

0 steps flagged

No circularity: new canonical representation and empirical cross-embodiment results are independent of fitted inputs or self-referential definitions.

full rationale

The paper introduces a parameterized canonical representation and canonical URDF as design choices, then reports empirical results from VAE latent encoding and conditioned grasping policies that achieve zero-shot transfer (e.g., 81.9% on LEAP Hand). No derivation step reduces a claimed prediction or preservation property to a quantity defined by the same fitted parameters or by construction within the paper's equations. The unification and preservation claims are supported by separate validation experiments rather than tautological equivalence, making the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence and utility of a unified parameter space and canonical URDF that preserve key properties; these are introduced by the paper rather than derived from external benchmarks.

free parameters (1)

Morphological and kinematic parameters
Define the unified space for capturing variations across hand architectures.

axioms (1)

domain assumption Interpolations in the latent manifold yield smooth and physically meaningful morphology transitions
Invoked to support learning a structured latent space over the parameter space.

invented entities (1)

Canonical URDF format no independent evidence
purpose: Standardizes action space while preserving dynamic and functional properties
New standardized format introduced to unify structurally diverse hands.

pith-pipeline@v0.9.0 · 5809 in / 1207 out tokens · 60166 ms · 2026-05-21T12:25:08.249508+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices
cs.CV 2026-05 unverdicted novelty 6.0

EgoKit is a new toolkit and accessory set that unifies egocentric video collection with wrist views across heterogeneous consumer devices using a consistent interface and log format.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Dexterous functional grasping.arXiv preprint arXiv:2312.02975, 2023

Ananye Agarwal, Shagun Uppal, Kenneth Shaw, and Deepak Pathak. Dexterous functional grasping.arXiv preprint arXiv:2312.02975, 2023

work page arXiv 2023
[2]

Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pa- chocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

work page 2020
[3]

La- tent action diffusion for cross-embodiment manipulation

Erik Bauer, Elvis Nava, and Robert K Katzschmann. La- tent action diffusion for cross-embodiment manipulation. arXiv preprint arXiv:2506.14608, 2025

work page arXiv 2025
[4]

A. Bicchi. Hands for dexterous manipulation and ro- bust grasping: a difficult road toward simplicity.IEEE Transactions on Robotics and Automation, 16(6):652– 662, 2000. doi: 10.1109/70.897777

work page doi:10.1109/70.897777 2000
[5]

A system for general in-hand object re-orientation

Tao Chen, Jie Xu, and Pulkit Agrawal. A system for general in-hand object re-orientation. InConference on Robot Learning, pages 297–307. PMLR, 2022

work page 2022
[6]

Visual dexter- ity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023

Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, and Pulkit Agrawal. Visual dexter- ity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023. doi: 10.1126/scirobotics.adc9244. URL https://www.science. org/doi/abs/10.1126/scirobotics.adc9244

work page doi:10.1126/scirobotics.adc9244 2023
[7]

Cutkosky

M.R. Cutkosky. On grasp choice, grasp models, and the design of hands for manufacturing tasks.IEEE Transactions on Robotics and Automation, 5(3):269–279,

work page
[8]

doi: 10.1109/70.34763

work page doi:10.1109/70.34763
[9]

T(r, o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping.arXiv preprint arXiv:2510.12724, 2025

Xin Fei, Zhixuan Xu, Huaicong Fang, Tianrui Zhang, and Lin Shao. T(r, o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping.arXiv preprint arXiv:2510.12724, 2025

work page arXiv 2025
[10]

Telepreview: A user-friendly teleoperation system with virtual arm assistance for enhanced effectiveness.arXiv preprint arXiv:2412.13548, 2024

Jingxiang Guo, Jiayu Luo, Zhenyu Wei, Yiwen Hou, Zhixuan Xu, Xiaoyi Lin, Chongkai Gao, and Lin Shao. Telepreview: A user-friendly teleoperation system with virtual arm assistance for enhanced effectiveness.arXiv preprint arXiv:2412.13548, 2024

work page arXiv 2024
[11]

Scaling cross-embodiment world models for dex- terous manipulation.arXiv preprint arXiv:2511.01177, 2025

Zihao He, Bo Ai, Tongzhou Mu, Yulin Liu, Weikang Wan, Jiawei Fu, Yilun Du, Henrik I Christensen, and Hao Su. Scaling cross-embodiment world models for dex- terous manipulation.arXiv preprint arXiv:2511.01177, 2025

work page arXiv 2025
[12]

Dynamic handover: Throw and catch with bi- manual hands.arXiv preprint arXiv:2309.05655, 2023

Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, and Xiaolong Wang. Dynamic handover: Throw and catch with bi- manual hands.arXiv preprint arXiv:2309.05655, 2023

work page arXiv 2023
[13]

Rl-100: Performant robotic manipulation with real-world reinforcement learning, 2025

Kun Lei, Huanyu Li, Dongjie Yu, Zhenyu Wei, Lingxiao Guo, Zhennan Jiang, Ziyu Wang, Shiyu Liang, and Huazhe Xu. Rl-100: Performant robotic manipulation with real-world reinforcement learning.arXiv preprint arXiv:2510.14830, 2025

work page arXiv 2025
[14]

Gendexgrasp: Generalizable dexterous grasping.arXiv preprint arXiv:2210.00722, 2022

Puhao Li, Tengyu Liu, Yuyang Li, Yixin Zhu, Yaodong Yang, and Siyuan Huang. Gendexgrasp: Generalizable dexterous grasping.arXiv preprint arXiv:2210.00722, 2022

work page arXiv 2022
[15]

Dexhanddiff: Interaction-aware diffusion planning for adaptive dexterous manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, and Mingyu Ding. Dexhanddiff: Interaction-aware diffusion planning for adaptive dexterous manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1745–1755, 2025

work page 2025
[16]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

Get-zero: Graph em- bodiment transformer for zero-shot embodiment gener- alization

Austin Patel and Shuran Song. Get-zero: Graph em- bodiment transformer for zero-shot embodiment gener- alization. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14262–14269. IEEE, 2025

work page 2025
[18]

Jinja documentation

Pallets Projects. Jinja documentation. https://jinja. palletsprojects.com/en/stable/, 2022

work page 2022
[19]

In-hand object rotation via rapid motor adaptation

Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, and Jitendra Malik. In-hand object rotation via rapid motor adaptation. InConference on Robot Learning, pages 1722–1732. PMLR, 2023

work page 2023
[20]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipula- tion with deep reinforcement learning and demonstra- tions.arXiv preprint arXiv:1709.10087, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Dexterous hand series

Shadow Robot. Dexterous hand series. https:// shadowrobot.com/dexterous-hand-series/

work page
[22]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Unigrasp: Learning a unified model to grasp with multifingered robotic hands

Lin Shao, Fabio Ferreira, Mikael Jorda, Varun Nambiar, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Ous- sama Khatib, and Jeannette Bohg. Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters, 5(2):2286–2293, 2020

work page 2020
[24]

Leap hand: Low-cost, efficient, and anthropomor- phic hand for robot learning.arXiv preprint arXiv:2309.06440, 2023

Kenneth Shaw, Ananye Agarwal, and Deepak Pathak. Leap hand: Low-cost, efficient, and anthropomor- phic hand for robot learning.arXiv preprint arXiv:2309.06440, 2023

work page arXiv 2023
[25]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[26]

Unidexgrasp++: Im- proving dexterous grasping policy learning via geometry- aware curriculum and iterative generalist-specialist learn- ing

Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang. Unidexgrasp++: Im- proving dexterous grasping policy learning via geometry- aware curriculum and iterative generalist-specialist learn- ing. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

work page 2023
[27]

Cy- berdemo: Augmenting simulated human demonstration for real-world dexterous manipulation

Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, and Xiaolong Wang. Cy- berdemo: Augmenting simulated human demonstration for real-world dexterous manipulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17952–17963, 2024

work page 2024
[28]

Lessons from learning to spin ”pens”, 2024

Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, and Xiaolong Wang. Lessons from learn- ing to spin” pens”.arXiv preprint arXiv:2407.18902, 2024

work page arXiv 2024
[29]

Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation,

Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, and He Wang. Dexgrasp- net: A large-scale robotic dexterous grasp dataset for general objects based on simulation.arXiv preprint arXiv:2210.02697, 2022

work page arXiv 2022
[30]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, and Lin Shao. D(R,O)grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasp- ing. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 4982–4988, 2025. doi: 10.1109/ICRA55743.2025.11127754

work page doi:10.1109/icra55743.2025.11127754 2025
[31]

Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.arXiv preprint arXiv:2509.24661, 2025

Zhiyuan Wu, Rolandos Alexandros Potamias, Xuyang Zhang, Zhongqun Zhang, Jiankang Deng, and Shan Luo. Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.arXiv preprint arXiv:2509.24661, 2025

work page arXiv 2025
[32]

Dexs- ingrasp: Learning a unified policy for dexterous object singulation and grasping in cluttered environments.arXiv preprint arXiv:2504.04516, 2025

Lixin Xu, Zixuan Liu, Zhewei Gui, Jingxiang Guo, Zeyu Jiang, Zhixuan Xu, Chongkai Gao, and Lin Shao. Dexs- ingrasp: Learning a unified policy for dexterous object singulation and grasping in cluttered environments.arXiv preprint arXiv:2504.04516, 2025

work page arXiv 2025
[33]

Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation.arXiv preprint arXiv:2505.21864, 2025

Mengda Xu, Han Zhang, Yifan Hou, Zhenjia Xu, Linxi Fan, Manuela Veloso, and Shuran Song. Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation.arXiv preprint arXiv:2505.21864, 2025

work page arXiv 2025
[34]

Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy

Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, et al. Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023

work page 2023
[35]

Manifounda- tion model for general-purpose robotic manipulation of contact synthesis with arbitrary objects and robots

Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianrun Hu, et al. Manifounda- tion model for general-purpose robotic manipulation of contact synthesis with arbitrary objects and robots. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10905–10912. ...

work page 2024
[36]

Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885, 2025

Brent Yi, Chung Min Kim, Justin Kerr, Gina Wu, Re- becca Feng, Anthony Zhang, Jonas Kulhanek, Hongsuk Choi, Yi Ma, Matthew Tancik, et al. Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885, 2025

work page arXiv 2025
[37]

Lightning grasp: High performance procedural grasp synthesis with contact fields.arXiv preprint arXiv:2511.07418, 2025

Zhao-Heng Yin and Pieter Abbeel. Lightning grasp: High performance procedural grasp synthesis with contact fields.arXiv preprint arXiv:2511.07418, 2025

work page arXiv 2025
[38]

Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, and Xiaolong Wang. Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

work page arXiv 2023
[39]

Robot synesthesia: In-hand ma- nipulation with visuotactile sensing

Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, and Xiaolong Wang. Robot synesthesia: In-hand ma- nipulation with visuotactile sensing. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6558–6565. IEEE, 2024

work page 2024
[40]

Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning.arXiv preprint arXiv:2407.15815, 2024

Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, and Huazhe Xu. Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning.arXiv preprint arXiv:2407.15815, 2024

work page arXiv 2024
[41]

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations.arXiv preprint arXiv:2403.03954, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[42]

Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes

Jialiang Zhang, Haoran Liu, Danshi Li, XinQiang Yu, Haoran Geng, Yufei Ding, Jiayi Chen, and He Wang. Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. In8th Annual Conference on Robot Learning, 2024

work page 2024
[43]

Catch it! learning to catch in flight with mobile dexterous hands

Yuanhang Zhang, Tianhai Liang, Zhenyang Chen, Yanjie Ze, and Huazhe Xu. Catch it! learning to catch in flight with mobile dexterous hands. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14385–14391. IEEE, 2025

work page 2025

[1] [1]

Dexterous functional grasping.arXiv preprint arXiv:2312.02975, 2023

Ananye Agarwal, Shagun Uppal, Kenneth Shaw, and Deepak Pathak. Dexterous functional grasping.arXiv preprint arXiv:2312.02975, 2023

work page arXiv 2023

[2] [2]

Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pa- chocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

work page 2020

[3] [3]

La- tent action diffusion for cross-embodiment manipulation

Erik Bauer, Elvis Nava, and Robert K Katzschmann. La- tent action diffusion for cross-embodiment manipulation. arXiv preprint arXiv:2506.14608, 2025

work page arXiv 2025

[4] [4]

A. Bicchi. Hands for dexterous manipulation and ro- bust grasping: a difficult road toward simplicity.IEEE Transactions on Robotics and Automation, 16(6):652– 662, 2000. doi: 10.1109/70.897777

work page doi:10.1109/70.897777 2000

[5] [5]

A system for general in-hand object re-orientation

Tao Chen, Jie Xu, and Pulkit Agrawal. A system for general in-hand object re-orientation. InConference on Robot Learning, pages 297–307. PMLR, 2022

work page 2022

[6] [6]

Visual dexter- ity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023

Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, and Pulkit Agrawal. Visual dexter- ity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023. doi: 10.1126/scirobotics.adc9244. URL https://www.science. org/doi/abs/10.1126/scirobotics.adc9244

work page doi:10.1126/scirobotics.adc9244 2023

[7] [7]

Cutkosky

M.R. Cutkosky. On grasp choice, grasp models, and the design of hands for manufacturing tasks.IEEE Transactions on Robotics and Automation, 5(3):269–279,

work page

[8] [8]

doi: 10.1109/70.34763

work page doi:10.1109/70.34763

[9] [9]

T(r, o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping.arXiv preprint arXiv:2510.12724, 2025

Xin Fei, Zhixuan Xu, Huaicong Fang, Tianrui Zhang, and Lin Shao. T(r, o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping.arXiv preprint arXiv:2510.12724, 2025

work page arXiv 2025

[10] [10]

Telepreview: A user-friendly teleoperation system with virtual arm assistance for enhanced effectiveness.arXiv preprint arXiv:2412.13548, 2024

Jingxiang Guo, Jiayu Luo, Zhenyu Wei, Yiwen Hou, Zhixuan Xu, Xiaoyi Lin, Chongkai Gao, and Lin Shao. Telepreview: A user-friendly teleoperation system with virtual arm assistance for enhanced effectiveness.arXiv preprint arXiv:2412.13548, 2024

work page arXiv 2024

[11] [11]

Scaling cross-embodiment world models for dex- terous manipulation.arXiv preprint arXiv:2511.01177, 2025

Zihao He, Bo Ai, Tongzhou Mu, Yulin Liu, Weikang Wan, Jiawei Fu, Yilun Du, Henrik I Christensen, and Hao Su. Scaling cross-embodiment world models for dex- terous manipulation.arXiv preprint arXiv:2511.01177, 2025

work page arXiv 2025

[12] [12]

Dynamic handover: Throw and catch with bi- manual hands.arXiv preprint arXiv:2309.05655, 2023

Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, and Xiaolong Wang. Dynamic handover: Throw and catch with bi- manual hands.arXiv preprint arXiv:2309.05655, 2023

work page arXiv 2023

[13] [13]

Rl-100: Performant robotic manipulation with real-world reinforcement learning, 2025

Kun Lei, Huanyu Li, Dongjie Yu, Zhenyu Wei, Lingxiao Guo, Zhennan Jiang, Ziyu Wang, Shiyu Liang, and Huazhe Xu. Rl-100: Performant robotic manipulation with real-world reinforcement learning.arXiv preprint arXiv:2510.14830, 2025

work page arXiv 2025

[14] [14]

Gendexgrasp: Generalizable dexterous grasping.arXiv preprint arXiv:2210.00722, 2022

Puhao Li, Tengyu Liu, Yuyang Li, Yixin Zhu, Yaodong Yang, and Siyuan Huang. Gendexgrasp: Generalizable dexterous grasping.arXiv preprint arXiv:2210.00722, 2022

work page arXiv 2022

[15] [15]

Dexhanddiff: Interaction-aware diffusion planning for adaptive dexterous manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, and Mingyu Ding. Dexhanddiff: Interaction-aware diffusion planning for adaptive dexterous manipulation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1745–1755, 2025

work page 2025

[16] [16]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[17] [17]

Get-zero: Graph em- bodiment transformer for zero-shot embodiment gener- alization

Austin Patel and Shuran Song. Get-zero: Graph em- bodiment transformer for zero-shot embodiment gener- alization. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14262–14269. IEEE, 2025

work page 2025

[18] [18]

Jinja documentation

Pallets Projects. Jinja documentation. https://jinja. palletsprojects.com/en/stable/, 2022

work page 2022

[19] [19]

In-hand object rotation via rapid motor adaptation

Haozhi Qi, Ashish Kumar, Roberto Calandra, Yi Ma, and Jitendra Malik. In-hand object rotation via rapid motor adaptation. InConference on Robot Learning, pages 1722–1732. PMLR, 2023

work page 2023

[20] [20]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipula- tion with deep reinforcement learning and demonstra- tions.arXiv preprint arXiv:1709.10087, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Dexterous hand series

Shadow Robot. Dexterous hand series. https:// shadowrobot.com/dexterous-hand-series/

work page

[22] [22]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Unigrasp: Learning a unified model to grasp with multifingered robotic hands

Lin Shao, Fabio Ferreira, Mikael Jorda, Varun Nambiar, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Ous- sama Khatib, and Jeannette Bohg. Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters, 5(2):2286–2293, 2020

work page 2020

[24] [24]

Leap hand: Low-cost, efficient, and anthropomor- phic hand for robot learning.arXiv preprint arXiv:2309.06440, 2023

Kenneth Shaw, Ananye Agarwal, and Deepak Pathak. Leap hand: Low-cost, efficient, and anthropomor- phic hand for robot learning.arXiv preprint arXiv:2309.06440, 2023

work page arXiv 2023

[25] [25]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[26] [26]

Unidexgrasp++: Im- proving dexterous grasping policy learning via geometry- aware curriculum and iterative generalist-specialist learn- ing

Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang. Unidexgrasp++: Im- proving dexterous grasping policy learning via geometry- aware curriculum and iterative generalist-specialist learn- ing. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

work page 2023

[27] [27]

Cy- berdemo: Augmenting simulated human demonstration for real-world dexterous manipulation

Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, and Xiaolong Wang. Cy- berdemo: Augmenting simulated human demonstration for real-world dexterous manipulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17952–17963, 2024

work page 2024

[28] [28]

Lessons from learning to spin ”pens”, 2024

Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, and Xiaolong Wang. Lessons from learn- ing to spin” pens”.arXiv preprint arXiv:2407.18902, 2024

work page arXiv 2024

[29] [29]

Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation,

Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, and He Wang. Dexgrasp- net: A large-scale robotic dexterous grasp dataset for general objects based on simulation.arXiv preprint arXiv:2210.02697, 2022

work page arXiv 2022

[30] [30]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, and Lin Shao. D(R,O)grasp: A unified representation of robot and object interaction for cross-embodiment dexterous grasp- ing. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 4982–4988, 2025. doi: 10.1109/ICRA55743.2025.11127754

work page doi:10.1109/icra55743.2025.11127754 2025

[31] [31]

Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.arXiv preprint arXiv:2509.24661, 2025

Zhiyuan Wu, Rolandos Alexandros Potamias, Xuyang Zhang, Zhongqun Zhang, Jiankang Deng, and Shan Luo. Cedex: Cross-embodiment dexterous grasp generation at scale from human-like contact representations.arXiv preprint arXiv:2509.24661, 2025

work page arXiv 2025

[32] [32]

Dexs- ingrasp: Learning a unified policy for dexterous object singulation and grasping in cluttered environments.arXiv preprint arXiv:2504.04516, 2025

Lixin Xu, Zixuan Liu, Zhewei Gui, Jingxiang Guo, Zeyu Jiang, Zhixuan Xu, Chongkai Gao, and Lin Shao. Dexs- ingrasp: Learning a unified policy for dexterous object singulation and grasping in cluttered environments.arXiv preprint arXiv:2504.04516, 2025

work page arXiv 2025

[33] [33]

Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation.arXiv preprint arXiv:2505.21864, 2025

Mengda Xu, Han Zhang, Yifan Hou, Zhenjia Xu, Linxi Fan, Manuela Veloso, and Shuran Song. Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation.arXiv preprint arXiv:2505.21864, 2025

work page arXiv 2025

[34] [34]

Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy

Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, et al. Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023

work page 2023

[35] [35]

Manifounda- tion model for general-purpose robotic manipulation of contact synthesis with arbitrary objects and robots

Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianrun Hu, et al. Manifounda- tion model for general-purpose robotic manipulation of contact synthesis with arbitrary objects and robots. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10905–10912. ...

work page 2024

[36] [36]

Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885, 2025

Brent Yi, Chung Min Kim, Justin Kerr, Gina Wu, Re- becca Feng, Anthony Zhang, Jonas Kulhanek, Hongsuk Choi, Yi Ma, Matthew Tancik, et al. Viser: Imperative, web-based 3d visualization in python.arXiv preprint arXiv:2507.22885, 2025

work page arXiv 2025

[37] [37]

Lightning grasp: High performance procedural grasp synthesis with contact fields.arXiv preprint arXiv:2511.07418, 2025

Zhao-Heng Yin and Pieter Abbeel. Lightning grasp: High performance procedural grasp synthesis with contact fields.arXiv preprint arXiv:2511.07418, 2025

work page arXiv 2025

[38] [38]

Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, and Xiaolong Wang. Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

work page arXiv 2023

[39] [39]

Robot synesthesia: In-hand ma- nipulation with visuotactile sensing

Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, and Xiaolong Wang. Robot synesthesia: In-hand ma- nipulation with visuotactile sensing. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6558–6565. IEEE, 2024

work page 2024

[40] [40]

Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning.arXiv preprint arXiv:2407.15815, 2024

Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, and Huazhe Xu. Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning.arXiv preprint arXiv:2407.15815, 2024

work page arXiv 2024

[41] [41]

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations.arXiv preprint arXiv:2403.03954, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[42] [42]

Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes

Jialiang Zhang, Haoran Liu, Danshi Li, XinQiang Yu, Haoran Geng, Yufei Ding, Jiayi Chen, and He Wang. Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. In8th Annual Conference on Robot Learning, 2024

work page 2024

[43] [43]

Catch it! learning to catch in flight with mobile dexterous hands

Yuanhang Zhang, Tianhai Liang, Zhenyang Chen, Yanjie Ze, and Huazhe Xu. Catch it! learning to catch in flight with mobile dexterous hands. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 14385–14391. IEEE, 2025

work page 2025