pith. sign in

arxiv: 2606.00998 · v1 · pith:SQGZHD7Lnew · submitted 2026-05-31 · 💻 cs.RO

GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

Pith reviewed 2026-06-28 17:24 UTC · model grok-4.3

classification 💻 cs.RO
keywords cross-embodiment grasping6-DOF graspingdiffusion modelsswept-volume representationzero-shot generalizationprocedural grippersrobot manipulation
0
0 comments X

The pith

Conditioning a diffusion-based 6-DOF grasping model on a swept-volume gripper representation enables zero-shot generalization to novel real-world grippers and objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that diffusion models for generating 6-DOF grasps can be made to work across different robot grippers by adding a representation of the gripper itself. Prior methods handled new objects but required separate training or failed when the gripper shape or closing motion changed. The authors encode each gripper via a swept-volume heuristic, train the conditioned model on procedurally generated grippers and two billion grasps, and report stronger zero-shot transfer in both simulation and real-robot tests than baselines that lack this conditioning.

Core claim

By extending diffusion-based 6-DOF grasp generators to accept an additional swept-volume encoding of the gripper and training on a large procedural-gripper dataset, the resulting model achieves the best zero-shot performance on novel real grippers and objects among compared methods, also serving as an effective starting point for fine-tuning.

What carries the argument

Swept-volume heuristic that encodes gripper morphology and the physical volume swept during closure, used as conditioning input to the diffusion model.

If this is right

  • The model outperforms prior 6-DOF grasp generators on unseen grippers and objects in simulation without retraining.
  • Fine-tuning the pre-trained model on a small amount of data for a new gripper yields faster adaptation than training from scratch.
  • The same architecture demonstrates successful zero-shot transfer when deployed on physical robots with previously unseen grippers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on manipulation skills beyond grasping, such as placement or tool use, by replacing the grasp label with an appropriate task label.
  • If the swept-volume representation proves robust, it may reduce the data-collection burden for each new end-effector in industrial or home robot deployments.
  • Extending the conditioning to include joint limits or compliance parameters could further broaden the set of grippers handled without retraining.

Load-bearing premise

The swept-volume heuristic captures enough of the differences in gripper shape and closing motion that a single model conditioned on it can generalize to unseen real grippers.

What would settle it

A controlled test in which a novel real gripper whose swept volume is supplied to the model still produces grasp success rates no higher than an unconditioned baseline on the same objects.

Figures

Figures reproduced from arXiv: 2606.00998 by Adithyavairavan Murali, Balakumar Sundaralingam, Beining Han, Clemens Eppner, Erwin Coumans, Jia Deng, Stan Birchfield, Yu-Wei Chao.

Figure 1
Figure 1. Figure 1: We introduce GraspGen-X, a cross-embodiment 6 DOF Grasping model trained with a large scale [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture for our GraspGen-X model. The model is developed based on GraspGen [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the Swept Volume, i.e., cube dimensions (xyz) and cube center translation relative [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of our procedural grippers. Each gripper is a random instance from our generator (Sec [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The distribution of Swept Volume cube dimensions along three axes, for training datasets (Train [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of generated grasp poses with GraspGen-X on Galaxea-G1, Tesollo-Delto2F, Robotiq [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Learning curve of generator finetuning (Sec [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation study on GraspGen-X gripper parameterization and gripper training sets (Sec [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
read the original abstract

We study cross-embodiment 6-DOF robot grasping. Unlike prior works, we require the model not only to generalize to novel objects / scenes but also to novel gripper morphologies and physical grasping processes. Our method extends diffusion model based generative 6-DOF grasping models to condition on the additional gripper's representation. We propose a swept-volume heuristic for encoding the gripper. We train our cross-embodiment model with procedural grippers and a large-scale dataset of 2 Billion grasps. In simulation experiments, our model has the best zero-shot generalization to novel real-world grippers and objects over baseline methods. Our model also serves as a good initialization for fine-tuning to adapt to novel grippers. In ablations, we demonstrate the efficiency of our sweep-volume gripper representation and our procedural gripper training dataset. Last, we show zero-shot generalization to real-world novel grippers for 6-DOF grasping, surpassing baselines in cross-embodiment generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces GraspGen-X, a diffusion-based model for 6-DOF grasping that extends prior generative approaches by conditioning on a swept-volume gripper representation to enable cross-embodiment generalization to novel grippers and objects. Trained on 2 billion grasps generated via procedural grippers, the method claims best-in-class zero-shot performance on novel real-world grippers and objects in simulation, utility as a fine-tuning initialization, and successful zero-shot transfer to real-world novel grippers, outperforming baselines in cross-embodiment settings. Ablations are said to validate the swept-volume representation and procedural dataset.

Significance. If the zero-shot cross-embodiment claims hold with rigorous quantitative support, the work would advance robotic grasping by showing that a purely geometric conditioning signal can bridge embodiment gaps at scale, reducing reliance on gripper-specific data collection. The 2B-grasp dataset size and reported real-world validation would be notable strengths for reproducibility and practical impact in the field.

major comments (1)
  1. [Abstract, method description] Abstract and method description: The central claim of zero-shot generalization to novel real-world grippers rests on the assertion that the swept-volume heuristic encodes gripper morphology and physical grasping process differences. As described, this is a geometric construct (collision and reachability volumes from gripper motion) that does not explicitly incorporate material properties, friction coefficients, compliance, or contact dynamics distinguishing real grippers from the procedural simulation set. This assumption is load-bearing for the cross-embodiment transfer result and requires either additional validation experiments or explicit discussion of its limitations to support the generalization claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and outline revisions to improve clarity on the scope of our claims.

read point-by-point responses
  1. Referee: [Abstract, method description] Abstract and method description: The central claim of zero-shot generalization to novel real-world grippers rests on the assertion that the swept-volume heuristic encodes gripper morphology and physical grasping process differences. As described, this is a geometric construct (collision and reachability volumes from gripper motion) that does not explicitly incorporate material properties, friction coefficients, compliance, or contact dynamics distinguishing real grippers from the procedural simulation set. This assumption is load-bearing for the cross-embodiment transfer result and requires either additional validation experiments or explicit discussion of its limitations to support the generalization claims.

    Authors: We agree that the swept-volume representation is a purely geometric construct based on collision and reachability volumes and does not explicitly model material properties, friction coefficients, compliance, or contact dynamics. The method's cross-embodiment results rely on the empirical effectiveness of this encoding when trained at scale on procedural grippers, as validated through zero-shot simulation and real-world experiments on novel grippers. To address the concern, we will revise the method section and add an explicit limitations paragraph discussing these modeling assumptions and noting that generalization claims are supported by empirical evidence rather than a complete physical simulation of gripper-object interactions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on held-out evaluation and external baselines

full rationale

The paper presents an empirical ML method: a diffusion model conditioned on a fixed swept-volume gripper heuristic, trained on 2B procedurally generated grasps, then evaluated zero-shot on novel real grippers and objects. No equations, predictions, or uniqueness claims reduce to author-fitted quantities by construction. The central results are measured success rates on held-out simulation and real-world test sets against baselines, which are independent of any self-referential definition or self-citation chain. The swept-volume heuristic is presented as an input representation, not derived from the model's outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a swept-volume encoding plus procedural gripper variety captures enough morphology variation for zero-shot transfer; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Swept-volume representation of a gripper is sufficient to condition the diffusion model for cross-embodiment generalization.
    Invoked in the method extension described in the abstract.

pith-pipeline@v0.9.1-grok · 5734 in / 1298 out tokens · 27295 ms · 2026-06-28T17:24:35.218662+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 17 canonical work pages · 4 internal anchors

  1. [1]

    Ac- cessed: 2024-03-07

    Isaac sim - robotics simulation and synthetic data nvidia developer.https://www.einscan.com/. Ac- cessed: 2024-03-07. 7

  2. [2]

    Geometry matching for multi-embodiment grasping

    Maria Attarian, Muhammad Adil Asif, Jingzhou Liu, Ruthrash Hari, Animesh Garg, Igor Gilitschenski, and Jonathan Tompson. Geometry matching for multi-embodiment grasping. InProceedings of the 7th Conference on Robot Learning (CoRL 2023), 2023. 4

  3. [3]

    Graspldm: Generative 6-dof grasp synthesis using latent diffusion models

    Kuldeep Barad, Andrej Orsula, ANntoine Richard, Jan Dentler, Miguel Olivares-Mendez, and Carol Mar- tinez. Graspldm: Generative 6-dof grasp synthesis using latent diffusion models. InIEEE Access, 2024. 3

  4. [4]

    $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    KevinBlack, NoahBrown, DannyDriess, AdnanEsmail, MichaelEqui, ChelseaFinn, NiccolòFusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell,MohithMothukuri,SurajNair,KarlPertsch,LucyXiaoyangShi,JamesTanner,QuanVuong,Anna Walling, Haohuan Wang, and Ury Zhilinsky.𝜋0: A vision-language-action flo...

  5. [5]

    Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batche- lor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Sco...

  6. [6]

    Le, Philipp Jahr, Qiao Sun, Julen Urain, Dorothea Koert, and Jan Peters

    Joao Carvalho, An T. Le, Philipp Jahr, Qiao Sun, Julen Urain, Dorothea Koert, and Jan Peters. Grasp diffusionnetwork: Learninggraspgeneratorsfrompartialpointcloudswithdiffusionmodelsinso(3)xr3,

  7. [7]

    Multigrippergrasp: A datasetforroboticgraspingfromparalleljawgripperstodexteroushands

    Luis Felipe Casas, Ninad Khargonkar, Balakrishnan Prabhakaran, and Yu Xiang. Multigrippergrasp: A datasetforroboticgraspingfromparalleljawgripperstodexteroushands. InProceedingsoftheIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), pages 2978–2984, 2024. 4, 21

  8. [8]

    Dexycb: A benchmark for capturing hand grasping of objects

    Yu-WeiChao, WeiYang, YuXiang, PavloMolchanov, AnkurHanda, JonathanTremblay, YashrajSNarang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, et al. Dexycb: A benchmark for capturing hand grasping of objects. InCVPR, 2021. 21

  9. [9]

    Spacetools: Tool-augmented spatial reasoning via dou- ble interactive rl, 2025

    Siyi Chen, Mikaela Angelina Uy, Chan Hee Song, Faisal Ladhak, Adithyavairavan Murali, Qing Qu, Stan Birchfield, Valts Blukis, and Jonathan Tremblay. Spacetools: Tool-augmented spatial reasoning via dou- ble interactive rl, 2025. 2

  10. [10]

    Hardware conditioned policies for multi-robot transfer learning

    Tao Chen, Adithyavairavan Murali, and Abhinav Gupta. Hardware conditioned policies for multi-robot transfer learning. InAdvances in Neural Information Processing Systems 31 (NeurIPS 2018), pages 9355– 9366, 2018. 3

  11. [11]

    Local policies enable zero-shot long-horizon manipulation

    Murtaza Dalal, Min Liu, Chen Chen, Deepak Pathak, Jian Zhang, and Ruslan Salakhutdinov. Local policies enable zero-shot long-horizon manipulation. InICRA, 2025. 2, 3

  12. [12]

    Graspmolmo: Generalizable task-oriented grasping via large-scale synthetic data generation

    Abhay Deshpande, Yuquan Deng, Arijit Ray, Jordi Salvador, Winson Han, Jiafei Duan, Kuo-Hao Zeng, Yuke Zhu, Ranjay Krishna, and Rose Hendrix. Graspmolmo: Generalizable task-oriented grasping via large-scale synthetic data generation. InCoRL, 2025. 2, 3 14 GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

  13. [13]

    Learningmodularneural network policies for multi-task and multi-robot transfer

    ColineDevin, AbhinavGupta, TrevorDarrell, PieterAbbeel, andSergeyLevine. Learningmodularneural network policies for multi-task and multi-robot transfer. InProceedings of the IEEE International Confer- ence on Robotics and Automation (ICRA 2017), pages 2169–2176, 2017. 3

  14. [14]

    Scalingcross-embodiedlearning: One policy for manipulation, navigation, locomotion and aviation

    RiaDoshi,HomerWalke,OierMees,SudeepDasari,andSergeyLevine. Scalingcross-embodiedlearning: One policy for manipulation, navigation, locomotion and aviation. InProceedings of the Conference on Robot Learning (CoRL 2024), 2024. 2, 3

  15. [15]

    ACRONYM: A large-scale grasp dataset based on simulation

    Clemens Eppner, Arsalan Mousavian, and Dieter Fox. ACRONYM: A large-scale grasp dataset based on simulation. InUnder Review at ICRA 2021, 2020. 2, 6, 20, 21

  16. [16]

    Anygrasp: Robust and efficient grasp perception in spatial and temporal domains

    Hao-Shu Fang, Chenxi Wang, Hongjie Fang, Minghao Gou, Jirong Liu, Hengxu Yan, Wenhai Liu, Yichen Xie, and Cewu Lu. Anygrasp: Robust and efficient grasp perception in spatial and temporal domains. Transactions on Robotics, 2023. 2, 3, 10, 11

  17. [17]

    T(R,O) grasp: Efficient graph diffusion of robot-object spatial transforma- tion for cross-embodiment dexterous grasping,

    Xin Fei, Zhenyu Wang, Jiayu Luo, Chongkai Gao, Zhehao Cai, and Lin Shao. T(r,o) grasp: Efficient graph diffusion of robot-object spatial transformation for cross-embodiment dexterous grasping. arXiv preprint arXiv:2510.12724, 2025. 4

  18. [18]

    Diffusion for multi- embodiment grasping

    Roman Freiberg, Alexander Qualmann, Ngo Anh Vien, and Gerhard Neumann. Diffusion for multi- embodiment grasping. InRobotics and Automation Letters, 2024. 3, 9, 23

  19. [19]

    Diffusion for multi- embodiment grasping.IEEE Robotics and Automation Letters, PP(99):1–8, 2025

    Roman Freiberg, Alexander Qualmann, Ngo Anh Vien, and Gerhard Neumann. Diffusion for multi- embodiment grasping.IEEE Robotics and Automation Letters, PP(99):1–8, 2025. 4

  20. [20]

    CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

    MaxFu, JustinYu, KarimEl-Refai, EthanKou, HaoruXue, HuangHuang, WenliXiao, GuanzhiWang, Fei- FeiLi,GuanyaShi,JiajunWu,ShankarSastry,YukeZhu,KenGoldberg,andJimFan. CaP-X:Aframework forbenchmarkingandimprovingcodingagentsforrobotmanipulation.arXivpreprintarXiv:2603.22435,

  21. [21]

    Robot learning in homes: Improvinggeneralizationandreducingdatasetbias

    Abhinav Gupta, Adithyavairavan Murali, Dhiraj Gandhi, and Lerrel Pinto. Robot learning in homes: Improvinggeneralizationandreducingdatasetbias. InAdvancesinNeuralInformationProcessingSystems 31 (NeurIPS 2018), 2018. 3

  22. [22]

    Robot learning in homes: Improving generalization and reducing dataset bias.NeurIPS, 2018

    Abhinav Gupta, Adithyavairavan Murali, Dhiraj Gandhi, and Lerrel Pinto. Robot learning in homes: Improving generalization and reducing dataset bias.NeurIPS, 2018. 11

  23. [23]

    Honnotate: A method for 3d annotation of hand and object poses

    Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. InCVPR, 2020. 21

  24. [24]

    Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system

    Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, and Dieter Fox. Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2020), pages 9164– 9170, 2020. 3

  25. [25]

    Hu, Kun Huang, Oleh Rybkin, and Dinesh Jayaraman

    Edward S. Hu, Kun Huang, Oleh Rybkin, and Dinesh Jayaraman. Know thyself: Transferable visual control policies through robot-awareness. InProceedings of the International Conference on Learning Representations (ICLR 2022), 2022. 3

  26. [26]

    One policy to control them all: Shared modular policies for agent-agnostic control

    Wenlong Huang, Igor Mordatch, and Deepak Pathak. One policy to control them all: Shared modular policies for agent-agnostic control. InProceedings of the International Conference on Machine Learning (ICML 2020), pages 4455–4464, 2020. 3

  27. [27]

    Rekep: Spatio-temporal reason- ing of relational keypoint constraints for robotic manipulation

    Wenlong Huang, Chen Wang, Yunzhu Li, Ruohan Zhang, and Li Fei-Fei. Rekep: Spatio-temporal reason- ing of relational keypoint constraints for robotic manipulation. InCoRL, 2024. 2, 3, 8 15 GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

  28. [28]

    Procedural generation of articulated simulation-ready assets, 2025

    Abhishek Joshi, Beining Han, Jack Nugent, Max Gonzalez Saez-Diez, Yiming Zuo, Jonathan Liu, Hongyu Wen, Stamatis Alexandropoulos, Karhan Kayan, Anna Calveri, Tao Sun, Gaowen Liu, Yi Shao, Alexander Raistrick, and Jia Deng. Procedural generation of articulated simulation-ready assets, 2025. 5, 20

  29. [29]

    Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, 2016

    Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, 2016. 3

  30. [30]

    Novaflow: Zero-shot manipulation via actionable flow from generated videos.arXiv preprint arXiv:2510.08568,

    Hongyu Li, Lingfeng Sun, Yafei Hu, Duy Ta, Jennifer Barry, George Konidaris, and Jiahui Fu. Novaflow: Zero-shot manipulation via actionable flow from generated videos.arXiv preprint arXiv:2510.08568,

  31. [31]

    Coa-vla: Improving vision-language-action models via visual-textual chain-of-affordance, 2025.https://arxiv.org/abs/2412.20451

    Jinming Li, Yichen Zhu, Zhibin Tang, Junjie Wen, Minjie Zhu, Xiaoyu Liu, Chengmeng Li, Ran Cheng, Yaxin Peng, Yan Peng, and Feifei Feng. Coa-vla: Improving vision-language-action models via visual- textual chain-of-affordance. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2025), 2025. arXiv preprint arXiv:2412.20451. 2, 3

  32. [32]

    Gendexgrasp: Generalizable dexterous grasping

    PuhaoLi,TengyuLiu,YuyangLi,YiranGeng,YixinZhu,YaodongYang,andSiyuanHuang. Gendexgrasp: Generalizable dexterous grasping. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA 2023), 2023. 4

  33. [33]

    Pointnetgpd: Detecting grasp configurations from point sets

    HongzhuoLiang,XiaojianMa,ShuangLi,MichaelGorner,SongTang,BinFang,FuchunSun,andJianwei Zhang. Pointnetgpd: Detecting grasp configurations from point sets. InICRA, 2019. 3

  34. [34]

    Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows

    Byeongdo Lim, Jongmin Kim, Jihwan Kim, Yonghyeon Lee, and Frank C Park. Equigraspflow: Se(3)- equivariant 6-dof grasp pose generative flows. InCoRL, 2024. 3

  35. [35]

    Deepdifferentiablegraspplanner for high-dof grippers.arXiv preprint arXiv:2002.01530, 2020

    MinLiu,ZherongPan,KaiXu,KanishkaGanguly,andDineshManocha. Deepdifferentiablegraspplanner for high-dof grippers.arXiv preprint arXiv:2002.01530, 2020. 21

  36. [36]

    2401.12202 , archivePrefix =

    Peiqi Liu, Yaswanth Orru, Chris Paxton, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. Ok-robot: What really matters in integrating open-knowledge models for robotics.preprint arXiv:2401.12202,

  37. [37]

    Li, Preston Culbertson, Krishnan Srinivasan, Aaron Ames, Mac Schwager, and Jeannette Bohg

    Tyler Ga Wei Lum, Albert H. Li, Preston Culbertson, Krishnan Srinivasan, Aaron Ames, Mac Schwager, and Jeannette Bohg. Get a grip: Multi-finger grasp evaluation at scale enables robust sim-to-real transfer. InCoRL, 2024. 3

  38. [38]

    Unified particle physics for real-time applications.ACM Transactions on Graphics (TOG), 33(4):1–12, 2014

    Miles Macklin, Matthias Müller, Nuttapong Chentanez, and Tae-Yong Kim. Unified particle physics for real-time applications.ACM Transactions on Graphics (TOG), 33(4):1–12, 2014. 21

  39. [39]

    Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

    Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics.arXiv preprint arXiv:1703.09312, 2017. 2, 3

  40. [40]

    nvblox: Gpu-acceleratedincrementalsigneddistancefieldmapping

    Alexander Millane, Helen Oleynikova, Emilie Wirbel, Remo Steiner, Vikram Ramasamy, David Tingdahl, andRolandSiegwart. nvblox: Gpu-acceleratedincrementalsigneddistancefieldmapping. InProceedings of the IEEE International Conference on Robotics and Automation, pages 2698–2705, 2024. 2

  41. [41]

    nvblox: Gpu-accelerated incremental signed distance field mapping, 2024

    Alexander Millane, Helen Oleynikova, Emilie Wirbel, Remo Steiner, Vikram Ramasamy, David Tingdahl, and Roland Siegwart. nvblox: Gpu-accelerated incremental signed distance field mapping, 2024. 23

  42. [42]

    Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation.IEEE Robotics and Automation Letters, 5(3):4368– 4375, 2020

    Douglas Morrison, Peter Corke, and Jürgen Leitner. Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation.IEEE Robotics and Automation Letters, 5(3):4368– 4375, 2020. 21 16 GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

  43. [43]

    6-dof graspnet: Variational grasp generation for object manipulation

    Arsalan Mousavian, Clemens Eppner, and Dieter Fox. 6-dof graspnet: Variational grasp generation for object manipulation. InICCV, 2019. 1, 2, 3

  44. [44]

    Same object, different grasps: Data and semantic knowledge for task-oriented grasping

    Adithyavairavan Murali, Weiyu Liu, Kenneth Marino, Sonia Chernova, and Abhinav Gupta. Same object, different grasps: Data and semantic knowledge for task-oriented grasping. InCoRL, 2020. 2, 8

  45. [45]

    6-dof grasp- ing for target-driven object manipulation in clutter

    Adithyavairavan Murali, Arsalan Mousavian, Clemens Eppner, Chris Paxton, and Dieter Fox. 6-dof grasp- ing for target-driven object manipulation in clutter. InICRA, 2020. 2, 3

  46. [46]

    Graspgen: A diffusion-based framework for 6-dof grasping with on- generator training,

    Adithyavairavan Murali, Balakumar Sundaralingam, Yu-Wei Chao, Jun Yamada, Wentao Yuan, Mark Carlson, Fabio Ramos, Stan Birchfield, Dieter Fox, and Clemens Eppner. Graspgen: A diffusion-based framework for 6-dof grasping with on-generator training.arXiv preprint arXiv:2507.13097, 2025. 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 21, 22, 23

  47. [47]

    Deep learning approaches to grasp synthesis: A review.IEEE Transactions on Robotics, 2023

    RhysNewbury, MorrisGu, LachlanChumbley, ArsalanMousavian, ClemensEppner, JürgenLeitner, Jean- nette Bohg, Antonio Morales, Tamim Asfour, Danica Kragic, et al. Deep learning approaches to grasp synthesis: A review.IEEE Transactions on Robotics, 2023. 2, 3

  48. [48]

    Nvidia isaac sim, 2023

    NVIDIA. Nvidia isaac sim, 2023. 21

  49. [49]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 4, 9, 21

  50. [50]

    arXiv preprint arXiv:2307.04577 (2023)

    Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, and Dieter Fox. Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system. arXiv preprint arXiv:2307.04577v1, 2023. 3

  51. [51]

    Infinite photorealistic worlds using procedural generation

    Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, et al. Infinite photorealistic worlds using procedural generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12630– 12641, 2023. 20

  52. [52]

    Infinite indoors: Photorealistic indoor scenes using procedural generation

    AlexanderRaistrick,MeiLingjie,KaanKayanKarhan,YanDavid,ZuoYiming,HanBeining,WenHongyu, Parakh Meenal, Stamatis Alexandropoulos, Lipson Lahav, Ma Zeyu, and Deng Jia. Infinite indoors: Photorealistic indoor scenes using procedural generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 20

  53. [53]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, NicolasCarion,Chao-YuanWu,RossGirshick,PiotrDollár,andChristophFeichtenhofer. Sam2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 2, 23

  54. [54]

    Sukhatme

    Gaurav Salhotra, I-Chun Arthur Liu, and Gaurav S. Sukhatme. Learning robot manipulation from cross- morphology demonstration. InProceedings of the 7th Annual Conference on Robot Learning (CoRL 2023),

  55. [55]

    Unigrasp: Learning a unified model to grasp with multifingered robotic hands.IEEE Robotics and Automation Letters, 5(2):2286–2293, 2020

    LinShao,FabioFerreira,MikaelJorda,VarunNambiar,JianlanLuo,EugenSolowjow,JuanAparicioOjea, Oussama Khatib, and Jeannette Bohg. Unigrasp: Learning a unified model to grasp with multifingered robotic hands.IEEE Robotics and Automation Letters, 5(2):2286–2293, 2020. 4, 9, 21

  56. [56]

    TiPToP: A modular open-vocabulary planning system for robotic manipulation.arXiv preprint arXiv:2603.09971, 2026

    William Shen, Nishanth Kumar, Sahit Chintalapudi, Jie Wang, Christopher Watson, Edward S. Hu, Jing Cao, Dinesh Jayaraman, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. TiPToP: A modular open- vocabulary planning system for robotic manipulation.arXiv preprint arXiv:2603.09971, 2026. 2 17 GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

  57. [57]

    Maestro: Orchestratingrobotics modules with vision-language models for zero-shot generalist robots.arXiv preprint arXiv:2511.00917,

    JunyaoShi, RujiaYang, KaitianChao, SelinaBingqingWan, YifeiShao, JiahuiLei, JianingQian, LongLe, PratikChaudhari,KostasDaniilidis,ChuanWen,andDineshJayaraman. Maestro: Orchestratingrobotics modules with vision-language models for zero-shot generalist robots.arXiv preprint arXiv:2511.00917,

  58. [58]

    Implicit grasp diffusion: Bridging the gap between dense prediction and sampling-based grasping

    Pinhao Song, Pengteng Li, and Renaud Detry. Implicit grasp diffusion: Bridging the gap between dense prediction and sampling-based grasping. InCoRL, 2024. 3

  59. [59]

    Curobo: Parallelized collision-freerobotmotiongeneration

    Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, et al. Curobo: Parallelized collision-freerobotmotiongeneration. In2023IEEEInternationalConferenceonRoboticsandAutomation (ICRA), pages 8112–8119. IEEE, 2023. 2

  60. [60]

    curobo: Parallelized collision-free minimum-jerk robot motion generation

    Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, AlexanderMillane, HelenOleynikova, AnkurHanda, FabioRamos, NathanRatliff, andDieterFox. curobo: Parallelized collision-free minimum-jerk robot motion generation. InICRA, 2023. 11, 23

  61. [61]

    Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes

    Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, and Dieter Fox. Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In2021 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 13438–13444. IEEE, 2021. 2, 3, 6

  62. [62]

    Graspgpt: Leveraging semantic knowledge from a large language model for task-oriented grasping.RAL, 2023

    Chao Tang, Dehao Huang, Wenqi Ge, Weiyu Liu, and Hong Zhang. Graspgpt: Leveraging semantic knowledge from a large language model for task-oriented grasping.RAL, 2023. 3

  63. [63]

    Grasp pose detection in point clouds

    Andreas ten Pas, Marcus Gualtieri, Kate Saenko, and Robert Platt. Grasp pose detection in point clouds. The International Journal of Robotics Research, 36(13–14):1455–1473, 2017. 3

  64. [64]

    Domain ran- domization and generative models for robotic grasping

    Josh Tobin, Lukas Biewald, Rocky Duan, Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob Mc- Grew, Alex Ray, Jonas Schneider, Peter Welinder, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization and generative models for robotic grasping. InIROS, 2018. 3

  65. [65]

    Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation.arXiv:2306.08132, 2023

    Dylan Turpin, Tao Zhong, Shutong Zhang, Guanglei Zhu, Jingzhou Liu, Ritvik Singh, Eric Heiden, Miles Macklin, Stavros Tsogkas, Sven Dickinson, et al. Fast-grasp’d: Dexterous multi-finger grasp generation through differentiable simulation.arXiv:2306.08132, 2023. 21

  66. [66]

    Se(3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion.ICRA, 2023

    Julen Urain, Niklas Funk, Jan Peters, and Georgia Chalvatzaki. Se(3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion.ICRA, 2023. 1, 3

  67. [67]

    Dexgrasp- net: A large-scale robotic dexterous grasp dataset for general objects based on simulation

    Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, and He Wang. Dexgrasp- net: A large-scale robotic dexterous grasp dataset for general objects based on simulation. InICRA, 2023. 21

  68. [68]

    Transferring grasping across grippers: Learning–optimization hybrid framework for generalized planar grasp generation.IEEE Transactions on Robotics, 2024

    Xianli Wang and Qingsong Xu. Transferring grasping across grippers: Learning–optimization hybrid framework for generalized planar grasp generation.IEEE Transactions on Robotics, 2024. 4

  69. [69]

    Learning compositional models of robot skills for task and motion planning.The International Journal of Robotics Research, 2020

    Zi Wang, Caelan Reed Garrett, Leslie Pack Kaelbling, and Tomas Lozano-Pérez. Learning compositional models of robot skills for task and motion planning.The International Journal of Robotics Research, 2020. 2

  70. [70]

    Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

    Xinyue Wei, Minghua Liu, Zhan Ling, and Hao Su. Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022. 20 18 GraspGen-X: Cross-Embodiment 6-DOF Diffusion-based Grasping

  71. [71]

    D(r,o) grasp: A unified representation of robot and object interaction for cross-embodiment dex- terous grasping

    Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, and Lin Shao. D(r,o) grasp: A unified representation of robot and object interaction for cross-embodiment dex- terous grasping. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA 2025), 2025. 4

  72. [72]

    Foundationpose: Unified 6d pose estimation and tracking of novel objects.arXiv preprint arXiv:2312.08344, 2023

    Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects.arXiv preprint arXiv:2312.08344, 2023. 2, 24

  73. [73]

    Founda- tionstereo: Zero-shot stereo matching

    Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Founda- tionstereo: Zero-shot stereo matching. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5249–5260, 2025. 2, 23

  74. [74]

    Dexdiffuser: Generating dexterous grasps with diffusion models.Robotics and Automation Letters, 2024

    Zehang Weng, Haofei Lu, Danica Kragic, and Jens Lundell. Dexdiffuser: Generating dexterous grasps with diffusion models.Robotics and Automation Letters, 2024. 3

  75. [75]

    Point transformer v3: Simpler, faster, stronger

    Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. InCVPR, 2024. 4

  76. [76]

    Learning generalizable dexterous manipulation from human grasp affordance

    Yueh-Hua Wu, Jiashun Wang, and Xiaolong Wang. Learning generalizable dexterous manipulation from human grasp affordance. InCoRL, 2023. 3

  77. [77]

    Open x-embodiment: Robotic learning datasets and rt-x models

    Open X-Team. Open x-embodiment: Robotic learning datasets and rt-x models. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA 2024), pages 6892–6903, 2024. 2, 3

  78. [78]

    Rethinking 6-dof grasp detection: A flexible framework for high-quality grasping.Pattern Recognition, 2025

    Pengwei Xie, Siang Chen, Wei Tang, Dingchang Hu, Wenming Yang, and Guijin Wang. Rethinking 6-dof grasp detection: A flexible framework for high-quality grasping.Pattern Recognition, 2025. 3

  79. [79]

    Adagrasp: Learning an adaptive gripper- aware grasping policy

    Zhenjia Xu, Beichun Qi, Shubham Agrawal, and Shuran Song. Adagrasp: Learning an adaptive gripper- aware grasping policy. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4620–4626. IEEE, 2021. 4, 9, 23

  80. [80]

    Grasp-mpc: Closed-loopvisualgraspingviavalue-guidedmodelpredictivecontrol

    Jun Yamada, Adithyavairavan Murali, Ajay Mandlekar, Clemens Eppner, Ingmar Posner, and Balakumar Sundaralingam. Grasp-mpc: Closed-loopvisualgraspingviavalue-guidedmodelpredictivecontrol. arXiv preprint arXiv:2509.06201v1, 2025. 8

Showing first 80 references.