AutoDex: An Automated Real-World System for Dexterous Grasping Data Collection

Gunhee Kim; Hanbyul Joo; Jisoo Kim; Jongbin Lim; Mingi Choi; Taeksoo Kim; Taeyun Ha

arxiv: 2606.23689 · v1 · pith:OT3MMZEXnew · submitted 2026-06-22 · 💻 cs.RO · cs.LG

AutoDex: An Automated Real-World System for Dexterous Grasping Data Collection

Mingi Choi , Gunhee Kim , Jisoo Kim , Taeksoo Kim , Taeyun Ha , Jongbin Lim , Hanbyul Joo This is my paper

Pith reviewed 2026-06-26 07:50 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords dexterous graspingautomated data collectionreal-world roboticsmulti-view perceptiongrasp validationAllegro handInspire handrobot reset mechanism

0 comments

The pith

AutoDex automates real-world dexterous grasp data collection with 4.8 times higher throughput than teleoperation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an automated system called AutoDex that generates candidate grasps, localizes objects despite heavy occlusion using 20 cameras, executes the grasp on real robot hands, labels success or failure by lift-and-hold, and resets the object to new stable poses for the next trial. This closes the full loop without human intervention, producing a database of physically validated grasp outcomes on 100 objects across two hands. A sympathetic reader would care because current alternatives either lack physical validity (simulation) or scale too slowly (teleoperation), and the system demonstrates a concrete speed-up plus better downstream grasp success when the validated data is used for retrieval.

Core claim

AutoDex is a replaceable-generator system that runs the full perception-execution-labeling-reset loop autonomously: dense multi-view localization under occlusion, collision-monitored motion execution on Allegro and Inspire hands, binary lift-and-hold outcome labeling, and active object resetting to expose new poses. The result is a reusable database of 3,593 synchronized real-world grasp trials. On a matched 500-trial collection, it finishes in 10.3 hours versus 49.4 hours for teleoperation and yields retrieved grasps that succeed at 76 percent versus 34 percent for simulation-only validation.

What carries the argument

AutoDex automated collection loop: the mechanism that takes a candidate grasp, performs 20-camera pose estimation under occlusion, executes and labels the physical outcome, then actively resets the object to generate additional stable poses without manual intervention.

If this is right

Real-world grasp data can be collected at scale without operator time or bias.
A database of physically labeled outcomes supports retrieval that outperforms simulation-only validation.
The same automated loop can be reused with different grasp generators or robot hands.
Synchronized multi-view observations and robot-state logs become available as a public resource for downstream training or analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be extended to collect data for other contact-rich tasks such as in-hand manipulation or assembly if the reset and labeling steps are adapted.
Hybrid datasets that mix AutoDex-validated real trials with large simulation sets might further improve policy robustness.
If the perception pipeline generalizes across object categories, the same hardware setup could support data collection for entirely new object sets with minimal redesign.

Load-bearing premise

The 20-camera perception pipeline can reliably localize objects and estimate poses even when the hand heavily occludes them, and the active reset can repeatedly produce new stable object poses without systematic bias or human help.

What would settle it

Run AutoDex on the same 100 objects for 500 trials and measure whether pose-estimation failures or reset interventions exceed a small fraction of trials, or whether retrieved grasps from the resulting database fail to reach substantially higher real-world success than simulation-only baselines.

Figures

Figures reproduced from arXiv: 2606.23689 by Gunhee Kim, Hanbyul Joo, Jisoo Kim, Jongbin Lim, Mingi Choi, Taeksoo Kim, Taeyun Ha.

**Figure 1.** Figure 1: The AutoDex pipeline. AutoDex builds a database of physically labeled dexterous-grasp trials by executing generated candidates in a multi-camera workcell, labeling lift-and-hold success or failure, and resetting the object between trials. At deployment, downstream systems retrieve successful grasps, filter them for feasibility in the new scene, and execute the selected grasp. ric: with accurate object mesh… view at source ↗

**Figure 2.** Figure 2: AutoDex workcell and execution examples. Left: A multi-camera workcell with a 6- DoF xArm, a swappable multi-finger hand, and 20 synchronized RGB cameras. Middle and right: Each row pairs a candidate grasp generated under a wall, shelf, or box scene constraint with its corresponding real-world execution, shown with synchronized views and overlaid tracked 6D object poses. In this work, we use BODex [8] as t… view at source ↗

**Figure 3.** Figure 3: Left: Reset examples. The top row shows direct placement, where the robot carries the object to the target pose and releases it at the tabletop. The bottom row shows height-relaxed placement for a flat object, where virtual support pillars prevent finger intrusion into the object’s descent region after release. In each row, the first panel shows the generated reset grasp, and the next two panels show the c… view at source ↗

**Figure 4.** Figure 4: Left: Throughput comparison. AutoDex collects 500 trials in 10.3 h, compared with 49.4 h for teleoperation in the same workcell. Right: Effect of physical validation. Grasps retrieved from the AutoDex-validated database achieve 76% real-world success, compared with 34% for grasps retrieved from the model-screened database, across 20 objects and 515 trials. The improvement is consistent across material, s… view at source ↗

**Figure 5.** Figure 5: Left: Reset strategy comparison. Reset success versus passive transition probability P(Pj | Pi). Naive Drop follows y = x by construction, while Stable Reorient Placement maintains high success even for transitions rarely reached by passive settling. Right: Pose self-consistency relative to the 20-camera reference as a function of camera count. Mean ADD-S between the full 20-camera reference pose and k-cam… view at source ↗

**Figure 6.** Figure 6: End-to-end dataset alignment. We reproject the calibrated robot mesh and the object mesh rendered at the estimated 6D pose into all 20 synchronized camera views. The visual overlap with the RGB observations provides an end-to-end check that camera extrinsics, hand–eye calibration, robot-state timing, and object-pose estimates are consistently aligned. B Workcell, Calibration, and Object Perception Calibra… view at source ↗

**Figure 7.** Figure 7: Multi-view object perception. (a) Multi-view object pose estimation pipeline. (b) Runtime distribution across perception stages. (c) Visible object-surface coverage from the best kcamera subset of a larger 24-camera candidate rig, with and without robot occlusion. The final data-collection setup uses 20 cameras. Coverage saturates at around 8 cameras, while robot occlusion consistently reduces the visib… view at source ↗

**Figure 8.** Figure 8: Residual-torque contact detection examples. Two placement trials in which the grasped object contacts the tabletop during descent. The residual-torque monitor detects the unexpected contact and halts the motion before continued descent can load the arm–hand assembly. Training. We collect free-space (q, q, τ ˙ motor) samples on the same arm–hand assembly used at deployment. All static and dynamic training t… view at source ↗

**Figure 9.** Figure 9: AutoDex object library and diversity. (a) The 100-object library spans diverse geometries, materials, and functional categories from everyday household items. (b, c) The objects cover seven dominant material categories and a wide weight range. D Object Library The dataset spans 100 diverse everyday objects (Fig. 9a), with more than 80% sourced from IKEA household products for commercial availability and r… view at source ↗

read the original abstract

Learning robust dexterous grasping requires real-world data that records the physical outcomes of grasp attempts. Such data is hard to obtain at scale: teleoperation yields valid physical outcomes but is slow and operator-biased, while simulation-based generation is cheap and scalable but cannot certify contact validity. A natural solution is to generate candidate grasps and verify them on real hardware, but this scales only if the entire collection loop (perception, execution, labeling, and reset) runs without human intervention. We present AutoDex, an automated real-world data-collection system that closes this loop: for each candidate from a replaceable generator, it localizes the object under severe hand-object occlusion with dense 20-camera perception, executes collision-monitored robot motions, labels lift-and-hold success or failure, and actively resets the object between trials to expose additional candidates across stable poses. The result is a reusable database of physically labeled grasp trials that downstream systems can query by retrieval and feasibility filtering. Using AutoDex, we collect 3,593 grasp trials across Allegro and Inspire hands on 100 diverse objects, with synchronized multi-view observations and robot-state logs. For a matched 500-trajectory collection, AutoDex requires 10.3 h versus 49.4 h for teleoperation, yielding a 4.8x throughput improvement, and grasps retrieved from the AutoDex-validated database succeed 76% versus 34% for simulation-only validation. Code and data will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AutoDex gives a working automated loop for real-world dexterous grasp data with clear 4.8x speed gains and better retrieval success, but the perception and reset steps lack the error numbers needed to confirm they run reliably at the claimed scale.

read the letter

This paper delivers a practical automated system for collecting real-world dexterous grasp data at scale, with reported 4.8x faster collection than teleoperation and higher success rates for retrieved grasps. The main advance is closing the loop with multi-view perception, monitored execution, auto-labeling, and object reset.

The system integrates 20 cameras to localize objects despite hand occlusion, runs collision-free motions on Allegro and Inspire hands, labels based on lift-and-hold, and resets objects to new stable poses. They collected 3593 trials on 100 objects, with synchronized data. For 500 trajectories, it took 10.3 hours vs 49.4 for teleop. Grasps from the database succeed 76% vs 34% from sim-only.

This is solid engineering work that directly tackles the data scarcity issue. The concrete metrics on time and performance give it weight, and the plan to release code and data helps others build on it.

The main gap is in validating the assumptions behind full autonomy. The perception must handle severe occlusion reliably, and the reset must avoid bias without human help, but the abstract provides no quantitative failure rates or intervention stats. Without those, it's hard to know how robust the 4.8x claim is when scaled. The comparison to teleop is fair on time, but more on how the generator candidates were chosen would strengthen it.

Readers working on robot learning for manipulation will find this useful as a data collection tool. It shows honest engagement with the practical problems in the field. The paper merits a serious referee to check the implementation details and error analysis.

I would recommend sending this to peer review.

Referee Report

2 major / 1 minor

Summary. The paper presents AutoDex, an automated real-world system for dexterous grasping data collection that integrates 20-camera perception for object localization under occlusion, collision-monitored execution on Allegro and Inspire hands, lift-and-hold labeling, and active object reset. It reports collecting 3,593 grasp trials across 100 objects, with a matched 500-trajectory collection taking 10.3 hours versus 49.4 hours for teleoperation (4.8x throughput) and downstream retrieval success of 76% versus 34% for simulation-only validation. Code and data are to be released publicly.

Significance. If the autonomous operation claims hold, the work provides a concrete, scalable bridge between simulation-generated candidates and physically validated real-world data, with falsifiable metrics on wall-clock time and downstream grasp success that directly address the data bottleneck in dexterous grasping. The public release of the database strengthens reproducibility and enables follow-on retrieval-based methods.

major comments (2)

[Abstract / perception and reset sections] Abstract and methods description of the perception pipeline: the central 4.8x throughput and 76% success claims rest on reliable object localization and pose estimation under severe hand-object occlusion plus fully autonomous reset, yet no quantitative error rates, failure counts, intervention statistics, or ablation on perception accuracy are supplied; without these, the attribution of the 3,593 trials and time savings to automation cannot be verified.
[Results / downstream evaluation] Results on downstream evaluation: the 76% vs 34% retrieval success is reported for a matched collection, but the manuscript supplies no details on the size of the query set, the exact retrieval mechanism, or how many AutoDex-labeled trials were used in the comparison, leaving the magnitude of the improvement difficult to interpret or reproduce.

minor comments (1)

[Abstract] The abstract mentions synchronized multi-view observations and robot-state logs but does not specify the exact data formats or synchronization method; a table or appendix listing the released data schema would improve usability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to improve verifiability and reproducibility.

read point-by-point responses

Referee: [Abstract / perception and reset sections] Abstract and methods description of the perception pipeline: the central 4.8x throughput and 76% success claims rest on reliable object localization and pose estimation under severe hand-object occlusion plus fully autonomous reset, yet no quantitative error rates, failure counts, intervention statistics, or ablation on perception accuracy are supplied; without these, the attribution of the 3,593 trials and time savings to automation cannot be verified.

Authors: We acknowledge that the manuscript does not supply quantitative error rates, failure counts, intervention statistics, or perception ablations. The throughput comparison is presented as a matched autonomous collection, but without these metrics the attribution to full automation cannot be independently verified from the text. We will add the requested statistics and an ablation on perception accuracy in the revised methods and results sections. revision: yes
Referee: [Results / downstream evaluation] Results on downstream evaluation: the 76% vs 34% retrieval success is reported for a matched collection, but the manuscript supplies no details on the size of the query set, the exact retrieval mechanism, or how many AutoDex-labeled trials were used in the comparison, leaving the magnitude of the improvement difficult to interpret or reproduce.

Authors: We agree that the manuscript omits key details required to interpret and reproduce the 76% versus 34% comparison. We will expand the downstream evaluation section to specify the query set size, the retrieval mechanism, and the exact number of AutoDex-labeled trials used. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation with direct measurements

full rationale

The paper presents an automated hardware/software system for grasp data collection and reports wall-clock times (10.3 h vs 49.4 h) and success rates (76% vs 34%) from physical trials. These are direct empirical observations of the deployed system rather than outputs of any fitted model, mathematical derivation, or self-referential prediction. No equations, parameters, or uniqueness theorems appear in the provided text; the central claims rest on measured throughput and retrieval performance, which are externally falsifiable by replication and do not reduce to their own inputs by construction. Self-citations, if present, are not load-bearing for the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard robotics hardware and perception assumptions rather than new fitted parameters or invented entities.

axioms (2)

domain assumption Dense 20-camera multi-view system can localize objects under severe hand-object occlusion
Central to the perception step described in the abstract.
domain assumption Lift-and-hold test accurately labels grasp success or failure
Used for automatic labeling of each trial.

pith-pipeline@v0.9.1-grok · 5821 in / 1327 out tokens · 24762 ms · 2026-06-26T07:50:56.148101+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages

[1]

Y . Liu, Y . Yang, Y . Wang, X. Wu, J. Wang, Y . Yao, S. Schwertfeger, S. Yang, W. Wang, J. Yu, et al. Realdex: Towards human-like grasping for robotic dexterous hand.arXiv preprint arXiv:2402.13853, 2024

arXiv 2024
[2]

C. Wang, H. Shi, W. Wang, R. Zhang, L. Fei-Fei, and C. K. Liu. Dexcap: Scalable and portable mocap data collection system for dexterous manipulation.arXiv preprint arXiv:2403.07788, 2024

arXiv 2024
[3]

Z. Chen, K. Van Wyk, Y .-W. Chao, W. Yang, A. Mousavian, A. Gupta, and D. Fox. Dextransfer: Real world multi-fingered dexterous grasping with minimal human demonstrations.arXiv preprint arXiv:2209.14284, 2022

arXiv 2022
[4]

Zhang, H

J. Zhang, H. Liu, D. Li, X. Yu, H. Geng, Y . Ding, J. Chen, and H. Wang. DexGraspNet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. InConference on Robot Learning (CoRL), 2024

2024
[5]

Bicchi and V

A. Bicchi and V . Kumar. Robotic grasping and contact: A review. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), volume 1, pages 348– 353, 2000

2000
[6]

Y . Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y .-W. Chao, and D. Fox. Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system. InRobotics: Science and Systems, 2023

2023
[7]

R. Wang, J. Zhang, J. Chen, Y . Xu, P. Li, T. Liu, and H. Wang. Dexgraspnet: A large- scale robotic dexterous grasp dataset for general objects based on simulation.arXiv preprint arXiv:2210.02697, 2022

arXiv 2022
[8]

J. Chen, Y . Ke, and H. Wang. Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization.arXiv preprint arXiv:2412.16490, 2024

arXiv 2024
[9]

Turpin, T

D. Turpin, T. Zhong, S. Zhang, G. Zhu, J. Liu, R. Singh, E. Heiden, M. Macklin, S. Tsogkas, S. Dickinson, et al. Fast-grasp’d: Dexterous multi-finger grasp generation through differen- tiable simulation.arXiv preprint arXiv:2306.08132, 2023

arXiv 2023
[10]

Huang, T

D. Huang, T. Zhang, Y . Li, L. Zhao, J. Li, Z. Fang, C. Xia, and X. He. Dexterous grasping with real-world robotic reinforcement learning.arXiv preprint arXiv:2503.04014, 2025. 9

arXiv 2025
[11]

Y . Park, J. S. Bhatia, L. Ankile, and P. Agrawal. Dart: Dexterous augmented reality teleoper- ation platform for large-scale robot data collection in simulation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13883–13889. IEEE, 2025

2025
[12]

T. Liu, Z. Liu, Z. Jiao, Y . Zhu, and S.-C. Zhu. Synthesizing diverse and physically stable grasps with arbitrary hand structures using differentiable force closure estimator.IEEE Robotics and Automation Letters, 7(1):470–477, Jan. 2022. ISSN 2377-3774. doi:10.1109/lra.2021. 3129138. URLhttp://dx.doi.org/10.1109/LRA.2021.3129138

work page doi:10.1109/lra.2021 2022
[13]

Zhang, S

H. Zhang, S. Christen, Z. Fan, O. Hilliges, and J. Song. GraspXL: Generating grasping motions for diverse objects at scale. InEuropean Conference on Computer Vision (ECCV), 2024

2024
[14]

S. Chen, J. Bohg, and C. K. Liu. Springgrasp: Synthesizing compliant, dexterous grasps under shape uncertainty.arXiv preprint arXiv:2404.13532, 2024

arXiv 2024
[15]

A. H. Li, P. Culbertson, J. W. Burdick, and A. D. Ames. Frogger: Fast robust grasp generation via the min-weight metric. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6809–6816. IEEE, 2023

2023
[16]

Lundell, F

J. Lundell, F. Verdoja, and V . Kyrki. Ddgc: Generative deep dexterous grasping in clutter. arXiv preprint arXiv:2103.04783, 2021

arXiv 2021
[17]

Y . Xu, W. Wan, J. Zhang, H. Liu, Z. Shan, H. Shen, R. Wang, H. Geng, Y . Weng, J. Chen, et al. Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023

2023
[18]

W. Wan, H. Geng, Y . Liu, Z. Shan, Y . Yang, L. Yi, and H. Wang. Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist- specialist learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

2023
[19]

J. Ye, K. Wang, C. Yuan, R. Yang, Y . Li, J. Zhu, Y . Qin, X. Zou, and X. Wang. Dex1b: Learning with 1b demonstrations for dexterous manipulation. InRobotics: Science and Systems (RSS), 2025

2025
[20]

Z. Q. Chen, K. Van Wyk, Y .-W. Chao, W. Yang, A. Mousavian, A. Gupta, and D. Fox. Learning robust real-world dexterous grasping policies via implicit shape augmentation.arXiv preprint arXiv:2210.13638, 2022

arXiv 2022
[21]

Christen, M

S. Christen, M. Kocabas, E. Aksan, J. Hwangbo, J. Song, and O. Hilliges. D-grasp: Phys- ically plausible dynamic grasp synthesis for hand-object interactions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20577–20586, 2022

2022
[22]

Tobin, R

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

2017
[23]

Akkaya, M

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, et al. Solving rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019

Pith/arXiv arXiv 1910
[24]

Levine, P

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International Journal of Robotics Research (IJRR), 37(4-5):421–436, 2018

2018
[25]

Kalashnikov, A

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke, and S. Levine. QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. InConference on Robot Learning (CoRL), 2018. 10

2018
[26]

Kalashnikov, J

D. Kalashnikov, J. Varley, Y . Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman. Scaling up multi-task robotic reinforcement learning. InConference on Robot Learning (CoRL), 2021

2021
[27]

M. Ahn, D. Dwibedi, C. Finn, M. Arenas, K. Armstrong, V . Baruch, S. Belkhale, A. Bro- han, N. Brown, K. Choromanski, et al. AutoRT: Embodied foundation models for large scale orchestration of robotic agents.arXiv preprint arXiv:2401.12963, 2024

arXiv 2024
[28]

H. Zhu, J. Yu, A. Gupta, D. Shah, K. Hartikainen, A. Singh, V . Kumar, and S. Levine. The in- gredients of real-world robotic reinforcement learning. InInternational Conference on Learn- ing Representations (ICLR), 2020

2020
[29]

Sharma, A

A. Sharma, A. M. Ahmed, R. Ahmad, and C. Finn. Self-improving robots: End-to-end au- tonomous visuomotor reinforcement learning. InConference on Robot Learning (CoRL), 2023

2023
[30]

H. Liu, S. Nasiriany, L. Zhang, Z. Bao, and Y . Zhu. Robot learning on the job: Human-in- the-loop autonomy and learning during deployment. InRobotics: Science and Systems (RSS), 2023

2023
[31]

Mirchandani, S

S. Mirchandani, S. Belkhale, J. Hejna, E. Choi, M. S. Islam, and D. Sadigh. So you think you can scale up autonomous robot data collection? InConference on Robot Learning (CoRL), 2024

2024
[32]

J. Yu, L. Fu, H. Huang, K. El-Refai, R. A. Ambrus, R. Cheng, M. Z. Irshad, and K. Goldberg. Real2Render2Real: Scaling robot data without dynamics simulation or robot hardware.arXiv preprint arXiv:2505.09601, 2025

arXiv 2025
[33]

Carion, L

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, et al. Sam 3: Segment anything with concepts.arXiv:2511.16719, 2025

Pith/arXiv arXiv 2025
[34]

B. Wen, W. Yang, J. Kautz, and S. Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024
[35]

H. Lin, S. Chen, J. Liew, D. Y . Chen, Z. Li, G. Shi, J. Feng, and B. Kang. Depth anything 3: Recovering the visual space from any views.arXiv:2511.10647, 2025

Pith/arXiv arXiv 2025
[36]

Hinterstoisser, V

S. Hinterstoisser, V . Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. InAsian conference on computer vision, 2012

2012
[37]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–

2012
[38]

Todorov, T

IEEE, 2012. doi:10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012
[39]

E. P. ¨Ornek, Y . Labb´e, B. Tekin, L. Ma, C. Keskin, C. Forster, and T. Hoda ˇn. Foundpose: Unseen object pose estimation with foundation features.European Conference on Computer Vision (ECCV), 2024

2024
[40]

V . N. Nguyen, C. Forster, B. Tekin, S. Shkodrani, V . Lepetit, C. Keskin, and T. Hodaˇn. Gotrack: Generic 6dof object pose refinement and tracking.Computer Vision and Pattern Recognition Workshops (CVPRW), 2025

2025
[41]

Sundaralingam, S

B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk, V . Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos, et al. Curobo: Parallelized collision-free robot motion generation. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 8112–8119. IEEE, 2023. 11 Supplementary Material A Candidate Generation and Executi...

2023

[1] [1]

Y . Liu, Y . Yang, Y . Wang, X. Wu, J. Wang, Y . Yao, S. Schwertfeger, S. Yang, W. Wang, J. Yu, et al. Realdex: Towards human-like grasping for robotic dexterous hand.arXiv preprint arXiv:2402.13853, 2024

arXiv 2024

[2] [2]

C. Wang, H. Shi, W. Wang, R. Zhang, L. Fei-Fei, and C. K. Liu. Dexcap: Scalable and portable mocap data collection system for dexterous manipulation.arXiv preprint arXiv:2403.07788, 2024

arXiv 2024

[3] [3]

Z. Chen, K. Van Wyk, Y .-W. Chao, W. Yang, A. Mousavian, A. Gupta, and D. Fox. Dextransfer: Real world multi-fingered dexterous grasping with minimal human demonstrations.arXiv preprint arXiv:2209.14284, 2022

arXiv 2022

[4] [4]

Zhang, H

J. Zhang, H. Liu, D. Li, X. Yu, H. Geng, Y . Ding, J. Chen, and H. Wang. DexGraspNet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. InConference on Robot Learning (CoRL), 2024

2024

[5] [5]

Bicchi and V

A. Bicchi and V . Kumar. Robotic grasping and contact: A review. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), volume 1, pages 348– 353, 2000

2000

[6] [6]

Y . Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y .-W. Chao, and D. Fox. Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system. InRobotics: Science and Systems, 2023

2023

[7] [7]

R. Wang, J. Zhang, J. Chen, Y . Xu, P. Li, T. Liu, and H. Wang. Dexgraspnet: A large- scale robotic dexterous grasp dataset for general objects based on simulation.arXiv preprint arXiv:2210.02697, 2022

arXiv 2022

[8] [8]

J. Chen, Y . Ke, and H. Wang. Bodex: Scalable and efficient robotic dexterous grasp synthesis using bilevel optimization.arXiv preprint arXiv:2412.16490, 2024

arXiv 2024

[9] [9]

Turpin, T

D. Turpin, T. Zhong, S. Zhang, G. Zhu, J. Liu, R. Singh, E. Heiden, M. Macklin, S. Tsogkas, S. Dickinson, et al. Fast-grasp’d: Dexterous multi-finger grasp generation through differen- tiable simulation.arXiv preprint arXiv:2306.08132, 2023

arXiv 2023

[10] [10]

Huang, T

D. Huang, T. Zhang, Y . Li, L. Zhao, J. Li, Z. Fang, C. Xia, and X. He. Dexterous grasping with real-world robotic reinforcement learning.arXiv preprint arXiv:2503.04014, 2025. 9

arXiv 2025

[11] [11]

Y . Park, J. S. Bhatia, L. Ankile, and P. Agrawal. Dart: Dexterous augmented reality teleoper- ation platform for large-scale robot data collection in simulation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13883–13889. IEEE, 2025

2025

[12] [12]

T. Liu, Z. Liu, Z. Jiao, Y . Zhu, and S.-C. Zhu. Synthesizing diverse and physically stable grasps with arbitrary hand structures using differentiable force closure estimator.IEEE Robotics and Automation Letters, 7(1):470–477, Jan. 2022. ISSN 2377-3774. doi:10.1109/lra.2021. 3129138. URLhttp://dx.doi.org/10.1109/LRA.2021.3129138

work page doi:10.1109/lra.2021 2022

[13] [13]

Zhang, S

H. Zhang, S. Christen, Z. Fan, O. Hilliges, and J. Song. GraspXL: Generating grasping motions for diverse objects at scale. InEuropean Conference on Computer Vision (ECCV), 2024

2024

[14] [14]

S. Chen, J. Bohg, and C. K. Liu. Springgrasp: Synthesizing compliant, dexterous grasps under shape uncertainty.arXiv preprint arXiv:2404.13532, 2024

arXiv 2024

[15] [15]

A. H. Li, P. Culbertson, J. W. Burdick, and A. D. Ames. Frogger: Fast robust grasp generation via the min-weight metric. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6809–6816. IEEE, 2023

2023

[16] [16]

Lundell, F

J. Lundell, F. Verdoja, and V . Kyrki. Ddgc: Generative deep dexterous grasping in clutter. arXiv preprint arXiv:2103.04783, 2021

arXiv 2021

[17] [17]

Y . Xu, W. Wan, J. Zhang, H. Liu, Z. Shan, H. Shen, R. Wang, H. Geng, Y . Weng, J. Chen, et al. Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023

2023

[18] [18]

W. Wan, H. Geng, Y . Liu, Z. Shan, Y . Yang, L. Yi, and H. Wang. Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist- specialist learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

2023

[19] [19]

J. Ye, K. Wang, C. Yuan, R. Yang, Y . Li, J. Zhu, Y . Qin, X. Zou, and X. Wang. Dex1b: Learning with 1b demonstrations for dexterous manipulation. InRobotics: Science and Systems (RSS), 2025

2025

[20] [20]

Z. Q. Chen, K. Van Wyk, Y .-W. Chao, W. Yang, A. Mousavian, A. Gupta, and D. Fox. Learning robust real-world dexterous grasping policies via implicit shape augmentation.arXiv preprint arXiv:2210.13638, 2022

arXiv 2022

[21] [21]

Christen, M

S. Christen, M. Kocabas, E. Aksan, J. Hwangbo, J. Song, and O. Hilliges. D-grasp: Phys- ically plausible dynamic grasp synthesis for hand-object interactions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20577–20586, 2022

2022

[22] [22]

Tobin, R

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

2017

[23] [23]

Akkaya, M

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, et al. Solving rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019

Pith/arXiv arXiv 1910

[24] [24]

Levine, P

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International Journal of Robotics Research (IJRR), 37(4-5):421–436, 2018

2018

[25] [25]

Kalashnikov, A

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke, and S. Levine. QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. InConference on Robot Learning (CoRL), 2018. 10

2018

[26] [26]

Kalashnikov, J

D. Kalashnikov, J. Varley, Y . Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman. Scaling up multi-task robotic reinforcement learning. InConference on Robot Learning (CoRL), 2021

2021

[27] [27]

M. Ahn, D. Dwibedi, C. Finn, M. Arenas, K. Armstrong, V . Baruch, S. Belkhale, A. Bro- han, N. Brown, K. Choromanski, et al. AutoRT: Embodied foundation models for large scale orchestration of robotic agents.arXiv preprint arXiv:2401.12963, 2024

arXiv 2024

[28] [28]

H. Zhu, J. Yu, A. Gupta, D. Shah, K. Hartikainen, A. Singh, V . Kumar, and S. Levine. The in- gredients of real-world robotic reinforcement learning. InInternational Conference on Learn- ing Representations (ICLR), 2020

2020

[29] [29]

Sharma, A

A. Sharma, A. M. Ahmed, R. Ahmad, and C. Finn. Self-improving robots: End-to-end au- tonomous visuomotor reinforcement learning. InConference on Robot Learning (CoRL), 2023

2023

[30] [30]

H. Liu, S. Nasiriany, L. Zhang, Z. Bao, and Y . Zhu. Robot learning on the job: Human-in- the-loop autonomy and learning during deployment. InRobotics: Science and Systems (RSS), 2023

2023

[31] [31]

Mirchandani, S

S. Mirchandani, S. Belkhale, J. Hejna, E. Choi, M. S. Islam, and D. Sadigh. So you think you can scale up autonomous robot data collection? InConference on Robot Learning (CoRL), 2024

2024

[32] [32]

J. Yu, L. Fu, H. Huang, K. El-Refai, R. A. Ambrus, R. Cheng, M. Z. Irshad, and K. Goldberg. Real2Render2Real: Scaling robot data without dynamics simulation or robot hardware.arXiv preprint arXiv:2505.09601, 2025

arXiv 2025

[33] [33]

Carion, L

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, et al. Sam 3: Segment anything with concepts.arXiv:2511.16719, 2025

Pith/arXiv arXiv 2025

[34] [34]

B. Wen, W. Yang, J. Kautz, and S. Birchfield. Foundationpose: Unified 6d pose estimation and tracking of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024

[35] [35]

H. Lin, S. Chen, J. Liew, D. Y . Chen, Z. Li, G. Shi, J. Feng, and B. Kang. Depth anything 3: Recovering the visual space from any views.arXiv:2511.10647, 2025

Pith/arXiv arXiv 2025

[36] [36]

Hinterstoisser, V

S. Hinterstoisser, V . Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. InAsian conference on computer vision, 2012

2012

[37] [37]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–

2012

[38] [38]

Todorov, T

IEEE, 2012. doi:10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012

[39] [39]

E. P. ¨Ornek, Y . Labb´e, B. Tekin, L. Ma, C. Keskin, C. Forster, and T. Hoda ˇn. Foundpose: Unseen object pose estimation with foundation features.European Conference on Computer Vision (ECCV), 2024

2024

[40] [40]

V . N. Nguyen, C. Forster, B. Tekin, S. Shkodrani, V . Lepetit, C. Keskin, and T. Hodaˇn. Gotrack: Generic 6dof object pose refinement and tracking.Computer Vision and Pattern Recognition Workshops (CVPRW), 2025

2025

[41] [41]

Sundaralingam, S

B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk, V . Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos, et al. Curobo: Parallelized collision-free robot motion generation. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 8112–8119. IEEE, 2023. 11 Supplementary Material A Candidate Generation and Executi...

2023