arxiv: 2603.05687 · v3 · submitted 2026-03-05 · 💻 cs.RO

Recognition: no theorem link

Contact-Grounded Policy: Dexterous Visuotactile Policy with Generative Contact Grounding

Zhengtong Xu , Yeping Wang , Ben Abbatematteo , Jom Preechayasomboon , Sonny Chan , Nick Colonnese , Amirhossein H. Memar

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:35 UTC · model grok-4.3

classification 💻 cs.RO

keywords dexterous manipulationvisuotactile policycontact groundingdiffusion modelcompliance controllermulti-finger handin-hand manipulationtool use

0 comments

The pith

Contact-Grounded Policy improves dexterous manipulation by predicting state-tactile trajectories and mapping them to compliant controller targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Contact-Grounded Policy (CGP) for contact-rich dexterous manipulation where success hinges on evolving multi-point contacts that depend on geometry and friction. CGP uses a conditional diffusion model to forecast future robot states and tactile feedback together in a compressed latent space. A learned contact-consistency mapping then turns those predictions into executable target states for a compliance controller. This explicit grounding lets the policy account for how its outputs interact with low-level contact dynamics. On a physical four-finger Allegro hand and a simulated five-finger Tesollo hand, CGP outperforms standard visuomotor and visuotactile diffusion baselines across in-hand manipulation, delicate grasping, and tool-use tasks.

Core claim

CGP grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback with a conditional diffusion model in compressed latent space, then applies a learned contact-consistency mapping to convert the predicted state-tactile pairs into executable target robot states for a compliance controller that can realize the intended contacts.

What carries the argument

The learned contact-consistency mapping that converts predicted robot state-tactile pairs into executable targets for the compliance controller.

Load-bearing premise

The learned contact-consistency mapping will reliably convert predicted state-tactile pairs into executable targets that the compliance controller can realize without introducing new slip or instability.

What would settle it

Measure whether the compliance controller achieves the exact predicted contacts without added slip when executing the mapped targets versus baseline predictions on the physical Allegro hand during a delicate grasping trial.

Figures

Figures reproduced from arXiv: 2603.05687 by Amirhossein H. Memar, Ben Abbatematteo, Jom Preechayasomboon, Nick Colonnese, Sonny Chan, Yeping Wang, Zhengtong Xu.

**Figure 1.** Figure 1: Contact-Grounded Policy (CGP) enables fine-grained, contact-rich dexterous manipulation by grounding multi-point [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Schematic of contact grounding using a 3-DoF revolute [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of Contact-Grounded Policy (CGP). CGP grounds multi-point contacts by predicting coupled trajectories [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Teleoperation pipeline. We use a Meta Quest 3 headset [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Hand configuration predictions by the contact [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Inference time comparison. Average time over 50 [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 7.** Figure 7: Ablation results of KL regularization for tactile com [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 9.** Figure 9: VAE reconstruction examples on the validation set for tactile arrays. We show ground truth (top) and reconstruction [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: VAE reconstruction examples on the validation set for Digit360 tactile images. We show ground truth (top) and [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: 11 objects with different shapes and sizes used for [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Snapshot of real-world Contact-Grounded Policy rollouts, with overlaid target and actual robot states. Before contact, [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

read the original abstract

Contact-rich dexterous manipulation with multi-finger hands remains an open challenge in robotics because task success depends on multi-point contacts that continuously evolve and are highly sensitive to object geometry, frictional transitions, and slip. Recently, tactile-informed manipulation policies have shown promise. However, most use tactile signals as additional observations rather than modeling contact state or how their action outputs interact with low-level controller dynamics. We present Contact-Grounded Policy (CGP), a visuotactile policy that grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller. CGP consists of two components: (i) a conditional diffusion model that forecasts future robot state and tactile feedback in a compressed latent space, and (ii) a learned contact-consistency mapping that converts the predicted robot state-tactile pair into executable targets for a compliance controller, enabling it to realize the intended contacts. We evaluate CGP using a physical four-finger Allegro V5 hand with Digit360 fingertip tactile sensors, and a simulated five-finger Tesollo DG-5F hand with dense whole-hand tactile arrays. Across a range of dexterous tasks including in-hand manipulation, delicate grasping, and tool use, CGP outperforms visuomotor and visuotactile diffusion-policy baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CGP adds a contact-consistency mapping to a conditional diffusion model so predicted state-tactile pairs become compliance targets, and it reports gains over baselines on Allegro and Tesollo hardware for in-hand tasks.

read the letter

The core addition is the learned contact-consistency mapping that turns diffusion-predicted robot state and tactile pairs into executable targets for the compliance controller. This is a targeted step beyond standard diffusion policies that treat tactile signals mainly as observations. The paper shows the approach on a physical four-finger Allegro V5 with Digit360 sensors and a simulated five-finger hand, covering in-hand manipulation, delicate grasping, and tool use, with claims of outperformance over visuomotor and visuotactile diffusion baselines. That hardware scope and the explicit attempt to ground contacts in controller dynamics are the parts that stand out as useful extensions of existing diffusion-policy work. The evaluation setup itself looks reasonable for the subfield. The main limitation is that the abstract supplies no numbers, error bars, ablation tables, or training-data details, so the strength of the outperformance claim is hard to judge from the summary alone. The stress-test concern about whether the mapping reliably preserves contact geometry and friction under actual controller dynamics is fair to raise; if the full paper lacks a derivation or empirical check on slip and instability in rapid contact changes, that part of the argument stays plausible rather than demonstrated. The method is empirical and avoids obvious circularity, but the lack of quantitative grounding in the provided text keeps the central result provisional. This is for robotics researchers focused on visuotactile dexterous manipulation and diffusion-based policies. A reader already working in that area would find the architecture and hardware experiments worth examining. It deserves peer review because the problem is real, the proposed fix is concrete, and the experiments reach physical hardware, even if the results section will need closer inspection on metrics and ablations.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Contact-Grounded Policy (CGP), a visuotactile policy for dexterous manipulation. It employs a conditional diffusion model to forecast coupled trajectories of robot state and tactile feedback in latent space, paired with a learned contact-consistency mapping that translates these predictions into target states for a compliance controller. The approach is evaluated on physical and simulated multi-finger hands across in-hand manipulation, delicate grasping, and tool use tasks, claiming superior performance over visuomotor and visuotactile diffusion baselines.

Significance. If the empirical claims hold under rigorous verification, CGP could advance contact-rich dexterous manipulation by explicitly modeling evolving multi-point contacts and grounding predictions in controller dynamics. The dual physical-simulated evaluation and use of generative modeling for state-tactile forecasting are positive elements that target key sensitivities to geometry, friction, and slip.

major comments (2)

[Evaluation] The abstract states that CGP outperforms baselines across tasks but supplies no quantitative metrics, error bars, ablation results, or training-data distribution details; this renders the central performance claim unverifiable from the provided text and weakens assessment of statistical reliability.
[Method] The contact-consistency mapping is presented as converting diffusion-predicted state-tactile pairs into executable compliance-controller targets, yet no derivation, stability bound, or analysis is given showing preservation of contact geometry and friction constraints under controller dynamics (particularly for rapid frictional transitions or evolving multi-point contacts).

minor comments (2)

[Method] Provide explicit architecture details for the diffusion model (noise schedule, latent dimensions) and contact-consistency network (training loss, input/output mappings) to support reproducibility.
[Evaluation] Clarify the exact composition of the visuomotor and visuotactile diffusion-policy baselines, including whether they share the same diffusion backbone or controller.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on the evaluation and methodological aspects of our work. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Evaluation] The abstract states that CGP outperforms baselines across tasks but supplies no quantitative metrics, error bars, ablation results, or training-data distribution details; this renders the central performance claim unverifiable from the provided text and weakens assessment of statistical reliability.

Authors: We acknowledge that the abstract does not include specific quantitative metrics. The full paper provides detailed results with error bars from repeated trials, ablation studies, and information on the training data distribution in Sections 4 and 5. To strengthen the abstract's verifiability, we will add key performance metrics, such as average success rates with standard deviations and notes on ablations, to the revised abstract. revision: yes
Referee: [Method] The contact-consistency mapping is presented as converting diffusion-predicted state-tactile pairs into executable compliance-controller targets, yet no derivation, stability bound, or analysis is given showing preservation of contact geometry and friction constraints under controller dynamics (particularly for rapid frictional transitions or evolving multi-point contacts).

Authors: The contact-consistency mapping is a neural network trained end-to-end to ensure that the diffusion model's predictions correspond to achievable states under the compliance controller, thereby preserving the intended contact geometry and friction properties as demonstrated in our physical and simulated experiments. While we do not provide a formal mathematical derivation or stability bounds in the current version, we will include additional analysis in the revised manuscript discussing how the mapping maintains contact constraints, supported by empirical observations on frictional transitions and multi-point contacts. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical pipeline: a conditional diffusion model is trained on observed state-tactile trajectories to forecast future pairs in latent space, after which a separate learned contact-consistency mapping converts those predictions into compliance-controller targets. Neither component is defined in terms of the other, nor is any fitted parameter relabeled as a prediction; both are trained on external data and evaluated against independent baselines on physical and simulated hardware. No self-citation chain, uniqueness theorem, or ansatz is invoked to force the central performance claims, so the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard assumptions of diffusion-model training (Gaussian noise schedule, Markovian reverse process) and the existence of a sufficiently expressive compliance controller. No new physical axioms or invented entities beyond the learned contact-consistency mapping are introduced.

free parameters (2)

diffusion noise schedule parameters
Standard hyperparameters of the conditional diffusion model that are tuned during training.
contact-consistency network weights
Learned parameters that map predicted state-tactile pairs to controller targets.

axioms (1)

domain assumption The compliance controller can realize any target pose within its workspace without instability when the target is within the learned mapping's output distribution.
Invoked when the contact-consistency mapping produces executable targets.

invented entities (1)

contact-consistency mapping no independent evidence
purpose: Converts predicted robot-state and tactile pairs into executable controller targets.
New learned module introduced by the paper; no independent evidence outside the training data is provided.

pith-pipeline@v0.9.0 · 5580 in / 1497 out tokens · 36351 ms · 2026-05-15T15:35:40.004090+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 3 internal anchors

[1]

Dexterous manipulation through imitation learning: A survey.arXiv preprint arXiv:2504.03515, 2025

Shan An, Ziyu Meng, Chao Tang, Yuning Zhou, Tengyu Liu, Fangqiang Ding, Shufang Zhang, Yao Mu, Ran Song, Wei Zhang, et al. Dexterous manipulation through imitation learning: A survey.arXiv preprint arXiv:2504.03515, 2025

work page arXiv 2025
[2]

Licrom: Linear-subspace continuous reduced order modeling with neural fields

Yue Chang, Peter Yichen Chen, Zhecheng Wang, Mau- rizio M Chiaramonte, Kevin Carlberg, and Eitan Grin- spun. Licrom: Linear-subspace continuous reduced order modeling with neural fields. InSIGGRAPH Asia 2023 Conference Papers, pages 1–12, 2023

work page 2023
[3]

Dexforce: Extracting force-informed actions from kinesthetic demonstrations for dexterous manipulation.IEEE Robotics and Automa- tion Letters, 10(6):6416–6423, 2025

Claire Chen, Zhongchun Yu, Hojung Choi, Mark Cutkosky, and Jeannette Bohg. Dexforce: Extracting force-informed actions from kinesthetic demonstrations for dexterous manipulation.IEEE Robotics and Automa- tion Letters, 10(6):6416–6423, 2025. doi: 10.1109/LRA. 2025.3568318

work page doi:10.1109/lra 2025
[4]

Dif- fusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems, 2023

work page 2023
[5]

In-the-Wild Compliant Manipulation with UMI-FT

Hojung Choi, Yifan Hou, Chuer Pan, Seongheon Hong, Austin Patel, Xiaomeng Xu, Mark R Cutkosky, and Shuran Song. In-the-wild compliant manipulation with umi-ft.arXiv preprint arXiv:2601.09988, 2026

work page arXiv 2026
[6]

Anydex- grasp: General dexterous grasping for different hands with human-level learning efficiency.arXiv preprint arXiv:2502.16420, 2025

Hao-Shu Fang, Hengxu Yan, Zhenyu Tang, Hongjie Fang, Chenxi Wang, and Cewu Lu. Anydex- grasp: General dexterous grasping for different hands with human-level learning efficiency.arXiv preprint arXiv:2502.16420, 2025

work page arXiv 2025
[7]

Online optical marker-based hand tracking with deep labels.Acm Transactions on Graphics (TOG), 37(4):1–10, 2018

Shangchen Han, Beibei Liu, Robert Wang, Yuting Ye, Christopher D Twigg, and Kenrick Kin. Online optical marker-based hand tracking with deep labels.Acm Transactions on Graphics (TOG), 37(4):1–10, 2018

work page 2018
[8]

Umetrack: Unified multi-view end-to-end hand tracking for vr

Shangchen Han, Po-chen Wu, Yubo Zhang, Beibei Liu, Linguang Zhang, Zheng Wang, Weiguang Si, Peizhao Zhang, Yujun Cai, Tomas Hodan, et al. Umetrack: Unified multi-view end-to-end hand tracking for vr. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022

work page 2022
[9]

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

Liang Heng, Haoran Geng, Kaifeng Zhang, Pieter Abbeel, and Jitendra Malik. Vitacformer: Learning cross- modal representation for visuo-tactile dexterous manipu- lation.arXiv preprint arXiv:2506.15953, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 2020

work page 2020
[11]

Adaptive compliance policy: Learning approximate compliance for diffusion guided control

Yifan Hou, Zeyi Liu, Cheng Chi, Eric Cousineau, Naveen Kuppuswamy, Siyuan Feng, Benjamin Burchfiel, and Shuran Song. Adaptive compliance policy: Learning approximate compliance for diffusion guided control. In IEEE International Conference on Robotics and Automa- tion (ICRA), pages 4829–4836, 2025

work page 2025
[12]

3d-vitac: Learning fine-grained ma- nipulation with visuo-tactile sensing

Binghao Huang, Yixuan Wang, Xinyi Yang, Yiyue Luo, and Yunzhu Li. 3d-vitac: Learning fine-grained ma- nipulation with visuo-tactile sensing. In8th Annual Conference on Robot Learning, 2024

work page 2024
[13]

Multimodal Diffusion Forcing for Forceful Manipulation

Zixuan Huang, Huaidian Hou, and Dmitry Berenson. Unified multimodal diffusion forcing for forceful manip- ulation.arXiv preprint arXiv:2511.04812, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Sampling-based exploration for reinforcement learning of dexterous manipulation

Gagan Khandate, Siqi Shang, Eric T Chang, Tristan Luca Saidi, Yang Liu, Seth Matthew Dennis, Johnson Adams, and Matei Ciocarlie. Sampling-based exploration for reinforcement learning of dexterous manipulation. In Proceedings of Robotics: Science and Systems, 2023

work page 2023
[15]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[16]

Digitiz- ing touch with an artificial multimodal fingertip.arXiv preprint arXiv:2411.02479, 2024

Mike Lambeta, Tingfan Wu, Ali Sengul, Victoria Rose Most, Nolan Black, Kevin Sawyer, Romeo Mercado, Haozhi Qi, Alexander Sohn, Byron Taylor, et al. Digitiz- ing touch with an artificial multimodal fingertip.arXiv preprint arXiv:2411.02479, 2024

work page arXiv 2024
[17]

Twisting lids off with two hands

Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, and Jitendra Malik. Twisting lids off with two hands. In Conference on Robot Learning, 2024

work page 2024
[18]

Factr: Force-attending curriculum training for contact-rich pol- icy learning

Jason Jingzhou Liu, Yulong Li, Kenneth Shaw, Tony Tao, Ruslan Salakhutdinov, and Deepak Pathak. Factr: Force-attending curriculum training for contact-rich pol- icy learning. InProceedings of Robotics: Science and Systems, 2025

work page 2025
[19]

Mla: A multisensory language-action model for multimodal understanding and forecasting in robotic manipulation.arXiv preprint arXiv:2509.26642, 2025

Zhuoyang Liu, Jiaming Liu, Jiadong Xu, Nuowei Han, Chenyang Gu, Hao Chen, Kaichen Zhou, Renrui Zhang, Kai Chin Hsieh, Kun Wu, et al. Mla: A multisensory language-action model for multimodal understanding and forecasting in robotic manipulation.arXiv preprint arXiv:2509.26642, 2025

work page arXiv 2025
[20]

Film: Visual reasoning with a general conditioning layer

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, 2018

work page 2018
[21]

From simple to complex skills: The case of in-hand object reorientation

Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, and Jitendra Malik. From simple to complex skills: The case of in-hand object reorientation. InIEEE International Conference on Robotics and Automation (ICRA), 2025

work page 2025
[22]

Re- laxedik: Real-time synthesis of accurate and feasible robot arm motion

Daniel Rakita, Bilge Mutlu, and Michael Gleicher. Re- laxedik: Real-time synthesis of accurate and feasible robot arm motion. InRobotics: Science and Systems, volume 14, pages 26–30, 2018

work page 2018
[23]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

work page 2022
[24]

Learning contact deformations with general collider descriptors

Cristian Romero, Dan Casas, Maurizio Chiaramonte, and Miguel A Otaduy. Learning contact deformations with general collider descriptors. InSIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023

work page 2023
[25]

De- noising diffusion implicit models.Proceedings of Inter- national Conference on Learning Representations, 2021

Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models.Proceedings of Inter- national Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=St1giarCHLP

work page 2021
[26]

Interpolated adaptive linear reduced order modeling for deformation dynamics.arXiv preprint arXiv:2509.25392, 2025

Yutian Tao, Maurizio Chiaramonte, and Pablo Fernandez. Interpolated adaptive linear reduced order modeling for deformation dynamics.arXiv preprint arXiv:2509.25392, 2025

work page arXiv 2025
[27]

deoxys control

UT-Austin-RPL. deoxys control. GitHub Repository,

work page
[28]

URL https://github.com/UT-Austin-RPL/deoxys control

work page
[29]

Rangedik: An optimization-based robot motion generation method for ranged-goal tasks

Yeping Wang, Pragathi Praveena, Daniel Rakita, and Michael Gleicher. Rangedik: An optimization-based robot motion generation method for ranged-goal tasks. arXiv preprint arXiv:2302.13935, 2023

work page arXiv 2023
[30]

DexUMI: Us- ing human hand as the universal manipulation interface for dexterous manipulation

Mengda Xu, Han Zhang, Yifan Hou, Zhenjia Xu, Linxi Fan, Manuela Veloso, and Shuran Song. DexUMI: Us- ing human hand as the universal manipulation interface for dexterous manipulation. In9th Annual Conference on Robot Learning, 2025. URL https://openreview.net/ forum?id=XrgRvBklWu

work page 2025
[31]

Compliant residual DAgger: Improving real-world contact-rich manipulation with human corrections

Xiaomeng Xu, Yifan Hou, Zeyi Liu, and Shuran Song. Compliant residual DAgger: Improving real-world contact-rich manipulation with human corrections. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/ forum?id=cjcm5LYVWm

work page 2025
[32]

Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy

Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, et al. Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023

work page 2023
[33]

Unit: Data efficient tactile representation with generalization to unseen objects.IEEE Robotics and Automation Letters, 2025

Zhengtong Xu, Raghava Uppuluri, Xinwei Zhang, Cael Fitch, Philip Glen Crandall, Wan Shou, Dongyi Wang, and Yu She. Unit: Data efficient tactile representation with generalization to unseen objects.IEEE Robotics and Automation Letters, 2025

work page 2025
[34]

Reactive dif- fusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation

Han Xue, Jieji Ren, Wendi Chen, Gu Zhang, Yuan Fang, Guoying Gu, Huazhe Xu, and Cewu Lu. Reactive dif- fusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation. InProceedings of Robotics: Science and Systems, 2025

work page 2025
[35]

Dex1b: Learning with 1b demonstrations for dexterous manipulation

Jianglong Ye, Keyi Wang, Chengjing Yuan, Ruihan Yang, Yiquan Li, Jiyue Zhu, Yuzhe Qin, Xueyan Zou, and Xi- aolong Wang. Dex1b: Learning with 1b demonstrations for dexterous manipulation. InProceedings of Robotics: Science and Systems, 2025

work page 2025
[36]

Rotating without seeing: Towards in-hand dexterity through touch

Zhao-Heng Yin, Binghao Huang, Yuzhe Qin, Qifeng Chen, and Xiaolong Wang. Rotating without seeing: Towards in-hand dexterity through touch. InProceedings of Robotics: Science and Systems, 2023

work page 2023
[37]

Kinedex: Learning tactile- informed visuomotor policies via kinesthetic teaching for dexterous manipulation

Di Zhang, Chengbo Yuan, Chuan Wen, Hai Zhang, Junqiao Zhao, and Yang Gao. Kinedex: Learning tactile- informed visuomotor policies via kinesthetic teaching for dexterous manipulation. In9th Annual Conference on Robot Learning, 2025. URL https://openreview.net/ forum?id=GKueYvjqSS

work page 2025
[38]

Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes

Jialiang Zhang, Haoran Liu, Danshi Li, XinQiang Yu, Haoran Geng, Yufei Ding, Jiayi Chen, and He Wang. Dexgraspnet 2.0: Learning generative dexterous grasping in large-scale synthetic cluttered scenes. In8th Annual Conference on Robot Learning, 2024

work page 2024
[39]

Polytouch: A robust multi-modal tactile sensor for contact-rich manip- ulation using tactile-diffusion policies

Jialiang Zhao, Naveen Kuppuswamy, Siyuan Feng, Ben- jamin Burchfiel, and Edward Adelson. Polytouch: A robust multi-modal tactile sensor for contact-rich manip- ulation using tactile-diffusion policies. InIEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 104–110, 2025. doi: 10.1109/ICRA55743.2025. 11128816

work page doi:10.1109/icra55743.2025 2025
[40]

On the continuity of rotation representations in neural networks

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. InProceedings of IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 5745–5753, 2019

work page 2019
[41]

Viola: Imitation learning for vision-based manipulation with object proposal pri- ors

Yifeng Zhu and Abhishek Joshi. Viola: Imitation learning for vision-based manipulation with object proposal pri- ors. InProceedings of Conference on Robot Learning, 2022

work page 2022
[42]

Neural stress fields for reduced-order elastoplasticity and frac- ture

Zeshun Zong, Xuan Li, Minchen Li, Maurizio M Chiara- monte, Wojciech Matusik, Eitan Grinspun, Kevin Carl- berg, Chenfanfu Jiang, and Peter Yichen Chen. Neural stress fields for reduced-order elastoplasticity and frac- ture. InSIGGRAPH Asia 2023 Conference Papers, pages 1–11, 2023. APPENDIXA ADDITIONALTASKDETAILS Table V summarizes the task, training, and ...

work page 2023