arxiv: 2605.11048 · v1 · submitted 2026-05-11 · 💻 cs.RO · cs.AI

Recognition: no theorem link

ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching

Shuoheng Zhang , Yifu Yuan , Hongyao Tang , Yan Zheng , Qiaojun Yu , Pengyi Li , Guowei Huang , Helong Huang

show 2 more authors

Xingyue Quan Jianye Hao

Authors on Pith no claims yet

Pith reviewed 2026-05-13 00:44 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords contact-rich manipulationforce feedbackflow matchingimitation learningmultimodal fusionrobotic policygeneralizationforce regulation

0 comments

The pith

ForceFlow couples force feedback with motion control using flow matching for contact-rich robot tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ForceFlow as a reactive framework that integrates force sensing into robot policies for tasks involving physical contact. It combines flow matching with an asymmetric fusion design where force signals regulate the overall policy, joint prediction of force and action, and a staged handover from vision-based positioning to force-driven execution. This setup aims to create tighter links between sensing and movement so policies generalize better without extensive per-task adjustments. A reader would care because many everyday robot interactions fail when contact forces are not handled precisely, limiting reliable autonomy in unstructured settings.

Core claim

ForceFlow is a force-aware reactive framework built upon flow matching. For contact-stage policy design, it adopts an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, combined with a joint prediction paradigm that enhances the policy's understanding of instantaneous force and historical information, thereby achieving deep coupling between force and motion. For task-level hierarchical decomposition, it divides manipulation into a vision-dominant approach stage and a touch-dominant interaction stage, with a Vision-to-Force (V2F) handover mechanism that explicitly decouples spatial generalization from contact regulation. Experiments on six real-world,

What carries the argument

Asymmetric multimodal fusion treating force as a global regulatory signal, combined with joint prediction paradigm and V2F handover, inside a flow-matching policy.

Load-bearing premise

The asymmetric multimodal fusion treating force as a global regulatory signal combined with joint prediction and V2F handover will produce stable deep coupling between force and motion without introducing instability or needing extensive task-specific tuning.

What would settle it

Experiments on the six contact-rich tasks where ForceFlow shows no meaningful gain in success rate over ForceVLA or produces unstable force predictions during contact phases.

read the original abstract

Existing imitation learning methods enable robots to interact autonomously with the physical environment. However, contact-rich manipulation tasks remain a significant challenge due to complex contact dynamics that demand high-precision force feedback and control. Although recent efforts have attempted to integrate force/torque sensing into policies, how to build a simple yet effective framework that achieves robust generalization under multimodal observations remains an open question. In this paper, we propose ForceFlow, a force-aware reactive framework built upon flow matching. For contact-stage policy design, we investigate force signal fusion mechanisms and adopt an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, combined with a joint prediction paradigm that enhances the policy's understanding of instantaneous force and historical information, thereby achieving deep coupling between force and motion. For task-level hierarchical decomposition, we divide manipulation into a vision-dominant approach stage (VLM-based pointing for target localization) and a touch-dominant interaction stage (force-driven contact execution), with a Vision-to-Force (V2F) handover mechanism that explicitly decouples spatial generalization from contact regulation. Experimental results across six real-world contact-rich tasks demonstrate that ForceFlow achieves a 37% success rate improvement over the strong baseline ForceVLA while maintaining significantly lower cost. Moreover, ForceFlow exhibits accurate force signal prediction and demonstrates superior performance in contact force self-regulation and zero-shot out-of-distribution (OOD) generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ForceFlow adds a practical hierarchical flow-matching setup for force in contact tasks and shows real-robot gains, but the 37% improvement lacks the ablations needed to tie it to the new pieces.

read the letter

The paper introduces ForceFlow, a flow-matching policy for contact-rich robotics that splits tasks into a vision-dominant approach phase and a force-dominant contact phase, with an explicit V2F handover between them. It treats force as a global regulatory signal in an asymmetric fusion and uses joint prediction of instantaneous and historical force to couple it with motion. On six real-world tasks it reports a 37% success-rate lift over ForceVLA, plus lower cost, accurate force prediction, better self-regulation, and stronger zero-shot OOD generalization.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes ForceFlow, a flow-matching based framework for contact-rich robotic manipulation. It uses an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, a joint prediction paradigm to couple instantaneous/historical force with motion, and a Vision-to-Force (V2F) handover for hierarchical decomposition separating vision-dominant approach from force-dominant contact execution. Experiments on six real-world tasks report a 37% success-rate improvement over ForceVLA, accurate force-signal prediction, superior contact-force self-regulation, lower cost, and better zero-shot OOD generalization.

Significance. If the performance gains are reproducible and causally attributable to the proposed fusion, joint-prediction, and handover mechanisms, the work would advance multimodal policy learning for precise force control in contact-rich tasks, addressing a recognized open challenge in robotic imitation learning.

major comments (3)

[Experimental Results] The headline claim of a 37% success-rate improvement (and associated OOD gains) over ForceVLA rests on experimental results that, per the abstract, supply no trial counts, statistical significance tests, failure-mode breakdowns, or confirmation that the baseline was re-implemented identically. This prevents verification that the gains arise from the asymmetric fusion, joint prediction, or V2F handover rather than implementation differences or task selection.
[Method (asymmetric multimodal fusion and joint prediction)] The central architectural claim—that treating force as a global regulatory signal in asymmetric fusion plus joint prediction produces deep force-motion coupling—requires component ablations. Without success rates or force-prediction error metrics for variants that disable each element, it is impossible to confirm these mechanisms are load-bearing for the reported self-regulation and generalization improvements.
[Task-level hierarchical decomposition] The V2F handover is asserted to decouple spatial generalization from contact regulation, yet no analysis or ablation demonstrates that this explicit separation is necessary for the observed OOD performance or that an integrated end-to-end policy would not achieve comparable results.

minor comments (1)

[Abstract] The abstract states 'significantly lower cost' without defining the cost metric (computational, energy, or task-time). Clarify this in the results section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below with honest clarifications and commitments to revisions that improve reproducibility and substantiate the claims without misrepresenting the current manuscript.

read point-by-point responses

Referee: [Experimental Results] The headline claim of a 37% success-rate improvement (and associated OOD gains) over ForceVLA rests on experimental results that, per the abstract, supply no trial counts, statistical significance tests, failure-mode breakdowns, or confirmation that the baseline was re-implemented identically. This prevents verification that the gains arise from the asymmetric fusion, joint prediction, or V2F handover rather than implementation differences or task selection.

Authors: We agree that the abstract omits these details and that they are essential for verification. The full manuscript describes the experimental protocol and states that ForceVLA was re-implemented using the original authors' code and hyperparameters, but it does not include explicit trial counts, statistical tests, or failure-mode breakdowns. In the revision we will add these elements (trial counts, p-values from significance tests, and failure analysis) to the abstract, results section, and a new table, enabling readers to confirm that the reported gains are attributable to the proposed mechanisms rather than implementation variances. revision: yes
Referee: [Method (asymmetric multimodal fusion and joint prediction)] The central architectural claim—that treating force as a global regulatory signal in asymmetric fusion plus joint prediction produces deep force-motion coupling—requires component ablations. Without success rates or force-prediction error metrics for variants that disable each element, it is impossible to confirm these mechanisms are load-bearing for the reported self-regulation and generalization improvements.

Authors: We acknowledge that the manuscript lacks component ablations isolating asymmetric fusion and joint prediction. Current results compare the complete ForceFlow system against ForceVLA and other baselines but do not disable individual elements. We will add these ablations in the revision, reporting task success rates and force-prediction errors for the disabled variants. This will directly demonstrate whether each component is load-bearing for the observed contact-force self-regulation and zero-shot OOD gains. revision: yes
Referee: [Task-level hierarchical decomposition] The V2F handover is asserted to decouple spatial generalization from contact regulation, yet no analysis or ablation demonstrates that this explicit separation is necessary for the observed OOD performance or that an integrated end-to-end policy would not achieve comparable results.

Authors: We agree that an ablation is required to validate the necessity of the V2F handover. The manuscript presents the hierarchical design and its motivation but contains no direct comparison to an integrated end-to-end policy. In the revision we will add an ablation that removes the V2F handover (training a single policy across both stages) and evaluate both versions on the OOD tasks. We will report the resulting success rates and discuss whether the explicit separation improves generalization, modularity, or training stability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external task benchmarks, not self-referential derivations

full rationale

The paper presents an empirical robotics framework (ForceFlow) built on flow matching with asymmetric fusion, joint prediction, and V2F handover. Its headline result is a 37% success-rate gain over ForceVLA measured on six real-world contact-rich tasks. No equations, first-principles derivations, or 'predictions' appear that reduce by construction to fitted parameters or self-defined quantities within the paper. The evaluation uses external, falsifiable metrics (task success, force prediction accuracy, OOD generalization) rather than internal self-consistency loops. Any self-citations are incidental and not load-bearing for the central performance claims, which remain independently testable against baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities. All modeling choices are described at the level of architectural decisions rather than mathematical postulates.

pith-pipeline@v0.9.0 · 5574 in / 1288 out tokens · 44054 ms · 2026-05-13T00:44:35.203453+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 3 internal anchors

[1]

2024 , cdate=

Zibin Dong and Yifu Yuan and Jianye Hao and Fei Ni and Yi Ma and Pengyi Li and Yan Zheng , title=. 2024 , cdate=

work page 2024
[2]

ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems , year=

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware , author=. ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems , year=

work page
[3]

2023 , cdate=

Cheng Chi and Siyuan Feng and Yilun Du and Zhenjia Xu and Eric Cousineau and Benjamin Burchfiel and Shuran Song , title=. 2023 , cdate=

work page 2023
[4]

Kevin Black and Noah Brown and James Darpinian and Karan Dhabalia and Danny Driess and Adnan Esmail and Michael Robert Equi and Chelsea Finn and Niccolo Fusai and Manuel Y. Galliker and Dibya Ghosh and Lachy Groom and Karol Hausman and brian ichter and Szymon Jakubczak and Tim Jones and Liyiming Ke and Devin LeBlanc and Sergey Levine and Adrian Li-Bell an...

work page 2025
[5]

Moo Jin Kim and Karl Pertsch and Siddharth Karamcheti and Ted Xiao and Ashwin Balakrishna and Suraj Nair and Rafael Rafailov and Ethan P Foster and Pannag R Sanketi and Quan Vuong and Thomas Kollar and Benjamin Burchfiel and Russ Tedrake and Dorsa Sadigh and Sergey Levine and Percy Liang and Chelsea Finn , booktitle=. Open. 2024 , url=

work page 2024
[6]

ICRA 2025 Workshop: Beyond Pick and Place , year=

Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation , author=. ICRA 2025 Workshop: Beyond Pick and Place , year=

work page 2025
[7]

Second Workshop on Out-of-Distribution Generalization in Robotics at RSS 2025 , year=

Touch begins where vision ends: Generalizable policies for contact-rich manipulation , author=. Second Workshop on Out-of-Distribution Generalization in Robotics at RSS 2025 , year=

work page 2025
[8]

Zihao He and Hongjie Fang and Jingjing Chen and Hao-Shu Fang and Cewu Lu , booktitle=. Fo. 2025 , url=

work page 2025
[9]

2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation , author=. 2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2025 , organization=

work page 2025
[11]

Jiawen Yu and Hairuo Liu and Qiaojun Yu and Jieji Ren and Ce Hao and Haitong Ding and Guangyu Huang and Guofan Huang and Yan Song and Panpan Cai and Wenqiang Zhang and Cewu Lu , booktitle=. Force. 2025 , url=

work page 2025
[13]

9th Annual Conference on Robot Learning , year=

Elucidating the Design Space of Torque-aware Vision-Language-Action Models , author=. 9th Annual Conference on Robot Learning , year=

work page
[15]

2020 , cdate=

Weiyao Wang and Du Tran and Matt Feiszli , title=. 2020 , cdate=

work page 2020
[16]

Geras , title=

Nan Wu and Stanislaw Jastrzebski and Kyunghyun Cho and Krzysztof J. Geras , title=. 2022 , cdate=

work page 2022
[17]

2024 , cdate=

Eugenio Chisari and Nick Heppert and Max Argus and Tim Welschehold and Thomas Brox and Abhinav Valada , title=. 2024 , cdate=

work page 2024
[18]

2024 , cdate=

Max Braun and Noémie Jaquier and Leonel Rozo and Tamim Asfour , title=. 2024 , cdate=

work page 2024
[19]

CoRR , volume=

Xuanran Zhai and Ce Hao , title=. CoRR , volume=. 2025 , month=

work page 2025
[20]

2025 , cdate=

Qinglun Zhang and Zhen Liu and Haoqiang Fan and Guanghui Liu and Bing Zeng and Shuaicheng Liu , title=. 2025 , cdate=

work page 2025
[21]

CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data , year=

Robot Manipulation with Flow Matching , author=. CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data , year=

work page 2024
[22]

Gordon and Drew Bagnell , title=

Stéphane Ross and Geoffrey J. Gordon and Drew Bagnell , title=. 2011 , cdate=

work page 2011
[23]

Andrew Bagnell and Pieter Abbeel and Jan Peters , title=

Takayuki Osa and Joni Pajarinen and Gerhard Neumann and J. Andrew Bagnell and Pieter Abbeel and Jan Peters , title=. Found. Trends Robotics , volume=. 2018 , cdate=

work page 2018
[24]

CoRR , volume=

Kai Arulkumaran and Marc Peter Deisenroth and Miles Brundage and Anil Anthony Bharath , title=. CoRR , volume=. 2017 , cdate=

work page 2017
[25]

Riedmiller , title=

Volodymyr Mnih and Koray Kavukcuoglu and David Silver and Alex Graves and Ioannis Antonoglou and Daan Wierstra and Martin A. Riedmiller , title=. CoRR , volume=. 2013 , cdate=

work page 2013
[26]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Conditioning Matters: Training Diffusion Policies is Faster Than You Think , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[27]

Yaron Lipman and Ricky T. Q. Chen and Heli Ben-Hamu and Maximilian Nickel and Matt Le , title=. CoRR , volume=. 2022 , cdate=

work page 2022
[30]

2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation , author=. 2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2025 , organization=

work page 2025
[34]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[35]

Feel the force: Contact-driven learning from humans

Ademi Adeniji, Zhuoran Chen, Vincent Liu, Venkatesh Pattabiraman, Raunaq Bhirangi, Siddhant Haldar, Pieter Abbeel, and Lerrel Pinto. Feel the force: Contact-driven learning from humans. arXiv preprint arXiv:2506.01944, 2025

work page arXiv 2025
[36]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Johan Bjorck, Fernando Casta \ n eda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint arXiv:2503.14734, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Robert Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, brian ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren, ...

work page 2025
[38]

Omnivtla: Vision-tactile-language-action model with semantic-aligned tactile sensing,

Zhengxue Cheng, Yiqian Zhang, Wenkang Zhang, Haoyu Li, Keyu Wang, Li Song, and Hengdi Zhang. Omnivtla: Vision-tactile-language-action model with semantic-aligned tactile sensing. arXiv preprint arXiv:2508.08706, 2025

work page arXiv 2025
[39]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems, 2023. URL https://doi.org/10.15607/RSS.2023.XIX.026

work page doi:10.15607/rss.2023.xix.026 2023
[40]

Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making

Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yi Ma, Pengyi Li, and Yan Zheng. Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making. In NeurIPS, 2024. URL http://papers.nips.cc/paper_files/paper/2024/hash/9e08a1db869a9646418e3371b24c6ae6-Abstract-Datasets_and_Benchmarks_Track.html

work page 2024
[41]

Conditioning matters: Training diffusion policies is faster than you think

Zibin Dong, Yicheng Liu, Yinchuan Li, Hang Zhao, and Jianye HAO. Conditioning matters: Training diffusion policies is faster than you think. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=pKQcmLHoGG

work page 2025
[42]

Tla: Tactile-language-action model for contact-rich manipulation,

Peng Hao, Chaofan Zhang, Dingzhe Li, Xiaoge Cao, Xiaoshuai Hao, Shaowei Cui, and Shuo Wang. Tla: Tactile-language-action model for contact-rich manipulation. arXiv preprint arXiv:2503.08548, 2025

work page arXiv 2025
[43]

Fo AR : Force-aware reactive policy for contact-rich robotic manipulation

Zihao He, Hongjie Fang, Jingjing Chen, Hao-Shu Fang, and Cewu Lu. Fo AR : Force-aware reactive policy for contact-rich robotic manipulation. In ICRA 2025 Workshop: Beyond Pick and Place, 2025. URL https://openreview.net/forum?id=cbjluXVaJz

work page 2025
[44]

Tactile-vla: Unlocking vision-language-action model’s physical knowledge for tactile generalization,

Jialei Huang, Shuo Wang, Fanqi Lin, Yihang Hu, Chuan Wen, and Yang Gao. Tactile-vla: unlocking vision-language-action model's physical knowledge for tactile generalization. arXiv preprint arXiv:2507.09160, 2025

work page arXiv 2025
[45]

Open VLA : An open-source vision-language-action model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Open VLA : An open-source vision-language-action model. In 8th Annual Conference on Robot Lear...

work page 2024
[46]

Forcevla2: Unleashing hybrid force-position control with force awareness for contact-rich manipulation

Yang Li, Hongru Jiang, Junjie Xia, Hongquan Zhang, Jinda Du, Yunsong Zhou, Jia Zeng, Ce Hao, Jieji Ren, Qiaojun Yu, et al. Forcevla2: Unleashing hybrid force-position control with force awareness for contact-rich manipulation. arXiv preprint arXiv:2603.15169, 2026

work page arXiv 2026
[47]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. CoRR, abs/2210.02747, 2022. URL https://doi.org/10.48550/arXiv.2210.02747

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.02747 2022
[48]

Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation

Wenhai Liu, Junbo Wang, Yiming Wang, Weiming Wang, and Cewu Lu. Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 1105--1112. IEEE, 2025

work page 2025
[49]

Andrew and Abbeel, Pieter and Peters, Jan , urldate =

Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, and Jan Peters. An algorithmic perspective on imitation learning. Found. Trends Robotics, 7 0 (1-2): 0 1--179, 2018. URL https://doi.org/10.1561/2300000053

work page doi:10.1561/2300000053 2018
[50]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195--4205, 2023

work page 2023
[51]

Gordon, and Drew Bagnell

Stéphane Ross, Geoffrey J. Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, pages 627--635, 2011. URL http://proceedings.mlr.press/v15/ross11a/ross11a.pdf

work page 2011
[52]

What makes training multi-modal classification networks hard? In CVPR, pages 12692--12702, 2020

Weiyao Wang, Du Tran, and Matt Feiszli. What makes training multi-modal classification networks hard? In CVPR, pages 12692--12702, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_What_Makes_Training_Multi-Modal_Classification_Networks_Hard_CVPR_2020_paper.html

work page 2020
[53]

Nan Wu, Stanislaw Jastrzebski, Kyunghyun Cho, and Krzysztof J. Geras. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In ICML, pages 24043--24055, 2022. URL https://proceedings.mlr.press/v162/wu22d.html

work page 2022
[54]

Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation

Yansong Wu, Zongxie Chen, Fan Wu, Lingyun Chen, Liding Zhang, Zhenshan Bing, Abdalla Swikir, Sami Haddadin, and Alois Knoll. Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11831--11837. IEEE, 2025

work page 2025
[55]

Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation

Han Xue, Jieji Ren, Wendi Chen, Gu Zhang, Fang Yuan, Guoying Gu, Huazhe Xu, and Cewu Lu. Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation. In ICRA 2025 Workshop: Beyond Pick and Place, 2025. URL https://openreview.net/forum?id=zRhjjLGUAp

work page 2025
[56]

Force VLA : Enhancing VLA models with a force-aware moe for contact-rich manipulation

Jiawen Yu, Hairuo Liu, Qiaojun Yu, Jieji Ren, Ce Hao, Haitong Ding, Guangyu Huang, Guofan Huang, Yan Song, Panpan Cai, Wenqiang Zhang, and Cewu Lu. Force VLA : Enhancing VLA models with a force-aware moe for contact-rich manipulation. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=...

work page 2025
[57]

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, and Jianye Hao. Embodied-r1: Reinforced embodied reasoning for general robotic manipulation. arXiv preprint arXiv:2508.13998, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[58]

Elucidating the design space of torque-aware vision-language-action models

Zongzheng Zhang, Haobo Xu, Zhuo Yang, Chenghao Yue, Zehao Lin, Huan ang Gao, Ziwei Wang, and Hao Zhao. Elucidating the design space of torque-aware vision-language-action models. In 9th Annual Conference on Robot Learning, 2025. URL https://openreview.net/forum?id=HAmi1X11BO

work page 2025
[59]

Fd-vla: Force-distilled vision-language-action model for contact-rich manipulation,

Ruiteng Zhao, Wenshuo Wang, Yicheng Ma, Xiaocong Li, Francis EH Tay, Marcelo H Ang Jr, and Haiyue Zhu. Fd-vla: Force-distilled vision-language-action model for contact-rich manipulation. arXiv preprint arXiv:2602.02142, 2026

work page arXiv 2026
[60]

Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. URL https://openreview.net/forum?id=e8Eu1lqLaf

work page 2023
[61]

Touch begins where vision ends: Generalizable policies for contact-rich manipulation

Zifan Zhao, Siddhant Haldar, Jinda Cui, and Lerrel Pinto. Touch begins where vision ends: Generalizable policies for contact-rich manipulation. In Second Workshop on Out-of-Distribution Generalization in Robotics at RSS 2025, 2025. URL https://openreview.net/forum?id=vbW7BVKAeb

work page 2025