pith. machine review for the scientific record. sign in

arxiv: 2605.11048 · v1 · submitted 2026-05-11 · 💻 cs.RO · cs.AI

Recognition: no theorem link

ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching

Authors on Pith no claims yet

Pith reviewed 2026-05-13 00:44 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords contact-rich manipulationforce feedbackflow matchingimitation learningmultimodal fusionrobotic policygeneralizationforce regulation
0
0 comments X

The pith

ForceFlow couples force feedback with motion control using flow matching for contact-rich robot tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ForceFlow as a reactive framework that integrates force sensing into robot policies for tasks involving physical contact. It combines flow matching with an asymmetric fusion design where force signals regulate the overall policy, joint prediction of force and action, and a staged handover from vision-based positioning to force-driven execution. This setup aims to create tighter links between sensing and movement so policies generalize better without extensive per-task adjustments. A reader would care because many everyday robot interactions fail when contact forces are not handled precisely, limiting reliable autonomy in unstructured settings.

Core claim

ForceFlow is a force-aware reactive framework built upon flow matching. For contact-stage policy design, it adopts an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, combined with a joint prediction paradigm that enhances the policy's understanding of instantaneous force and historical information, thereby achieving deep coupling between force and motion. For task-level hierarchical decomposition, it divides manipulation into a vision-dominant approach stage and a touch-dominant interaction stage, with a Vision-to-Force (V2F) handover mechanism that explicitly decouples spatial generalization from contact regulation. Experiments on six real-world,

What carries the argument

Asymmetric multimodal fusion treating force as a global regulatory signal, combined with joint prediction paradigm and V2F handover, inside a flow-matching policy.

Load-bearing premise

The asymmetric multimodal fusion treating force as a global regulatory signal combined with joint prediction and V2F handover will produce stable deep coupling between force and motion without introducing instability or needing extensive task-specific tuning.

What would settle it

Experiments on the six contact-rich tasks where ForceFlow shows no meaningful gain in success rate over ForceVLA or produces unstable force predictions during contact phases.

read the original abstract

Existing imitation learning methods enable robots to interact autonomously with the physical environment. However, contact-rich manipulation tasks remain a significant challenge due to complex contact dynamics that demand high-precision force feedback and control. Although recent efforts have attempted to integrate force/torque sensing into policies, how to build a simple yet effective framework that achieves robust generalization under multimodal observations remains an open question. In this paper, we propose ForceFlow, a force-aware reactive framework built upon flow matching. For contact-stage policy design, we investigate force signal fusion mechanisms and adopt an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, combined with a joint prediction paradigm that enhances the policy's understanding of instantaneous force and historical information, thereby achieving deep coupling between force and motion. For task-level hierarchical decomposition, we divide manipulation into a vision-dominant approach stage (VLM-based pointing for target localization) and a touch-dominant interaction stage (force-driven contact execution), with a Vision-to-Force (V2F) handover mechanism that explicitly decouples spatial generalization from contact regulation. Experimental results across six real-world contact-rich tasks demonstrate that ForceFlow achieves a 37% success rate improvement over the strong baseline ForceVLA while maintaining significantly lower cost. Moreover, ForceFlow exhibits accurate force signal prediction and demonstrates superior performance in contact force self-regulation and zero-shot out-of-distribution (OOD) generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes ForceFlow, a flow-matching based framework for contact-rich robotic manipulation. It uses an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, a joint prediction paradigm to couple instantaneous/historical force with motion, and a Vision-to-Force (V2F) handover for hierarchical decomposition separating vision-dominant approach from force-dominant contact execution. Experiments on six real-world tasks report a 37% success-rate improvement over ForceVLA, accurate force-signal prediction, superior contact-force self-regulation, lower cost, and better zero-shot OOD generalization.

Significance. If the performance gains are reproducible and causally attributable to the proposed fusion, joint-prediction, and handover mechanisms, the work would advance multimodal policy learning for precise force control in contact-rich tasks, addressing a recognized open challenge in robotic imitation learning.

major comments (3)
  1. [Experimental Results] The headline claim of a 37% success-rate improvement (and associated OOD gains) over ForceVLA rests on experimental results that, per the abstract, supply no trial counts, statistical significance tests, failure-mode breakdowns, or confirmation that the baseline was re-implemented identically. This prevents verification that the gains arise from the asymmetric fusion, joint prediction, or V2F handover rather than implementation differences or task selection.
  2. [Method (asymmetric multimodal fusion and joint prediction)] The central architectural claim—that treating force as a global regulatory signal in asymmetric fusion plus joint prediction produces deep force-motion coupling—requires component ablations. Without success rates or force-prediction error metrics for variants that disable each element, it is impossible to confirm these mechanisms are load-bearing for the reported self-regulation and generalization improvements.
  3. [Task-level hierarchical decomposition] The V2F handover is asserted to decouple spatial generalization from contact regulation, yet no analysis or ablation demonstrates that this explicit separation is necessary for the observed OOD performance or that an integrated end-to-end policy would not achieve comparable results.
minor comments (1)
  1. [Abstract] The abstract states 'significantly lower cost' without defining the cost metric (computational, energy, or task-time). Clarify this in the results section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below with honest clarifications and commitments to revisions that improve reproducibility and substantiate the claims without misrepresenting the current manuscript.

read point-by-point responses
  1. Referee: [Experimental Results] The headline claim of a 37% success-rate improvement (and associated OOD gains) over ForceVLA rests on experimental results that, per the abstract, supply no trial counts, statistical significance tests, failure-mode breakdowns, or confirmation that the baseline was re-implemented identically. This prevents verification that the gains arise from the asymmetric fusion, joint prediction, or V2F handover rather than implementation differences or task selection.

    Authors: We agree that the abstract omits these details and that they are essential for verification. The full manuscript describes the experimental protocol and states that ForceVLA was re-implemented using the original authors' code and hyperparameters, but it does not include explicit trial counts, statistical tests, or failure-mode breakdowns. In the revision we will add these elements (trial counts, p-values from significance tests, and failure analysis) to the abstract, results section, and a new table, enabling readers to confirm that the reported gains are attributable to the proposed mechanisms rather than implementation variances. revision: yes

  2. Referee: [Method (asymmetric multimodal fusion and joint prediction)] The central architectural claim—that treating force as a global regulatory signal in asymmetric fusion plus joint prediction produces deep force-motion coupling—requires component ablations. Without success rates or force-prediction error metrics for variants that disable each element, it is impossible to confirm these mechanisms are load-bearing for the reported self-regulation and generalization improvements.

    Authors: We acknowledge that the manuscript lacks component ablations isolating asymmetric fusion and joint prediction. Current results compare the complete ForceFlow system against ForceVLA and other baselines but do not disable individual elements. We will add these ablations in the revision, reporting task success rates and force-prediction errors for the disabled variants. This will directly demonstrate whether each component is load-bearing for the observed contact-force self-regulation and zero-shot OOD gains. revision: yes

  3. Referee: [Task-level hierarchical decomposition] The V2F handover is asserted to decouple spatial generalization from contact regulation, yet no analysis or ablation demonstrates that this explicit separation is necessary for the observed OOD performance or that an integrated end-to-end policy would not achieve comparable results.

    Authors: We agree that an ablation is required to validate the necessity of the V2F handover. The manuscript presents the hierarchical design and its motivation but contains no direct comparison to an integrated end-to-end policy. In the revision we will add an ablation that removes the V2F handover (training a single policy across both stages) and evaluate both versions on the OOD tasks. We will report the resulting success rates and discuss whether the explicit separation improves generalization, modularity, or training stability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external task benchmarks, not self-referential derivations

full rationale

The paper presents an empirical robotics framework (ForceFlow) built on flow matching with asymmetric fusion, joint prediction, and V2F handover. Its headline result is a 37% success-rate gain over ForceVLA measured on six real-world contact-rich tasks. No equations, first-principles derivations, or 'predictions' appear that reduce by construction to fitted parameters or self-defined quantities within the paper. The evaluation uses external, falsifiable metrics (task success, force prediction accuracy, OOD generalization) rather than internal self-consistency loops. Any self-citations are incidental and not load-bearing for the central performance claims, which remain independently testable against baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities. All modeling choices are described at the level of architectural decisions rather than mathematical postulates.

pith-pipeline@v0.9.0 · 5574 in / 1288 out tokens · 44054 ms · 2026-05-13T00:44:35.203453+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 3 internal anchors

  1. [1]

    2024 , cdate=

    Zibin Dong and Yifu Yuan and Jianye Hao and Fei Ni and Yi Ma and Pengyi Li and Yan Zheng , title=. 2024 , cdate=

  2. [2]

    ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems , year=

    Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware , author=. ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems , year=

  3. [3]

    2023 , cdate=

    Cheng Chi and Siyuan Feng and Yilun Du and Zhenjia Xu and Eric Cousineau and Benjamin Burchfiel and Shuran Song , title=. 2023 , cdate=

  4. [4]

    Kevin Black and Noah Brown and James Darpinian and Karan Dhabalia and Danny Driess and Adnan Esmail and Michael Robert Equi and Chelsea Finn and Niccolo Fusai and Manuel Y. Galliker and Dibya Ghosh and Lachy Groom and Karol Hausman and brian ichter and Szymon Jakubczak and Tim Jones and Liyiming Ke and Devin LeBlanc and Sergey Levine and Adrian Li-Bell an...

  5. [5]

    Moo Jin Kim and Karl Pertsch and Siddharth Karamcheti and Ted Xiao and Ashwin Balakrishna and Suraj Nair and Rafael Rafailov and Ethan P Foster and Pannag R Sanketi and Quan Vuong and Thomas Kollar and Benjamin Burchfiel and Russ Tedrake and Dorsa Sadigh and Sergey Levine and Percy Liang and Chelsea Finn , booktitle=. Open. 2024 , url=

  6. [6]

    ICRA 2025 Workshop: Beyond Pick and Place , year=

    Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation , author=. ICRA 2025 Workshop: Beyond Pick and Place , year=

  7. [7]

    Second Workshop on Out-of-Distribution Generalization in Robotics at RSS 2025 , year=

    Touch begins where vision ends: Generalizable policies for contact-rich manipulation , author=. Second Workshop on Out-of-Distribution Generalization in Robotics at RSS 2025 , year=

  8. [8]

    Zihao He and Hongjie Fang and Jingjing Chen and Hao-Shu Fang and Cewu Lu , booktitle=. Fo. 2025 , url=

  9. [9]

    2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation , author=. 2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2025 , organization=

  10. [11]

    Jiawen Yu and Hairuo Liu and Qiaojun Yu and Jieji Ren and Ce Hao and Haitong Ding and Guangyu Huang and Guofan Huang and Yan Song and Panpan Cai and Wenqiang Zhang and Cewu Lu , booktitle=. Force. 2025 , url=

  11. [13]

    9th Annual Conference on Robot Learning , year=

    Elucidating the Design Space of Torque-aware Vision-Language-Action Models , author=. 9th Annual Conference on Robot Learning , year=

  12. [15]

    2020 , cdate=

    Weiyao Wang and Du Tran and Matt Feiszli , title=. 2020 , cdate=

  13. [16]

    Geras , title=

    Nan Wu and Stanislaw Jastrzebski and Kyunghyun Cho and Krzysztof J. Geras , title=. 2022 , cdate=

  14. [17]

    2024 , cdate=

    Eugenio Chisari and Nick Heppert and Max Argus and Tim Welschehold and Thomas Brox and Abhinav Valada , title=. 2024 , cdate=

  15. [18]

    2024 , cdate=

    Max Braun and Noémie Jaquier and Leonel Rozo and Tamim Asfour , title=. 2024 , cdate=

  16. [19]

    CoRR , volume=

    Xuanran Zhai and Ce Hao , title=. CoRR , volume=. 2025 , month=

  17. [20]

    2025 , cdate=

    Qinglun Zhang and Zhen Liu and Haoqiang Fan and Guanghui Liu and Bing Zeng and Shuaicheng Liu , title=. 2025 , cdate=

  18. [21]

    CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data , year=

    Robot Manipulation with Flow Matching , author=. CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data , year=

  19. [22]

    Gordon and Drew Bagnell , title=

    Stéphane Ross and Geoffrey J. Gordon and Drew Bagnell , title=. 2011 , cdate=

  20. [23]

    Andrew Bagnell and Pieter Abbeel and Jan Peters , title=

    Takayuki Osa and Joni Pajarinen and Gerhard Neumann and J. Andrew Bagnell and Pieter Abbeel and Jan Peters , title=. Found. Trends Robotics , volume=. 2018 , cdate=

  21. [24]

    CoRR , volume=

    Kai Arulkumaran and Marc Peter Deisenroth and Miles Brundage and Anil Anthony Bharath , title=. CoRR , volume=. 2017 , cdate=

  22. [25]

    Riedmiller , title=

    Volodymyr Mnih and Koray Kavukcuoglu and David Silver and Alex Graves and Ioannis Antonoglou and Daan Wierstra and Martin A. Riedmiller , title=. CoRR , volume=. 2013 , cdate=

  23. [26]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Conditioning Matters: Training Diffusion Policies is Faster Than You Think , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  24. [27]

    Yaron Lipman and Ricky T. Q. Chen and Heli Ben-Hamu and Maximilian Nickel and Matt Le , title=. CoRR , volume=. 2022 , cdate=

  25. [30]

    2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation , author=. 2025 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2025 , organization=

  26. [34]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  27. [35]

    Feel the force: Contact-driven learning from humans

    Ademi Adeniji, Zhuoran Chen, Vincent Liu, Venkatesh Pattabiraman, Raunaq Bhirangi, Siddhant Haldar, Pieter Abbeel, and Lerrel Pinto. Feel the force: Contact-driven learning from humans. arXiv preprint arXiv:2506.01944, 2025

  28. [36]

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

    Johan Bjorck, Fernando Casta \ n eda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint arXiv:2503.14734, 2025

  29. [37]

    Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Robert Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, brian ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren, ...

  30. [38]

    Omnivtla: Vision-tactile-language-action model with semantic-aligned tactile sensing,

    Zhengxue Cheng, Yiqian Zhang, Wenkang Zhang, Haoyu Li, Keyu Wang, Li Song, and Hengdi Zhang. Omnivtla: Vision-tactile-language-action model with semantic-aligned tactile sensing. arXiv preprint arXiv:2508.08706, 2025

  31. [39]

    Diffusion policy: Visuomotor policy learning via action diffusion

    Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems, 2023. URL https://doi.org/10.15607/RSS.2023.XIX.026

  32. [40]

    Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making

    Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yi Ma, Pengyi Li, and Yan Zheng. Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making. In NeurIPS, 2024. URL http://papers.nips.cc/paper_files/paper/2024/hash/9e08a1db869a9646418e3371b24c6ae6-Abstract-Datasets_and_Benchmarks_Track.html

  33. [41]

    Conditioning matters: Training diffusion policies is faster than you think

    Zibin Dong, Yicheng Liu, Yinchuan Li, Hang Zhao, and Jianye HAO. Conditioning matters: Training diffusion policies is faster than you think. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=pKQcmLHoGG

  34. [42]

    Tla: Tactile-language-action model for contact-rich manipulation,

    Peng Hao, Chaofan Zhang, Dingzhe Li, Xiaoge Cao, Xiaoshuai Hao, Shaowei Cui, and Shuo Wang. Tla: Tactile-language-action model for contact-rich manipulation. arXiv preprint arXiv:2503.08548, 2025

  35. [43]

    Fo AR : Force-aware reactive policy for contact-rich robotic manipulation

    Zihao He, Hongjie Fang, Jingjing Chen, Hao-Shu Fang, and Cewu Lu. Fo AR : Force-aware reactive policy for contact-rich robotic manipulation. In ICRA 2025 Workshop: Beyond Pick and Place, 2025. URL https://openreview.net/forum?id=cbjluXVaJz

  36. [44]

    Tactile-vla: Unlocking vision-language-action model’s physical knowledge for tactile generalization,

    Jialei Huang, Shuo Wang, Fanqi Lin, Yihang Hu, Chuan Wen, and Yang Gao. Tactile-vla: unlocking vision-language-action model's physical knowledge for tactile generalization. arXiv preprint arXiv:2507.09160, 2025

  37. [45]

    Open VLA : An open-source vision-language-action model

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Open VLA : An open-source vision-language-action model. In 8th Annual Conference on Robot Lear...

  38. [46]

    Forcevla2: Unleashing hybrid force-position control with force awareness for contact-rich manipulation

    Yang Li, Hongru Jiang, Junjie Xia, Hongquan Zhang, Jinda Du, Yunsong Zhou, Jia Zeng, Ce Hao, Jieji Ren, Qiaojun Yu, et al. Forcevla2: Unleashing hybrid force-position control with force awareness for contact-rich manipulation. arXiv preprint arXiv:2603.15169, 2026

  39. [47]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. CoRR, abs/2210.02747, 2022. URL https://doi.org/10.48550/arXiv.2210.02747

  40. [48]

    Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation

    Wenhai Liu, Junbo Wang, Yiming Wang, Weiming Wang, and Cewu Lu. Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 1105--1112. IEEE, 2025

  41. [49]

    Andrew and Abbeel, Pieter and Peters, Jan , urldate =

    Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, and Jan Peters. An algorithmic perspective on imitation learning. Found. Trends Robotics, 7 0 (1-2): 0 1--179, 2018. URL https://doi.org/10.1561/2300000053

  42. [50]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195--4205, 2023

  43. [51]

    Gordon, and Drew Bagnell

    Stéphane Ross, Geoffrey J. Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, pages 627--635, 2011. URL http://proceedings.mlr.press/v15/ross11a/ross11a.pdf

  44. [52]

    What makes training multi-modal classification networks hard? In CVPR, pages 12692--12702, 2020

    Weiyao Wang, Du Tran, and Matt Feiszli. What makes training multi-modal classification networks hard? In CVPR, pages 12692--12702, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_What_Makes_Training_Multi-Modal_Classification_Networks_Hard_CVPR_2020_paper.html

  45. [53]

    Nan Wu, Stanislaw Jastrzebski, Kyunghyun Cho, and Krzysztof J. Geras. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In ICML, pages 24043--24055, 2022. URL https://proceedings.mlr.press/v162/wu22d.html

  46. [54]

    Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation

    Yansong Wu, Zongxie Chen, Fan Wu, Lingyun Chen, Liding Zhang, Zhenshan Bing, Abdalla Swikir, Sami Haddadin, and Alois Knoll. Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11831--11837. IEEE, 2025

  47. [55]

    Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation

    Han Xue, Jieji Ren, Wendi Chen, Gu Zhang, Fang Yuan, Guoying Gu, Huazhe Xu, and Cewu Lu. Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation. In ICRA 2025 Workshop: Beyond Pick and Place, 2025. URL https://openreview.net/forum?id=zRhjjLGUAp

  48. [56]

    Force VLA : Enhancing VLA models with a force-aware moe for contact-rich manipulation

    Jiawen Yu, Hairuo Liu, Qiaojun Yu, Jieji Ren, Ce Hao, Haitong Ding, Guangyu Huang, Guofan Huang, Yan Song, Panpan Cai, Wenqiang Zhang, and Cewu Lu. Force VLA : Enhancing VLA models with a force-aware moe for contact-rich manipulation. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=...

  49. [57]

    Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

    Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, and Jianye Hao. Embodied-r1: Reinforced embodied reasoning for general robotic manipulation. arXiv preprint arXiv:2508.13998, 2025

  50. [58]

    Elucidating the design space of torque-aware vision-language-action models

    Zongzheng Zhang, Haobo Xu, Zhuo Yang, Chenghao Yue, Zehao Lin, Huan ang Gao, Ziwei Wang, and Hao Zhao. Elucidating the design space of torque-aware vision-language-action models. In 9th Annual Conference on Robot Learning, 2025. URL https://openreview.net/forum?id=HAmi1X11BO

  51. [59]

    Fd-vla: Force-distilled vision-language-action model for contact-rich manipulation,

    Ruiteng Zhao, Wenshuo Wang, Yicheng Ma, Xiaocong Li, Francis EH Tay, Marcelo H Ang Jr, and Haiyue Zhu. Fd-vla: Force-distilled vision-language-action model for contact-rich manipulation. arXiv preprint arXiv:2602.02142, 2026

  52. [60]

    Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

    Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. URL https://openreview.net/forum?id=e8Eu1lqLaf

  53. [61]

    Touch begins where vision ends: Generalizable policies for contact-rich manipulation

    Zifan Zhao, Siddhant Haldar, Jinda Cui, and Lerrel Pinto. Touch begins where vision ends: Generalizable policies for contact-rich manipulation. In Second Workshop on Out-of-Distribution Generalization in Robotics at RSS 2025, 2025. URL https://openreview.net/forum?id=vbW7BVKAeb