pith. machine review for the scientific record. sign in

arxiv: 2605.01427 · v1 · submitted 2026-05-02 · 💻 cs.RO

Recognition: unknown

SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:37 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid robotsproprioceptionwrench estimationcontact detectionconditional flow matchingwhole-body control
0
0 comments X

The pith

SixthSense shows that whole-body contact wrenches on humanoids can be inferred solely from proprioception and IMU data using conditional flow matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a task-agnostic method called SixthSense for estimating whole-body contact timing, locations, and external wrenches on humanoid robots. It relies exclusively on proprioceptive sensors and IMU measurements, avoiding the need for external sensors or simplifying assumptions common in analytical approaches. By tokenizing proprioceptive histories and applying conditional flow matching, the approach models the complex, sparse dynamics of contacts. This enables reliable perception for applications like collision detection and physical interaction. Experiments on various behaviors demonstrate its effectiveness across different policies.

Core claim

We propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. This serves as a plug-and-play module for force-interaction tasks.

What carries the argument

Conditional flow matching applied to tokenized proprioceptive histories to model spatiotemporally sparse contact-event flows.

If this is right

  • Enables plug-and-play perception for collision detection without extra hardware.
  • Supports physical human-robot interaction using only internal sensors.
  • Facilitates force-feedback teleoperation on floating-base systems.
  • Achieves performance across standing, walking, and whole-body motion policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Humanoid robot designs could omit dedicated force-torque sensors, reducing hardware costs and complexity.
  • The tokenization and flow-matching approach may extend to other legged robots with similar proprioceptive setups.
  • Training data from diverse real-world interactions could improve robustness to unseen contact scenarios.

Load-bearing premise

That conditional flow matching on tokenized proprioceptive histories can reliably capture the multi-modal and spatiotemporally sparse mapping from unstructured contact inputs to uncertain motion outputs without additional external measurements or idealistic assumptions.

What would settle it

Measure estimated wrenches against ground-truth data from an external force-torque sensor during a controlled collision or push while the robot walks, and verify if the estimates match within a small error margin.

Figures

Figures reproduced from arXiv: 2605.01427 by Haodong Zhang, Jiahao Chen, Jiyu Yu, Ling Shi, Lingzhu Xiang, Siyi Qian, Xiayan Xu, Xingzhou Chen, Yan Ning, Yizheng Zhang, Yuquan Wang.

Figure 1
Figure 1. Figure 1: SixthSense: Inferring whole-body contact wrench fields via proprioception. This task-agnostic, plug-and-play module provides a robust perception foundation for diverse downstream control and decision-making applications. While external wrench estimation for fixed-base manipula￾tors is largely a solved problem, extending these methods to whole-body humanoid interaction is fundamentally different. The presen… view at source ↗
Figure 2
Figure 2. Figure 2: Mapping whole-body surface contact force to wrench view at source ↗
Figure 3
Figure 3. Figure 3: Overview: Given a contact-resilient control policy, we use its rollouts to train a conditional flow-matching model that view at source ↗
Figure 4
Figure 4. Figure 4: Overview of information flow: Tokenized proprioceptive observations are streamed into iterative CFM refinement view at source ↗
Figure 5
Figure 5. Figure 5: Contact dataset collection across behaviors in MuJoCo view at source ↗
Figure 7
Figure 7. Figure 7: An example multi-point contact scenario We then test a contact estimator trained on the single￾contact dataset only, which has never observed any multi￾contact sample during training view at source ↗
Figure 8
Figure 8. Figure 8: Zero-shot multi-contact inference To verify that this zero-shot generalization stems from CFM’s distributional modeling rather than network capacity alone, we compare against an MLP baseline with hidden size [512, 512, 512] trained on the same single-contact locomotion data. The MLP achieves 99.69% detection on single-contact testing—comparable to CFM—but is evaluated on three si￾multaneous contacts that i… view at source ↗
Figure 9
Figure 9. Figure 9: Sensitivity to observation noise under single-contact view at source ↗
Figure 11
Figure 11. Figure 11: Contact data collection on real Unitree G1 view at source ↗
Figure 12
Figure 12. Figure 12: Spatiotemporally sparse contact wrench field estima view at source ↗
read the original abstract

Humanoid robots are entering our physical world at scale, yet as oversized toys--good at singing and dancing, but short on force-interaction capabilities for practical tasks. Bridging this gap necessitates prioritizing reliable contact perception as a fundamental requirement. Estimating external wrenches in humanoids is complicated by floating-base dynamics and indeterminate contact locations. Existing analytical frameworks require idealistic assumptions and hard-to-obtain measurements, which are often unavailable in practice. To bridge this gap, we propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. SixthSense serves as a plug-and-play perception module for applications including collision detection, physical human-robot interaction, and force-feedback teleoperation. Experiments across standing, walking, and whole-body motion-tracking policies showcased unprecedented performance in diverse behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SixthSense, a task-agnostic proprioception-only approach for estimating whole-body contact timing, location, and wrenches in humanoid robots. It utilizes conditional flow matching on tokenized proprioceptive histories to model the mapping from unstructured contact inputs to uncertain motion outputs. The method is claimed to serve as a plug-and-play module for applications such as collision detection, physical human-robot interaction, and force-feedback teleoperation, with experiments on standing, walking, and whole-body motion-tracking policies demonstrating unprecedented performance.

Significance. Should the results hold under rigorous validation, this work would be significant for the field of humanoid robotics. It addresses a critical gap in contact perception by eliminating the need for external sensors or idealistic assumptions in floating-base systems. The application of conditional flow matching to capture multi-modal and sparse contact events represents an innovative use of generative models in robotics perception, potentially enabling more robust force-interaction capabilities.

major comments (2)
  1. [Method] Method section: The description of conditional flow matching does not provide details on how the model resolves the underdetermined nature of the floating-base inverse dynamics problem. Different contact configurations can produce nearly identical joint torques and base accelerations, and without explicit physics constraints or regularization, it is unclear if the learned flow concentrates on true contact events rather than plausible alternatives.
  2. [Experiments] Experiments section: The claim of 'unprecedented performance' in diverse behaviors is not supported by any quantitative results, error bars, baseline comparisons, or statistical analysis in the manuscript. This makes it difficult to evaluate the effectiveness against existing analytical or learning-based methods.
minor comments (1)
  1. [Abstract] The abstract mentions 'unprecedented performance' without specifying the metrics used, which could be clarified for better context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below and describe the corresponding revisions.

read point-by-point responses
  1. Referee: [Method] Method section: The description of conditional flow matching does not provide details on how the model resolves the underdetermined nature of the floating-base inverse dynamics problem. Different contact configurations can produce nearly identical joint torques and base accelerations, and without explicit physics constraints or regularization, it is unclear if the learned flow concentrates on true contact events rather than plausible alternatives.

    Authors: We thank the referee for this important observation. SixthSense resolves the underdetermined mapping in a purely data-driven manner: the conditional flow matching model is trained end-to-end on large-scale paired datasets of proprioceptive histories and ground-truth contact-event flows obtained from simulation and motion-capture. The learned conditional distribution implicitly encodes the physics of floating-base dynamics, and the sparsity-inducing formulation of the contact-event flow further regularizes the solution space toward physically consistent sparse events. In practice, the generated flows align with true contacts rather than arbitrary alternatives, as confirmed by our qualitative and quantitative validation. We will add a dedicated paragraph in the method section clarifying this data-driven disambiguation mechanism and the role of the learned prior. revision: partial

  2. Referee: [Experiments] Experiments section: The claim of 'unprecedented performance' in diverse behaviors is not supported by any quantitative results, error bars, baseline comparisons, or statistical analysis in the manuscript. This makes it difficult to evaluate the effectiveness against existing analytical or learning-based methods.

    Authors: We agree that the current manuscript relies primarily on qualitative demonstrations and policy-integration results across standing, walking, and whole-body tracking. While these results illustrate successful real-world deployment without external sensors, we acknowledge the absence of comprehensive quantitative metrics, baselines, and statistical analysis. In the revised manuscript we will expand the experiments section with numerical evaluations (contact timing precision/recall, wrench estimation MAE and RMSE), direct comparisons against momentum-based observers and prior learning baselines, error bars from repeated trials, and statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity: data-driven learning method with no self-referential derivations

full rationale

The paper frames SixthSense as a learned generative model (conditional flow matching on tokenized proprioceptive histories) trained to map inputs to contact estimates. No equations, first-principles derivations, or analytical steps are shown that reduce outputs to inputs by construction. The approach is explicitly empirical and task-agnostic, relying on data rather than fitted parameters renamed as predictions or self-cited uniqueness theorems. Any incidental self-citations would not be load-bearing for the central claim, which rests on experimental validation across behaviors.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that proprioceptive histories contain sufficient information to disambiguate contact events and on the modeling choice that conditional flow matching can represent the required multi-modal distribution; no free parameters or invented entities are explicitly named in the abstract.

free parameters (1)
  • flow-matching model hyperparameters
    Training-time parameters of the conditional flow matching network are necessarily fitted to data but not enumerated.
axioms (1)
  • domain assumption Proprioception and IMU signals alone suffice to infer external wrenches without external measurements or idealistic contact assumptions
    Stated as the core premise that existing analytical methods fail to meet but the new method satisfies.

pith-pipeline@v0.9.0 · 5519 in / 1216 out tokens · 30465 ms · 2026-05-09T14:37:37.017381+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 11 canonical work pages · 4 internal anchors

  1. [1]

    State estimation for legged robots—consistent fusion of leg kinematics and IMU

    Michael Bloesch, Marco Hutter, Mark A Hoepflinger, Stefan Leutenegger, Christian Gehring, C David Remy, and Roland Siegwart. State estimation for legged robots—consistent fusion of leg kinematics and IMU. Robotics, 17:17–24, 2013

  2. [2]

    More than a feeling: Learning to grasp and regrasp using vision and touch

    Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H Adelson, and Sergey Levine. More than a feeling: Learning to grasp and regrasp using vision and touch. IEEE Robotics and Automation Letters, 3(4):3300–3307, 2018

  3. [3]

    Caldwell, and Claudio Semini

    Marco Camurri, Maurice Fallon, St ´ephane Bazeille, An- dreea Radulescu, Victor Barasuol, Darwin G. Caldwell, and Claudio Semini. Probabilistic contact estimation and impact detection for state estimation of quadruped robots. IEEE Robotics and Automation Letters, 2(2):1023–1030, 2017

  4. [4]

    Capturability-based pattern generation for walking with variable height.IEEE Transactions on Robotics, 36(2):517–536, 2019

    St ´ephane Caron, Adrien Escande, Leonardo Lanari, and Bastien Mallein. Capturability-based pattern generation for walking with variable height.IEEE Transactions on Robotics, 36(2):517–536, 2019

  5. [5]

    Sensorless robot collision detection and hybrid force/motion control

    Alessandro De Luca and Raffaella Mattone. Sensorless robot collision detection and hybrid force/motion control. InProceedings of the IEEE International Conference on Robotics and Automation, pages 999–1004. IEEE, 2005

  6. [6]

    Collision detection and safe reaction with the DLR-III lightweight manipulator arm

    Alessandro De Luca, Alin Albu-Schaffer, Sami Had- dadin, and Gerd Hirzinger. Collision detection and safe reaction with the DLR-III lightweight manipulator arm. InIEEE/RSJ international conference on intelligent robots and systems, pages 1623–1630. IEEE, 2006

  7. [7]

    Springer, 2008

    Roy Featherstone.Rigid body dynamics algorithms. Springer, 2008

  8. [8]

    Residual-based contacts estimation for hu- manoid robots

    Fabrizio Flacco, Antonio Paolillo, and Abderrahmane Kheddar. Residual-based contacts estimation for hu- manoid robots. InIEEE-RAS International Conference on Humanoid Robots, pages 409–415, 2016

  9. [9]

    Feed- back control of a Cassie bipedal robot: Walking, stand- ing, and riding a Segway

    Yukai Gong, Ross Hartley, Xingye Da, Ayonga Hereid, Omar Harib, Jiunn-Kai Huang, and Jessy Grizzle. Feed- back control of a Cassie bipedal robot: Walking, stand- ing, and riding a Segway. In2019 American control conference, pages 4559–4566. IEEE, 2019

  10. [10]

    Collision detection and reaction: A contribution to safe physical human-robot interaction

    Sami Haddadin, Alin Albu-Schaffer, Alessandro De Luca, and Gerd Hirzinger. Collision detection and reaction: A contribution to safe physical human-robot interaction. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3356–3363. IEEE, 2008

  11. [11]

    Robot collisions: A survey on detection, isola- tion, and identification.IEEE Transactions on Robotics, 33(6):1292–1312, 2017

    Sami Haddadin, Alessandro De Luca, and Alin Albu- Sch¨affer. Robot collisions: A survey on detection, isola- tion, and identification.IEEE Transactions on Robotics, 33(6):1292–1312, 2017

  12. [12]

    Eustice, and Jessy W

    Ross Hartley, Maani Ghaffari, Ryan M. Eustice, and Jessy W. Grizzle. Contact-aided invariant extended Kalman filtering for robot state estimation.The Inter- national Journal of Robotics Research, 39(4):402–430, 2020

  13. [13]

    Learning human- to-humanoid real-time whole-body teleoperation

    Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Learning human- to-humanoid real-time whole-body teleoperation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 8944–8951. IEEE, 2024

  14. [14]

    Rapid trajectory optimization using C-FROST with illustration on a Cassie-series dynamic walking biped

    Ayonga Hereid, Omar Harib, Ross Hartley, Yukai Gong, and Jessy W Grizzle. Rapid trajectory optimization using C-FROST with illustration on a Cassie-series dynamic walking biped. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4722–4729, 2019

  15. [15]

    Probabilistic foot contact estimation by fusing information from dynamics and dif- ferentialforward kinematics

    Jemin Hwangbo, Carmine Dario Bellicoso, P ´eter Fankhauser, and Marco Hutter. Probabilistic foot contact estimation by fusing information from dynamics and dif- ferentialforward kinematics. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3872–3878. IEEE, 2016

  16. [16]

    Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

    Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

  17. [17]

    Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022

    Gwanghyeon Ji, Juhyeok Mun, Hyeongjun Kim, and Jemin Hwangbo. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022

  18. [18]

    Humanoid robots in aircraft manufacturing: The airbus use cases.IEEE Robotics & Automation Magazine, 26(4):30–45, 2019

    Abderrahmane Kheddar, St ´ephane Caron, Pierre Ger- gondet, Andrew Comport, Arnaud Tanguy, Christian Ott, Bernd Henze, George Mesesan, Johannes Englsberger, M´aximo A Roa, et al. Humanoid robots in aircraft manufacturing: The airbus use cases.IEEE Robotics & Automation Magazine, 26(4):30–45, 2019

  19. [19]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational Bayes.arXiv preprint arXiv:1312.6114, 2013

  20. [20]

    Rma: Rapid motor adaptation for legged robots

    Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. RMA: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

  21. [21]

    DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5 (3):3838–3845, 2020

    Mike Lambeta, Po-Wei Chou, Stephen Tian, Brian Yang, Benjamin Maloon, Victoria Rose Most, Dave Stroud, Raymond Santos, Ahmad Byagowi, Gregg Kammerer, et al. DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5 (3):3838–3845, 2020

  22. [22]

    Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5 (47):eabc5986, 2020

    Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5 (47):eabc5986, 2020

  23. [23]

    Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36(3):582–596, 2020

    Michelle A Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, and Jeannette Bohg. Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36(3):582–596, 2020

  24. [24]

    Beyondmimic: From mo- tion tracking to versatile humanoid control via guided diffusion,

    Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. BeyondMimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

  25. [25]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  26. [26]

    Localizing external contact using proprioceptive sensors: The contact particle filter

    Lucas Manuelli and Russ Tedrake. Localizing external contact using proprioceptive sensors: The contact particle filter. InIEEE/RSJ International Conference on Intelli- gent Robots and Systems, pages 5062–5069. IEEE, 2016

  27. [27]

    Dynamic walking on compliant and uneven terrain using dcm and passivity-based whole-body control

    George Mesesan, Johannes Englsberger, Gianluca Garo- falo, Christian Ott, and Alin Albu-Sch ¨affer. Dynamic walking on compliant and uneven terrain using dcm and passivity-based whole-body control. In2019 IEEE- RAS 19th International Conference on Humanoid Robots, pages 25–32, 2019

  28. [28]

    Joe Payne, Daniel A

    J. Joe Payne, Daniel A. Hagen, Denis Garagi ´c, and Aaron M. Johnson. Multi-momentum observer con- tact estimation for bipedal robots.arXiv preprint arXiv:2412.03462, 2024

  29. [29]

    DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills

    Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills. ACM Transactions On Graphics, 37(4):1–14, 2018

  30. [30]

    Ambiguous collision outcomes and sliding with infinite friction in models of legged systems

    C David Remy. Ambiguous collision outcomes and sliding with infinite friction in models of legged systems. The International Journal of Robotics Research, 36(12): 1252–1267, 2017

  31. [31]

    Springer handbook of robotics, volume 200

    Bruno Siciliano, Oussama Khatib, and Torsten Kr ¨oger. Springer handbook of robotics, volume 200. Springer, 2008

  32. [32]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

  33. [33]

    NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation.Science Robotics, 9(96): eadl0628, 2024

    Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, et al. NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation.Science Robotics, 9(96): eadl0628, 2024

  34. [34]

    Improving and generalizing flow-based generative models with minibatch optimal transport

    Alexander Tong, Kilian Fatras, Nikolay Malkin, Guil- laume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.arXiv preprint arXiv:2302.00482, 2023

  35. [35]

    Soft magnetic skin for super-resolution tactile sensing with force self- decoupling.Science Robotics, 6(51):eabc8801, 2021

    Youcan Yan, Zhe Hu, Zhengbao Yang, Wenzhen Yuan, Chaoyang Song, Jia Pan, and Yajing Shen. Soft magnetic skin for super-resolution tactile sensing with force self- decoupling.Science Robotics, 6(51):eabc8801, 2021

  36. [36]

    GelSight: High-resolution robot tactile sensors for esti- mating geometry and force.Sensors, 17(12):2762, 2017

    Wenxuan Yuan, Siyuan Dong, and Edward H Adelson. GelSight: High-resolution robot tactile sensors for esti- mating geometry and force.Sensors, 17(12):2762, 2017

  37. [37]

    TWIST: Teleoperated whole-body imitation system

    Yanjie Ze, Zixuan Chen, Joao Pedro Ara ´ujo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, and C Karen Liu. TWIST: Teleoperated whole-body imitation system. arXiv preprint arXiv:2505.02833, 2025

  38. [38]

    Twist2: Scalable, portable, and holistic humanoid data collection system,

    Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Ji- ajun Wu, and C Karen Liu. TWIST2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

  39. [39]

    Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025

    Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Huaping Liu, et al. Track any motions un- der any disturbances.arXiv preprint arXiv:2509.13833, 2025

  40. [40]

    Elucidating the design space of torque-aware vision-language-action models

    Zongzheng Zhang, Haobo Xu, Zhuo Yang, Chenghao Yue, Zehao Lin, Huan-ang Gao, Ziwei Wang, and Hao Zhao. Elucidating the design space of torque-aware vision-language-action models. In9th Annual Confer- ence on Robot Learning, 2025

  41. [41]

    Learning unified force and position control for legged loco-manipulation,

    Peiyuan Zhi, Peiyang Li, Jianqin Yin, Baoxiong Jia, and Siyuan Huang. Learning unified force and position control for legged loco-manipulation.arXiv preprint arXiv:2505.20829, 2025