Recognition: unknown
SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids
Pith reviewed 2026-05-09 14:37 UTC · model grok-4.3
The pith
SixthSense shows that whole-body contact wrenches on humanoids can be inferred solely from proprioception and IMU data using conditional flow matching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. This serves as a plug-and-play module for force-interaction tasks.
What carries the argument
Conditional flow matching applied to tokenized proprioceptive histories to model spatiotemporally sparse contact-event flows.
If this is right
- Enables plug-and-play perception for collision detection without extra hardware.
- Supports physical human-robot interaction using only internal sensors.
- Facilitates force-feedback teleoperation on floating-base systems.
- Achieves performance across standing, walking, and whole-body motion policies.
Where Pith is reading between the lines
- Humanoid robot designs could omit dedicated force-torque sensors, reducing hardware costs and complexity.
- The tokenization and flow-matching approach may extend to other legged robots with similar proprioceptive setups.
- Training data from diverse real-world interactions could improve robustness to unseen contact scenarios.
Load-bearing premise
That conditional flow matching on tokenized proprioceptive histories can reliably capture the multi-modal and spatiotemporally sparse mapping from unstructured contact inputs to uncertain motion outputs without additional external measurements or idealistic assumptions.
What would settle it
Measure estimated wrenches against ground-truth data from an external force-torque sensor during a controlled collision or push while the robot walks, and verify if the estimates match within a small error margin.
Figures
read the original abstract
Humanoid robots are entering our physical world at scale, yet as oversized toys--good at singing and dancing, but short on force-interaction capabilities for practical tasks. Bridging this gap necessitates prioritizing reliable contact perception as a fundamental requirement. Estimating external wrenches in humanoids is complicated by floating-base dynamics and indeterminate contact locations. Existing analytical frameworks require idealistic assumptions and hard-to-obtain measurements, which are often unavailable in practice. To bridge this gap, we propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. SixthSense serves as a plug-and-play perception module for applications including collision detection, physical human-robot interaction, and force-feedback teleoperation. Experiments across standing, walking, and whole-body motion-tracking policies showcased unprecedented performance in diverse behaviors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SixthSense, a task-agnostic proprioception-only approach for estimating whole-body contact timing, location, and wrenches in humanoid robots. It utilizes conditional flow matching on tokenized proprioceptive histories to model the mapping from unstructured contact inputs to uncertain motion outputs. The method is claimed to serve as a plug-and-play module for applications such as collision detection, physical human-robot interaction, and force-feedback teleoperation, with experiments on standing, walking, and whole-body motion-tracking policies demonstrating unprecedented performance.
Significance. Should the results hold under rigorous validation, this work would be significant for the field of humanoid robotics. It addresses a critical gap in contact perception by eliminating the need for external sensors or idealistic assumptions in floating-base systems. The application of conditional flow matching to capture multi-modal and sparse contact events represents an innovative use of generative models in robotics perception, potentially enabling more robust force-interaction capabilities.
major comments (2)
- [Method] Method section: The description of conditional flow matching does not provide details on how the model resolves the underdetermined nature of the floating-base inverse dynamics problem. Different contact configurations can produce nearly identical joint torques and base accelerations, and without explicit physics constraints or regularization, it is unclear if the learned flow concentrates on true contact events rather than plausible alternatives.
- [Experiments] Experiments section: The claim of 'unprecedented performance' in diverse behaviors is not supported by any quantitative results, error bars, baseline comparisons, or statistical analysis in the manuscript. This makes it difficult to evaluate the effectiveness against existing analytical or learning-based methods.
minor comments (1)
- [Abstract] The abstract mentions 'unprecedented performance' without specifying the metrics used, which could be clarified for better context.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments. We address each major comment below and describe the corresponding revisions.
read point-by-point responses
-
Referee: [Method] Method section: The description of conditional flow matching does not provide details on how the model resolves the underdetermined nature of the floating-base inverse dynamics problem. Different contact configurations can produce nearly identical joint torques and base accelerations, and without explicit physics constraints or regularization, it is unclear if the learned flow concentrates on true contact events rather than plausible alternatives.
Authors: We thank the referee for this important observation. SixthSense resolves the underdetermined mapping in a purely data-driven manner: the conditional flow matching model is trained end-to-end on large-scale paired datasets of proprioceptive histories and ground-truth contact-event flows obtained from simulation and motion-capture. The learned conditional distribution implicitly encodes the physics of floating-base dynamics, and the sparsity-inducing formulation of the contact-event flow further regularizes the solution space toward physically consistent sparse events. In practice, the generated flows align with true contacts rather than arbitrary alternatives, as confirmed by our qualitative and quantitative validation. We will add a dedicated paragraph in the method section clarifying this data-driven disambiguation mechanism and the role of the learned prior. revision: partial
-
Referee: [Experiments] Experiments section: The claim of 'unprecedented performance' in diverse behaviors is not supported by any quantitative results, error bars, baseline comparisons, or statistical analysis in the manuscript. This makes it difficult to evaluate the effectiveness against existing analytical or learning-based methods.
Authors: We agree that the current manuscript relies primarily on qualitative demonstrations and policy-integration results across standing, walking, and whole-body tracking. While these results illustrate successful real-world deployment without external sensors, we acknowledge the absence of comprehensive quantitative metrics, baselines, and statistical analysis. In the revised manuscript we will expand the experiments section with numerical evaluations (contact timing precision/recall, wrench estimation MAE and RMSE), direct comparisons against momentum-based observers and prior learning baselines, error bars from repeated trials, and statistical significance tests. revision: yes
Circularity Check
No circularity: data-driven learning method with no self-referential derivations
full rationale
The paper frames SixthSense as a learned generative model (conditional flow matching on tokenized proprioceptive histories) trained to map inputs to contact estimates. No equations, first-principles derivations, or analytical steps are shown that reduce outputs to inputs by construction. The approach is explicitly empirical and task-agnostic, relying on data rather than fitted parameters renamed as predictions or self-cited uniqueness theorems. Any incidental self-citations would not be load-bearing for the central claim, which rests on experimental validation across behaviors.
Axiom & Free-Parameter Ledger
free parameters (1)
- flow-matching model hyperparameters
axioms (1)
- domain assumption Proprioception and IMU signals alone suffice to infer external wrenches without external measurements or idealistic contact assumptions
Reference graph
Works this paper leans on
-
[1]
State estimation for legged robots—consistent fusion of leg kinematics and IMU
Michael Bloesch, Marco Hutter, Mark A Hoepflinger, Stefan Leutenegger, Christian Gehring, C David Remy, and Roland Siegwart. State estimation for legged robots—consistent fusion of leg kinematics and IMU. Robotics, 17:17–24, 2013
2013
-
[2]
More than a feeling: Learning to grasp and regrasp using vision and touch
Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H Adelson, and Sergey Levine. More than a feeling: Learning to grasp and regrasp using vision and touch. IEEE Robotics and Automation Letters, 3(4):3300–3307, 2018
2018
-
[3]
Caldwell, and Claudio Semini
Marco Camurri, Maurice Fallon, St ´ephane Bazeille, An- dreea Radulescu, Victor Barasuol, Darwin G. Caldwell, and Claudio Semini. Probabilistic contact estimation and impact detection for state estimation of quadruped robots. IEEE Robotics and Automation Letters, 2(2):1023–1030, 2017
2017
-
[4]
Capturability-based pattern generation for walking with variable height.IEEE Transactions on Robotics, 36(2):517–536, 2019
St ´ephane Caron, Adrien Escande, Leonardo Lanari, and Bastien Mallein. Capturability-based pattern generation for walking with variable height.IEEE Transactions on Robotics, 36(2):517–536, 2019
2019
-
[5]
Sensorless robot collision detection and hybrid force/motion control
Alessandro De Luca and Raffaella Mattone. Sensorless robot collision detection and hybrid force/motion control. InProceedings of the IEEE International Conference on Robotics and Automation, pages 999–1004. IEEE, 2005
2005
-
[6]
Collision detection and safe reaction with the DLR-III lightweight manipulator arm
Alessandro De Luca, Alin Albu-Schaffer, Sami Had- dadin, and Gerd Hirzinger. Collision detection and safe reaction with the DLR-III lightweight manipulator arm. InIEEE/RSJ international conference on intelligent robots and systems, pages 1623–1630. IEEE, 2006
2006
-
[7]
Springer, 2008
Roy Featherstone.Rigid body dynamics algorithms. Springer, 2008
2008
-
[8]
Residual-based contacts estimation for hu- manoid robots
Fabrizio Flacco, Antonio Paolillo, and Abderrahmane Kheddar. Residual-based contacts estimation for hu- manoid robots. InIEEE-RAS International Conference on Humanoid Robots, pages 409–415, 2016
2016
-
[9]
Feed- back control of a Cassie bipedal robot: Walking, stand- ing, and riding a Segway
Yukai Gong, Ross Hartley, Xingye Da, Ayonga Hereid, Omar Harib, Jiunn-Kai Huang, and Jessy Grizzle. Feed- back control of a Cassie bipedal robot: Walking, stand- ing, and riding a Segway. In2019 American control conference, pages 4559–4566. IEEE, 2019
2019
-
[10]
Collision detection and reaction: A contribution to safe physical human-robot interaction
Sami Haddadin, Alin Albu-Schaffer, Alessandro De Luca, and Gerd Hirzinger. Collision detection and reaction: A contribution to safe physical human-robot interaction. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3356–3363. IEEE, 2008
2008
-
[11]
Robot collisions: A survey on detection, isola- tion, and identification.IEEE Transactions on Robotics, 33(6):1292–1312, 2017
Sami Haddadin, Alessandro De Luca, and Alin Albu- Sch¨affer. Robot collisions: A survey on detection, isola- tion, and identification.IEEE Transactions on Robotics, 33(6):1292–1312, 2017
2017
-
[12]
Eustice, and Jessy W
Ross Hartley, Maani Ghaffari, Ryan M. Eustice, and Jessy W. Grizzle. Contact-aided invariant extended Kalman filtering for robot state estimation.The Inter- national Journal of Robotics Research, 39(4):402–430, 2020
2020
-
[13]
Learning human- to-humanoid real-time whole-body teleoperation
Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Learning human- to-humanoid real-time whole-body teleoperation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 8944–8951. IEEE, 2024
2024
-
[14]
Rapid trajectory optimization using C-FROST with illustration on a Cassie-series dynamic walking biped
Ayonga Hereid, Omar Harib, Ross Hartley, Yukai Gong, and Jessy W Grizzle. Rapid trajectory optimization using C-FROST with illustration on a Cassie-series dynamic walking biped. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4722–4729, 2019
2019
-
[15]
Probabilistic foot contact estimation by fusing information from dynamics and dif- ferentialforward kinematics
Jemin Hwangbo, Carmine Dario Bellicoso, P ´eter Fankhauser, and Marco Hutter. Probabilistic foot contact estimation by fusing information from dynamics and dif- ferentialforward kinematics. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3872–3878. IEEE, 2016
2016
-
[16]
Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019
2019
-
[17]
Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022
Gwanghyeon Ji, Juhyeok Mun, Hyeongjun Kim, and Jemin Hwangbo. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022
2022
-
[18]
Humanoid robots in aircraft manufacturing: The airbus use cases.IEEE Robotics & Automation Magazine, 26(4):30–45, 2019
Abderrahmane Kheddar, St ´ephane Caron, Pierre Ger- gondet, Andrew Comport, Arnaud Tanguy, Christian Ott, Bernd Henze, George Mesesan, Johannes Englsberger, M´aximo A Roa, et al. Humanoid robots in aircraft manufacturing: The airbus use cases.IEEE Robotics & Automation Magazine, 26(4):30–45, 2019
2019
-
[19]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational Bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[20]
Rma: Rapid motor adaptation for legged robots
Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. RMA: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021
-
[21]
DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5 (3):3838–3845, 2020
Mike Lambeta, Po-Wei Chou, Stephen Tian, Brian Yang, Benjamin Maloon, Victoria Rose Most, Dave Stroud, Raymond Santos, Ahmad Byagowi, Gregg Kammerer, et al. DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5 (3):3838–3845, 2020
2020
-
[22]
Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5 (47):eabc5986, 2020
Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5 (47):eabc5986, 2020
2020
-
[23]
Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36(3):582–596, 2020
Michelle A Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, and Jeannette Bohg. Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36(3):582–596, 2020
2020
-
[24]
Beyondmimic: From mo- tion tracking to versatile humanoid control via guided diffusion,
Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. BeyondMimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025
-
[25]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Localizing external contact using proprioceptive sensors: The contact particle filter
Lucas Manuelli and Russ Tedrake. Localizing external contact using proprioceptive sensors: The contact particle filter. InIEEE/RSJ International Conference on Intelli- gent Robots and Systems, pages 5062–5069. IEEE, 2016
2016
-
[27]
Dynamic walking on compliant and uneven terrain using dcm and passivity-based whole-body control
George Mesesan, Johannes Englsberger, Gianluca Garo- falo, Christian Ott, and Alin Albu-Sch ¨affer. Dynamic walking on compliant and uneven terrain using dcm and passivity-based whole-body control. In2019 IEEE- RAS 19th International Conference on Humanoid Robots, pages 25–32, 2019
2019
-
[28]
J. Joe Payne, Daniel A. Hagen, Denis Garagi ´c, and Aaron M. Johnson. Multi-momentum observer con- tact estimation for bipedal robots.arXiv preprint arXiv:2412.03462, 2024
-
[29]
DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills. ACM Transactions On Graphics, 37(4):1–14, 2018
2018
-
[30]
Ambiguous collision outcomes and sliding with infinite friction in models of legged systems
C David Remy. Ambiguous collision outcomes and sliding with infinite friction in models of legged systems. The International Journal of Robotics Research, 36(12): 1252–1267, 2017
2017
-
[31]
Springer handbook of robotics, volume 200
Bruno Siciliano, Oussama Khatib, and Torsten Kr ¨oger. Springer handbook of robotics, volume 200. Springer, 2008
2008
-
[32]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020
work page internal anchor Pith review arXiv 2011
-
[33]
NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation.Science Robotics, 9(96): eadl0628, 2024
Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, et al. NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation.Science Robotics, 9(96): eadl0628, 2024
2024
-
[34]
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong, Kilian Fatras, Nikolay Malkin, Guil- laume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.arXiv preprint arXiv:2302.00482, 2023
work page internal anchor Pith review arXiv 2023
-
[35]
Soft magnetic skin for super-resolution tactile sensing with force self- decoupling.Science Robotics, 6(51):eabc8801, 2021
Youcan Yan, Zhe Hu, Zhengbao Yang, Wenzhen Yuan, Chaoyang Song, Jia Pan, and Yajing Shen. Soft magnetic skin for super-resolution tactile sensing with force self- decoupling.Science Robotics, 6(51):eabc8801, 2021
2021
-
[36]
GelSight: High-resolution robot tactile sensors for esti- mating geometry and force.Sensors, 17(12):2762, 2017
Wenxuan Yuan, Siyuan Dong, and Edward H Adelson. GelSight: High-resolution robot tactile sensors for esti- mating geometry and force.Sensors, 17(12):2762, 2017
2017
-
[37]
TWIST: Teleoperated whole-body imitation system
Yanjie Ze, Zixuan Chen, Joao Pedro Ara ´ujo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, and C Karen Liu. TWIST: Teleoperated whole-body imitation system. arXiv preprint arXiv:2505.02833, 2025
-
[38]
Twist2: Scalable, portable, and holistic humanoid data collection system,
Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Ji- ajun Wu, and C Karen Liu. TWIST2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025
-
[39]
Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025
Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Huaping Liu, et al. Track any motions un- der any disturbances.arXiv preprint arXiv:2509.13833, 2025
-
[40]
Elucidating the design space of torque-aware vision-language-action models
Zongzheng Zhang, Haobo Xu, Zhuo Yang, Chenghao Yue, Zehao Lin, Huan-ang Gao, Ziwei Wang, and Hao Zhao. Elucidating the design space of torque-aware vision-language-action models. In9th Annual Confer- ence on Robot Learning, 2025
2025
-
[41]
Learning unified force and position control for legged loco-manipulation,
Peiyuan Zhi, Peiyang Li, Jianqin Yin, Baoxiong Jia, and Siyuan Huang. Learning unified force and position control for legged loco-manipulation.arXiv preprint arXiv:2505.20829, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.