arxiv: 2605.01427 · v1 · submitted 2026-05-02 · 💻 cs.RO

Recognition: unknown

SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids

Xingzhou Chen , Xiayan Xu , Yan Ning , Jiyu Yu , Yizheng Zhang , Siyi Qian , Lingzhu Xiang , Jiahao Chen

show 3 more authors

Yuquan Wang Haodong Zhang Ling Shi

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:37 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid robotsproprioceptionwrench estimationcontact detectionconditional flow matchingwhole-body control

0 comments

The pith

SixthSense shows that whole-body contact wrenches on humanoids can be inferred solely from proprioception and IMU data using conditional flow matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a task-agnostic method called SixthSense for estimating whole-body contact timing, locations, and external wrenches on humanoid robots. It relies exclusively on proprioceptive sensors and IMU measurements, avoiding the need for external sensors or simplifying assumptions common in analytical approaches. By tokenizing proprioceptive histories and applying conditional flow matching, the approach models the complex, sparse dynamics of contacts. This enables reliable perception for applications like collision detection and physical interaction. Experiments on various behaviors demonstrate its effectiveness across different policies.

Core claim

We propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. This serves as a plug-and-play module for force-interaction tasks.

What carries the argument

Conditional flow matching applied to tokenized proprioceptive histories to model spatiotemporally sparse contact-event flows.

If this is right

Enables plug-and-play perception for collision detection without extra hardware.
Supports physical human-robot interaction using only internal sensors.
Facilitates force-feedback teleoperation on floating-base systems.
Achieves performance across standing, walking, and whole-body motion policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Humanoid robot designs could omit dedicated force-torque sensors, reducing hardware costs and complexity.
The tokenization and flow-matching approach may extend to other legged robots with similar proprioceptive setups.
Training data from diverse real-world interactions could improve robustness to unseen contact scenarios.

Load-bearing premise

That conditional flow matching on tokenized proprioceptive histories can reliably capture the multi-modal and spatiotemporally sparse mapping from unstructured contact inputs to uncertain motion outputs without additional external measurements or idealistic assumptions.

What would settle it

Measure estimated wrenches against ground-truth data from an external force-torque sensor during a controlled collision or push while the robot walks, and verify if the estimates match within a small error margin.

Figures

Figures reproduced from arXiv: 2605.01427 by Haodong Zhang, Jiahao Chen, Jiyu Yu, Ling Shi, Lingzhu Xiang, Siyi Qian, Xiayan Xu, Xingzhou Chen, Yan Ning, Yizheng Zhang, Yuquan Wang.

**Figure 1.** Figure 1: SixthSense: Inferring whole-body contact wrench fields via proprioception. This task-agnostic, plug-and-play module provides a robust perception foundation for diverse downstream control and decision-making applications. While external wrench estimation for fixed-base manipulators is largely a solved problem, extending these methods to whole-body humanoid interaction is fundamentally different. The presen… view at source ↗

**Figure 2.** Figure 2: Mapping whole-body surface contact force to wrench view at source ↗

**Figure 3.** Figure 3: Overview: Given a contact-resilient control policy, we use its rollouts to train a conditional flow-matching model that view at source ↗

**Figure 4.** Figure 4: Overview of information flow: Tokenized proprioceptive observations are streamed into iterative CFM refinement view at source ↗

**Figure 5.** Figure 5: Contact dataset collection across behaviors in MuJoCo view at source ↗

**Figure 7.** Figure 7: An example multi-point contact scenario We then test a contact estimator trained on the singlecontact dataset only, which has never observed any multicontact sample during training view at source ↗

**Figure 8.** Figure 8: Zero-shot multi-contact inference To verify that this zero-shot generalization stems from CFM’s distributional modeling rather than network capacity alone, we compare against an MLP baseline with hidden size [512, 512, 512] trained on the same single-contact locomotion data. The MLP achieves 99.69% detection on single-contact testing—comparable to CFM—but is evaluated on three simultaneous contacts that i… view at source ↗

**Figure 9.** Figure 9: Sensitivity to observation noise under single-contact view at source ↗

**Figure 11.** Figure 11: Contact data collection on real Unitree G1 view at source ↗

**Figure 12.** Figure 12: Spatiotemporally sparse contact wrench field estima view at source ↗

read the original abstract

Humanoid robots are entering our physical world at scale, yet as oversized toys--good at singing and dancing, but short on force-interaction capabilities for practical tasks. Bridging this gap necessitates prioritizing reliable contact perception as a fundamental requirement. Estimating external wrenches in humanoids is complicated by floating-base dynamics and indeterminate contact locations. Existing analytical frameworks require idealistic assumptions and hard-to-obtain measurements, which are often unavailable in practice. To bridge this gap, we propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. SixthSense serves as a plug-and-play perception module for applications including collision detection, physical human-robot interaction, and force-feedback teleoperation. Experiments across standing, walking, and whole-body motion-tracking policies showcased unprecedented performance in diverse behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SixthSense frames conditional flow matching on tokenized proprioceptive histories as a task-agnostic way to recover whole-body contact timing, location, and wrenches, but the floating-base inverse problem stays fundamentally ill-posed without extra constraints.

read the letter

The paper's main contribution is showing that conditional flow matching can be applied to proprioception and IMU histories to produce estimates of contact events and wrenches across standing, walking, and whole-body tracking behaviors. It positions the method as a plug-and-play module that avoids external sensors or strong analytical assumptions, which matches a real gap in current humanoid systems that need better force interaction without extra hardware. The experiments across multiple policies give some evidence that the approach can run in diverse settings, and the tokenization step is a concrete implementation choice that lets the model handle sparse, multi-modal inputs. That part is new enough to stand on its own rather than just rehashing prior learning-based contact estimators. The authors also keep the framing practical, tying it directly to collision detection, physical HRI, and teleoperation instead of abstract benchmarks. On the soft side, the central mapping is still severely underdetermined: floating-base dynamics plus indeterminate contacts mean many different wrench configurations can produce nearly identical joint torques and base accelerations. The stress-test concern holds up here—the flow model could converge to plausible but incorrect contact flows if it lacks physics grounding or external validation signals. Without seeing detailed baselines, error distributions, or ablation on the flow-matching hyperparameters, it's difficult to judge whether the reported performance actually resolves the ambiguity or just fits the training distribution. The paper would be useful for robotics groups already running humanoid platforms and looking for learning-based perception add-ons. Readers focused on legged robot state estimation or sim-to-real transfer would find the most direct value. It deserves a serious referee because the problem is timely and the method is implemented end-to-end, even though the results section will need close scrutiny on generalization and failure modes. I would send it to review with a request for quantitative comparisons against analytical baselines and tests on out-of-distribution contacts.

Referee Report

2 major / 1 minor

Summary. The paper introduces SixthSense, a task-agnostic proprioception-only approach for estimating whole-body contact timing, location, and wrenches in humanoid robots. It utilizes conditional flow matching on tokenized proprioceptive histories to model the mapping from unstructured contact inputs to uncertain motion outputs. The method is claimed to serve as a plug-and-play module for applications such as collision detection, physical human-robot interaction, and force-feedback teleoperation, with experiments on standing, walking, and whole-body motion-tracking policies demonstrating unprecedented performance.

Significance. Should the results hold under rigorous validation, this work would be significant for the field of humanoid robotics. It addresses a critical gap in contact perception by eliminating the need for external sensors or idealistic assumptions in floating-base systems. The application of conditional flow matching to capture multi-modal and sparse contact events represents an innovative use of generative models in robotics perception, potentially enabling more robust force-interaction capabilities.

major comments (2)

[Method] Method section: The description of conditional flow matching does not provide details on how the model resolves the underdetermined nature of the floating-base inverse dynamics problem. Different contact configurations can produce nearly identical joint torques and base accelerations, and without explicit physics constraints or regularization, it is unclear if the learned flow concentrates on true contact events rather than plausible alternatives.
[Experiments] Experiments section: The claim of 'unprecedented performance' in diverse behaviors is not supported by any quantitative results, error bars, baseline comparisons, or statistical analysis in the manuscript. This makes it difficult to evaluate the effectiveness against existing analytical or learning-based methods.

minor comments (1)

[Abstract] The abstract mentions 'unprecedented performance' without specifying the metrics used, which could be clarified for better context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below and describe the corresponding revisions.

read point-by-point responses

Referee: [Method] Method section: The description of conditional flow matching does not provide details on how the model resolves the underdetermined nature of the floating-base inverse dynamics problem. Different contact configurations can produce nearly identical joint torques and base accelerations, and without explicit physics constraints or regularization, it is unclear if the learned flow concentrates on true contact events rather than plausible alternatives.

Authors: We thank the referee for this important observation. SixthSense resolves the underdetermined mapping in a purely data-driven manner: the conditional flow matching model is trained end-to-end on large-scale paired datasets of proprioceptive histories and ground-truth contact-event flows obtained from simulation and motion-capture. The learned conditional distribution implicitly encodes the physics of floating-base dynamics, and the sparsity-inducing formulation of the contact-event flow further regularizes the solution space toward physically consistent sparse events. In practice, the generated flows align with true contacts rather than arbitrary alternatives, as confirmed by our qualitative and quantitative validation. We will add a dedicated paragraph in the method section clarifying this data-driven disambiguation mechanism and the role of the learned prior. revision: partial
Referee: [Experiments] Experiments section: The claim of 'unprecedented performance' in diverse behaviors is not supported by any quantitative results, error bars, baseline comparisons, or statistical analysis in the manuscript. This makes it difficult to evaluate the effectiveness against existing analytical or learning-based methods.

Authors: We agree that the current manuscript relies primarily on qualitative demonstrations and policy-integration results across standing, walking, and whole-body tracking. While these results illustrate successful real-world deployment without external sensors, we acknowledge the absence of comprehensive quantitative metrics, baselines, and statistical analysis. In the revised manuscript we will expand the experiments section with numerical evaluations (contact timing precision/recall, wrench estimation MAE and RMSE), direct comparisons against momentum-based observers and prior learning baselines, error bars from repeated trials, and statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity: data-driven learning method with no self-referential derivations

full rationale

The paper frames SixthSense as a learned generative model (conditional flow matching on tokenized proprioceptive histories) trained to map inputs to contact estimates. No equations, first-principles derivations, or analytical steps are shown that reduce outputs to inputs by construction. The approach is explicitly empirical and task-agnostic, relying on data rather than fitted parameters renamed as predictions or self-cited uniqueness theorems. Any incidental self-citations would not be load-bearing for the central claim, which rests on experimental validation across behaviors.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that proprioceptive histories contain sufficient information to disambiguate contact events and on the modeling choice that conditional flow matching can represent the required multi-modal distribution; no free parameters or invented entities are explicitly named in the abstract.

free parameters (1)

flow-matching model hyperparameters
Training-time parameters of the conditional flow matching network are necessarily fitted to data but not enumerated.

axioms (1)

domain assumption Proprioception and IMU signals alone suffice to infer external wrenches without external measurements or idealistic contact assumptions
Stated as the core premise that existing analytical methods fail to meet but the new method satisfies.

pith-pipeline@v0.9.0 · 5519 in / 1216 out tokens · 30465 ms · 2026-05-09T14:37:37.017381+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 11 canonical work pages · 4 internal anchors

[1]

State estimation for legged robots—consistent fusion of leg kinematics and IMU

Michael Bloesch, Marco Hutter, Mark A Hoepflinger, Stefan Leutenegger, Christian Gehring, C David Remy, and Roland Siegwart. State estimation for legged robots—consistent fusion of leg kinematics and IMU. Robotics, 17:17–24, 2013

2013
[2]

More than a feeling: Learning to grasp and regrasp using vision and touch

Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H Adelson, and Sergey Levine. More than a feeling: Learning to grasp and regrasp using vision and touch. IEEE Robotics and Automation Letters, 3(4):3300–3307, 2018

2018
[3]

Caldwell, and Claudio Semini

Marco Camurri, Maurice Fallon, St ´ephane Bazeille, An- dreea Radulescu, Victor Barasuol, Darwin G. Caldwell, and Claudio Semini. Probabilistic contact estimation and impact detection for state estimation of quadruped robots. IEEE Robotics and Automation Letters, 2(2):1023–1030, 2017

2017
[4]

Capturability-based pattern generation for walking with variable height.IEEE Transactions on Robotics, 36(2):517–536, 2019

St ´ephane Caron, Adrien Escande, Leonardo Lanari, and Bastien Mallein. Capturability-based pattern generation for walking with variable height.IEEE Transactions on Robotics, 36(2):517–536, 2019

2019
[5]

Sensorless robot collision detection and hybrid force/motion control

Alessandro De Luca and Raffaella Mattone. Sensorless robot collision detection and hybrid force/motion control. InProceedings of the IEEE International Conference on Robotics and Automation, pages 999–1004. IEEE, 2005

2005
[6]

Collision detection and safe reaction with the DLR-III lightweight manipulator arm

Alessandro De Luca, Alin Albu-Schaffer, Sami Had- dadin, and Gerd Hirzinger. Collision detection and safe reaction with the DLR-III lightweight manipulator arm. InIEEE/RSJ international conference on intelligent robots and systems, pages 1623–1630. IEEE, 2006

2006
[7]

Springer, 2008

Roy Featherstone.Rigid body dynamics algorithms. Springer, 2008

2008
[8]

Residual-based contacts estimation for hu- manoid robots

Fabrizio Flacco, Antonio Paolillo, and Abderrahmane Kheddar. Residual-based contacts estimation for hu- manoid robots. InIEEE-RAS International Conference on Humanoid Robots, pages 409–415, 2016

2016
[9]

Feed- back control of a Cassie bipedal robot: Walking, stand- ing, and riding a Segway

Yukai Gong, Ross Hartley, Xingye Da, Ayonga Hereid, Omar Harib, Jiunn-Kai Huang, and Jessy Grizzle. Feed- back control of a Cassie bipedal robot: Walking, stand- ing, and riding a Segway. In2019 American control conference, pages 4559–4566. IEEE, 2019

2019
[10]

Collision detection and reaction: A contribution to safe physical human-robot interaction

Sami Haddadin, Alin Albu-Schaffer, Alessandro De Luca, and Gerd Hirzinger. Collision detection and reaction: A contribution to safe physical human-robot interaction. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3356–3363. IEEE, 2008

2008
[11]

Robot collisions: A survey on detection, isola- tion, and identification.IEEE Transactions on Robotics, 33(6):1292–1312, 2017

Sami Haddadin, Alessandro De Luca, and Alin Albu- Sch¨affer. Robot collisions: A survey on detection, isola- tion, and identification.IEEE Transactions on Robotics, 33(6):1292–1312, 2017

2017
[12]

Eustice, and Jessy W

Ross Hartley, Maani Ghaffari, Ryan M. Eustice, and Jessy W. Grizzle. Contact-aided invariant extended Kalman filtering for robot state estimation.The Inter- national Journal of Robotics Research, 39(4):402–430, 2020

2020
[13]

Learning human- to-humanoid real-time whole-body teleoperation

Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Learning human- to-humanoid real-time whole-body teleoperation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 8944–8951. IEEE, 2024

2024
[14]

Rapid trajectory optimization using C-FROST with illustration on a Cassie-series dynamic walking biped

Ayonga Hereid, Omar Harib, Ross Hartley, Yukai Gong, and Jessy W Grizzle. Rapid trajectory optimization using C-FROST with illustration on a Cassie-series dynamic walking biped. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4722–4729, 2019

2019
[15]

Probabilistic foot contact estimation by fusing information from dynamics and dif- ferentialforward kinematics

Jemin Hwangbo, Carmine Dario Bellicoso, P ´eter Fankhauser, and Marco Hutter. Probabilistic foot contact estimation by fusing information from dynamics and dif- ferentialforward kinematics. InIEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3872–3878. IEEE, 2016

2016
[16]

Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

2019
[17]

Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022

Gwanghyeon Ji, Juhyeok Mun, Hyeongjun Kim, and Jemin Hwangbo. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022

2022
[18]

Humanoid robots in aircraft manufacturing: The airbus use cases.IEEE Robotics & Automation Magazine, 26(4):30–45, 2019

Abderrahmane Kheddar, St ´ephane Caron, Pierre Ger- gondet, Andrew Comport, Arnaud Tanguy, Christian Ott, Bernd Henze, George Mesesan, Johannes Englsberger, M´aximo A Roa, et al. Humanoid robots in aircraft manufacturing: The airbus use cases.IEEE Robotics & Automation Magazine, 26(4):30–45, 2019

2019
[19]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational Bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[20]

Rma: Rapid motor adaptation for legged robots

Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. RMA: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

work page arXiv 2021
[21]

DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5 (3):3838–3845, 2020

Mike Lambeta, Po-Wei Chou, Stephen Tian, Brian Yang, Benjamin Maloon, Victoria Rose Most, Dave Stroud, Raymond Santos, Ahmad Byagowi, Gregg Kammerer, et al. DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5 (3):3838–3845, 2020

2020
[22]

Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5 (47):eabc5986, 2020

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5 (47):eabc5986, 2020

2020
[23]

Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36(3):582–596, 2020

Michelle A Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, and Jeannette Bohg. Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36(3):582–596, 2020

2020
[24]

Beyondmimic: From mo- tion tracking to versatile humanoid control via guided diffusion,

Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. BeyondMimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

work page arXiv 2025
[25]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Localizing external contact using proprioceptive sensors: The contact particle filter

Lucas Manuelli and Russ Tedrake. Localizing external contact using proprioceptive sensors: The contact particle filter. InIEEE/RSJ International Conference on Intelli- gent Robots and Systems, pages 5062–5069. IEEE, 2016

2016
[27]

Dynamic walking on compliant and uneven terrain using dcm and passivity-based whole-body control

George Mesesan, Johannes Englsberger, Gianluca Garo- falo, Christian Ott, and Alin Albu-Sch ¨affer. Dynamic walking on compliant and uneven terrain using dcm and passivity-based whole-body control. In2019 IEEE- RAS 19th International Conference on Humanoid Robots, pages 25–32, 2019

2019
[28]

Joe Payne, Daniel A

J. Joe Payne, Daniel A. Hagen, Denis Garagi ´c, and Aaron M. Johnson. Multi-momentum observer con- tact estimation for bipedal robots.arXiv preprint arXiv:2412.03462, 2024

work page arXiv 2024
[29]

DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills. ACM Transactions On Graphics, 37(4):1–14, 2018

2018
[30]

Ambiguous collision outcomes and sliding with infinite friction in models of legged systems

C David Remy. Ambiguous collision outcomes and sliding with infinite friction in models of legged systems. The International Journal of Robotics Research, 36(12): 1252–1267, 2017

2017
[31]

Springer handbook of robotics, volume 200

Bruno Siciliano, Oussama Khatib, and Torsten Kr ¨oger. Springer handbook of robotics, volume 200. Springer, 2008

2008
[32]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review arXiv 2011
[33]

NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation.Science Robotics, 9(96): eadl0628, 2024

Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, et al. NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation.Science Robotics, 9(96): eadl0628, 2024

2024
[34]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Kilian Fatras, Nikolay Malkin, Guil- laume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.arXiv preprint arXiv:2302.00482, 2023

work page internal anchor Pith review arXiv 2023
[35]

Soft magnetic skin for super-resolution tactile sensing with force self- decoupling.Science Robotics, 6(51):eabc8801, 2021

Youcan Yan, Zhe Hu, Zhengbao Yang, Wenzhen Yuan, Chaoyang Song, Jia Pan, and Yajing Shen. Soft magnetic skin for super-resolution tactile sensing with force self- decoupling.Science Robotics, 6(51):eabc8801, 2021

2021
[36]

GelSight: High-resolution robot tactile sensors for esti- mating geometry and force.Sensors, 17(12):2762, 2017

Wenxuan Yuan, Siyuan Dong, and Edward H Adelson. GelSight: High-resolution robot tactile sensors for esti- mating geometry and force.Sensors, 17(12):2762, 2017

2017
[37]

TWIST: Teleoperated whole-body imitation system

Yanjie Ze, Zixuan Chen, Joao Pedro Ara ´ujo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, and C Karen Liu. TWIST: Teleoperated whole-body imitation system. arXiv preprint arXiv:2505.02833, 2025

work page arXiv 2025
[38]

Twist2: Scalable, portable, and holistic humanoid data collection system,

Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Ji- ajun Wu, and C Karen Liu. TWIST2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

work page arXiv 2025
[39]

Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025

Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Huaping Liu, et al. Track any motions un- der any disturbances.arXiv preprint arXiv:2509.13833, 2025

work page arXiv 2025
[40]

Elucidating the design space of torque-aware vision-language-action models

Zongzheng Zhang, Haobo Xu, Zhuo Yang, Chenghao Yue, Zehao Lin, Huan-ang Gao, Ziwei Wang, and Hao Zhao. Elucidating the design space of torque-aware vision-language-action models. In9th Annual Confer- ence on Robot Learning, 2025

2025
[41]

Learning unified force and position control for legged loco-manipulation,

Peiyuan Zhi, Peiyang Li, Jianqin Yin, Baoxiong Jia, and Siyuan Huang. Learning unified force and position control for legged loco-manipulation.arXiv preprint arXiv:2505.20829, 2025

work page arXiv 2025