HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

· 2026 · cs.RO · arXiv 2605.24934

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Human egocentric video captures rich manipulation demonstrations without any robot hardware, yet transferring these skills to robots remains challenging due to the embodiment gap between human and robot in both visual appearance and kinematics. We present HumanEgo, a framework that bridges the embodiment gap by lifting each human demonstration to an entity-level representation of hand-object interaction, and training a flow matching policy with dense auxiliary objectives that amplify supervision from every trajectory. HumanEgo is robot-data-free, hardware-agnostic, data-efficient, and zero-shot human-to-robot transferable. With only 30 minutes of human videos per task, HumanEgo achieves 92.5% average success across four real-world tasks (75% with just 15 minutes), outperforms matched-time robot teleoperation by 41%, and robustly transfers zero-shot across novel robots, cameras, and environments. We release HumanEgo as an easy-to-use, open-source framework for learning robot policies directly from human data: https://github.com/TX-Leo/HumanEgo

representative citing papers

Human-as-Humanoid: Enabling Zero-Shot Humanoid Learning from Ego-Exo Human Videos with Human-Aligned Embodiments

cs.RO · 2026-06-30 · unverdicted · novelty 6.0

Human-as-Humanoid converts ego-exo human videos into executable 60-DoF humanoid actions through embodiment alignment and retargeting, enabling zero-shot real-robot policy deployment without target-task teleoperation data.

LUCID: Learning Embodiment-Agnostic Intent Models from Unstructured Human Videos for Scalable Dexterous Robot Skill Acquisition

cs.RO · 2026-06-10 · unverdicted · novelty 5.0

LUCID learns embodiment-agnostic intent models from unstructured human videos to train dexterous robot policies in simulation, enabling zero-shot transfer on real-world tasks like stirring and wiping.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Human-as-Humanoid: Enabling Zero-Shot Humanoid Learning from Ego-Exo Human Videos with Human-Aligned Embodiments cs.RO · 2026-06-30 · unverdicted · none · ref 9 · internal anchor
Human-as-Humanoid converts ego-exo human videos into executable 60-DoF humanoid actions through embodiment alignment and retargeting, enabling zero-shot real-robot policy deployment without target-task teleoperation data.
LUCID: Learning Embodiment-Agnostic Intent Models from Unstructured Human Videos for Scalable Dexterous Robot Skill Acquisition cs.RO · 2026-06-10 · unverdicted · none · ref 27 · internal anchor
LUCID learns embodiment-agnostic intent models from unstructured human videos to train dexterous robot policies in simulation, enabling zero-shot transfer on real-world tasks like stirring and wiping.

HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

fields

years

verdicts

representative citing papers

citing papers explorer