hub

Being-h0: Vision-language-action pretraining from large-scale human videos

Hao Luo, Yicheng Feng, Wanpeng Zhang, Sipeng Zheng, Ye Wang, Haoqi Yuan, Jiazheng Liu, Chaoyi Xu, Qin Jin, Zongqing Lu · 2025 · arXiv 2507.15597

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

HumanNet: Scaling Human-centric Video Learning to One Million Hours

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

cs.RO · 2026-04-27 · unverdicted · novelty 6.0

MoT-HRA learns embodiment-agnostic human-intention priors from the HA-2.2M dataset of 2.2M human video episodes through a three-expert hierarchy to improve robotic motion plausibility and robustness under distribution shift.

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.

A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.

LIDEA: Human-to-Robot Imitation Learning via Implicit Feature Distillation and Explicit Geometry Alignment

cs.RO · 2026-04-12 · unverdicted · novelty 5.0

LIDEA bridges the human-robot embodiment gap via implicit feature distillation in 2D and explicit geometry alignment in 3D, enabling human data to substitute up to 80% of robot demonstrations with improved out-of-distribution robustness.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks

cs.RO · 2026-04-26 · unverdicted · novelty 4.0

EgoLive is presented as the largest open-source annotated egocentric dataset for real-world task-oriented human routines, captured with a custom head-mounted device and multi-modal annotations exclusively in unconstrained environments.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

citing papers explorer

Showing 10 of 10 citing papers.

Being-H0.7: A Latent World-Action Model from Egocentric Videos cs.RO · 2026-04-30 · unverdicted · none · ref 6
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
HumanNet: Scaling Human-centric Video Learning to One Million Hours cs.CV · 2026-05-07 · unverdicted · none · ref 24
HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.
Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation cs.RO · 2026-04-27 · unverdicted · none · ref 10
MoT-HRA learns embodiment-agnostic human-intention priors from the HA-2.2M dataset of 2.2M human video episodes through a three-expert hierarchy to improve robotic motion plausibility and robustness under distribution shift.
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models cs.RO · 2026-04-20 · unverdicted · none · ref 8
State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies cs.RO · 2026-04-15 · unverdicted · none · ref 17
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World cs.RO · 2026-04-08 · unverdicted · none · ref 38
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
LIDEA: Human-to-Robot Imitation Learning via Implicit Feature Distillation and Explicit Geometry Alignment cs.RO · 2026-04-12 · unverdicted · none · ref 2
LIDEA bridges the human-robot embodiment gap via implicit feature distillation in 2D and explicit geometry alignment in 3D, enabling human data to substitute up to 80% of robot demonstrations with improved out-of-distribution robustness.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 203
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks cs.RO · 2026-04-26 · unverdicted · none · ref 25
EgoLive is presented as the largest open-source annotated egocentric dataset for real-world task-oriented human routines, captured with a custom head-mounted device and multi-modal annotations exclusively in unconstrained environments.
World Model for Robot Learning: A Comprehensive Survey cs.RO · 2026-04-30 · unverdicted · none · ref 35
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

Being-h0: Vision-language-action pretraining from large-scale human videos

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer