hub Canonical reference

Being-h0: vision-language-action pretraining from large-scale human videos

Hao Luo, Yicheng Feng, Wanpeng Zhang, Sipeng Zheng, Ye Wang, Haoqi Yuan, Jiazheng Liu, Chaoyi Xu, Qin Jin, Zongqing Lu · 2025 · arXiv 2507.15597

Canonical reference. 78% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 78% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 baseline 1 dataset 1

citation-polarity summary

background 7 baseline 1 unclear 1

representative citing papers

Dexora: Open-source VLA for High-DoF Bimanual Dexterity

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

Dexora is the first open-source VLA system for dual-arm dual-hand high-DoF manipulation, trained on 100K simulated and 10K real teleoperated trajectories with a discriminator-weighted diffusion policy, achieving 66.7% dexterous success versus 51.7% for baselines.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation

cs.RO · 2026-02-18 · unverdicted · novelty 7.0

PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting

cs.CV · 2025-11-22 · unverdicted · novelty 7.0

SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.

Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention

cs.RO · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

HandITL enables seamless human intervention in VLA policies for bimanual dexterous manipulation, cutting jitter by 99.8% and improving refined policies by 19% over standard teleoperation.

HumanNet: Scaling Human-centric Video Learning to One Million Hours

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks, as shown by the new BeTTER benchmark with real-world validation.

A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

cs.RO · 2026-04-27 · unverdicted · novelty 5.0 · 2 refs

MoT-HRA learns embodiment-agnostic human-intention priors from a curated 2.2M-episode human video dataset via a three-expert hierarchical vision-language-action model to improve robotic manipulation under distribution shift.

LIDEA: Human-to-Robot Imitation Learning via Implicit Feature Distillation and Explicit Geometry Alignment

cs.RO · 2026-04-12 · unverdicted · novelty 5.0

LIDEA bridges the human-robot embodiment gap via implicit feature distillation in 2D and explicit geometry alignment in 3D, enabling human data to substitute up to 80% of robot demonstrations with improved out-of-distribution robustness.

Towards Robotic Dexterous Hand Intelligence: A Survey

cs.RO · 2026-05-13 · unverdicted · novelty 4.0

A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks

cs.RO · 2026-04-26 · unverdicted · novelty 4.0

EgoLive is presented as the largest open-source annotated egocentric dataset for real-world task-oriented human routines, captured with a custom head-mounted device and multi-modal annotations exclusively in unconstrained environments.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models

cs.CV · 2026-02-11 · unverdicted · novelty 3.0

AugVLA-3D augments existing VLA models with depth-derived 3D features and action priors to improve generalization and action accuracy in 3D robotic tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 203
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Being-h0: vision-language-action pretraining from large-scale human videos

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer