hub

Egodex: Learning dexterous manipulation from large-scale egocentric video

Ryan Hoque, Peide Huang, David J · 2026 · arXiv 2505.11709

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

TouchAnything: A Dataset and Framework for Bimanual Tactile Estimation from Egocentric Video

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

EgoTouch is a new multi-view egocentric dataset with dense bimanual tactile supervision, and TouchAnything is a baseline framework showing that wrist views improve vision-based tactile prediction over egocentric input alone.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

HumanNet: Scaling Human-centric Video Learning to One Million Hours

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.

Bridging the Embodiment Gap: Disentangled Cross-Embodiment Video Editing

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

A dual-contrastive disentanglement method factorizes videos into independent task and embodiment latents, then uses a parameter-efficient adapter on a frozen video diffusion model to synthesize robot executions from single human demonstrations without paired data.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

cs.RO · 2026-04-23 · conditional · novelty 6.0

FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

cs.RO · 2026-04-21 · unverdicted · novelty 6.0

UniT creates a unified physical language via visual anchoring and tri-branch reconstruction to enable scalable human-to-humanoid transfer for policy learning and world modeling.

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.

TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.

DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA

cs.RO · 2026-03-31 · unverdicted · novelty 6.0

DIAL decouples intent from action in end-to-end VLAs using a latent visual foresight bottleneck and two-stage training, reaching SOTA on RoboCasa with 10x fewer demonstrations and zero-shot real-world transfer.

World Action Models are Zero-shot Policies

cs.RO · 2026-02-17 · unverdicted · novelty 6.0

DreamZero uses a 14B video diffusion model as a World Action Model to achieve over 2x better zero-shot generalization on real robots than state-of-the-art VLAs, real-time 7Hz closed-loop control, and cross-embodiment transfer with 10-30 minutes of data.

STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

cs.RO · 2026-04-29 · unverdicted · novelty 5.0

STARRY uses unified diffusion to align spatial-temporal world predictions with action generation plus GASAM for geometry-aware attention, reaching 93.82%/93.30% success on 50 bimanual tasks in simulation and raising real-world success from 42.5% to 70.8%.

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

cs.RO · 2026-04-20 · unverdicted · novelty 5.0

StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict action accuracy on AgiBot and 9.7-17.6% gains in real-robot tasks.

Motus: A Unified Latent Action World Model

cs.CV · 2025-12-15 · unverdicted · novelty 5.0

Motus unifies understanding, video generation, and action in one latent world model via MoT experts and optical-flow latent actions, reporting gains over prior methods in simulation and real robots.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Robotic Affection -- Opportunities of AI-based haptic interactions to improve social robotic touch through a multi-deep-learning approach

cs.HC · 2026-05-04 · unverdicted · novelty 4.0

A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

cs.RO · 2026-04-22 · unverdicted · novelty 4.0

JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.

MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware

cs.CV · 2026-05-07 · 2 refs

citing papers explorer

Showing 1 of 1 citing paper after filters.

MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware cs.CV · 2026-05-07 · unreviewed · ref 14 · 2 links

Egodex: Learning dexterous manipulation from large-scale egocentric video

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer