The invisible egohand: 3d hand forecasting through egobody pose estimation

Masashi Hatano, Zhifan Zhu, Hideo Saito, Dima Damen · 2025 · cs.CV · arXiv 2504.08654

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Forecasting hand motion and pose from an egocentric perspective is essential for understanding human intention. However, existing methods focus solely on predicting positions without considering articulation, and only when the hands are visible in the field of view. This limitation overlooks the fact that approximate hand positions can still be inferred even when they are outside the camera's view. In this paper, we propose a method to forecast the 3D trajectories and poses of both hands from an egocentric video, both in and out of the field of view. We propose a diffusion-based transformer architecture for Egocentric Hand Forecasting, EgoH4, which takes as input the observation sequence and camera poses, then predicts future 3D motion and poses for both hands of the camera wearer. We leverage full-body pose information, allowing other joints to provide constraints on hand motion. We denoise the hand and body joints along with a visibility predictor for hand joints and a 3D-to-2D reprojection loss that minimizes the error when hands are in-view. We evaluate EgoH4 on the Ego-Exo4D dataset, combining subsets with body and hand annotations. We train on 156K sequences and evaluate on 34K sequences, respectively. EgoH4 improves the performance by 3.4cm and 5.1cm over the baseline in terms of ADE for hand trajectory forecasting and MPJPE for hand pose forecasting. Project page: https://masashi-hatano.github.io/EgoH4/

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting

cs.CV · 2025-11-22 · unverdicted · novelty 7.0

SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.

Ego-Human Motion Prediction with 3D-Aware LLM

cs.CV · 2026-07-08 · conditional · novelty 6.0

Ego3DLM jointly predicts past and future 3D body pose and motion descriptions in a single autoregressive pass, conditioned on egocentric video, 3D scene features, and three-point tracking, achieving state-of-the-art on the Nymeria benchmark.

EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

EggHand unifies VLA action decoding with viewpoint-aware video-text encoding to forecast egocentric hand poses, achieving SOTA accuracy on EgoExo4D while remaining robust to ego-motion and controllable via language prompts.

Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

cs.CV · 2025-11-17 · unverdicted · novelty 6.0

Uni-Hand forecasts 2D/3D hand waypoints, head motion, and contact states in egocentric views using vision-language fusion and dual-branch diffusion, with new benchmarks for downstream robotics and action tasks.

Prior-First, Condition-Second: Scalable and Controllable Hand Motion Completion

cs.GR · 2026-07-07 · conditional · novelty 5.5

Prior-first body-hand kinematic model with layered adapters for real-time, low-supervision hand motion completion conditioned on body and semantics.

citing papers explorer

Showing 5 of 5 citing papers.

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting cs.CV · 2025-11-22 · unverdicted · none · ref 21
SFHand presents the first streaming language-guided autoregressive framework for 3D hand forecasting, achieving up to 35.8% gains over prior methods and 13.4% better downstream embodied task performance.
Ego-Human Motion Prediction with 3D-Aware LLM cs.CV · 2026-07-08 · conditional · none · ref 30 · internal anchor
Ego3DLM jointly predicts past and future 3D body pose and motion descriptions in a single autoregressive pass, conditioned on egocentric video, 3D scene features, and three-point tracking, achieving state-of-the-art on the Nymeria benchmark.
EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting cs.CV · 2026-05-08 · unverdicted · none · ref 21
EggHand unifies VLA action decoding with viewpoint-aware video-text encoding to forecast egocentric hand poses, achieving SOTA accuracy on EgoExo4D while remaining robust to ego-motion and controllable via language prompts.
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views cs.CV · 2025-11-17 · unverdicted · none · ref 24
Uni-Hand forecasts 2D/3D hand waypoints, head motion, and contact states in egocentric views using vision-language fusion and dual-branch diffusion, with new benchmarks for downstream robotics and action tasks.
Prior-First, Condition-Second: Scalable and Controllable Hand Motion Completion cs.GR · 2026-07-07 · conditional · none · ref 9 · internal anchor
Prior-first body-hand kinematic model with layered adapters for real-time, low-supervision hand motion completion conditioned on body and semantics.

The invisible egohand: 3d hand forecasting through egobody pose estimation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer