HumanoidArena is a new benchmark of 7 leg-critical HOI/HSI tasks that evaluates egocentric hierarchical whole-body policies in humanoids and finds performance is strongly conditioned on the low-level GMT used.
hub
ψ0: An open foundation model towards universal humanoid loco-manipulation.arXiv preprint arXiv:2603.12263, 2026
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
fields
cs.RO 11years
2026 11verdicts
UNVERDICTED 11representative citing papers
SceneBot conditions a humanoid tracking policy on motion references and contact labels, using reconstructed scene-interaction data to unify free-space locomotion with contact-rich manipulation and terrain tasks.
OpenHLM is an empirical recipe yielding a whole-body humanoid VLA model that outperforms GR00T N1.6 and Ψ0 baselines on long-horizon tasks using less than half the demonstration time.
MotionWAM conditions a policy on intermediate features from a video world model to predict unified whole-body motion tokens, enabling real-time humanoid loco-manipulation that outperforms VLA baselines by over 30% on nine Unitree G1 tasks.
SIMPLE is a new large-scale simulation benchmark for humanoid loco-manipulation that integrates accurate dynamics and photorealistic rendering and demonstrates policy transfer from simulation to physical robots.
ActiveMimic pretrains on egocentric human video by recovering and modeling active camera motion as viewpoint actions, matching robot-data pretraining performance on real-world tasks.
LEGS shows synthetic data from a 3DGS-mesh hybrid simulator trains VLA policies for humanoid pick-and-place that match or exceed human teleoperation performance across multiple backbones and tasks while enabling low-cost robustness to appearance shifts.
DUET pretrains collaborative policies on human-human VR demonstrations then fine-tunes on minimal robot teleoperation data, achieving equal or better performance than robot-only baselines with 5.4x faster collection across four tasks.
GenHOI reconstructs robot-object scenes, generates task videos from language and first-frame images, extracts contact constraints, optimizes reference trajectories, and executes them via closed-loop control for zero-shot humanoid-object interaction.
OASIS generates scalable simulation data for humanoid loco-manipulation via 3D generative asset reconstruction and domain randomization, yielding a policy with higher zero-shot real-world success than real-robot teleoperation data.
HANDOFF is a distilled mixture-of-experts humanoid whole-body controller that follows a compact task-space interface, matches SOTA velocity tracking, provides large manipulation workspace on Unitree G1, and supports VLM-driven agentic planning with no task-specific data.
citing papers explorer
-
HumanoidArena: Benchmarking Egocentric Hierarchical Whole-body Learning
HumanoidArena is a new benchmark of 7 leg-critical HOI/HSI tasks that evaluates egocentric hierarchical whole-body policies in humanoids and finds performance is strongly conditioned on the low-level GMT used.
-
SceneBot: Contact-Prompted General Humanoid Whole Body Tracking with Scene-Interaction
SceneBot conditions a humanoid tracking policy on motion references and contact labels, using reconstructed scene-interaction data to unify free-space locomotion with contact-rich manipulation and terrain tasks.
-
OpenHLM: An Empirical Recipe for Whole-Body Humanoid Loco-Manipulation
OpenHLM is an empirical recipe yielding a whole-body humanoid VLA model that outperforms GR00T N1.6 and Ψ0 baselines on long-horizon tasks using less than half the demonstration time.
-
MotionWAM: Towards Foundation World Action Models for Real-Time Humanoid Loco-Manipulation
MotionWAM conditions a policy on intermediate features from a video world model to predict unified whole-body motion tokens, enabling real-time humanoid loco-manipulation that outperforms VLA baselines by over 30% on nine Unitree G1 tasks.
-
SIMPLE: Simulation-Based Policy Learning and Evaluation for Humanoid Loco-manipulation
SIMPLE is a new large-scale simulation benchmark for humanoid loco-manipulation that integrates accurate dynamics and photorealistic rendering and demonstrates policy transfer from simulation to physical robots.
-
ActiveMimic: Egocentric Video Pretraining with Active Perception
ActiveMimic pretrains on egocentric human video by recovering and modeling active camera motion as viewpoint actions, matching robot-data pretraining performance on real-world tasks.
-
LEGS: Fine-Tuning Teleop-Free VLAs for Humanoid Loco-manipulation in an Embodied Gaussian Splatting World
LEGS shows synthetic data from a 3DGS-mesh hybrid simulator trains VLA policies for humanoid pick-and-place that match or exceed human teleoperation performance across multiple backbones and tasks while enabling low-cost robustness to appearance shifts.
-
Duet: Dual-Robot Understanding via Efficient Teaching
DUET pretrains collaborative policies on human-human VR demonstrations then fine-tunes on minimal robot teleoperation data, achieving equal or better performance than robot-only baselines with 5.4x faster collection across four tasks.
-
GenHOI: Contact-Aware Humanoid-Object Interaction by Imitating Generated Videos without Task-Specific Training
GenHOI reconstructs robot-object scenes, generates task videos from language and first-frame images, extracts contact constraints, optimizes reference trajectories, and executes them via closed-loop control for zero-shot humanoid-object interaction.
-
OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation
OASIS generates scalable simulation data for humanoid loco-manipulation via 3D generative asset reconstruction and domain randomization, yielding a policy with higher zero-shot real-world success than real-robot teleoperation data.
-
HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers
HANDOFF is a distilled mixture-of-experts humanoid whole-body controller that follows a compact task-space interface, matches SOTA velocity tracking, provides large manipulation workspace on Unitree G1, and supports VLM-driven agentic planning with no task-specific data.