hub

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Zipeng Fu, Tony Z. Zhao, Chelsea Finn · 2024 · arXiv 2401.02117

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

open full Pith review browse 21 citing papers arXiv PDF

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation

cs.RO · 2026-04-28 · conditional · novelty 7.0

ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.

BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

cs.RO · 2026-04-07 · conditional · novelty 7.0

BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

cs.RO · 2024-02-15 · conditional · novelty 7.0

UMI enables zero-shot deployment of robot manipulation policies trained solely on portable human demonstrations captured with custom handheld grippers, supporting dynamic bimanual tasks across novel environments and objects.

CUBic: Coordinated Unified Bimanual Perception and Control Framework

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordination accuracy and task success on the RoboTwin benchmark.

Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation

cs.RO · 2026-05-07 · unverdicted · novelty 6.0

VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.

BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.

WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models

cs.RO · 2026-04-13 · unverdicted · novelty 6.0

WM-DAgger uses world models with corrective action synthesis and consistency-guided filtering to aggregate OOD recovery data for imitation learning, reporting 93.3% success in soft bag pushing with five demonstrations.

WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.

From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

cs.AI · 2026-04-12 · unverdicted · novelty 6.0

EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.

ARM: Advantage Reward Modeling for Long-Horizon Manipulation

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

cs.RO · 2025-06-22 · unverdicted · novelty 6.0

RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

cs.RO · 2024-10-10 · conditional · novelty 6.0

RDT-1B is a diffusion foundation model that unifies action spaces across robots and demonstrates superior bimanual manipulation with zero-shot generalization, language following, and few-shot learning on real robots.

Octo: An Open-Source Generalist Robot Policy

cs.RO · 2024-05-20 · unverdicted · novelty 6.0

Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.

Evaluating Real-World Robot Manipulation Policies in Simulation

cs.RO · 2024-05-09 · conditional · novelty 6.0

SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

cs.RO · 2024-03-19 · accept · novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 5.0

SASI combines skeleton-based graph convolutions with sub-action semantics for improved early action recognition on the BABEL dataset.

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

cs.RO · 2026-04-20 · unverdicted · novelty 5.0

StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict action accuracy on AgiBot and 9.7-17.6% gains in real-robot tasks.

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

cs.RO · 2026-04-26 · accept · novelty 4.0

A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.

citing papers explorer

Showing 20 of 20 citing papers after filters.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction cs.RO · 2026-04-30 · unverdicted · none · ref 11 · internal anchor
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation cs.RO · 2026-04-28 · conditional · none · ref 34 · internal anchor
ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment cs.RO · 2026-04-27 · unverdicted · none · ref 6 · internal anchor
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination cs.RO · 2026-04-07 · conditional · none · ref 14 · internal anchor
BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots cs.RO · 2024-02-15 · conditional · none · ref 15 · internal anchor
UMI enables zero-shot deployment of robot manipulation policies trained solely on portable human demonstrations captured with custom handheld grippers, supporting dynamic bimanual tasks across novel environments and objects.
CUBic: Coordinated Unified Bimanual Perception and Control Framework cs.RO · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordination accuracy and task success on the RoboTwin benchmark.
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation cs.RO · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation cs.RO · 2026-05-05 · unverdicted · none · ref 24 · internal anchor
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios cs.RO · 2026-04-24 · unverdicted · none · ref 7 · internal anchor
LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.
WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models cs.RO · 2026-04-13 · unverdicted · none · ref 24 · internal anchor
WM-DAgger uses world models with corrective action synthesis and consistency-guided filtering to aggregate OOD recovery data for imitation learning, reporting 93.3% success in soft bag pushing with five demonstrations.
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations cs.RO · 2026-04-12 · unverdicted · none · ref 29 · internal anchor
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
ARM: Advantage Reward Modeling for Long-Horizon Manipulation cs.RO · 2026-04-03 · unverdicted · none · ref 9 · internal anchor
ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation cs.RO · 2025-06-22 · unverdicted · none · ref 14 · internal anchor
RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation cs.RO · 2024-10-10 · conditional · none · ref 3 · internal anchor
RDT-1B is a diffusion foundation model that unifies action spaces across robots and demonstrates superior bimanual manipulation with zero-shot generalization, language following, and few-shot learning on real robots.
Octo: An Open-Source Generalist Robot Policy cs.RO · 2024-05-20 · unverdicted · none · ref 29 · internal anchor
Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.
Evaluating Real-World Robot Manipulation Policies in Simulation cs.RO · 2024-05-09 · conditional · none · ref 20 · internal anchor
SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset cs.RO · 2024-03-19 · accept · none · ref 16 · internal anchor
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction cs.RO · 2026-04-30 · unverdicted · none · ref 35 · internal anchor
SASI combines skeleton-based graph convolutions with sub-action semantics for improved early action recognition on the BABEL dataset.
StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement cs.RO · 2026-04-20 · unverdicted · none · ref 21 · internal anchor
StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict action accuracy on AgiBot and 9.7-17.6% gains in real-robot tasks.
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms cs.RO · 2026-04-26 · accept · none · ref 15 · internal anchor
A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer