arxiv: 2401.02117 · v1 · submitted 2024-01-04 · 💻 cs.RO · cs.AI· cs.CV· cs.LG· cs.SY· eess.SY

Recognition: 1 theorem link

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Zipeng Fu , Tony Z. Zhao , Chelsea Finn

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:58 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CVcs.LGcs.SYeess.SY

keywords mobile manipulationbimanual roboticswhole-body teleoperationimitation learningbehavior cloningco-trainingdata collection

0 comments

The pith

Co-training static and mobile demonstration data allows a bimanual robot to reach up to 90 percent success on complex mobile manipulation tasks with only 50 demonstrations each.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Mobile ALOHA, a low-cost teleoperation system that adds a mobile base to the existing ALOHA setup so that human operators can collect whole-body bimanual demonstrations while moving through an environment. Supervised behavior cloning is then performed on the collected trajectories, and the key finding is that mixing these mobile trajectories with existing static ALOHA datasets during training markedly raises success rates. With fifty demonstrations per task the combined training produces autonomous execution of practical mobile tasks such as sauteing and plating shrimp, opening a wall cabinet to store heavy pots, calling and riding an elevator, and rinsing a pan at a sink. A sympathetic reader cares because the result indicates that modest amounts of mobile data, when paired with readily available static data, can unlock useful whole-body behaviors without requiring expensive hardware or thousands of demonstrations.

Core claim

We present Mobile ALOHA, a low-cost whole-body teleoperation system formed by augmenting the ALOHA hardware with a mobile base, and show that supervised behavior cloning on data collected with this system, when co-trained with existing static ALOHA datasets, enables high success rates on bimanual mobile manipulation tasks using only fifty demonstrations per task.

What carries the argument

Mobile ALOHA, the augmented teleoperation platform that supplies whole-body bimanual demonstrations, together with the co-training procedure that mixes these demonstrations with static ALOHA data inside a supervised behavior-cloning objective.

If this is right

Success rates on mobile tasks rise by as much as 90 percent when static ALOHA data are included in training.
A single robot platform can autonomously perform sequences that combine locomotion and precise bimanual actions such as sauteing shrimp or storing pots in a wall cabinet.
Only fifty demonstrations per task suffice once co-training is applied, lowering the data-collection burden for new mobile behaviors.
The same low-cost interface supports data collection for both static and mobile versions of a task, allowing reuse of prior datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same co-training pattern could be tested on tasks outside kitchens, such as household cleaning or warehouse pick-and-place, to check whether the performance lift generalizes.
Because the mobile base is added without redesigning the arms, existing ALOHA users could retrofit their hardware to collect mobile data at low additional cost.
If negative transfer appears on some tasks, selective data mixing or task-specific weighting might be needed to keep the benefit of co-training.

Load-bearing premise

Demonstrations gathered through the low-cost whole-body teleoperation interface are consistent and high-quality enough that behavior cloning on them, even after co-training, produces reliable policies for the tested tasks.

What would settle it

Train separate policies on the same fifty mobile demonstrations without any static co-training data and measure whether success rates on the four kitchen and elevator tasks remain below 20 percent or show no improvement over the co-trained version.

read the original abstract

Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mobile ALOHA adds a workable low-cost mobile base to the ALOHA rig and shows co-training with static data lifts real-robot success rates, but the gains are not isolated from total data volume.

read the letter

The main point is a straightforward hardware extension: they mount a mobile base under the existing ALOHA arms, add a whole-body teleoperation interface, and collect 50 demonstrations per task. Training a behavior-cloning policy on that data plus existing static ALOHA datasets produces clear success-rate jumps on four mobile tasks—shrimp sautéing, cabinet opening, elevator use, and pan rinsing. The hardware is cheap enough that other labs could replicate it, and the tasks are closer to useful chores than most table-top benchmarks. That combination is the real addition over prior ALOHA work. The co-training result is also useful to see in practice, even if the mechanism is not dissected. The soft spots are exactly where the stress-test flagged them. The abstract and reported numbers do not include an ablation that keeps total training steps or data volume fixed while swapping static for extra mobile demonstrations, so it is hard to separate genuine transfer from simply having more gradient updates. There is also no mention of per-seed variance or trial-by-trial success rates, which leaves the 90 % figure difficult to interpret for reliability. Baselines are described at a high level but not broken down enough to judge how much the mobile base itself changes the difficulty. For a reading group this is worth a look if the group is focused on imitation learning for mobile manipulation; the hardware description and task videos give concrete material to discuss. I would not cite it in my own work yet because the controls are missing, but I would send it to referees. The practical system and real-robot results are solid enough to deserve a full review even if the analysis needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Mobile ALOHA, a low-cost whole-body teleoperation system that augments the original ALOHA setup with a mobile base for collecting bimanual mobile manipulation demonstrations. Using supervised behavior cloning, the authors train policies on 50 demonstrations per task and report that co-training with existing static ALOHA datasets raises success rates by up to 90% on four real-world tasks: sautéing and serving shrimp, opening a two-door wall cabinet to store heavy pots, calling and entering an elevator, and rinsing a used pan at a kitchen faucet.

Significance. If the performance attribution to co-training holds under controlled conditions, the result would be significant for mobile manipulation research: it shows that limited mobile-specific data can be effectively augmented by static tabletop datasets to enable whole-body tasks that combine navigation, bimanual dexterity, and force-sensitive actions. The low-cost teleoperation interface and concrete hardware demonstrations on practical kitchen and mobility tasks are practical contributions that could accelerate data collection in this domain.

major comments (2)

[Experiments] Experiments section: the central claim that co-training with static ALOHA data produces up to 90% success-rate gains lacks an ablation that holds total demonstration count fixed while replacing static data with additional mobile demonstrations. Without this control it is impossible to separate the effect of data content from the effect of increased data volume or training steps.
[Experiments] Experiments section: success rates are presented without reported variance (standard deviation across random seeds or evaluation trials), number of evaluation episodes per task, or breakdown of failure modes. This weakens confidence in the reliability of the reported improvements and in the claim that co-training reliably avoids negative transfer.

minor comments (2)

[Abstract] Abstract: the phrase 'up to 90%' is not tied to a specific task or baseline; a brief parenthetical listing the per-task numbers would improve clarity.
[System Overview] Figure captions and text occasionally use inconsistent terminology for the teleoperation interface (e.g., 'whole-body' vs. 'mobile base + arms'); a single defined term would reduce ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the experimental presentation while preserving the core contributions on low-cost whole-body teleoperation and practical co-training benefits.

read point-by-point responses

Referee: Experiments section: the central claim that co-training with static ALOHA data produces up to 90% success-rate gains lacks an ablation that holds total demonstration count fixed while replacing static data with additional mobile demonstrations. Without this control it is impossible to separate the effect of data content from the effect of increased data volume or training steps.

Authors: We agree that an ablation holding total demonstration count fixed would more cleanly isolate the contribution of static data content. Our current setup uses exactly 50 mobile demonstrations per task and augments them with the existing static ALOHA corpus; the practical motivation is that static data requires no extra teleoperation effort or hardware. Collecting equivalent additional mobile demonstrations would demand substantial new data-collection time. In the revision we will add an explicit discussion of this limitation, note that the reported gains reflect a realistic low-effort augmentation scenario, and include a partial control by subsampling the static dataset to match mobile data volume where possible. revision: partial
Referee: Experiments section: success rates are presented without reported variance (standard deviation across random seeds or evaluation trials), number of evaluation episodes per task, or breakdown of failure modes. This weakens confidence in the reliability of the reported improvements and in the claim that co-training reliably avoids negative transfer.

Authors: We apologize for the omission. Each reported success rate was obtained from 10 evaluation episodes per task across 3 random training seeds. We will revise the Experiments section to report means and standard deviations, explicitly state the number of episodes, and add a failure-mode breakdown (navigation errors, grasping failures, force-control issues, etc.) to substantiate that co-training does not introduce negative transfer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical evaluation on real-robot tasks

full rationale

The paper introduces a teleoperation hardware system (Mobile ALOHA) and applies standard supervised behavior cloning to collected demonstrations. Performance claims rest on measured success rates for concrete mobile manipulation tasks rather than any mathematical derivation, prediction, or fitted quantity that reduces to the paper's own inputs by construction. Co-training with prior static ALOHA data is presented as an empirical finding validated by external task completion, with no self-definitional equations, fitted-input predictions, or load-bearing self-citations that collapse the central result. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard imitation-learning assumptions that teleoperated demonstrations are sufficiently expert and that mixing static and mobile data reduces distribution shift. No new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption Behavior cloning from a modest number of teleoperated demonstrations can generalize to autonomous execution when augmented by co-training on related static tasks.
This assumption underpins the reported 90% success-rate gains with only 50 demos per task.

pith-pipeline@v0.9.0 · 5514 in / 1333 out tokens · 58871 ms · 2026-05-14T21:58:25.295897+00:00 · methodology

discussion (0)

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction
cs.RO 2026-04 unverdicted novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation
cs.RO 2026-04 conditional novelty 7.0

ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
cs.RO 2026-04 unverdicted novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination
cs.RO 2026-04 conditional novelty 7.0

BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
Robotic Control via Embodied Chain-of-Thought Reasoning
cs.RO 2024-07 conditional novelty 7.0

Training VLAs to perform embodied chain-of-thought reasoning about plans, sub-tasks, motions, and grounded visual features before acting raises OpenVLA success rates by 28% on challenging generalization tasks without ...
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
cs.RO 2024-02 conditional novelty 7.0

UMI enables zero-shot deployment of robot manipulation policies trained solely on portable human demonstrations captured with custom handheld grippers, supporting dynamic bimanual tasks across novel environments and objects.
CUBic: Coordinated Unified Bimanual Perception and Control Framework
cs.RO 2026-05 unverdicted novelty 6.0

CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordinatio...
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
cs.RO 2026-05 unverdicted novelty 6.0

VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios
cs.RO 2026-04 unverdicted novelty 6.0

LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.
WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models
cs.RO 2026-04 unverdicted novelty 6.0

WM-DAgger uses world models with corrective action synthesis and consistency-guided filtering to aggregate OOD recovery data for imitation learning, reporting 93.3% success in soft bag pushing with five demonstrations.
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations
cs.RO 2026-04 unverdicted novelty 6.0

WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match tele...
From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning
cs.AI 2026-04 unverdicted novelty 6.0

EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.
ARM: Advantage Reward Modeling for Long-Horizon Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
cs.RO 2025-06 unverdicted novelty 6.0

RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
cs.RO 2024-10 conditional novelty 6.0

RDT-1B is a diffusion foundation model that unifies action spaces across robots and demonstrates superior bimanual manipulation with zero-shot generalization, language following, and few-shot learning on real robots.
Octo: An Open-Source Generalist Robot Policy
cs.RO 2024-05 unverdicted novelty 6.0

Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.
Evaluating Real-World Robot Manipulation Policies in Simulation
cs.RO 2024-05 conditional novelty 6.0

SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
cs.RO 2024-03 accept novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction
cs.RO 2026-04 unverdicted novelty 5.0

SASI combines skeleton-based graph convolutions with sub-action semantics for improved early action recognition on the BABEL dataset.
StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement
cs.RO 2026-04 unverdicted novelty 5.0

StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict act...
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments
cs.AI 2026-03 unverdicted novelty 5.0

An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.
Low-Cost Teleoperation Extension for Mobile Manipulators
cs.RO 2026-03 unverdicted novelty 5.0

An open-source teleoperation framework enables intuitive whole-body control of mobile manipulators using commodity smartphone, leader arms, and foot pedals instead of costly VR equipment.
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
cs.RO 2026-04 accept novelty 4.0

A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · cited by 24 Pith papers · 12 internal anchors

[1]

https://docs.fetchrobotics.com/ teleop.html

Fetch robot. https://docs.fetchrobotics.com/ teleop.html. 2

work page
[2]

https://github.com/ hello-robot/stretch_fisheye_web_interface

Hello robot stretch. https://github.com/ hello-robot/stretch_fisheye_web_interface. 2

work page
[3]

https://www.trossenrobotics

Viperx 300 6dof. https://www.trossenrobotics. com/viperx-300-robot-arm.aspx . 3

work page
[4]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Her- zog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jau- regui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang,...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Human to robot whole-body motion transfer

Miguel Arduengo, Ana Arduengo, Adrià Colomé, Joan Lobo-Prat, and Carme Torras. Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids) , 2021. 2, 3

work page 2020
[6]

What happened at the darpa robotics challenge finals

Christopher G Atkeson, PW Babu Ben- zun, Nandan Banerjee, Dmitry Berenson, Christoper P Bove, Xiongyi Cui, Mathew De- Donato, Ruixiang Du, Siyuan Feng, Perry Franklin, et al. What happened at the darpa robotics challenge finals. The DARPA robotics challenge finals: Humanoid robots to the rescue . 3

work page
[7]

Hierarchical neural dynamic policies

Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Hierarchical neural dynamic policies. RSS, 2021. 3

work page 2021
[8]

Human-to-robot imitation in the wild

Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022. 3

work page arXiv 2022
[9]

A mobile manipulation system for one-shot teaching of complex tasks in homes

Max Bajracharya, James Borders, Dan Helmick, Thomas Kollar, Michael Laskey, John Leichty, Jeremy Ma, Umashankar Nagarajan, Akiyoshi Ochiai, Josh Petersen, et al. A mobile manipulation system for one-shot teaching of complex tasks in homes. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020. 2

work page 2020
[10]

Roboagent: Towards sample efficient robot manipulation with se- mantic augmentations and action chunking,

H Bharadhwaj, J Vakil, M Sharma, A Gupta, S Tulsiani, and V Kumar. Roboagent: Towards sample efficient robot manipulation with se- mantic augmentations and action chunking,

work page
[11]

Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Lau- rens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Sco...

work page arXiv 2023
[12]

RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carba- jal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Haus- man, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, 11 Mobile ALOHA: https://mobile-aloha.github.io Ku...

work page internal anchor Pith review Pith/arXiv arXiv
[13]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Car- bajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Flo- rence, Chuyuan Fu, Montse Gonzalez Are- nas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alex Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashniko...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Humanoid robot teleoperation with vibrotac- tile based balancing feedback

Anais Brygo, Ioannis Sarakoglou, Nadia Garcia-Hernandez, and Nikolaos Tsagarakis. Humanoid robot teleoperation with vibrotac- tile based balancing feedback. In Haptics: Neu- roscience, Devices, Modeling, and Applications: 9th International Conference, EuroHaptics 2014, Versailles, France, June 24-26, 2014, Proceedings, Part II 9, 2014. 3

work page 2014
[15]

Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation

Jean Chagas Vaz, Dylan Wallace, and Paul Y Oh. Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation. In ASME International Mechanical Engineering Congress and Exposition, 2021. 3

work page 2021
[16]

in-the-wild

Annie S Chen, Suraj Nair, and Chelsea Finn. Learning generalizable robotic reward func- tions from" in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021. 3

work page arXiv 2021
[17]

Footstep planning for the honda asimo humanoid

Joel Chestnutt, Manfred Lau, German Cheung, James Kuffner, Jessica Hodgins, and Takeo Kanade. Footstep planning for the honda asimo humanoid. In ICRA, 2005. 2

work page 2005
[18]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Pro- ceedings of Robotics: Science and Systems (RSS) ,

work page
[19]

Team janus hu- manoid avatar: A cybernetic avatar to embody human telepresence

R Cisneros, M Benallegue, K Kaneko, H Kam- inaga, G Caron, A Tanguy, R Singh, L Sun, A Dallard, C Fournier, et al. Team janus hu- manoid avatar: A cybernetic avatar to embody human telepresence. In Toward Robot A vatars: Perspectives on the ANA A vatar XPRIZE Com- petition, RSS Workshop, 2022. 3

work page 2022
[20]

Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

From play to policy: Conditional behavior generation from uncurated robot data.arXiv preprint arXiv:2210.10047, 2022

Zichen Jeff Cui, Yibin Wang, Nur Muham- mad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior genera- tion from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022. 3

work page arXiv 2022
[22]

icub3 avatar system

Stefano Dafarra, Kourosh Darvish, Riccardo Grieco, Gianluca Milani, Ugo Pattacini, Lorenzo Rapetti, Giulio Romualdi, Mattia Salvi, Alessandro Scalzo, Ines Sorrentino, et al. icub3 avatar system. arXiv preprint arXiv:2203.06972, 2022. 3

work page arXiv 2022
[23]

Whole-body geometric retargeting for humanoid robots

Kourosh Darvish, Yeshasvi Tirupachuri, Giulio Romualdi, Lorenzo Rapetti, Diego Ferigo, Francisco Javier Andrade Chavez, and Daniele Pucci. Whole-body geometric retargeting for humanoid robots. In 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019. 3

work page 2019
[24]

Model-based inverse reinforcement learning from visual demonstrations

Neha Das, Sarah Bechtle, Todor Davchev, Di- nesh Jayaraman, Akshara Rai, and Franziska Meier. Model-based inverse reinforcement learning from visual demonstrations. In Con- ference on Robot Learning , pages 1930–1942. PMLR, 2021. 3

work page 1930
[25]

Transformers for one-shot visual imitation

Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning , 2020. 3

work page 2020
[26]

Legibility and predictabil- ity of robot motion

Anca D Dragan, Kenton CT Lee, and Sid- dhartha S Srinivasa. Legibility and predictabil- ity of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot In- teraction (HRI), 2013. 3

work page 2013
[27]

One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017. 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

Frederik Ebert, Yanlai Yang, Karl Schmeck- peper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021. 3

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

Perceptual Values from Observation

Ashley D Edwards and Charles L Isbell. Per- ceptual values from observation. arXiv pre- print arXiv:1905.07861, 2019. 3

work page internal anchor Pith review Pith/arXiv arXiv 1905
[30]

Learning manipulation skills from a single demonstra- tion

Peter Englert and Marc Toussaint. Learning manipulation skills from a single demonstra- tion. The International Journal of Robotics Re- search, 37(1):137–154, 2018. 3

work page 2018
[31]

Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot

Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learn- ing Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023. 3, 5

work page 2023
[32]

Low-cost exoskeletons for learning whole-arm manipulation in the wild

Hongjie Fang, Hao-Shu Fang, Yiming Wang, Jieji Ren, Jingjing Chen, Ruo Zhang, Weiming Wang, and Cewu Lu. Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023. 3

work page arXiv 2023
[33]

Optimization based full body control for the atlas robot

Siyuan Feng, Eric Whitman, X Xinjilefu, and Christopher G Atkeson. Optimization based full body control for the atlas robot. In Inter- national Conference on Humanoid Robots, 2014. 2

work page 2014
[34]

One-shot visual imitation learning via meta-learning

Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. In Conference on robot learning , 2017. 3

work page 2017
[35]

Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S

Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021. 3

work page arXiv 2021
[36]

Deep whole-body control: learning a unified policy for manipulation and locomotion

Zipeng Fu, Xuxin Cheng, and Deepak Pathak. Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning , 2022. 3

work page 2022
[37]

Bootstrap your own latent- a new approach to self-supervised learning

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, 13 Mobile ALOHA: https://mobile-aloha.github.io Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, et al. Bootstrap your own latent- a new approach to self-supervised learning. Advances in neural information processing sys- te...

work page 2020
[38]

Multi-skill mobile manip- ulation for object rearrangement

Jiayuan Gu, Devendra Singh Chaplot, Hao Su, and Jitendra Malik. Multi-skill mobile manip- ulation for object rearrangement. ICLR, 2023. 3

work page 2023
[39]

Robot learning in homes: Improving general- ization and reducing dataset bias

Abhinav Gupta, Adithyavairavan Murali, Dhi- raj Prakashchand Gandhi, and Lerrel Pinto. Robot learning in homes: Improving general- ization and reducing dataset bias. Advances in neural information processing systems , 2018. 3

work page 2018
[40]

Zhang, Shaoqing Ren, and Jian Sun

Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR) , pages 770–778, 2015. 19

work page 2016
[41]

Vision-based ma- nipulators need to also see from their hands

Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jia- jun Wu, and Chelsea Finn. Vision-based ma- nipulators need to also see from their hands. ArXiv, abs/2203.12677, 2022. URL https://api. semanticscholar.org/CorpusID:247628166. 9

work page arXiv 2022
[42]

Causal policy gradient for whole- body mobile manipulation

Jiaheng Hu, Peter Stone, and Roberto Martín- Martín. Causal policy gradient for whole- body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023. 3

work page arXiv 2023
[43]

Skill transformer: A monolithic policy for mobile manipulation

Xiaoyu Huang, Dhruv Batra, Akshara Rai, and Andrew Szot. Skill transformer: A monolithic policy for mobile manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 3

work page 2023
[44]

Dynam- ical movement primitives: learning attractor models for motor behaviors

Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoff- mann, Peter Pastor, and Stefan Schaal. Dynam- ical movement primitives: learning attractor models for motor behaviors. Neural computa- tion, 2013. 3

work page 2013
[45]

Bilateral humanoid teleoper- ation system using whole-body exoskeleton cockpit tablis

Yasuhiro Ishiguro, Tasuku Makabe, Yuya Nagamatsu, Yuta Kojio, Kunio Kojima, Fumi- hito Sugai, Yohei Kakiuchi, Kei Okada, and Masayuki Inaba. Bilateral humanoid teleoper- ation system using whole-body exoskeleton cockpit tablis. IEEE Robotics and Automation Letters, 2020. 3

work page 2020
[46]

Stephen James, Michael Bloesch, and An- drew J. Davison. Task-embedded control net- works for few-shot imitation learning. ArXiv, abs/1810.03237, 2018. 3

work page internal anchor Pith review Pith/arXiv arXiv 2018
[47]

Bc-z: Zero-shot task generalization with robotic imitation learning

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2022. 3

work page 2022
[48]

Robot learning of mobile manipula- tion with reachability behavior priors

Snehal Jauhri, Jan Peters, and Georgia Chal- vatzaki. Robot learning of mobile manipula- tion with reachability behavior priors. IEEE Robotics and Automation Letters , 2022. 3

work page 2022
[49]

Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration

Edward Johns. Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613– 4619, 2021. 3

work page 2021
[50]

Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration

Edward Johns. Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration. In 2021 IEEE international conference on robotics and automation (ICRA), pages 4613–

work page 2021
[51]

Team ihmc’s lessons learned from the darpa robotics challenge trials

Matthew Johnson, Brandon Shrewsbury, Syl- vain Bertrand, Tingfan Wu, Daniel Du- ran, Marshall Floyd, Peter Abeles, Douglas Stephen, Nathan Mertins, Alex Lesman, et al. Team ihmc’s lessons learned from the darpa robotics challenge trials. Journal of Field Robotics, 2015. 3

work page 2015
[52]

Force strategies for cooperative tasks in multiple mobile manipulation systems

Oussama Khatib, K Yokoi, K Chang, D Ruspini, R Holmberg, A Casal, and A Baader. Force strategies for cooperative tasks in multiple mobile manipulation systems. In Robotics Re- search: The Seventh International Symposium ,

work page
[53]

Whole body motion control framework for ar- bitrarily and simultaneously assigned upper- body tasks and walking motion

Doik Kim, Bum-Jae You, and Sang-Rok Oh. Whole body motion control framework for ar- bitrarily and simultaneously assigned upper- body tasks and walking motion. Modeling, Simulation and Optimization of Bipedal Walk- ing, 2013. 3

work page 2013
[54]

Robot peels banana with goal- conditioned dual-action deep imitation learn- ing

Heecheol Kim, Yoshiyuki Ohmura, and Ya- suo Kuniyoshi. Robot peels banana with goal- conditioned dual-action deep imitation learn- ing. ArXiv, abs/2203.09749, 2022. 3

work page arXiv 2022
[55]

Learning motor primitives for robotics

Jens Kober and Jan Peters. Learning motor primitives for robotics. In 2009 IEEE Interna- tional Conference on Robotics and Automation ,

work page 2009
[56]

The darpa robotics challenge finals: Results and perspectives

Eric Krotkov, Douglas Hackett, Larry Jackel, Michael Perschbacher, James Pippine, Jesse Strauss, Gill Pratt, and Christopher Orlowski. The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Chal- lenge Finals: Humanoid Robots To The Rescue ,

work page
[57]

Learning latent 14 Mobile ALOHA: https://mobile-aloha.github.io plans from play

Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent 14 Mobile ALOHA: https://mobile-aloha.github.io plans from play. In Conference on robot learn- ing, pages 1113–1132. PMLR, 2020. 3

work page 2020
[58]

Combining learning-based locomotion policy with model- based manipulation for legged mobile manip- ulators

Yuntao Ma, Farbod Farshidian, Takahiro Miki, Joonho Lee, and Marco Hutter. Combining learning-based locomotion policy with model- based manipulation for legged mobile manip- ulators. IEEE Robotics and Automation Letters ,

work page
[59]

Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in

Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from offline human demonstrations for robot manipulation. InConference on Robot Learning, 2021. 3

work page 2021
[60]

R3M: A Universal Visual Representation for Robot Manipulation

Suraj Nair, Aravind Rajeswaran, Vikash Ku- mar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601,

work page internal anchor Pith review arXiv
[61]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jian- lan Luo, Tobias Kreiman, You Liang Tan, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023. 3, 5

work page 2023
[62]

Using proba- bilistic movement primitives in robotics

Alexandros Paraschos, Christian Daniel, Jan Peters, and Gerhard Neumann. Using proba- bilistic movement primitives in robotics. Au- tonomous Robots, 42:529–551, 2018. 3

work page 2018
[63]

The surprising ef- fectiveness of representation learning for visual imitation

Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, and Lerrel Pinto. The surprising effectiveness of repre- sentation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021. 3, 5, 8, 9

work page arXiv 2021
[64]

Learning and generaliza- tion of motor skills by learning from demon- stration

Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generaliza- tion of motor skills by learning from demon- stration. 2009 IEEE International Conference on Robotics and Automation , pages 763–768,

work page 2009
[65]

A multimode teleoperation framework for humanoid loco-manipulation: An appli- cation for the icub robot

Luigi Penco, Nicola Scianca, Valerio Modugno, Leonardo Lanari, Giuseppe Oriolo, and Serena Ivaldi. A multimode teleoperation framework for humanoid loco-manipulation: An appli- cation for the icub robot. IEEE Robotics & Automation Magazine, 2019. 3

work page 2019
[66]

Learning of compliant human–robot interaction using full- body haptic interface

Luka Peternel and Jan Babič. Learning of compliant human–robot interaction using full- body haptic interface. Advanced Robotics, 2013. 3

work page 2013
[67]

Pomerleau

Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. InNIPS, 1988. 1, 3

work page 1988
[68]

Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid

Amartya Purushottam, Yeongtae Jung, Christopher Xu, and Joao Ramos. Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid. arXiv preprint arXiv:2307.01350, 2023. 3

work page arXiv 2023
[69]

Real-world robot learning with masked visual pre-training

Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, and Trevor Dar- rell. Real-world robot learning with masked visual pre-training. CoRL, 2022. 3

work page 2022
[70]

Robot learning with sensorimotor pre- training,

Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, and Jitendra Ma- lik. Robot learning with sensorimotor pre- training. arXiv preprint arXiv:2306.10007, 2023. 9

work page arXiv 2023
[71]

Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

Rouhollah Rahmatizadeh, Pooya Abol- ghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017. 3

work page 2018
[72]

Humanoid dy- namic synchronization through whole-body bilateral feedback teleoperation

Joao Ramos and Sangbae Kim. Humanoid dy- namic synchronization through whole-body bilateral feedback teleoperation. IEEE Trans- actions on Robotics, 2018. 3

work page 2018
[73]

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net- works for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. URL https://api. semanticscholar.org/CorpusID:3719281. 19

work page internal anchor Pith review Pith/arXiv arXiv 2015
[74]

La- tent plans for task-agnostic offline reinforce- ment learning

Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. La- tent plans for task-agnostic offline reinforce- ment learning. In Conference on Robot Learn- ing, pages 1838–1849. PMLR, 2023. 3

work page 2023
[75]

Nim- bro avatar: Interactive immersive telepresence with force-feedback telemanipulation

Max Schwarz, Christian Lenz, Andre Rochow, Michael Schreiber, and Sven Behnke. Nim- bro avatar: Interactive immersive telepresence with force-feedback telemanipulation. In 2021 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS) , pages 5312– 5319, 2021. 3

work page 2021
[76]

Deep imitation learning for humanoid loco-manipulation through human teleoperation

Mingyo Seo, Steve Han, Kyutae Sim, Se- ung Hyeon Bang, Carlos Gonzalez, Luis Sentis, and Yuke Zhu. Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023. 3

work page 2023
[77]

Behavior transformers: Cloning k modes with one stone

Nur Muhammad (Mahi) Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022. 3 15 Mobile ALOHA: https://mobile-aloha.github.io

work page arXiv 2022
[78]

On bringing robots home

Nur Muhammad Mahi Shafiullah, Anant Rai, Haritheja Etukuru, Yiqian Liu, Ishan Misra, Soumith Chintala, and Lerrel Pinto. On bringing robots home. arXiv preprint arXiv:2311.16098, 2023. 3

work page arXiv 2023
[79]

Gnm: A general navigation model to drive any robot

Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hirose, and Sergey Levine. Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–

work page 2023
[80]

Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations

Lin Shao, Toki Migimatsu, Qiang Zhang, Karen Yang, and Jeannette Bohg. Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research , 40(12-14):1419–1434, 2021. 3

work page 2021

Showing first 80 references.