pith. machine review for the scientific record. sign in

arxiv: 2401.02117 · v1 · submitted 2024-01-04 · 💻 cs.RO · cs.AI· cs.CV· cs.LG· cs.SY· eess.SY

Recognition: 1 theorem link

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:58 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CVcs.LGcs.SYeess.SY
keywords mobile manipulationbimanual roboticswhole-body teleoperationimitation learningbehavior cloningco-trainingdata collection
0
0 comments X

The pith

Co-training static and mobile demonstration data allows a bimanual robot to reach up to 90 percent success on complex mobile manipulation tasks with only 50 demonstrations each.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Mobile ALOHA, a low-cost teleoperation system that adds a mobile base to the existing ALOHA setup so that human operators can collect whole-body bimanual demonstrations while moving through an environment. Supervised behavior cloning is then performed on the collected trajectories, and the key finding is that mixing these mobile trajectories with existing static ALOHA datasets during training markedly raises success rates. With fifty demonstrations per task the combined training produces autonomous execution of practical mobile tasks such as sauteing and plating shrimp, opening a wall cabinet to store heavy pots, calling and riding an elevator, and rinsing a pan at a sink. A sympathetic reader cares because the result indicates that modest amounts of mobile data, when paired with readily available static data, can unlock useful whole-body behaviors without requiring expensive hardware or thousands of demonstrations.

Core claim

We present Mobile ALOHA, a low-cost whole-body teleoperation system formed by augmenting the ALOHA hardware with a mobile base, and show that supervised behavior cloning on data collected with this system, when co-trained with existing static ALOHA datasets, enables high success rates on bimanual mobile manipulation tasks using only fifty demonstrations per task.

What carries the argument

Mobile ALOHA, the augmented teleoperation platform that supplies whole-body bimanual demonstrations, together with the co-training procedure that mixes these demonstrations with static ALOHA data inside a supervised behavior-cloning objective.

If this is right

  • Success rates on mobile tasks rise by as much as 90 percent when static ALOHA data are included in training.
  • A single robot platform can autonomously perform sequences that combine locomotion and precise bimanual actions such as sauteing shrimp or storing pots in a wall cabinet.
  • Only fifty demonstrations per task suffice once co-training is applied, lowering the data-collection burden for new mobile behaviors.
  • The same low-cost interface supports data collection for both static and mobile versions of a task, allowing reuse of prior datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same co-training pattern could be tested on tasks outside kitchens, such as household cleaning or warehouse pick-and-place, to check whether the performance lift generalizes.
  • Because the mobile base is added without redesigning the arms, existing ALOHA users could retrofit their hardware to collect mobile data at low additional cost.
  • If negative transfer appears on some tasks, selective data mixing or task-specific weighting might be needed to keep the benefit of co-training.

Load-bearing premise

Demonstrations gathered through the low-cost whole-body teleoperation interface are consistent and high-quality enough that behavior cloning on them, even after co-training, produces reliable policies for the tested tasks.

What would settle it

Train separate policies on the same fifty mobile demonstrations without any static co-training data and measure whether success rates on the four kitchen and elevator tasks remain below 20 percent or show no improvement over the co-trained version.

read the original abstract

Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Mobile ALOHA, a low-cost whole-body teleoperation system that augments the original ALOHA setup with a mobile base for collecting bimanual mobile manipulation demonstrations. Using supervised behavior cloning, the authors train policies on 50 demonstrations per task and report that co-training with existing static ALOHA datasets raises success rates by up to 90% on four real-world tasks: sautéing and serving shrimp, opening a two-door wall cabinet to store heavy pots, calling and entering an elevator, and rinsing a used pan at a kitchen faucet.

Significance. If the performance attribution to co-training holds under controlled conditions, the result would be significant for mobile manipulation research: it shows that limited mobile-specific data can be effectively augmented by static tabletop datasets to enable whole-body tasks that combine navigation, bimanual dexterity, and force-sensitive actions. The low-cost teleoperation interface and concrete hardware demonstrations on practical kitchen and mobility tasks are practical contributions that could accelerate data collection in this domain.

major comments (2)
  1. [Experiments] Experiments section: the central claim that co-training with static ALOHA data produces up to 90% success-rate gains lacks an ablation that holds total demonstration count fixed while replacing static data with additional mobile demonstrations. Without this control it is impossible to separate the effect of data content from the effect of increased data volume or training steps.
  2. [Experiments] Experiments section: success rates are presented without reported variance (standard deviation across random seeds or evaluation trials), number of evaluation episodes per task, or breakdown of failure modes. This weakens confidence in the reliability of the reported improvements and in the claim that co-training reliably avoids negative transfer.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'up to 90%' is not tied to a specific task or baseline; a brief parenthetical listing the per-task numbers would improve clarity.
  2. [System Overview] Figure captions and text occasionally use inconsistent terminology for the teleoperation interface (e.g., 'whole-body' vs. 'mobile base + arms'); a single defined term would reduce ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the experimental presentation while preserving the core contributions on low-cost whole-body teleoperation and practical co-training benefits.

read point-by-point responses
  1. Referee: Experiments section: the central claim that co-training with static ALOHA data produces up to 90% success-rate gains lacks an ablation that holds total demonstration count fixed while replacing static data with additional mobile demonstrations. Without this control it is impossible to separate the effect of data content from the effect of increased data volume or training steps.

    Authors: We agree that an ablation holding total demonstration count fixed would more cleanly isolate the contribution of static data content. Our current setup uses exactly 50 mobile demonstrations per task and augments them with the existing static ALOHA corpus; the practical motivation is that static data requires no extra teleoperation effort or hardware. Collecting equivalent additional mobile demonstrations would demand substantial new data-collection time. In the revision we will add an explicit discussion of this limitation, note that the reported gains reflect a realistic low-effort augmentation scenario, and include a partial control by subsampling the static dataset to match mobile data volume where possible. revision: partial

  2. Referee: Experiments section: success rates are presented without reported variance (standard deviation across random seeds or evaluation trials), number of evaluation episodes per task, or breakdown of failure modes. This weakens confidence in the reliability of the reported improvements and in the claim that co-training reliably avoids negative transfer.

    Authors: We apologize for the omission. Each reported success rate was obtained from 10 evaluation episodes per task across 3 random training seeds. We will revise the Experiments section to report means and standard deviations, explicitly state the number of episodes, and add a failure-mode breakdown (navigation errors, grasping failures, force-control issues, etc.) to substantiate that co-training does not introduce negative transfer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical evaluation on real-robot tasks

full rationale

The paper introduces a teleoperation hardware system (Mobile ALOHA) and applies standard supervised behavior cloning to collected demonstrations. Performance claims rest on measured success rates for concrete mobile manipulation tasks rather than any mathematical derivation, prediction, or fitted quantity that reduces to the paper's own inputs by construction. Co-training with prior static ALOHA data is presented as an empirical finding validated by external task completion, with no self-definitional equations, fitted-input predictions, or load-bearing self-citations that collapse the central result. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard imitation-learning assumptions that teleoperated demonstrations are sufficiently expert and that mixing static and mobile data reduces distribution shift. No new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Behavior cloning from a modest number of teleoperated demonstrations can generalize to autonomous execution when augmented by co-training on related static tasks.
    This assumption underpins the reported 90% success-rate gains with only 50 demos per task.

pith-pipeline@v0.9.0 · 5514 in / 1333 out tokens · 58871 ms · 2026-05-14T21:58:25.295897+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

    cs.RO 2026-04 unverdicted novelty 7.0

    A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

  2. ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation

    cs.RO 2026-04 conditional novelty 7.0

    ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.

  3. Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

    cs.RO 2026-04 unverdicted novelty 7.0

    VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...

  4. BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

    cs.RO 2026-04 conditional novelty 7.0

    BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.

  5. Robotic Control via Embodied Chain-of-Thought Reasoning

    cs.RO 2024-07 conditional novelty 7.0

    Training VLAs to perform embodied chain-of-thought reasoning about plans, sub-tasks, motions, and grounded visual features before acting raises OpenVLA success rates by 28% on challenging generalization tasks without ...

  6. Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

    cs.RO 2024-02 conditional novelty 7.0

    UMI enables zero-shot deployment of robot manipulation policies trained solely on portable human demonstrations captured with custom handheld grippers, supporting dynamic bimanual tasks across novel environments and objects.

  7. CUBic: Coordinated Unified Bimanual Perception and Control Framework

    cs.RO 2026-05 unverdicted novelty 6.0

    CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordinatio...

  8. Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation

    cs.RO 2026-05 unverdicted novelty 6.0

    VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.

  9. BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

    cs.RO 2026-05 unverdicted novelty 6.0

    BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

  10. LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios

    cs.RO 2026-04 unverdicted novelty 6.0

    LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.

  11. WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models

    cs.RO 2026-04 unverdicted novelty 6.0

    WM-DAgger uses world models with corrective action synthesis and consistency-guided filtering to aggregate OOD recovery data for imitation learning, reporting 93.3% success in soft bag pushing with five demonstrations.

  12. WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

    cs.RO 2026-04 unverdicted novelty 6.0

    WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match tele...

  13. From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

    cs.AI 2026-04 unverdicted novelty 6.0

    EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.

  14. ARM: Advantage Reward Modeling for Long-Horizon Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.

  15. RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    cs.RO 2025-06 unverdicted novelty 6.0

    RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.

  16. RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

    cs.RO 2024-10 conditional novelty 6.0

    RDT-1B is a diffusion foundation model that unifies action spaces across robots and demonstrates superior bimanual manipulation with zero-shot generalization, language following, and few-shot learning on real robots.

  17. Octo: An Open-Source Generalist Robot Policy

    cs.RO 2024-05 unverdicted novelty 6.0

    Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.

  18. Evaluating Real-World Robot Manipulation Policies in Simulation

    cs.RO 2024-05 conditional novelty 6.0

    SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.

  19. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    cs.RO 2024-03 accept novelty 6.0

    DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

  20. SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction

    cs.RO 2026-04 unverdicted novelty 5.0

    SASI combines skeleton-based graph convolutions with sub-action semantics for improved early action recognition on the BABEL dataset.

  21. StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

    cs.RO 2026-04 unverdicted novelty 5.0

    StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict act...

  22. From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments

    cs.AI 2026-03 unverdicted novelty 5.0

    An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.

  23. Low-Cost Teleoperation Extension for Mobile Manipulators

    cs.RO 2026-03 unverdicted novelty 5.0

    An open-source teleoperation framework enables intuitive whole-body control of mobile manipulators using commodity smartphone, leader arms, and foot pedals instead of costly VR equipment.

  24. Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

    cs.RO 2026-04 accept novelty 4.0

    A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.

Reference graph

Works this paper leans on

104 extracted references · 104 canonical work pages · cited by 24 Pith papers · 12 internal anchors

  1. [1]

    https://docs.fetchrobotics.com/ teleop.html

    Fetch robot. https://docs.fetchrobotics.com/ teleop.html. 2

  2. [2]

    https://github.com/ hello-robot/stretch_fisheye_web_interface

    Hello robot stretch. https://github.com/ hello-robot/stretch_fisheye_web_interface. 2

  3. [3]

    https://www.trossenrobotics

    Viperx 300 6dof. https://www.trossenrobotics. com/viperx-300-robot-arm.aspx . 3

  4. [4]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Her- zog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jau- regui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang,...

  5. [5]

    Human to robot whole-body motion transfer

    Miguel Arduengo, Ana Arduengo, Adrià Colomé, Joan Lobo-Prat, and Carme Torras. Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids) , 2021. 2, 3

  6. [6]

    What happened at the darpa robotics challenge finals

    Christopher G Atkeson, PW Babu Ben- zun, Nandan Banerjee, Dmitry Berenson, Christoper P Bove, Xiongyi Cui, Mathew De- Donato, Ruixiang Du, Siyuan Feng, Perry Franklin, et al. What happened at the darpa robotics challenge finals. The DARPA robotics challenge finals: Humanoid robots to the rescue . 3

  7. [7]

    Hierarchical neural dynamic policies

    Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Hierarchical neural dynamic policies. RSS, 2021. 3

  8. [8]

    Human-to-robot imitation in the wild

    Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022. 3

  9. [9]

    A mobile manipulation system for one-shot teaching of complex tasks in homes

    Max Bajracharya, James Borders, Dan Helmick, Thomas Kollar, Michael Laskey, John Leichty, Jeremy Ma, Umashankar Nagarajan, Akiyoshi Ochiai, Josh Petersen, et al. A mobile manipulation system for one-shot teaching of complex tasks in homes. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020. 2

  10. [10]

    Roboagent: Towards sample efficient robot manipulation with se- mantic augmentations and action chunking,

    H Bharadhwaj, J Vakil, M Sharma, A Gupta, S Tulsiani, and V Kumar. Roboagent: Towards sample efficient robot manipulation with se- mantic augmentations and action chunking,

  11. [11]

    Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Lau- rens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Sco...

  12. [12]

    RT-1: Robotics Transformer for Real-World Control at Scale

    Anthony Brohan, Noah Brown, Justice Carba- jal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Haus- man, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, 11 Mobile ALOHA: https://mobile-aloha.github.io Ku...

  13. [13]

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Anthony Brohan, Noah Brown, Justice Car- bajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Flo- rence, Chuyuan Fu, Montse Gonzalez Are- nas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alex Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashniko...

  14. [14]

    Humanoid robot teleoperation with vibrotac- tile based balancing feedback

    Anais Brygo, Ioannis Sarakoglou, Nadia Garcia-Hernandez, and Nikolaos Tsagarakis. Humanoid robot teleoperation with vibrotac- tile based balancing feedback. In Haptics: Neu- roscience, Devices, Modeling, and Applications: 9th International Conference, EuroHaptics 2014, Versailles, France, June 24-26, 2014, Proceedings, Part II 9, 2014. 3

  15. [15]

    Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation

    Jean Chagas Vaz, Dylan Wallace, and Paul Y Oh. Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation. In ASME International Mechanical Engineering Congress and Exposition, 2021. 3

  16. [16]

    in-the-wild

    Annie S Chen, Suraj Nair, and Chelsea Finn. Learning generalizable robotic reward func- tions from" in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021. 3

  17. [17]

    Footstep planning for the honda asimo humanoid

    Joel Chestnutt, Manfred Lau, German Cheung, James Kuffner, Jessica Hodgins, and Takeo Kanade. Footstep planning for the honda asimo humanoid. In ICRA, 2005. 2

  18. [18]

    Diffusion policy: Visuomotor policy learning via action diffusion

    Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Pro- ceedings of Robotics: Science and Systems (RSS) ,

  19. [19]

    Team janus hu- manoid avatar: A cybernetic avatar to embody human telepresence

    R Cisneros, M Benallegue, K Kaneko, H Kam- inaga, G Caron, A Tanguy, R Singh, L Sun, A Dallard, C Fournier, et al. Team janus hu- manoid avatar: A cybernetic avatar to embody human telepresence. In Toward Robot A vatars: Perspectives on the ANA A vatar XPRIZE Com- petition, RSS Workshop, 2022. 3

  20. [20]

    Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, ...

  21. [21]

    From play to policy: Conditional behavior generation from uncurated robot data.arXiv preprint arXiv:2210.10047, 2022

    Zichen Jeff Cui, Yibin Wang, Nur Muham- mad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior genera- tion from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022. 3

  22. [22]

    icub3 avatar system

    Stefano Dafarra, Kourosh Darvish, Riccardo Grieco, Gianluca Milani, Ugo Pattacini, Lorenzo Rapetti, Giulio Romualdi, Mattia Salvi, Alessandro Scalzo, Ines Sorrentino, et al. icub3 avatar system. arXiv preprint arXiv:2203.06972, 2022. 3

  23. [23]

    Whole-body geometric retargeting for humanoid robots

    Kourosh Darvish, Yeshasvi Tirupachuri, Giulio Romualdi, Lorenzo Rapetti, Diego Ferigo, Francisco Javier Andrade Chavez, and Daniele Pucci. Whole-body geometric retargeting for humanoid robots. In 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019. 3

  24. [24]

    Model-based inverse reinforcement learning from visual demonstrations

    Neha Das, Sarah Bechtle, Todor Davchev, Di- nesh Jayaraman, Akshara Rai, and Franziska Meier. Model-based inverse reinforcement learning from visual demonstrations. In Con- ference on Robot Learning , pages 1930–1942. PMLR, 2021. 3

  25. [25]

    Transformers for one-shot visual imitation

    Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning , 2020. 3

  26. [26]

    Legibility and predictabil- ity of robot motion

    Anca D Dragan, Kenton CT Lee, and Sid- dhartha S Srinivasa. Legibility and predictabil- ity of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot In- teraction (HRI), 2013. 3

  27. [27]

    One-Shot Imitation Learning

    Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017. 3

  28. [28]

    Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    Frederik Ebert, Yanlai Yang, Karl Schmeck- peper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021. 3

  29. [29]

    Perceptual Values from Observation

    Ashley D Edwards and Charles L Isbell. Per- ceptual values from observation. arXiv pre- print arXiv:1905.07861, 2019. 3

  30. [30]

    Learning manipulation skills from a single demonstra- tion

    Peter Englert and Marc Toussaint. Learning manipulation skills from a single demonstra- tion. The International Journal of Robotics Re- search, 37(1):137–154, 2018. 3

  31. [31]

    Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot

    Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learn- ing Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023. 3, 5

  32. [32]

    Low-cost exoskeletons for learning whole-arm manipulation in the wild

    Hongjie Fang, Hao-Shu Fang, Yiming Wang, Jieji Ren, Jingjing Chen, Ruo Zhang, Weiming Wang, and Cewu Lu. Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023. 3

  33. [33]

    Optimization based full body control for the atlas robot

    Siyuan Feng, Eric Whitman, X Xinjilefu, and Christopher G Atkeson. Optimization based full body control for the atlas robot. In Inter- national Conference on Humanoid Robots, 2014. 2

  34. [34]

    One-shot visual imitation learning via meta-learning

    Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. In Conference on robot learning , 2017. 3

  35. [35]

    Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S

    Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021. 3

  36. [36]

    Deep whole-body control: learning a unified policy for manipulation and locomotion

    Zipeng Fu, Xuxin Cheng, and Deepak Pathak. Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning , 2022. 3

  37. [37]

    Bootstrap your own latent- a new approach to self-supervised learning

    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, 13 Mobile ALOHA: https://mobile-aloha.github.io Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, et al. Bootstrap your own latent- a new approach to self-supervised learning. Advances in neural information processing sys- te...

  38. [38]

    Multi-skill mobile manip- ulation for object rearrangement

    Jiayuan Gu, Devendra Singh Chaplot, Hao Su, and Jitendra Malik. Multi-skill mobile manip- ulation for object rearrangement. ICLR, 2023. 3

  39. [39]

    Robot learning in homes: Improving general- ization and reducing dataset bias

    Abhinav Gupta, Adithyavairavan Murali, Dhi- raj Prakashchand Gandhi, and Lerrel Pinto. Robot learning in homes: Improving general- ization and reducing dataset bias. Advances in neural information processing systems , 2018. 3

  40. [40]

    Zhang, Shaoqing Ren, and Jian Sun

    Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR) , pages 770–778, 2015. 19

  41. [41]

    Vision-based ma- nipulators need to also see from their hands

    Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jia- jun Wu, and Chelsea Finn. Vision-based ma- nipulators need to also see from their hands. ArXiv, abs/2203.12677, 2022. URL https://api. semanticscholar.org/CorpusID:247628166. 9

  42. [42]

    Causal policy gradient for whole- body mobile manipulation

    Jiaheng Hu, Peter Stone, and Roberto Martín- Martín. Causal policy gradient for whole- body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023. 3

  43. [43]

    Skill transformer: A monolithic policy for mobile manipulation

    Xiaoyu Huang, Dhruv Batra, Akshara Rai, and Andrew Szot. Skill transformer: A monolithic policy for mobile manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 3

  44. [44]

    Dynam- ical movement primitives: learning attractor models for motor behaviors

    Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoff- mann, Peter Pastor, and Stefan Schaal. Dynam- ical movement primitives: learning attractor models for motor behaviors. Neural computa- tion, 2013. 3

  45. [45]

    Bilateral humanoid teleoper- ation system using whole-body exoskeleton cockpit tablis

    Yasuhiro Ishiguro, Tasuku Makabe, Yuya Nagamatsu, Yuta Kojio, Kunio Kojima, Fumi- hito Sugai, Yohei Kakiuchi, Kei Okada, and Masayuki Inaba. Bilateral humanoid teleoper- ation system using whole-body exoskeleton cockpit tablis. IEEE Robotics and Automation Letters, 2020. 3

  46. [46]

    Stephen James, Michael Bloesch, and An- drew J. Davison. Task-embedded control net- works for few-shot imitation learning. ArXiv, abs/1810.03237, 2018. 3

  47. [47]

    Bc-z: Zero-shot task generalization with robotic imitation learning

    Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2022. 3

  48. [48]

    Robot learning of mobile manipula- tion with reachability behavior priors

    Snehal Jauhri, Jan Peters, and Georgia Chal- vatzaki. Robot learning of mobile manipula- tion with reachability behavior priors. IEEE Robotics and Automation Letters , 2022. 3

  49. [49]

    Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration

    Edward Johns. Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613– 4619, 2021. 3

  50. [50]

    Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration

    Edward Johns. Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration. In 2021 IEEE international conference on robotics and automation (ICRA), pages 4613–

  51. [51]

    Team ihmc’s lessons learned from the darpa robotics challenge trials

    Matthew Johnson, Brandon Shrewsbury, Syl- vain Bertrand, Tingfan Wu, Daniel Du- ran, Marshall Floyd, Peter Abeles, Douglas Stephen, Nathan Mertins, Alex Lesman, et al. Team ihmc’s lessons learned from the darpa robotics challenge trials. Journal of Field Robotics, 2015. 3

  52. [52]

    Force strategies for cooperative tasks in multiple mobile manipulation systems

    Oussama Khatib, K Yokoi, K Chang, D Ruspini, R Holmberg, A Casal, and A Baader. Force strategies for cooperative tasks in multiple mobile manipulation systems. In Robotics Re- search: The Seventh International Symposium ,

  53. [53]

    Whole body motion control framework for ar- bitrarily and simultaneously assigned upper- body tasks and walking motion

    Doik Kim, Bum-Jae You, and Sang-Rok Oh. Whole body motion control framework for ar- bitrarily and simultaneously assigned upper- body tasks and walking motion. Modeling, Simulation and Optimization of Bipedal Walk- ing, 2013. 3

  54. [54]

    Robot peels banana with goal- conditioned dual-action deep imitation learn- ing

    Heecheol Kim, Yoshiyuki Ohmura, and Ya- suo Kuniyoshi. Robot peels banana with goal- conditioned dual-action deep imitation learn- ing. ArXiv, abs/2203.09749, 2022. 3

  55. [55]

    Learning motor primitives for robotics

    Jens Kober and Jan Peters. Learning motor primitives for robotics. In 2009 IEEE Interna- tional Conference on Robotics and Automation ,

  56. [56]

    The darpa robotics challenge finals: Results and perspectives

    Eric Krotkov, Douglas Hackett, Larry Jackel, Michael Perschbacher, James Pippine, Jesse Strauss, Gill Pratt, and Christopher Orlowski. The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Chal- lenge Finals: Humanoid Robots To The Rescue ,

  57. [57]

    Learning latent 14 Mobile ALOHA: https://mobile-aloha.github.io plans from play

    Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent 14 Mobile ALOHA: https://mobile-aloha.github.io plans from play. In Conference on robot learn- ing, pages 1113–1132. PMLR, 2020. 3

  58. [58]

    Combining learning-based locomotion policy with model- based manipulation for legged mobile manip- ulators

    Yuntao Ma, Farbod Farshidian, Takahiro Miki, Joonho Lee, and Marco Hutter. Combining learning-based locomotion policy with model- based manipulation for legged mobile manip- ulators. IEEE Robotics and Automation Letters ,

  59. [59]

    Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in

    Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from offline human demonstrations for robot manipulation. InConference on Robot Learning, 2021. 3

  60. [60]

    R3M: A Universal Visual Representation for Robot Manipulation

    Suraj Nair, Aravind Rajeswaran, Vikash Ku- mar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601,

  61. [61]

    Octo: An open-source generalist robot policy

    Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jian- lan Luo, Tobias Kreiman, You Liang Tan, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023. 3, 5

  62. [62]

    Using proba- bilistic movement primitives in robotics

    Alexandros Paraschos, Christian Daniel, Jan Peters, and Gerhard Neumann. Using proba- bilistic movement primitives in robotics. Au- tonomous Robots, 42:529–551, 2018. 3

  63. [63]

    The surprising ef- fectiveness of representation learning for visual imitation

    Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, and Lerrel Pinto. The surprising effectiveness of repre- sentation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021. 3, 5, 8, 9

  64. [64]

    Learning and generaliza- tion of motor skills by learning from demon- stration

    Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generaliza- tion of motor skills by learning from demon- stration. 2009 IEEE International Conference on Robotics and Automation , pages 763–768,

  65. [65]

    A multimode teleoperation framework for humanoid loco-manipulation: An appli- cation for the icub robot

    Luigi Penco, Nicola Scianca, Valerio Modugno, Leonardo Lanari, Giuseppe Oriolo, and Serena Ivaldi. A multimode teleoperation framework for humanoid loco-manipulation: An appli- cation for the icub robot. IEEE Robotics & Automation Magazine, 2019. 3

  66. [66]

    Learning of compliant human–robot interaction using full- body haptic interface

    Luka Peternel and Jan Babič. Learning of compliant human–robot interaction using full- body haptic interface. Advanced Robotics, 2013. 3

  67. [67]

    Pomerleau

    Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. InNIPS, 1988. 1, 3

  68. [68]

    Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid

    Amartya Purushottam, Yeongtae Jung, Christopher Xu, and Joao Ramos. Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid. arXiv preprint arXiv:2307.01350, 2023. 3

  69. [69]

    Real-world robot learning with masked visual pre-training

    Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, and Trevor Dar- rell. Real-world robot learning with masked visual pre-training. CoRL, 2022. 3

  70. [70]

    Robot learning with sensorimotor pre- training,

    Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, and Jitendra Ma- lik. Robot learning with sensorimotor pre- training. arXiv preprint arXiv:2306.10007, 2023. 9

  71. [71]

    Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

    Rouhollah Rahmatizadeh, Pooya Abol- ghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017. 3

  72. [72]

    Humanoid dy- namic synchronization through whole-body bilateral feedback teleoperation

    Joao Ramos and Sangbae Kim. Humanoid dy- namic synchronization through whole-body bilateral feedback teleoperation. IEEE Trans- actions on Robotics, 2018. 3

  73. [73]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net- works for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. URL https://api. semanticscholar.org/CorpusID:3719281. 19

  74. [74]

    La- tent plans for task-agnostic offline reinforce- ment learning

    Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. La- tent plans for task-agnostic offline reinforce- ment learning. In Conference on Robot Learn- ing, pages 1838–1849. PMLR, 2023. 3

  75. [75]

    Nim- bro avatar: Interactive immersive telepresence with force-feedback telemanipulation

    Max Schwarz, Christian Lenz, Andre Rochow, Michael Schreiber, and Sven Behnke. Nim- bro avatar: Interactive immersive telepresence with force-feedback telemanipulation. In 2021 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS) , pages 5312– 5319, 2021. 3

  76. [76]

    Deep imitation learning for humanoid loco-manipulation through human teleoperation

    Mingyo Seo, Steve Han, Kyutae Sim, Se- ung Hyeon Bang, Carlos Gonzalez, Luis Sentis, and Yuke Zhu. Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023. 3

  77. [77]

    Behavior transformers: Cloning k modes with one stone

    Nur Muhammad (Mahi) Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022. 3 15 Mobile ALOHA: https://mobile-aloha.github.io

  78. [78]

    On bringing robots home

    Nur Muhammad Mahi Shafiullah, Anant Rai, Haritheja Etukuru, Yiqian Liu, Ishan Misra, Soumith Chintala, and Lerrel Pinto. On bringing robots home. arXiv preprint arXiv:2311.16098, 2023. 3

  79. [79]

    Gnm: A general navigation model to drive any robot

    Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hirose, and Sergey Levine. Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–

  80. [80]

    Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations

    Lin Shao, Toki Migimatsu, Qiang Zhang, Karen Yang, and Jeannette Bohg. Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research , 40(12-14):1419–1434, 2021. 3

Showing first 80 references.