Recognition: 2 theorem links
· Lean TheoremRoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Pith reviewed 2026-05-12 23:41 UTC · model grok-4.3
The pith
Large-scale kitchen simulation enables scaling imitation learning for generalist robots using synthetic data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RoboCasa is a large-scale simulation framework for training generalist robots in everyday kitchen environments, featuring thousands of 3D assets across over 150 object categories, dozens of interactable furniture and appliances, enrichment with generative AI for assets and textures, a set of 100 tasks including composite ones guided by large language models, high-quality human demonstrations, automated trajectory generation to enlarge datasets, and experiments showing clear scaling trends in imitation learning with synthetic data along with promise for harnessing it in real-world tasks.
What carries the argument
The RoboCasa simulation framework that provides realistic scenes, diverse assets, tasks, and methods to generate large synthetic robot datasets for imitation learning.
Load-bearing premise
The simulation's physical fidelity, asset diversity, and task coverage are sufficient for policies trained entirely in simulation to transfer meaningfully to real robots without extensive additional real-world data.
What would settle it
A real-world experiment where increasing the volume of synthetic training data produces no corresponding increase in task success rates on physical robots, or where real-world performance stays significantly below simulation performance.
read the original abstract
Recent advancements in Artificial Intelligence (AI) have largely been propelled by scaling. In Robotics, scaling is hindered by the lack of access to massive robot datasets. We advocate using realistic physical simulation as a means to scale environments, tasks, and datasets for robot learning methods. We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. RoboCasa features realistic and diverse scenes focusing on kitchen environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. We enrich the realism and diversity of our simulation with generative AI tools, such as object assets from text-to-3D models and environment textures from text-to-image models. We design a set of 100 tasks for systematic evaluation, including composite tasks generated by the guidance of large language models. To facilitate learning, we provide high-quality human demonstrations and integrate automated trajectory generation methods to substantially enlarge our datasets with minimal human burden. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks. Videos and open-source code are available at https://robocasa.ai/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RoboCasa, a large-scale physical simulation framework focused on kitchen environments for training generalist robots. It provides thousands of 3D assets across over 150 categories, dozens of interactable furniture and appliances, 100 tasks (including LLM-guided composite tasks), high-quality human demonstrations, and automated trajectory generation methods to scale datasets with minimal human effort. The central empirical claims are a clear scaling trend in large-scale imitation learning from synthetically generated robot data and great promise for harnessing such simulation data in real-world tasks.
Significance. If the reported scaling trends and sim-to-real transfer results hold, RoboCasa could provide a valuable open resource for addressing data scarcity in robotics by enabling scalable synthetic data generation. The integration of generative AI tools for assets and textures, combined with the release of code and videos, supports reproducibility and community use.
major comments (2)
- Abstract: the claim that 'experiments show a clear scaling trend' and 'great promise in harnessing simulation data in real-world tasks' is presented without any quantitative metrics, baselines, error bars, exact data volumes, or real-robot success rates, preventing verification of the central empirical assertions.
- Experiments section (implied by abstract claims): the sim-to-real transfer component is load-bearing for the 'great promise' statement yet lacks supporting details on physical fidelity (contact dynamics, friction, object properties), domain randomization, asset quality from text-to-3D models, or ablation results showing real-robot performance improving with synthetic data scale.
minor comments (3)
- Provide explicit comparisons to existing simulation frameworks (e.g., AI2-THOR, Habitat) in terms of asset diversity, task coverage, and data generation scale to better situate the contribution.
- Expand the description of the 100 tasks and LLM-guided composite task generation with concrete examples and statistics on task complexity.
- Ensure all experimental figures and tables include error bars, statistical tests, and clear axis labels for the scaling curves.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each of the major comments below and have made revisions to the paper where necessary to improve the clarity and completeness of our empirical claims.
read point-by-point responses
-
Referee: Abstract: the claim that 'experiments show a clear scaling trend' and 'great promise in harnessing simulation data in real-world tasks' is presented without any quantitative metrics, baselines, error bars, exact data volumes, or real-robot success rates, preventing verification of the central empirical assertions.
Authors: We acknowledge the referee's concern regarding the lack of quantitative details in the abstract. To address this, we have revised the abstract to incorporate references to specific quantitative results from our experiments, such as the scaling behavior observed with varying dataset sizes and the success rates in real-world tasks. The full details, including baselines, error bars, exact data volumes, and real-robot performance metrics, are provided in the Experiments section, and the abstract now points to these for verification. revision: yes
-
Referee: Experiments section (implied by abstract claims): the sim-to-real transfer component is load-bearing for the 'great promise' statement yet lacks supporting details on physical fidelity (contact dynamics, friction, object properties), domain randomization, asset quality from text-to-3D models, or ablation results showing real-robot performance improving with synthetic data scale.
Authors: We agree that additional details on the sim-to-real aspects would strengthen the manuscript. In the revised version, we have added descriptions of the physical fidelity aspects, including the modeling of contact dynamics, friction, and object properties in the simulator. We have also elaborated on the domain randomization strategies employed and the quality assurance for assets generated via text-to-3D models. Furthermore, we include ablation studies that demonstrate the improvement in real-robot performance with increasing scales of synthetic data. These revisions provide the necessary supporting information for the claims. revision: yes
Circularity Check
No significant circularity in empirical scaling claims
full rationale
The paper introduces the RoboCasa simulation framework and reports direct empirical results from training imitation learning policies on data generated inside it, including scaling trends with synthetic data volume and some real-robot transfer observations. These are observed outcomes of running the described pipelines rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. No self-citations serve as load-bearing justifications for uniqueness or ansatzes, and no mathematical claims are present that would trigger self-definitional or renaming patterns. The work is self-contained as a new tool plus its experimental evaluation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Physics simulation in the chosen engine produces trajectories sufficiently close to real-world dynamics for policy transfer
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearOur experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe employ generative AI tools to create environment textures and 3D objects... MimicGen data generation
Forward citations
Cited by 30 Pith papers
-
SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution i...
-
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
-
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
-
DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation
DockAnywhere lifts single demonstrations to diverse docking points via structure-preserving augmentation and point-cloud spatial editing to improve viewpoint generalization in visuomotor policies for mobile manipulation.
-
AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation
AffordSim is the first simulation framework integrating open-vocabulary 3D affordance detection into scalable manipulation data generation, with a 50-task benchmark showing imitation learning succeeds on grasping but ...
-
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving hi...
-
D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models
D-VLA introduces plane decoupling and a swimlane asynchronous pipeline to achieve high-concurrency RL training and linear scalability for billion- to trillion-parameter vision-language-action models.
-
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
Pace-and-Path Correction is a closed-form inference-time operator that decomposes a quadratic cost minimization into orthogonal pace compression and path offset channels to correct dynamics-blindness in chunked-action...
-
RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark
RoboMemArena is a new large-scale robotic memory benchmark with real-world tasks, and PrediMem is a dual VLA system that outperforms baselines by managing memory buffers with predictive coding.
-
Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
Adaptive Q-Chunking selects optimal action chunk sizes at each state via normalized advantage comparisons to outperform fixed chunk sizes in offline-to-online RL on robot benchmarks.
-
How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
Vision-language models exhibit perceptual fragility and fail to consistently respect privacy constraints when operating in simulated physical environments, with performance declining in cluttered scenes and under conf...
-
How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
VLMs show consistent deficits in identifying sensitive items in cluttered scenes, adapting to social contexts, and resolving conflicts between commands and privacy constraints in a new physical simulator benchmark.
-
LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios
LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.
-
Exploring High-Order Self-Similarity for Video Understanding
The MOSS module learns and combines multi-order space-time self-similarity features to enhance temporal dynamics modeling in videos across action recognition, VQA, and robotic tasks.
-
Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models
State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks...
-
From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation
Digital Cousins is a generative real-to-sim method that creates diverse high-fidelity simulation scenes from real panoramas to improve generalization in robot learning and evaluation.
-
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
-
Grounded World Model for Semantically Generalizable Planning
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
-
AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation
AffordSim integrates open-vocabulary 3D affordance prediction into simulation trajectory generation to create a 50-task benchmark that reaches 93% of manual annotation success rates and enables 24% average zero-shot s...
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.
-
Nautilus: From One Prompt to Plug-and-Play Robot Learning
NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.
-
What Will Happen Next: Large Models-Driven Deduction for Emergency Instances
WLDS applies large models with factual and logical calibration to produce diverse text-and-image deductions of emergency scenarios beyond what traditional fixed simulations can generate.
-
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
-
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
-
CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer ...
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.
-
World Simulation with Video Foundation Models for Physical AI
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
-
Cosmos World Foundation Model Platform for Physical AI
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
Reference graph
Works this paper leans on
-
[1]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Her- zog, et al. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 , 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[2]
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yev- gen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. RT-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817 , 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[3]
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions
Yevgen Chebotar, Quan Vuong, Karol Hausman, Fei Xia, Yao Lu, Alex Irpan, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, et al. Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In Conference on Robot Learning , pages 3909–3928. PMLR, 2023
work page 2023
-
[4]
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137 , 2023
work page internal anchor Pith review arXiv 2023
-
[5]
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration et al. Open X- Embodiment: Robotic learning datasets and RT-X mod- els. https://arxiv.org/abs/2310.08864, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
Imitating task and motion planning with visuomotor transformers
Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan Salakhutdinov, and Dieter Fox. Imitating task and motion planning with visuomotor transformers. arXiv preprint arXiv:2305.16309 , 2023
-
[7]
Robonet: Large-scale multi-robot learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. Robonet: Large-scale multi-robot learning. In Conference on Robot Learning , 2019
work page 2019
-
[8]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. arXiv preprint arXiv:2212.08051, 2022
-
[9]
Bridge data: Boosting generalization of robotic skills with cross- domain datasets
Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Dani- ilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross- domain datasets. In Robotics: Science and Systems (RSS) , 2022
work page 2022
-
[10]
(2023) Maniskill2: A unified benchmark for generalizable manipulation skills
Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, et al. Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023
-
[11]
Benchmarking offline reinforcement learning on real-robot hardware
Nico G ¨urtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel W ¨uthrich, Stefan Bauer, Bernhard Sch¨olkopf, and Georg Martius. Benchmarking offline reinforcement learning on real-robot hardware. arXiv preprint arXiv:2307.15690, 2023
-
[12]
Scaling up and distilling down: Language-guided robot skill acquisition
Huy Ha, Pete Florence, and Shuran Song. Scaling up and distilling down: Language-guided robot skill acquisition. In Conference on Robot Learning , pages 3766–3777. PMLR, 2023
work page 2023
-
[13]
A holistic approach to reactive mobile manipulation
Jesse Haviland, Niko S ¨underhauf, and Peter Corke. A holistic approach to reactive mobile manipulation. IEEE Robotics and Automation Letters , 7(2):3122–3129, 2022
work page 2022
-
[14]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural infor- mation processing systems , 33:6840–6851, 2020
work page 2020
-
[15]
Rlbench: The robot learning bench- mark & learning environment
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J Davison. Rlbench: The robot learning bench- mark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020
work page 2020
-
[16]
Bc-z: Zero-shot task generalization with robotic imitation learning
Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2021
work page 2021
-
[17]
Vima: General robot manipulation with multimodal prompts
Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts. In Inter- national Conference on Machine Learning , 2023
work page 2023
-
[18]
Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation, 2018
Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, et al. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293, 2018
-
[19]
Mt-opt: Continuous multi-task robotic reinforcement learning at scale,
Dmitry Kalashnikov, Jacob Varley, Yevgen Chebotar, Benjamin Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, and Karol Hausman. Mt-opt: Continuous multi-task robotic reinforcement learning at scale. arXiv preprint arXiv:2104.08212, 2021
-
[20]
Droid: A large-scale in-the-wild robot manipulation dataset, 2024
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ash- win Balakrishna, Sudeep Dasari, Siddharth Karam- cheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset, 2024
work page 2024
-
[21]
AI2-THOR: An Interactive 3D Environment for Visual AI
Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli Van- derBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, et al. AI2-THOR: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[22]
A workflow for offline model- free robotic reinforcement learning
Aviral Kumar, Anikait Singh, Stephen Tian, Chelsea Finn, and Sergey Levine. A workflow for offline model- free robotic reinforcement learning. arXiv preprint arXiv:2109.10813, 2021
-
[23]
Pre-training for robots: Offline RL enables learning new tasks from a handful of trials
Aviral Kumar, Anikait Singh, Frederik Ebert, Mitsuhiko Nakamoto, Yanlai Yang, Chelsea Finn, and Sergey Levine. Pre-training for robots: Offline rl enables learn- ing new tasks from a handful of trials. arXiv preprint arXiv:2210.05178, 2022
-
[24]
Learning hand-eye coordination for robotic grasping with large-scale data collection
Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with large-scale data collection. In ISER, pages 173–184, 2016
work page 2016
-
[25]
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[26]
igibson 2.0: Object-centric simulation for robot learning of everyday household tasks
Chengshu Li, Fei Xia, Roberto Mart ´ın-Mart´ın, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. arXiv preprint arXiv:2108.03272, 2021
-
[27]
Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation
Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gok- men, Sanjana Srivastava, Roberto Mart ´ın-Mart´ın, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, et al. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning , pages 80–93. PMLR, 2023
work page 2023
-
[28]
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning. arXiv preprint arXiv:2306.03310, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Roboturk: A crowdsourcing platform for robotic skill learning through imitation
Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, and Li Fei-Fei. Roboturk: A crowdsourcing platform for robotic skill learning through imitation. In Conference on Robot Learning , 2018
work page 2018
-
[30]
Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, and Li Fei-Fei. Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity. arXiv preprint arXiv:1911.04052, 2019
-
[31]
Learning to generalize across long-horizon tasks from human demonstrations
Ajay Mandlekar, Danfei Xu, Roberto Mart ´ın-Mart´ın, Silvio Savarese, and Li Fei-Fei. Learning to generalize across long-horizon tasks from human demonstrations. In Robotics: Science and Systems (RSS) , 2020
work page 2020
-
[32]
Human-in- the-loop imitation learning using remote teleoperation,
Ajay Mandlekar, Danfei Xu, Roberto Mart ´ın-Mart´ın, Yuke Zhu, Li Fei-Fei, and Silvio Savarese. Human-in- the-loop imitation learning using remote teleoperation,
- [33]
-
[34]
What matters in learning from offline human demonstra- tions for robot manipulation
Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart ´ın-Mart´ın. What matters in learning from offline human demonstra- tions for robot manipulation. In Conference on Robot Learning, 2021
work page 2021
-
[35]
Mimicgen: A data generation system for scalable robot learning using human demonstrations, 2023
Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Ire- tiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. arXiv preprint arXiv:2310.17596 , 2023
- [36]
-
[37]
Supersizing self- supervision: Learning to grasp from 50k tries and 700 robot hours
Lerrel Pinto and Abhinav Gupta. Supersizing self- supervision: Learning to grasp from 50k tries and 700 robot hours. In Robotics and Automation (ICRA), 2016 IEEE Int’l Conference on . IEEE, 2016
work page 2016
-
[38]
Alvinn: An autonomous land vehicle in a neural network
Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems, pages 305–313, 1989
work page 1989
-
[39]
High-resolution image synthesis with latent diffusion models, 2021
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models, 2021
work page 2021
-
[40]
De- noising diffusion implicit models, 2022
Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models, 2022
work page 2022
-
[41]
Habitat 2.0: Training home assistants to rearrange their habitat
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat. Advances in Neural Informa- tion Processing Systems , 34:251–266, 2021
work page 2021
-
[42]
Large language models as generalizable policies for embodied tasks
Andrew Szot, Max Schwarzer, Harsh Agrawal, Bog- dan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, and Alexander Toshev. Large language models as generalizable policies for embodied tasks. arXiv preprint arXiv:2310.17722 , 2023
-
[43]
Gemini: A family of highly capable multimodal models, 2024
Gemini Team. Gemini: A family of highly capable multimodal models, 2024
work page 2024
-
[44]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems , 2017
work page 2017
-
[45]
Gensim: Generating robotic simulation tasks via large language models
Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shrid- har, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, and Xiaolong Wang. Gensim: Generating robotic simulation tasks via large language models. In Arxiv, 2023
work page 2023
-
[46]
Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. Robogen: Towards un- leashing infinite data for automated robot learning via generative simulation. arXiv preprint arXiv:2311.01455 , 2023
-
[47]
More than a million ways to be pushed
Kuan-Ting Yu, Maria Bauza, Nima Fazeli, and Alberto Rodriguez. More than a million ways to be pushed. a high-fidelity experimental dataset of planar pushing. In Int’l Conference on Intelligent Robots and Systems, 2016
work page 2016
-
[48]
MuJoCo Menagerie: A collection of high- quality simulation models for MuJoCo, 2022
Kevin Zakka, Yuval Tassa, and MuJoCo Menagerie Con- tributors. MuJoCo Menagerie: A collection of high- quality simulation models for MuJoCo, 2022. URL http://github.com/google-deepmind/mujoco menagerie
work page 2022
-
[49]
Transporter networks: Rearranging the visual world for robotic manipulation
Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Arm- strong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning , 2020
work page 2020
-
[50]
Deep imita- tion learning for complex manipulation tasks from virtual reality teleoperation
Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Xi Chen, Ken Goldberg, and Pieter Abbeel. Deep imita- tion learning for complex manipulation tasks from virtual reality teleoperation. In IEEE International Conference on Robotics and Automation (ICRA) , 2018
work page 2018
-
[51]
Rein- forcement and imitation learning for diverse visuomotor skills
Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, J ´anos Kram´ar, Raia Hadsell, Nando de Freitas, et al. Rein- forcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564 , 2018
-
[52]
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Yuke Zhu, Josiah Wong, Ajay Mandlekar, and Roberto Mart´ın-Mart´ın. robosuite: A modular simulation frame- work and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020. VII. S IMULATOR We benchmark the speed of our simulator on the PickPlaceCounterToCab task, running for 10 episodes, with each episode spawned in a random scene. We use n...
work page internal anchor Pith review Pith/arXiv arXiv 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.