Building a Scalable, Reproducible, Evaluatable, and Closed-Loop Simulation Environment Foundation for Embodied Intelligence
Pith reviewed 2026-07-02 21:20 UTC · model grok-4.3
The pith
Cloud-native simulation infrastructure unifies data generation, model training, standardized evaluation, and real-world deployment for embodied intelligence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors describe a four-layer cloud-native simulation infrastructure that unifies environment asset provision, automated task generation, trajectory collection, benchmark evaluation, and closed-loop data optimization. Cloud-native elements—elastic resource scheduling, containerized simulation, unified data management, and service-oriented design—enable efficient large-scale operation across multi-model and multi-task workloads. The system integrates representative embodied intelligence setups to demonstrate scalable simulation, dynamic scheduling, visual augmentation, and real-time data filtering, positioning the infrastructure as the core platform for data generation, training, evaluati
What carries the argument
Four-layer architecture built on elastic resource scheduling, containerized simulation, unified data management, and service-oriented design.
If this is right
- Large-scale training and standardized evaluation become feasible without relying on costly real-world robotic data collection.
- Closed-loop data optimization allows simulation outputs to directly improve models in an automated cycle.
- Reproducible benchmarks can be run across different models and tasks on the same platform.
- Integration with specific systems supports dynamic scheduling and real-time data filtering during simulation runs.
- The platform serves as a bridge from simulation-based development to real-world deployment of embodied intelligence.
Where Pith is reading between the lines
- Teams could run thousands of parallel experiments to explore model variations before committing resources to physical hardware.
- Standardized simulation assets might evolve into shared community resources that reduce duplicated effort across research groups.
- The closed-loop feature could be extended to automatically flag simulation-to-reality gaps and trigger targeted data collection in the physical world.
- Adoption might shift evaluation practices toward simulation-first protocols that later validate on hardware only for final confirmation.
Load-bearing premise
Elastic resource scheduling, containerized simulation, unified data management, and service-oriented design will enable efficient large-scale simulation for multi-model and multi-task workloads.
What would settle it
A deployment test showing that repeated identical simulation tasks produce inconsistent trajectories or that the system cannot maintain performance when scaling to hundreds of concurrent multi-task workloads.
read the original abstract
This paper presents a cloud-native simulation infrastructure framework for embodied intelligence that supports large-scale training, standardized evaluation, and simulation-based data collection. The framework unifies simulation environment generation, task execution, trajectory collection, model evaluation, data management, and cloud services into a scalable and reproducible platform. To address the high cost, limited scalability, and poor reproducibility of real-world robotic data collection, the framework adopts cloud-native technologies including elastic resource scheduling, containerized simulation, unified data management, and service-oriented system design, enabling efficient large-scale simulation for multi-model and multi-task workloads. Built on a four-layer architecture, the framework provides standardized environment assets, automated task generation, trajectory collection, benchmark evaluation, and closed-loop data optimization. It further integrates representative systems including D-VLA, RL-VLA3, Sword, and Pre-VLA to support scalable simulation, dynamic scheduling, visual augmentation, and real-time data filtering. We argue that cloud-native simulation infrastructure provides a unified foundation for data generation, model training, standardized evaluation, and real-world deployment, and will play a key role in the future development of embodied intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to present a cloud-native simulation infrastructure framework for embodied intelligence. This framework unifies simulation environment generation, task execution, trajectory collection, model evaluation, data management, and cloud services into a scalable and reproducible platform using a four-layer architecture. It adopts cloud-native technologies such as elastic resource scheduling, containerized simulation, unified data management, and service-oriented design to enable efficient large-scale simulation for multi-model and multi-task workloads. The framework integrates representative systems including D-VLA, RL-VLA3, Sword, and Pre-VLA to support scalable simulation, dynamic scheduling, visual augmentation, and real-time data filtering. The authors argue that this provides a unified foundation for data generation, model training, standardized evaluation, and real-world deployment, playing a key role in embodied intelligence development.
Significance. If the described framework delivers on its promises of scalability, reproducibility, and efficiency, it could serve as an important standardized platform for simulation-based research in embodied AI and robotics. This would facilitate larger-scale experiments, better reproducibility across studies, and closed-loop optimization of models. The comprehensive design covering multiple aspects from environment assets to cloud services is a strength, as is the integration with existing systems like D-VLA and others. However, without empirical validation, the significance is currently prospective.
major comments (3)
- [Abstract] Abstract: The claim that the framework enables 'efficient large-scale simulation for multi-model and multi-task workloads' is load-bearing for the paper's contribution but is presented without any supporting metrics, such as simulation throughput, scaling behavior with number of tasks or models, resource utilization rates, or comparisons to non-cloud-native setups.
- [Abstract (four-layer architecture)] Abstract (four-layer architecture): The four-layer architecture is central to the framework but the manuscript provides only high-level descriptions of its layers without sufficient technical details on interfaces, data flows, or implementation choices that would allow assessment of its claimed advantages in reproducibility and evaluatability.
- [Abstract (integrations)] Abstract (integrations): The integrations with D-VLA, RL-VLA3, Sword, and Pre-VLA are used to illustrate the framework's capabilities, but no specific results or case studies are provided to show how they benefit from or demonstrate the closed-loop aspects or efficiency gains.
minor comments (1)
- [Abstract] Abstract: Consider shortening the abstract as it is lengthy and repeats some ideas about the framework's benefits.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight areas where the current manuscript could be strengthened with additional detail and evidence. We agree that the claims would benefit from more concrete support and plan revisions to address the points raised. Our responses to each major comment follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the framework enables 'efficient large-scale simulation for multi-model and multi-task workloads' is load-bearing for the paper's contribution but is presented without any supporting metrics, such as simulation throughput, scaling behavior with number of tasks or models, resource utilization rates, or comparisons to non-cloud-native setups.
Authors: We acknowledge that the abstract asserts efficiency gains without accompanying quantitative evidence in the submitted manuscript. The current text describes the architectural mechanisms intended to deliver these gains but does not report throughput numbers, scaling curves, or baseline comparisons. In revision we will (1) moderate the abstract language to 'designed to support efficient large-scale simulation' and (2) add a dedicated evaluation section presenting preliminary scaling results obtained from the deployed system. revision: yes
-
Referee: [Abstract (four-layer architecture)] Abstract (four-layer architecture): The four-layer architecture is central to the framework but the manuscript provides only high-level descriptions of its layers without sufficient technical details on interfaces, data flows, or implementation choices that would allow assessment of its claimed advantages in reproducibility and evaluatability.
Authors: The manuscript indeed presents the four-layer structure at a conceptual level. To enable readers to evaluate the reproducibility and evaluatability claims, we will expand each layer description with explicit interface specifications (e.g., REST/gRPC endpoints and data schemas), data-flow diagrams, and concrete implementation choices such as the container orchestration platform, versioning strategy for environment assets, and logging mechanisms used for closed-loop evaluation. revision: yes
-
Referee: [Abstract (integrations)] Abstract (integrations): The integrations with D-VLA, RL-VLA3, Sword, and Pre-VLA are used to illustrate the framework's capabilities, but no specific results or case studies are provided to show how they benefit from or demonstrate the closed-loop aspects or efficiency gains.
Authors: We agree that the integrations are referenced illustratively without quantitative demonstration of benefit. In the revised manuscript we will include short case-study subsections for at least two of the integrated systems, reporting concrete metrics (e.g., task throughput before/after integration, data-filtering latency, and closed-loop iteration counts) drawn from our internal deployment logs. revision: yes
Circularity Check
No circularity; descriptive system-design paper with no derivations or fitted quantities
full rationale
The paper describes a proposed cloud-native simulation framework, its four-layer architecture, and example integrations (D-VLA, RL-VLA3, Sword, Pre-VLA). It states design choices (elastic scheduling, containerization, unified data management) and argues they enable scalable simulation, but offers no equations, first-principles derivations, predictions of quantities, or fitted parameters. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify results. Claims remain at the level of architectural description rather than any reduction of outputs to inputs by construction. This is the expected non-finding for infrastructure papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cloud-native technologies including elastic resource scheduling, containerized simulation, unified data management, and service-oriented system design enable efficient large-scale simulation for multi-model and multi-task workloads.
invented entities (2)
-
Four-layer architecture
no independent evidence
-
D-VLA, RL-VLA3, Sword, Pre-VLA integrations
no independent evidence
Reference graph
Works this paper leans on
-
[1]
RT-1: Robotics transformer for real-world control at scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, A vinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashniko...
2023
-
[2]
Ryoo, Grecia Salazar, Pannag R
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deepak Manjunath, Igor Mordatch...
2023
-
[3]
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. OpenVLA: An open-source vision-language-action model. arXiv preprint arXiv:2406.09246, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Octo: An Open-Source Generalist Robot Policy
Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Teodor Kreiman, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. arXiv preprint arXiv:2405.12213, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
David Ha and Jürgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Dream to control: Learning behaviors by latent imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020
2020
-
[7]
RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training
Haoran Sun, Yongjian Guo, Zhong Guan, Shuai Di, Xiaodong Bai, Jing Long, Tianyun Zhao, Mingxi Luo, Hongke Zhao, Likang Wu, Xiaotie Deng, Xu Chu, Xi Xiao, Sheng Wen, Yicheng Gong, and Junwu Xiong. RL-VLA3: A flexible and asynchronous reinforcement learning framework for vla training. arXiv preprint arXiv:2602.05765, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[8]
Yucheng Guo, Yongjian Guo, Zhong Guan, Wen Huang, Haoran Sun, Haodong Yue, Xiaolong Xiang, Shuai Di, Zhen Sun, Luqiao Wang, Junwu Xiong, and Yicheng Gong. D-VLA: A high-concurrency distributed asynchronous reinforcement learning framework for vision-language-action models. arXiv preprint arXiv:2605.13276, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[9]
Robert E. Shannon. Introduction to the art and science of simulation. In Proceedings of the 30th Conference on Winter Simulation, pages 7–14, 1998
1998
-
[10]
Domain random- ization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain random- ization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 23–30, 2017
2017
-
[11]
CAD2RL: Real single-image flight without a single real image
Fereshteh Sadeghi and Sergey Levine. CAD2RL: Real single-image flight without a single real image. In Robotics: Science and Systems, 2017
2017
-
[12]
Sim-to-real transfer of robotic control with dynamics randomization
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In IEEE International Conference on Robotics and Automation, pages 3803–3810, 2018
2018
-
[13]
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High performance GPU-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021. 27
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
MuJoCo: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012
2012
-
[15]
Chang, Leonidas J
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. SAPIEN: A simulated part- based interactive environment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020
2020
-
[16]
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The robot learning benchmark and learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020
2020
-
[17]
ManiSkill2: A unified benchmark for generalizable manipulation skills
Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, Xiaodi Yuan, Pengwei Xie, Zhiao Huang, Rui Chen, and Hao Su. ManiSkill2: A unified benchmark for generalizable manipulation skills. In International Conference on Learning Representations, 2023
2023
-
[18]
LIBERO: Benchmarking knowledge transfer for lifelong robot learning
Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. LIBERO: Benchmarking knowledge transfer for lifelong robot learning. In Advances in Neural Information Processing Systems, 2024
2024
-
[19]
RoboCasa: Large-scale simulation of everyday tasks for generalist robots
Yifeng Zhu, Abhishek Joshi, Peter Stone, and Yuke Zhu. RoboCasa: Large-scale simulation of everyday tasks for generalist robots. In Robotics: Science and Systems, 2024
2024
-
[20]
AI2-THOR: An Interactive 3D Environment for Visual AI
Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. AI2-THOR: An interactive 3d environment for visual AI. arXiv preprint arXiv:1712.05474, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel X. Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Trainin...
2021
-
[22]
BEHA VIOR: Benchmark for everyday household activities in virtual, interactive, and ecological environments
Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, and Li Fei-Fei. BEHA VIOR: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on Robot Learning, 2022
2022
-
[23]
CAL VIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks
Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CAL VIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks. In IEEE Robotics and Automation Letters, 2022
2022
-
[24]
Open X-Embodiment: Robotic learning datasets and RT-X models
Open X-Embodiment Collaboration. Open X-Embodiment: Robotic learning datasets and RT-X models. In IEEE International Conference on Robotics and Automation, 2024
2024
-
[25]
DROID: A large-scale in-the-wild robot manipulation dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter Luo, Fan Qian, Ethan Zhu, Dibya Gandhi, Bradly Stadie, Austin Stone, Michael Chiang, Fei Xia, Chelsea Finn, and Sergey Levine. DROID: A large-scale in-the-wild robot man...
2024
-
[26]
Bridgedata v2: A dataset for robot learning at scale
Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen- Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale. Conference on Robot Learning Workshop, 2023
2023
-
[27]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, pages 2555–2565, 2019
2019
-
[28]
Jiaxuan Gao, Yongjian Guo, Zhong Guan, Wen Huang, Wanlun Ma, Xi Xiao, Junwu Xiong, and Sheng Wen. Sword: Style-robust world models as simulators via dynamic latent bootstrapping for vla policy post-training. arXiv preprint arXiv:2605.07288, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[29]
Zhen Sun, Yongjian Guo, Haoran Sun, Luqiao Wang, Wei Lu, Jiachi Ji, Shengzhe Ji, Junwu Xiong, and Zhijun Meng. Pre-vla: Preemptive runtime verification for reliable vision-language-action and world-model rollouts. arXiv preprint arXiv:2605.22446, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[30]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems, 2023. 28
2023
-
[31]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. In Robotics: Science and Systems, 2023
2023
-
[32]
Design and use paradigms for Gazebo, an open-source multi-robot simulator
Nathan Koenig and Andrew Howard. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2149–2154, 2004
2004
-
[33]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y. Ng. ROS: An open-source robot operating system. In ICRA Workshop on Open Source Software, 2009
2009
-
[34]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[35]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martin-Martin, Abhishek Joshi, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv:2009.12293, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[37]
PyBullet, a Python module for physics simulation for games, robotics and machine learning
Erwin Coumans and Yunfei Bai. PyBullet, a Python module for physics simulation for games, robotics and machine learning. GitHub repository, 2016
2016
-
[38]
Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pages 1094–1100, 2020
2020
-
[39]
Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, and Silvio Savarese
Fei Xia, Amir R. Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, and Silvio Savarese. Gibson Env: Real- world perception for embodied agents. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9068–9079, 2018
2018
-
[40]
iGibson 1.0: A simulation environment for interactive tasks in large realistic scenes
Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Shyamal Buch, Claudia D’Arpino, Sanjana Srivastava, Lyne Tchapmi, Kent Vainio, James Wong, Li Fei-Fei, and Silvio Savarese. iGibson 1.0: A simulation environment for interactive tasks in large realistic scenes. In IEEE/RSJ International Conference on Intelligent Robots and S...
2021
-
[41]
Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang
Angel X. Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3D: Learning from RGB-D data in indoor environments. In International Conference on 3D Vision, pages 667–676, 2017
2017
-
[42]
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Ming Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Stra...
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[43]
ProcTHOR: Large-scale embodied AI using procedural generation
Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. ProcTHOR: Large-scale embodied AI using procedural generation. In Advances in Neural Information Processing Systems, 2022
2022
-
[44]
Habitat: A platform for embodied AI research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied AI research. In IEEE/CVF International Conference on Computer Vision, pages 9339–9347, 2019
2019
-
[45]
Vision-and-language navigation: Interpreting visually-grounded navigation instruc- tions in real environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton van den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instruc- tions in real environments. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3674–3683, 2018
2018
-
[46]
ALFRED: A benchmark for interpreting grounded instructions for everyday tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettle- moyer, and Dieter Fox. ALFRED: A benchmark for interpreting grounded instructions for everyday tasks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10740–10749, 2020. 29
2020
-
[47]
TEACh: Task-driven embodied agents that chat
Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, and Dilek Hakkani-Tur. TEACh: Task-driven embodied agents that chat. In AAAI Conference on Artificial Intelligence, pages 2017–2025, 2022
2017
-
[48]
Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics
Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and Systems, 2017
2017
-
[49]
End-to-end training of deep visuomotor policies
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. In Journal of Machine Learning Research, volume 17, pages 1–40, 2016
2016
-
[50]
QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation
Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine. QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pages 651–673, 2018
2018
-
[51]
RoboNet: Large-scale multi-robot learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. RoboNet: Large-scale multi-robot learning. In Conference on Robot Learning, pages 885–897, 2019
2019
-
[52]
What matters in learning from offline human demonstrations for robot manipulation
Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, 2021
2021
-
[53]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[54]
Mas- tering atari, go, chess and shogi by planning with a learned model
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, and David Silver. Mas- tering atari, go, chess and shogi by planning with a learned model. Nature, 588:604–609, 2020
2020
-
[55]
A generalist agent
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gómez Colmenarejo, Alexander Novikov, Gabriel Barth- Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, and Nando de Freitas. A generalist agent. Transactions on...
2022
-
[56]
Do as i can, not as i say: Grounding language in robotic affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang...
2022
-
[57]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duck- worth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An embodie...
2023
-
[58]
Inner monologue: Embodied reasoning through planning with language models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tianhe Yu Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, 2022
2022
-
[59]
Code as policies: Language model programs for embodied control
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In IEEE International Conference on Robotics and Automation, pages 9493–9500, 2023. 30
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.