MagicSim: A Unified Infrastructure for Executable Embodied Interaction

Chuye Hong; Guo Ye; Han Liu; Haoran Lu; Jianshu Zhang; Jiayi Wang; Jihai Zhao; Maojiang Su; Mutian Shen; Ruihai Wu

arxiv: 2606.17511 · v1 · pith:6V7W2EJKnew · submitted 2026-06-16 · 💻 cs.RO · cs.AI· cs.CV

MagicSim: A Unified Infrastructure for Executable Embodied Interaction

Haoran Lu , Songling Liu , Yue Chen , Guo Ye , Mutian Shen , Shuyang Yu , Yu Xiao , Jihai Zhao

show 10 more authors

Shang Wu Jianshu Zhang Xiangtian Gui Chuye Hong Yuran Wang Maojiang Su Jiayi Wang Ruihai Wu Zhaoran Wang Han Liu

This is my paper

Pith reviewed 2026-06-27 00:58 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CV

keywords embodied simulationrobot learningunified runtimeMarkov decision processautomatic trajectory generationplanner-in-the-loopYAML world specificationmultimodal data collection

0 comments

The pith

MagicSim unifies world construction, embodied execution, evaluation, rollout generation, and agent interaction in one deterministic runtime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MagicSim to address how robot learning simulations are currently split across disconnected layers that use magic actions or forward-only renders. It builds everything around one deterministic batched runtime and a shared Markov decision process. YAML specifications define contents, placement, behavior, and agent exposure separately, then the system constructs executable worlds that support multiple task families and embodiments in a single reset-and-step loop. High-level commands are grounded through skills and planners into actual robot actions rather than simulator edits. The same task definition then enables benchmarking, automatic trajectory collection, and interactive interfaces while saving structured multimodal data from successful episodes.

Core claim

MagicSim constructs diverse executable worlds from YAML-first specifications and realizes high-level commands as robot actions inside one deterministic batched runtime and shared MDP. A common execution interface routes commands through controllers, atomic skills, planner primitives, and asynchronous planning. One task definition supports benchmark and RL evaluation, an autocollect interface that turns commands into grounded trajectories, and agent or VLM-facing interaction. Commands advance through a Command-Skill-Planner-Robot-Record pipeline while per-environment states progress independently above the shared physics tick, and successful rollouts are recorded as structured multimodal traj

What carries the argument

The deterministic batched runtime and shared MDP that executes a Command->Skill->Planner->Robot->Record pipeline, grounding high-level commands as robot actions rather than direct state edits.

If this is right

One task definition supports three distinct uses: benchmark evaluation, automatic rollout collection, and interactive agent interfaces.
Commands are turned into grounded robot trajectories that align language supervision with action, visual, and task status representations.
Per-environment command, skill, planning, retry, annotation, and episode states advance independently above the shared physics tick.
Successful episodes are saved as structured multimodal trajectories for downstream training or analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The unified loop could simplify scaling of language-conditioned robot policies by removing the need to maintain separate collection and evaluation codebases.
It might enable tighter closed-loop testing of planner primitives directly inside the same environment used for data generation.
Future work could test whether adding new sensor models or physics variants requires changes only to the YAML layer or also to the core execution loop.

Load-bearing premise

A single deterministic batched runtime and shared MDP can support all diverse task families, interaction regimes, physics, sensors, and embodiments without significant trade-offs in performance or fidelity.

What would settle it

A head-to-head test on a complex multi-embodiment task where MagicSim produces measurably lower physics fidelity or slower per-step throughput than a specialized simulator built only for that task family.

read the original abstract

Robot learning and embodied agents now require simulation to serve as a shared execution substrate linking control, skills, and planning, not only as a renderer, controller testbed, or fixed task environment. Existing pipelines split these layers with "magic" actions, disconnected training environments, or forward-only renders that cannot reproduce, evaluate, and annotate the same episode. We present MagicSim, an embodied interaction infrastructure built around one deterministic batched runtime and a shared Markov decision process (MDP). From YAML-first specifications that decouple contents, placement, behavior, and agent exposure, MagicSim constructs diverse executable worlds spanning task families, interaction regimes, physics, layouts, sensors, avatars, and robot embodiments in one reset-and-step loop. A common execution interface grounds high-level commands through controllers, atomicskills, planner primitives, and asynchronous planning, realizing them as robot actions rather than simulator-side state edits. One task definition supports three capabilities: benchmark and RL evaluation, an autocollect interface that automatically turns commands into grounded trajectories, and agent/VLM-facing interaction. For automatic execution, commands flow through a Command->Skill->Planner->Robot->Record pipeline, while per-environment command, skill, planning, retry, annotation, and episode states advance independently above the shared physics tick. Successful rollouts are saved as structured multimodal trajectories aligning language supervision, action representations, visual/geometric representations, and task-level status with the executed episode. MagicSim thus unifies diverse world construction, embodied execution, task evaluation, automatic rollout generation, and interactive agent interfaces in one planner-in-the-loop runtime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MagicSim describes a unified batched runtime and command pipeline for embodied AI but supplies no data on whether the shared substrate actually avoids fidelity or speed trade-offs.

read the letter

The main things to know are that this paper presents MagicSim as a single deterministic batched runtime on a shared MDP that ties world construction, execution, evaluation, and data collection together through YAML specs and a Command->Skill->Planner->Robot->Record flow, and that it offers no experimental results to test any of those claims.

What stands out as new is the explicit unification: one task definition drives benchmark evaluation, automatic rollout generation, and agent-facing interaction, with per-environment states handling asynchrony above the shared physics tick. The YAML decoupling of contents, placement, behavior, and exposure, plus the insistence on grounding commands in robot actions rather than simulator edits, gives a cleaner separation than many existing pipelines.

The paper does a reasonable job spelling out how this architecture could reduce the usual splits between training environments and renderers. The design choices around deterministic execution and structured multimodal trajectory output look practical for reproducibility in robot learning work.

The soft spot is the total lack of validation. There are no throughput figures, no fidelity comparisons to tools like Isaac or MuJoCo, no ablations on contact-rich physics or heterogeneous sensors, and nothing checking whether the shared MDP incurs costs across embodiments. The stress-test concern about unmeasured trade-offs therefore stands on the evidence given.

This is for groups already building integrated embodied pipelines who want a common substrate. Readers needing proven performance numbers or new algorithms will not find them.

I would send it to peer review. The integration problem is real and the proposed structure is coherent, but any serious review will require experiments before the central claims can be assessed.

Referee Report

1 major / 1 minor

Summary. The paper presents MagicSim, a unified infrastructure for embodied interaction in robotics. It is built around one deterministic batched runtime and a shared Markov decision process (MDP). From YAML-first specifications that decouple contents, placement, behavior, and agent exposure, the system constructs diverse executable worlds spanning task families, interaction regimes, physics, sensors, avatars, and robot embodiments. A common Command->Skill->Planner->Robot execution interface grounds high-level commands as robot actions. One task definition supports benchmark/RL evaluation, automatic rollout generation via autocollect, and interactive agent/VLM interfaces, with per-environment states advancing independently above the shared physics tick. Successful rollouts are saved as structured multimodal trajectories. The paper claims this unifies world construction, embodied execution, task evaluation, automatic rollout generation, and interactive interfaces in one planner-in-the-loop runtime.

Significance. If the system performs as described without the hypothesized fidelity or throughput trade-offs, MagicSim would offer a meaningful contribution to robot learning by replacing fragmented simulation pipelines with a single shared substrate that consistently links control, skills, planning, evaluation, and data collection across heterogeneous tasks and embodiments.

major comments (1)

[Abstract] Abstract and overall manuscript: the central claim that one deterministic batched runtime and shared MDP can instantiate and execute worlds spanning diverse task families, physics, sensors, and embodiments without significant performance or fidelity trade-offs is load-bearing for the contribution, yet the manuscript supplies no implementation details, throughput measurements, error rates, fidelity comparisons, or ablation studies to support it.

minor comments (1)

The description of independent per-environment states advancing above the shared tick would benefit from a diagram or pseudocode to clarify the separation between command/skill/planning layers and the physics tick.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying the load-bearing nature of our central claim and the absence of supporting empirical evidence. We agree this requires strengthening and will revise the manuscript to include the requested details.

read point-by-point responses

Referee: [Abstract] Abstract and overall manuscript: the central claim that one deterministic batched runtime and shared MDP can instantiate and execute worlds spanning diverse task families, physics, sensors, and embodiments without significant performance or fidelity trade-offs is load-bearing for the contribution, yet the manuscript supplies no implementation details, throughput measurements, error rates, fidelity comparisons, or ablation studies to support it.

Authors: We agree that the claim is central and that the current manuscript does not provide the requested quantitative support. The manuscript emphasizes the architectural unification via the YAML-first specifications, shared MDP, and Command->Skill->Planner->Robot pipeline but lacks implementation specifics on the batched runtime, performance metrics, or comparisons. In revision we will add: (1) detailed implementation of the deterministic batched runtime and per-environment state advancement; (2) throughput measurements (steps/sec across environment counts and task types); (3) error rates for rollout generation and task success; (4) fidelity comparisons against standard simulators for physics, sensors, and embodiments; and (5) ablations isolating the effects of batching and the shared MDP. These additions will directly address whether significant trade-offs exist. revision: yes

Circularity Check

0 steps flagged

No circularity: system-description paper with no derivations, predictions, or load-bearing equations

full rationale

The manuscript is an infrastructure/system paper whose central claim is the existence and unification of a deterministic batched runtime + shared MDP that supports diverse embodied tasks. No equations, fitted parameters, predictions, or derivation chain appear in the abstract or described full text. The architecture (YAML decoupling, Command->Skill->Planner->Robot pipeline, per-env state above shared tick) is presented descriptively; success is not claimed via reduction to prior self-defined quantities or self-citations. The reader's assessment of score 1.0 is consistent with the absence of any of the enumerated circularity patterns. The paper is self-contained against external benchmarks in the sense that its claims are architectural assertions open to empirical validation outside any internal fit.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or data-fitting steps are described; the contribution is a software infrastructure rather than a parameterized model.

pith-pipeline@v0.9.1-grok · 5878 in / 1200 out tokens · 56129 ms · 2026-06-27T00:58:09.721798+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

123 extracted references · 8 canonical work pages

[1]

Pi-0.7: A steerable generalist robotic foundation model with emergent capabilities

Physical Intelligence. Pi-0.7: A steerable generalist robotic foundation model with emergent capabilities. arXiv preprint, 2026. CorpusID: 287607456

2026
[2]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch...

Pith/arXiv arXiv
[3]

URLhttps://api.semanticscholar.org/CorpusID:277993634
[4]

Gen-0: Embodied foundation models that scale with physical interaction

Generalist AI Team. Gen-0: Embodied foundation models that scale with physical interaction. Generalist AI Blog, 2025. November 4, 2025

2025
[5]

Gr00t n1: An open foundation model for generalist humanoid robots

NVIDIA. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv:2503.14734, 2025

Pith/arXiv arXiv 2025
[6]

World action models are zero-shot policies

Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, and Joel Jang. World action models are zero-shot policies. arXiv:2602.15922, 2026. 53

Pith/arXiv arXiv 2026
[7]

Fast-wam: Do world action models need test-time future imagination?, 2026

Tianyuan Yuan, Zibin Dong, Yicheng Liu, and Hang Zhao. Fast-wam: Do world action models need test-time future imagination?, 2026. URLhttps://arxiv.org/abs/2603.16666

Pith/arXiv arXiv 2026
[8]

Learning to feel the future: Dreamtacvla for contact-rich manipulation.ArXiv, abs/2512.23864, 2025

Guo Ye, Zexi Zhang, Xu Zhao, Shang Wu, Haoran Lu, Shihan Lu, and Han Liu. Learning to feel the future: Dreamtacvla for contact-rich manipulation.ArXiv, abs/2512.23864, 2025. URLhttps://api.semanticscholar. org/CorpusID:284350273

Pith/arXiv arXiv 2025
[9]

Vagen: Reinforcing world model reasoning for multi-turn vlm agents.ArXiv, abs/2510.16907, 2025

Kangrui Wang, Pingyue Zhang, Zihan Wang, Yaning Gao, Linjie Li, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Fei-Fei Li, Yejin Choi, and Manling Li. Vagen: Reinforcing world model reasoning for multi-turn vlm agents.ArXiv, abs/2510.16907, 2025. URL https://api.semanticscholar.org/CorpusID:282210682

arXiv 2025
[10]

Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Fei-Fei Li, Lijuan Wang, Yejin Choi, and Manling Li

Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica S. Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Fei-Fei Li, Lijuan Wang, Yejin Choi, and Manling Li. Ragen: Understanding self-evolution in llm agents via multi-turn reinforcement learning.ArXiv, abs/2504.20073, 2025....

Pith/arXiv arXiv 2025
[11]

Embodied ai agents: Modeling the world.ArXiv, abs/2506.22355, 2025

Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, and Jitendra Malik. Embodied ai agents: Modeling the world.ArXiv, abs/2506.22355, 2025. URLh...

arXiv 2025
[12]

MuJoCo: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. InIEEE/RSJ International Conference on Intelligent Robots and Systems, 2012

2012
[13]

Isaac Gym: High performance GPU-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High performance GPU-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

Pith/arXiv arXiv 2021
[14]

Chang, Leonidas J

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. SAPIEN: A simulated part-based interactive environment. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2020
[15]

Learning part-aware dense 3d feature field for generalizable articulated object manipulation, 2026

Yue Chen, Muqing Jiang, Kaifeng Zheng, Jiaqi Liang, Chenrui Tie, Haoran Lu, Ruihai Wu, and Hao Dong. Learning part-aware dense 3d feature field for generalizable articulated object manipulation, 2026. URL https://arxiv.org/abs/2602.14193

arXiv 2026
[17]

Broadcasting support relations recursively from local dynamics for object retrieval in clutters.ArXiv, abs/2406.02283, 2024

Yitong Li, Ruihai Wu, Haoran Lu, Chuanruo Ning, Yan Shen, Guanqi Zhan, and Hao Dong. Broadcasting support relations recursively from local dynamics for object retrieval in clutters.ArXiv, abs/2406.02283, 2024. URLhttps://api.semanticscholar.org/CorpusID:270226492

arXiv 2024
[18]

Neural dynamics augmented diffusion policy

Ruihai Wu, Haozhe Chen, Mingtong Zhang, Haoran Lu, Yitong Li, and Yunzhu Li. Neural dynamics augmented diffusion policy. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13234–13241,
[19]

doi: 10.1109/ICRA55743.2025.11128651

work page doi:10.1109/icra55743.2025.11128651 2025
[20]

Garmentlab: A unified simulation and benchmark for garment manipula- tion

Haoran Lu, Ruihai Wu, Yitong Li, Sijie Li, Ziyu Zhu, Chuanruo Ning, Yan Shen, Longzan Luo, Yuan- pei Chen, and Hao Dong. Garmentlab: A unified simulation and benchmark for garment manipula- tion. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, page...

work page doi:10.52202/079017-0379 2024
[21]

Unigarment: A unified simulation and benchmark for garment manipulation, 2025

Haoran Lu, Yitong Li, Ruihai Wu, Chuanruo Ning, Yan Shen, and Hao Dong. Unigarment: A unified simulation and benchmark for garment manipulation, 2025. URLhttps://api.semanticscholar.org/CorpusID:275782214. Manuscript

2025
[22]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025

Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025

arXiv 2025
[23]

Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning.arXiv preprint arXiv:2511.04831, 2025

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, et al. Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning.arXiv preprint arXiv:2511.04831, 2025. 54

Pith/arXiv arXiv 2025
[24]

Maniskill2: A unified benchmark for generalizable manipulation skills.arXiv preprint arXiv:2302.04659, 2023

Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, et al. Maniskill2: A unified benchmark for generalizable manipulation skills.arXiv preprint arXiv:2302.04659, 2023

arXiv 2023
[25]

Habitat: A platform for embodied ai research.arXiv preprint arXiv:1904.01201, 2019

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied ai research.arXiv preprint arXiv:1904.01201, 2019

arXiv 1904
[26]

Tchapmi, Micael E

Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, and Silvio Savarese. igibson 1.0: A simulation environment for interactive tasks in large realistic scenes.arXiv preprint arXiv:2012.02924, 2020

arXiv 2012
[27]

Karen Liu, Jiajun Wu, and Li Fei-Fei

Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R ...

Pith/arXiv arXiv 2024
[28]

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The robot learning benchmark & learning environment.arXiv preprint arXiv:1909.12271, 2019

arXiv 1909
[29]

CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks.arXiv preprint arXiv:2112.03227, 2021

Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks.arXiv preprint arXiv:2112.03227, 2021

arXiv 2021
[30]

RoboCasa: Large-scale simulation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024

Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. RoboCasa: Large-scale simulation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024

Pith/arXiv arXiv 2024
[31]

Open x-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023

Open X-Embodiment Collaboration. Open x-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023

Pith/arXiv arXiv 2023
[32]

Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023

Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023

arXiv 2023
[33]

DROID: A large-scale in-the-wild robot manipulation dataset

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, et al. DROID: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024

Pith/arXiv arXiv 2024
[34]

Vima: General robot manipulation with multimodal prompts.arXiv preprint arXiv:2210.03094, 2022

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts.arXiv preprint arXiv:2210.03094, 2022

arXiv 2022
[35]

Mimicgen: A data generation system for scalable robot learning using human demonstrations

Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In Conference on Robot Learning (CoRL), 2023. arXiv:2310.17596

Pith/arXiv arXiv 2023
[36]

Sucan, Mark Moll, and Lydia E

Ioan A. Sucan, Mark Moll, and Lydia E. Kavraki. The open motion planning library.IEEE Robotics & Automation Magazine, 19(4):72–82, 2012

2012
[37]

Reducing the barrier to entry of complex robotic software: a MoveIt! case study.arXiv preprint arXiv:1404.3785, 2014

David Coleman, Ioan Sucan, Sachin Chitta, and Nikolaus Correll. Reducing the barrier to entry of complex robotic software: a MoveIt! case study.arXiv preprint arXiv:1404.3785, 2014

Pith/arXiv arXiv 2014
[38]

Hierarchical task and motion planning in the now

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Hierarchical task and motion planning in the now. In2011 IEEE International Conference on Robotics and Automation, pages 1470–1477, 2011

2011
[39]

A survey of optimization- based task and motion planning: From classical to learning approaches.arXiv preprint arXiv:2404.02817, 2024

Zhigen Zhao, Shuo Cheng, Yan Ding, Ziyi Zhou, Shiqi Zhang, Danfei Xu, and Ye Zhao. A survey of optimization- based task and motion planning: From classical to learning approaches.arXiv preprint arXiv:2404.02817, 2024

arXiv 2024
[40]

curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023

Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, and Dieter Fox. curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023. 55

arXiv 2023
[41]

RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Pith/arXiv arXiv 2022
[42]

RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

Pith/arXiv arXiv 2023
[43]

Do as i can, not as i say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691, 2022

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as i can, not as i say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691, 2022

Pith/arXiv arXiv 2022
[44]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. PaLM-E: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023

Pith/arXiv arXiv 2023
[45]

Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023

Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, and Animesh Garg. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023

2023
[46]

Domain random- ization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain random- ization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30, 2017

2017
[47]

Openai gym.arXiv preprint arXiv:1606.01540, 2016

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym.arXiv preprint arXiv:1606.01540, 2016

Pith/arXiv arXiv 2016
[48]

Hybridflow: A flexible and efficient RLHF framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient RLHF framework. InProceedings of the Twentieth European Conference on Computer Systems (EuroSys), 2025. The verl library implements HybridFlow

2025
[49]

RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025

Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, et al. RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025

arXiv 2025
[50]

Scenesmith: Agentic generation of simulation-ready indoor scenes, 2026

Nicholas Pfaff, Thomas Cohn, Sergey Zakharov, Rick Cory, and Russ Tedrake. Scenesmith: Agentic generation of simulation-ready indoor scenes, 2026. URLhttps://arxiv.org/abs/2602.09153

Pith/arXiv arXiv 2026
[51]

Holodeck: Language guided generation of 3d embodied ai environments

Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, and Christopher Clark. Holodeck: Language guided generation of 3d embodied ai environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...

2024
[52]

Pink: Python inverse kinematics based on Pinocchio, 2026

Stéphane Caron, Yann De Mont-Marin, Rohan Budhiraja, Seung Hyeon Bang, Ivan Domrachev, Simeon Nedelchev, Peter Du, Adrien Escande, Joris Vaillant, Bruce Wingo, Santosh Patapati, Daniel San José Pro, and Nicolas Guillermo Marticorena Vidal. Pink: Python inverse kinematics based on Pinocchio, 2026. URL https://github.com/stephane-caron/pink

2026
[53]

HOMIE: Humanoid loco-manipulation with isomorphic exoskeleton cockpit.arXiv preprint arXiv:2502.13013, 2025

Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, and Jiangmiao Pang. HOMIE: Humanoid loco-manipulation with isomorphic exoskeleton cockpit.arXiv preprint arXiv:2502.13013, 2025

arXiv 2025
[54]

Agile: A comprehensive workflow for humanoid loco-manipulation learning, 2026

Huihua Zhao*, Rafael Cathomen*, Lionel Gulich, Wei Liu, Efe Arda Ongan, Michael Lin, Shalin Jain, Soha Pouya, and Yan Chang. Agile: A comprehensive workflow for humanoid loco-manipulation learning, 2026. URL https://arxiv.org/abs/2603.20147

arXiv 2026
[55]

The dynamic window approach to collision avoidance

Dieter Fox, Wolfram Burgard, and Sebastian Thrun. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, 4(1):23–33, 1997

1997
[56]

Gonzalez, Clark Barrett, and Ying Sheng

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, and Ying Sheng. SGLang: Efficient execution of structured language model programs. InAdvances in Neural Information Processing Systems, 2024

2024
[57]

MindCube: Spatial mental modeling from limited views.arXiv preprint arXiv:2506.21458, 2025

Qineng Wang, Baiqiao Yin, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu, Li Fei-Fei, and Manling Li. MindCube: Spatial mental modeling from limited views.arXiv preprint arXiv:2506.21458, 2025. 56

arXiv 2025
[58]

Phys4D: Fine-grained physics-consistent 4D modeling from video diffusion.arXiv preprint arXiv:2603.03485, 2026

Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, and Han Liu. Phys4D: Fine-grained physics-consistent 4D modeling from video diffusion.arXiv preprint arXiv:2603.03485, 2026

Pith/arXiv arXiv 2026
[59]

Wenzhen Yuan, Siyuan Dong, and Edward H. Adelson. GelSight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017. doi: 10.3390/s17122762

work page doi:10.3390/s17122762 2017
[60]

Taxim: An example-based simulation model for GelSight tactile sensors.IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022

Zilin Si and Wenzhen Yuan. Taxim: An example-based simulation model for GelSight tactile sensors.IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022

2022
[61]

TacSL: A library for visuotactile sensor simulation and learning.arXiv preprint arXiv:2408.06506, 2024

Iretiayo Akinola, Jie Xu, Jan Carius, Dieter Fox, and Yashraj Narang. TacSL: A library for visuotactile sensor simulation and learning.arXiv preprint arXiv:2408.06506, 2024

arXiv 2024
[62]

FlexiTac: A low-cost, open-source, scalable tactile sensing solution for robotic systems.arXiv preprint arXiv:2604.28156, 2026

Binghao Huang and Yunzhu Li. FlexiTac: A low-cost, open-source, scalable tactile sensing solution for robotic systems.arXiv preprint arXiv:2604.28156, 2026

Pith/arXiv arXiv 2026
[63]

Tacmap: Bridging the tactile sim-to-real gap via geometry-consistent penetration depth map.arXiv preprint arXiv:2602.21625, 2026

Lei Su, Zhijie Peng, Renyuan Ren, Shengping Mao, Juan Du, Kaifeng Zhang, and Xuezhou Zhu. Tacmap: Bridging the tactile sim-to-real gap via geometry-consistent penetration depth map.arXiv preprint arXiv:2602.21625, 2026

Pith/arXiv arXiv 2026
[64]

Annotateanything: Automatic annotation of 3D assets for robot manipulation, 2026

AnnotateAnything Team. Annotateanything: Automatic annotation of 3D assets for robot manipulation, 2026. Companion paper, under review. Citation to be updated upon publication

2026
[65]

Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

Shuai Bai, Yuheng Cai, Ruisheng Chen, Kai Chen, Xi Chen, Zesen Cheng, Lianghao Deng, Wenyu Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

Pith/arXiv arXiv 2025
[66]

Qwen3.5: Towards native multimodal agents

Qwen Team. Qwen3.5: Towards native multimodal agents. Official release post, February 2026. URLhttps: //www.alibabacloud.com/blog/602894. Accessed 2026-06-10

2026
[67]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017
[68]

P3-SAM: Native 3D part segmentation.arXiv preprint arXiv:2509.06784, 2025

Changfeng Ma, Yang Li, Xinhao Yan, Jiachen Xu, Yunhan Yang, Chunshi Wang, Zibo Zhao, Yanwen Guo, Zhuo Chen, and Chunchao Guo. P3-SAM: Native 3D part segmentation.arXiv preprint arXiv:2509.06784, 2025

arXiv 2025
[69]

X-Part: High fidelity and structure coherent shape decomposition.arXiv preprint arXiv:2509.08643, 2025

Xinhao Yan, Jiachen Xu, Yang Li, Changfeng Ma, Yunhan Yang, Chunshi Wang, Zibo Zhao, Zeqiang Lai, Yunfei Zhao, Zhuo Chen, et al. X-Part: High fidelity and structure coherent shape decomposition.arXiv preprint arXiv:2509.08643, 2025

arXiv 2025
[70]

NVIDIA Isaac Sim documentation

NVIDIA. NVIDIA Isaac Sim documentation. https://docs.isaacsim.omniverse.nvidia.com, 2025. Accessed 2026-06-10

2025
[71]

Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai

Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Viswesh Nagaswamy Rajesh, Yong Woo Choi, Yen-Ru Chen, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, and Hao Su. Maniskill3: Gpu parallelized robotics simulation and r...

arXiv 2025
[72]

Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, and Roozbeh Mottaghi

Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, ...

arXiv 2023
[73]

Learning to walk in minutes using massively parallel deep reinforcement learning

Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. InProceedings of the 5th Conference on Robot Learning (CoRL), volume 164 ofProceedings of Machine Learning Research, pages 91–100. PMLR, 2022

2022
[74]

Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems (RSS), 2023. arXiv:2304.13705

Pith/arXiv arXiv 2023
[75]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. InIEEE International Conference on Robotics and Automation (ICRA), 2025. arXiv:2410.24185

arXiv 2025
[76]

Skillmimicgen: Automated demonstration genera- tion for efficient skill learning and deployment

Caelan Garrett, Ajay Mandlekar, Bowen Wen, and Dieter Fox. Skillmimicgen: Automated demonstration genera- tion for efficient skill learning and deployment. InConference on Robot Learning (CoRL), 2024. arXiv:2410.18907. 57

arXiv 2024
[77]

Softmimicgen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, and Ajay Mandlekar. Softmimicgen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

arXiv 2026
[78]

Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088, 2025

Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo, and Yao Mu. Robotwin 2.0: A scalable d...

Pith/arXiv arXiv 2025
[79]

Gensim: Generating robotic simulation tasks via large language models

Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, and Xiaolong Wang. Gensim: Generating robotic simulation tasks via large language models. InInternational Conference on Learning Representations (ICLR), 2024. arXiv:2310.01361

arXiv 2024
[80]

Gensim2: Scaling robot data generation with multi-modal and reasoning llms

Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, and Lirui Wang. Gensim2: Scaling robot data generation with multi-modal and reasoning llms. InConference on Robot Learning (CoRL),
[81]

Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023

arXiv 2023

Showing first 80 references.

[1] [1]

Pi-0.7: A steerable generalist robotic foundation model with emergent capabilities

Physical Intelligence. Pi-0.7: A steerable generalist robotic foundation model with emergent capabilities. arXiv preprint, 2026. CorpusID: 287607456

2026

[2] [2]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch...

Pith/arXiv arXiv

[3] [3]

URLhttps://api.semanticscholar.org/CorpusID:277993634

[4] [4]

Gen-0: Embodied foundation models that scale with physical interaction

Generalist AI Team. Gen-0: Embodied foundation models that scale with physical interaction. Generalist AI Blog, 2025. November 4, 2025

2025

[5] [5]

Gr00t n1: An open foundation model for generalist humanoid robots

NVIDIA. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv:2503.14734, 2025

Pith/arXiv arXiv 2025

[6] [6]

World action models are zero-shot policies

Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, and Joel Jang. World action models are zero-shot policies. arXiv:2602.15922, 2026. 53

Pith/arXiv arXiv 2026

[7] [7]

Fast-wam: Do world action models need test-time future imagination?, 2026

Tianyuan Yuan, Zibin Dong, Yicheng Liu, and Hang Zhao. Fast-wam: Do world action models need test-time future imagination?, 2026. URLhttps://arxiv.org/abs/2603.16666

Pith/arXiv arXiv 2026

[8] [8]

Learning to feel the future: Dreamtacvla for contact-rich manipulation.ArXiv, abs/2512.23864, 2025

Guo Ye, Zexi Zhang, Xu Zhao, Shang Wu, Haoran Lu, Shihan Lu, and Han Liu. Learning to feel the future: Dreamtacvla for contact-rich manipulation.ArXiv, abs/2512.23864, 2025. URLhttps://api.semanticscholar. org/CorpusID:284350273

Pith/arXiv arXiv 2025

[9] [9]

Vagen: Reinforcing world model reasoning for multi-turn vlm agents.ArXiv, abs/2510.16907, 2025

Kangrui Wang, Pingyue Zhang, Zihan Wang, Yaning Gao, Linjie Li, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Fei-Fei Li, Yejin Choi, and Manling Li. Vagen: Reinforcing world model reasoning for multi-turn vlm agents.ArXiv, abs/2510.16907, 2025. URL https://api.semanticscholar.org/CorpusID:282210682

arXiv 2025

[10] [10]

Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Fei-Fei Li, Lijuan Wang, Yejin Choi, and Manling Li

Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica S. Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Fei-Fei Li, Lijuan Wang, Yejin Choi, and Manling Li. Ragen: Understanding self-evolution in llm agents via multi-turn reinforcement learning.ArXiv, abs/2504.20073, 2025....

Pith/arXiv arXiv 2025

[11] [11]

Embodied ai agents: Modeling the world.ArXiv, abs/2506.22355, 2025

Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, and Jitendra Malik. Embodied ai agents: Modeling the world.ArXiv, abs/2506.22355, 2025. URLh...

arXiv 2025

[12] [12]

MuJoCo: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. InIEEE/RSJ International Conference on Intelligent Robots and Systems, 2012

2012

[13] [13]

Isaac Gym: High performance GPU-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High performance GPU-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

Pith/arXiv arXiv 2021

[14] [14]

Chang, Leonidas J

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. SAPIEN: A simulated part-based interactive environment. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2020

[15] [15]

Learning part-aware dense 3d feature field for generalizable articulated object manipulation, 2026

Yue Chen, Muqing Jiang, Kaifeng Zheng, Jiaqi Liang, Chenrui Tie, Haoran Lu, Ruihai Wu, and Hao Dong. Learning part-aware dense 3d feature field for generalizable articulated object manipulation, 2026. URL https://arxiv.org/abs/2602.14193

arXiv 2026

[16] [17]

Broadcasting support relations recursively from local dynamics for object retrieval in clutters.ArXiv, abs/2406.02283, 2024

Yitong Li, Ruihai Wu, Haoran Lu, Chuanruo Ning, Yan Shen, Guanqi Zhan, and Hao Dong. Broadcasting support relations recursively from local dynamics for object retrieval in clutters.ArXiv, abs/2406.02283, 2024. URLhttps://api.semanticscholar.org/CorpusID:270226492

arXiv 2024

[17] [18]

Neural dynamics augmented diffusion policy

Ruihai Wu, Haozhe Chen, Mingtong Zhang, Haoran Lu, Yitong Li, and Yunzhu Li. Neural dynamics augmented diffusion policy. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13234–13241,

[18] [19]

doi: 10.1109/ICRA55743.2025.11128651

work page doi:10.1109/icra55743.2025.11128651 2025

[19] [20]

Garmentlab: A unified simulation and benchmark for garment manipula- tion

Haoran Lu, Ruihai Wu, Yitong Li, Sijie Li, Ziyu Zhu, Chuanruo Ning, Yan Shen, Longzan Luo, Yuan- pei Chen, and Hao Dong. Garmentlab: A unified simulation and benchmark for garment manipula- tion. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, page...

work page doi:10.52202/079017-0379 2024

[20] [21]

Unigarment: A unified simulation and benchmark for garment manipulation, 2025

Haoran Lu, Yitong Li, Ruihai Wu, Chuanruo Ning, Yan Shen, and Hao Dong. Unigarment: A unified simulation and benchmark for garment manipulation, 2025. URLhttps://api.semanticscholar.org/CorpusID:275782214. Manuscript

2025

[21] [22]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025

Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025

arXiv 2025

[22] [23]

Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning.arXiv preprint arXiv:2511.04831, 2025

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, et al. Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning.arXiv preprint arXiv:2511.04831, 2025. 54

Pith/arXiv arXiv 2025

[23] [24]

Maniskill2: A unified benchmark for generalizable manipulation skills.arXiv preprint arXiv:2302.04659, 2023

Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, et al. Maniskill2: A unified benchmark for generalizable manipulation skills.arXiv preprint arXiv:2302.04659, 2023

arXiv 2023

[24] [25]

Habitat: A platform for embodied ai research.arXiv preprint arXiv:1904.01201, 2019

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied ai research.arXiv preprint arXiv:1904.01201, 2019

arXiv 1904

[25] [26]

Tchapmi, Micael E

Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, and Silvio Savarese. igibson 1.0: A simulation environment for interactive tasks in large realistic scenes.arXiv preprint arXiv:2012.02924, 2020

arXiv 2012

[26] [27]

Karen Liu, Jiajun Wu, and Li Fei-Fei

Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R ...

Pith/arXiv arXiv 2024

[27] [28]

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The robot learning benchmark & learning environment.arXiv preprint arXiv:1909.12271, 2019

arXiv 1909

[28] [29]

CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks.arXiv preprint arXiv:2112.03227, 2021

Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks.arXiv preprint arXiv:2112.03227, 2021

arXiv 2021

[29] [30]

RoboCasa: Large-scale simulation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024

Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. RoboCasa: Large-scale simulation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024

Pith/arXiv arXiv 2024

[30] [31]

Open x-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023

Open X-Embodiment Collaboration. Open x-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023

Pith/arXiv arXiv 2023

[31] [32]

Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023

Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023

arXiv 2023

[32] [33]

DROID: A large-scale in-the-wild robot manipulation dataset

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, et al. DROID: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024

Pith/arXiv arXiv 2024

[33] [34]

Vima: General robot manipulation with multimodal prompts.arXiv preprint arXiv:2210.03094, 2022

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts.arXiv preprint arXiv:2210.03094, 2022

arXiv 2022

[34] [35]

Mimicgen: A data generation system for scalable robot learning using human demonstrations

Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In Conference on Robot Learning (CoRL), 2023. arXiv:2310.17596

Pith/arXiv arXiv 2023

[35] [36]

Sucan, Mark Moll, and Lydia E

Ioan A. Sucan, Mark Moll, and Lydia E. Kavraki. The open motion planning library.IEEE Robotics & Automation Magazine, 19(4):72–82, 2012

2012

[36] [37]

Reducing the barrier to entry of complex robotic software: a MoveIt! case study.arXiv preprint arXiv:1404.3785, 2014

David Coleman, Ioan Sucan, Sachin Chitta, and Nikolaus Correll. Reducing the barrier to entry of complex robotic software: a MoveIt! case study.arXiv preprint arXiv:1404.3785, 2014

Pith/arXiv arXiv 2014

[37] [38]

Hierarchical task and motion planning in the now

Leslie Pack Kaelbling and Tomás Lozano-Pérez. Hierarchical task and motion planning in the now. In2011 IEEE International Conference on Robotics and Automation, pages 1470–1477, 2011

2011

[38] [39]

A survey of optimization- based task and motion planning: From classical to learning approaches.arXiv preprint arXiv:2404.02817, 2024

Zhigen Zhao, Shuo Cheng, Yan Ding, Ziyi Zhou, Shiqi Zhang, Danfei Xu, and Ye Zhao. A survey of optimization- based task and motion planning: From classical to learning approaches.arXiv preprint arXiv:2404.02817, 2024

arXiv 2024

[39] [40]

curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023

Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, and Dieter Fox. curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023. 55

arXiv 2023

[40] [41]

RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Pith/arXiv arXiv 2022

[41] [42]

RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

Pith/arXiv arXiv 2023

[42] [43]

Do as i can, not as i say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691, 2022

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as i can, not as i say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691, 2022

Pith/arXiv arXiv 2022

[43] [44]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. PaLM-E: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023

Pith/arXiv arXiv 2023

[44] [45]

Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023

Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, and Animesh Garg. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023

2023

[45] [46]

Domain random- ization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain random- ization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30, 2017

2017

[46] [47]

Openai gym.arXiv preprint arXiv:1606.01540, 2016

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym.arXiv preprint arXiv:1606.01540, 2016

Pith/arXiv arXiv 2016

[47] [48]

Hybridflow: A flexible and efficient RLHF framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient RLHF framework. InProceedings of the Twentieth European Conference on Computer Systems (EuroSys), 2025. The verl library implements HybridFlow

2025

[48] [49]

RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025

Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, et al. RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025

arXiv 2025

[49] [50]

Scenesmith: Agentic generation of simulation-ready indoor scenes, 2026

Nicholas Pfaff, Thomas Cohn, Sergey Zakharov, Rick Cory, and Russ Tedrake. Scenesmith: Agentic generation of simulation-ready indoor scenes, 2026. URLhttps://arxiv.org/abs/2602.09153

Pith/arXiv arXiv 2026

[50] [51]

Holodeck: Language guided generation of 3d embodied ai environments

Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, and Christopher Clark. Holodeck: Language guided generation of 3d embodied ai environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...

2024

[51] [52]

Pink: Python inverse kinematics based on Pinocchio, 2026

Stéphane Caron, Yann De Mont-Marin, Rohan Budhiraja, Seung Hyeon Bang, Ivan Domrachev, Simeon Nedelchev, Peter Du, Adrien Escande, Joris Vaillant, Bruce Wingo, Santosh Patapati, Daniel San José Pro, and Nicolas Guillermo Marticorena Vidal. Pink: Python inverse kinematics based on Pinocchio, 2026. URL https://github.com/stephane-caron/pink

2026

[52] [53]

HOMIE: Humanoid loco-manipulation with isomorphic exoskeleton cockpit.arXiv preprint arXiv:2502.13013, 2025

Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, and Jiangmiao Pang. HOMIE: Humanoid loco-manipulation with isomorphic exoskeleton cockpit.arXiv preprint arXiv:2502.13013, 2025

arXiv 2025

[53] [54]

Agile: A comprehensive workflow for humanoid loco-manipulation learning, 2026

Huihua Zhao*, Rafael Cathomen*, Lionel Gulich, Wei Liu, Efe Arda Ongan, Michael Lin, Shalin Jain, Soha Pouya, and Yan Chang. Agile: A comprehensive workflow for humanoid loco-manipulation learning, 2026. URL https://arxiv.org/abs/2603.20147

arXiv 2026

[54] [55]

The dynamic window approach to collision avoidance

Dieter Fox, Wolfram Burgard, and Sebastian Thrun. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, 4(1):23–33, 1997

1997

[55] [56]

Gonzalez, Clark Barrett, and Ying Sheng

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, and Ying Sheng. SGLang: Efficient execution of structured language model programs. InAdvances in Neural Information Processing Systems, 2024

2024

[56] [57]

MindCube: Spatial mental modeling from limited views.arXiv preprint arXiv:2506.21458, 2025

Qineng Wang, Baiqiao Yin, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu, Li Fei-Fei, and Manling Li. MindCube: Spatial mental modeling from limited views.arXiv preprint arXiv:2506.21458, 2025. 56

arXiv 2025

[57] [58]

Phys4D: Fine-grained physics-consistent 4D modeling from video diffusion.arXiv preprint arXiv:2603.03485, 2026

Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, and Han Liu. Phys4D: Fine-grained physics-consistent 4D modeling from video diffusion.arXiv preprint arXiv:2603.03485, 2026

Pith/arXiv arXiv 2026

[58] [59]

Wenzhen Yuan, Siyuan Dong, and Edward H. Adelson. GelSight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017. doi: 10.3390/s17122762

work page doi:10.3390/s17122762 2017

[59] [60]

Taxim: An example-based simulation model for GelSight tactile sensors.IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022

Zilin Si and Wenzhen Yuan. Taxim: An example-based simulation model for GelSight tactile sensors.IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022

2022

[60] [61]

TacSL: A library for visuotactile sensor simulation and learning.arXiv preprint arXiv:2408.06506, 2024

Iretiayo Akinola, Jie Xu, Jan Carius, Dieter Fox, and Yashraj Narang. TacSL: A library for visuotactile sensor simulation and learning.arXiv preprint arXiv:2408.06506, 2024

arXiv 2024

[61] [62]

FlexiTac: A low-cost, open-source, scalable tactile sensing solution for robotic systems.arXiv preprint arXiv:2604.28156, 2026

Binghao Huang and Yunzhu Li. FlexiTac: A low-cost, open-source, scalable tactile sensing solution for robotic systems.arXiv preprint arXiv:2604.28156, 2026

Pith/arXiv arXiv 2026

[62] [63]

Tacmap: Bridging the tactile sim-to-real gap via geometry-consistent penetration depth map.arXiv preprint arXiv:2602.21625, 2026

Lei Su, Zhijie Peng, Renyuan Ren, Shengping Mao, Juan Du, Kaifeng Zhang, and Xuezhou Zhu. Tacmap: Bridging the tactile sim-to-real gap via geometry-consistent penetration depth map.arXiv preprint arXiv:2602.21625, 2026

Pith/arXiv arXiv 2026

[63] [64]

Annotateanything: Automatic annotation of 3D assets for robot manipulation, 2026

AnnotateAnything Team. Annotateanything: Automatic annotation of 3D assets for robot manipulation, 2026. Companion paper, under review. Citation to be updated upon publication

2026

[64] [65]

Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

Shuai Bai, Yuheng Cai, Ruisheng Chen, Kai Chen, Xi Chen, Zesen Cheng, Lianghao Deng, Wenyu Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

Pith/arXiv arXiv 2025

[65] [66]

Qwen3.5: Towards native multimodal agents

Qwen Team. Qwen3.5: Towards native multimodal agents. Official release post, February 2026. URLhttps: //www.alibabacloud.com/blog/602894. Accessed 2026-06-10

2026

[66] [67]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017

[67] [68]

P3-SAM: Native 3D part segmentation.arXiv preprint arXiv:2509.06784, 2025

Changfeng Ma, Yang Li, Xinhao Yan, Jiachen Xu, Yunhan Yang, Chunshi Wang, Zibo Zhao, Yanwen Guo, Zhuo Chen, and Chunchao Guo. P3-SAM: Native 3D part segmentation.arXiv preprint arXiv:2509.06784, 2025

arXiv 2025

[68] [69]

X-Part: High fidelity and structure coherent shape decomposition.arXiv preprint arXiv:2509.08643, 2025

Xinhao Yan, Jiachen Xu, Yang Li, Changfeng Ma, Yunhan Yang, Chunshi Wang, Zibo Zhao, Zeqiang Lai, Yunfei Zhao, Zhuo Chen, et al. X-Part: High fidelity and structure coherent shape decomposition.arXiv preprint arXiv:2509.08643, 2025

arXiv 2025

[69] [70]

NVIDIA Isaac Sim documentation

NVIDIA. NVIDIA Isaac Sim documentation. https://docs.isaacsim.omniverse.nvidia.com, 2025. Accessed 2026-06-10

2025

[70] [71]

Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai

Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Viswesh Nagaswamy Rajesh, Yong Woo Choi, Yen-Ru Chen, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, and Hao Su. Maniskill3: Gpu parallelized robotics simulation and r...

arXiv 2025

[71] [72]

Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, and Roozbeh Mottaghi

Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, ...

arXiv 2023

[72] [73]

Learning to walk in minutes using massively parallel deep reinforcement learning

Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. InProceedings of the 5th Conference on Robot Learning (CoRL), volume 164 ofProceedings of Machine Learning Research, pages 91–100. PMLR, 2022

2022

[73] [74]

Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn

Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems (RSS), 2023. arXiv:2304.13705

Pith/arXiv arXiv 2023

[74] [75]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. InIEEE International Conference on Robotics and Automation (ICRA), 2025. arXiv:2410.24185

arXiv 2025

[75] [76]

Skillmimicgen: Automated demonstration genera- tion for efficient skill learning and deployment

Caelan Garrett, Ajay Mandlekar, Bowen Wen, and Dieter Fox. Skillmimicgen: Automated demonstration genera- tion for efficient skill learning and deployment. InConference on Robot Learning (CoRL), 2024. arXiv:2410.18907. 57

arXiv 2024

[76] [77]

Softmimicgen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, and Ajay Mandlekar. Softmimicgen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

arXiv 2026

[77] [78]

Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088, 2025

Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo, and Yao Mu. Robotwin 2.0: A scalable d...

Pith/arXiv arXiv 2025

[78] [79]

Gensim: Generating robotic simulation tasks via large language models

Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, and Xiaolong Wang. Gensim: Generating robotic simulation tasks via large language models. InInternational Conference on Learning Representations (ICLR), 2024. arXiv:2310.01361

arXiv 2024

[79] [80]

Gensim2: Scaling robot data generation with multi-modal and reasoning llms

Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, and Lirui Wang. Gensim2: Scaling robot data generation with multi-modal and reasoning llms. InConference on Robot Learning (CoRL),

[80] [81]

Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023

arXiv 2023