MagicSim: A Unified Infrastructure for Executable Embodied Interaction
Pith reviewed 2026-06-27 00:58 UTC · model grok-4.3
The pith
MagicSim unifies world construction, embodied execution, evaluation, rollout generation, and agent interaction in one deterministic runtime.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MagicSim constructs diverse executable worlds from YAML-first specifications and realizes high-level commands as robot actions inside one deterministic batched runtime and shared MDP. A common execution interface routes commands through controllers, atomic skills, planner primitives, and asynchronous planning. One task definition supports benchmark and RL evaluation, an autocollect interface that turns commands into grounded trajectories, and agent or VLM-facing interaction. Commands advance through a Command-Skill-Planner-Robot-Record pipeline while per-environment states progress independently above the shared physics tick, and successful rollouts are recorded as structured multimodal traj
What carries the argument
The deterministic batched runtime and shared MDP that executes a Command->Skill->Planner->Robot->Record pipeline, grounding high-level commands as robot actions rather than direct state edits.
If this is right
- One task definition supports three distinct uses: benchmark evaluation, automatic rollout collection, and interactive agent interfaces.
- Commands are turned into grounded robot trajectories that align language supervision with action, visual, and task status representations.
- Per-environment command, skill, planning, retry, annotation, and episode states advance independently above the shared physics tick.
- Successful episodes are saved as structured multimodal trajectories for downstream training or analysis.
Where Pith is reading between the lines
- The unified loop could simplify scaling of language-conditioned robot policies by removing the need to maintain separate collection and evaluation codebases.
- It might enable tighter closed-loop testing of planner primitives directly inside the same environment used for data generation.
- Future work could test whether adding new sensor models or physics variants requires changes only to the YAML layer or also to the core execution loop.
Load-bearing premise
A single deterministic batched runtime and shared MDP can support all diverse task families, interaction regimes, physics, sensors, and embodiments without significant trade-offs in performance or fidelity.
What would settle it
A head-to-head test on a complex multi-embodiment task where MagicSim produces measurably lower physics fidelity or slower per-step throughput than a specialized simulator built only for that task family.
read the original abstract
Robot learning and embodied agents now require simulation to serve as a shared execution substrate linking control, skills, and planning, not only as a renderer, controller testbed, or fixed task environment. Existing pipelines split these layers with "magic" actions, disconnected training environments, or forward-only renders that cannot reproduce, evaluate, and annotate the same episode. We present MagicSim, an embodied interaction infrastructure built around one deterministic batched runtime and a shared Markov decision process (MDP). From YAML-first specifications that decouple contents, placement, behavior, and agent exposure, MagicSim constructs diverse executable worlds spanning task families, interaction regimes, physics, layouts, sensors, avatars, and robot embodiments in one reset-and-step loop. A common execution interface grounds high-level commands through controllers, atomicskills, planner primitives, and asynchronous planning, realizing them as robot actions rather than simulator-side state edits. One task definition supports three capabilities: benchmark and RL evaluation, an autocollect interface that automatically turns commands into grounded trajectories, and agent/VLM-facing interaction. For automatic execution, commands flow through a Command->Skill->Planner->Robot->Record pipeline, while per-environment command, skill, planning, retry, annotation, and episode states advance independently above the shared physics tick. Successful rollouts are saved as structured multimodal trajectories aligning language supervision, action representations, visual/geometric representations, and task-level status with the executed episode. MagicSim thus unifies diverse world construction, embodied execution, task evaluation, automatic rollout generation, and interactive agent interfaces in one planner-in-the-loop runtime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents MagicSim, a unified infrastructure for embodied interaction in robotics. It is built around one deterministic batched runtime and a shared Markov decision process (MDP). From YAML-first specifications that decouple contents, placement, behavior, and agent exposure, the system constructs diverse executable worlds spanning task families, interaction regimes, physics, sensors, avatars, and robot embodiments. A common Command->Skill->Planner->Robot execution interface grounds high-level commands as robot actions. One task definition supports benchmark/RL evaluation, automatic rollout generation via autocollect, and interactive agent/VLM interfaces, with per-environment states advancing independently above the shared physics tick. Successful rollouts are saved as structured multimodal trajectories. The paper claims this unifies world construction, embodied execution, task evaluation, automatic rollout generation, and interactive interfaces in one planner-in-the-loop runtime.
Significance. If the system performs as described without the hypothesized fidelity or throughput trade-offs, MagicSim would offer a meaningful contribution to robot learning by replacing fragmented simulation pipelines with a single shared substrate that consistently links control, skills, planning, evaluation, and data collection across heterogeneous tasks and embodiments.
major comments (1)
- [Abstract] Abstract and overall manuscript: the central claim that one deterministic batched runtime and shared MDP can instantiate and execute worlds spanning diverse task families, physics, sensors, and embodiments without significant performance or fidelity trade-offs is load-bearing for the contribution, yet the manuscript supplies no implementation details, throughput measurements, error rates, fidelity comparisons, or ablation studies to support it.
minor comments (1)
- The description of independent per-environment states advancing above the shared tick would benefit from a diagram or pseudocode to clarify the separation between command/skill/planning layers and the physics tick.
Simulated Author's Rebuttal
We thank the referee for identifying the load-bearing nature of our central claim and the absence of supporting empirical evidence. We agree this requires strengthening and will revise the manuscript to include the requested details.
read point-by-point responses
-
Referee: [Abstract] Abstract and overall manuscript: the central claim that one deterministic batched runtime and shared MDP can instantiate and execute worlds spanning diverse task families, physics, sensors, and embodiments without significant performance or fidelity trade-offs is load-bearing for the contribution, yet the manuscript supplies no implementation details, throughput measurements, error rates, fidelity comparisons, or ablation studies to support it.
Authors: We agree that the claim is central and that the current manuscript does not provide the requested quantitative support. The manuscript emphasizes the architectural unification via the YAML-first specifications, shared MDP, and Command->Skill->Planner->Robot pipeline but lacks implementation specifics on the batched runtime, performance metrics, or comparisons. In revision we will add: (1) detailed implementation of the deterministic batched runtime and per-environment state advancement; (2) throughput measurements (steps/sec across environment counts and task types); (3) error rates for rollout generation and task success; (4) fidelity comparisons against standard simulators for physics, sensors, and embodiments; and (5) ablations isolating the effects of batching and the shared MDP. These additions will directly address whether significant trade-offs exist. revision: yes
Circularity Check
No circularity: system-description paper with no derivations, predictions, or load-bearing equations
full rationale
The manuscript is an infrastructure/system paper whose central claim is the existence and unification of a deterministic batched runtime + shared MDP that supports diverse embodied tasks. No equations, fitted parameters, predictions, or derivation chain appear in the abstract or described full text. The architecture (YAML decoupling, Command->Skill->Planner->Robot pipeline, per-env state above shared tick) is presented descriptively; success is not claimed via reduction to prior self-defined quantities or self-citations. The reader's assessment of score 1.0 is consistent with the absence of any of the enumerated circularity patterns. The paper is self-contained against external benchmarks in the sense that its claims are architectural assertions open to empirical validation outside any internal fit.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Pi-0.7: A steerable generalist robotic foundation model with emergent capabilities
Physical Intelligence. Pi-0.7: A steerable generalist robotic foundation model with emergent capabilities. arXiv preprint, 2026. CorpusID: 287607456
2026
-
[2]
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch...
-
[3]
URLhttps://api.semanticscholar.org/CorpusID:277993634
-
[4]
Gen-0: Embodied foundation models that scale with physical interaction
Generalist AI Team. Gen-0: Embodied foundation models that scale with physical interaction. Generalist AI Blog, 2025. November 4, 2025
2025
-
[5]
Gr00t n1: An open foundation model for generalist humanoid robots
NVIDIA. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv:2503.14734, 2025
Pith/arXiv arXiv 2025
-
[6]
World action models are zero-shot policies
Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, and Joel Jang. World action models are zero-shot policies. arXiv:2602.15922, 2026. 53
Pith/arXiv arXiv 2026
-
[7]
Fast-wam: Do world action models need test-time future imagination?, 2026
Tianyuan Yuan, Zibin Dong, Yicheng Liu, and Hang Zhao. Fast-wam: Do world action models need test-time future imagination?, 2026. URLhttps://arxiv.org/abs/2603.16666
Pith/arXiv arXiv 2026
-
[8]
Learning to feel the future: Dreamtacvla for contact-rich manipulation.ArXiv, abs/2512.23864, 2025
Guo Ye, Zexi Zhang, Xu Zhao, Shang Wu, Haoran Lu, Shihan Lu, and Han Liu. Learning to feel the future: Dreamtacvla for contact-rich manipulation.ArXiv, abs/2512.23864, 2025. URLhttps://api.semanticscholar. org/CorpusID:284350273
Pith/arXiv arXiv 2025
-
[9]
Vagen: Reinforcing world model reasoning for multi-turn vlm agents.ArXiv, abs/2510.16907, 2025
Kangrui Wang, Pingyue Zhang, Zihan Wang, Yaning Gao, Linjie Li, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Fei-Fei Li, Yejin Choi, and Manling Li. Vagen: Reinforcing world model reasoning for multi-turn vlm agents.ArXiv, abs/2510.16907, 2025. URL https://api.semanticscholar.org/CorpusID:282210682
arXiv 2025
-
[10]
Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Fei-Fei Li, Lijuan Wang, Yejin Choi, and Manling Li
Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica S. Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Fei-Fei Li, Lijuan Wang, Yejin Choi, and Manling Li. Ragen: Understanding self-evolution in llm agents via multi-turn reinforcement learning.ArXiv, abs/2504.20073, 2025....
Pith/arXiv arXiv 2025
-
[11]
Embodied ai agents: Modeling the world.ArXiv, abs/2506.22355, 2025
Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, and Jitendra Malik. Embodied ai agents: Modeling the world.ArXiv, abs/2506.22355, 2025. URLh...
arXiv 2025
-
[12]
MuJoCo: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. InIEEE/RSJ International Conference on Intelligent Robots and Systems, 2012
2012
-
[13]
Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High performance GPU-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021
Pith/arXiv arXiv 2021
-
[14]
Chang, Leonidas J
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. SAPIEN: A simulated part-based interactive environment. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2020
-
[15]
Learning part-aware dense 3d feature field for generalizable articulated object manipulation, 2026
Yue Chen, Muqing Jiang, Kaifeng Zheng, Jiaqi Liang, Chenrui Tie, Haoran Lu, Ruihai Wu, and Hao Dong. Learning part-aware dense 3d feature field for generalizable articulated object manipulation, 2026. URL https://arxiv.org/abs/2602.14193
arXiv 2026
-
[17]
Yitong Li, Ruihai Wu, Haoran Lu, Chuanruo Ning, Yan Shen, Guanqi Zhan, and Hao Dong. Broadcasting support relations recursively from local dynamics for object retrieval in clutters.ArXiv, abs/2406.02283, 2024. URLhttps://api.semanticscholar.org/CorpusID:270226492
arXiv 2024
-
[18]
Neural dynamics augmented diffusion policy
Ruihai Wu, Haozhe Chen, Mingtong Zhang, Haoran Lu, Yitong Li, and Yunzhu Li. Neural dynamics augmented diffusion policy. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 13234–13241,
-
[19]
doi: 10.1109/ICRA55743.2025.11128651
-
[20]
Garmentlab: A unified simulation and benchmark for garment manipula- tion
Haoran Lu, Ruihai Wu, Yitong Li, Sijie Li, Ziyu Zhu, Chuanruo Ning, Yan Shen, Longzan Luo, Yuan- pei Chen, and Hao Dong. Garmentlab: A unified simulation and benchmark for garment manipula- tion. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, page...
-
[21]
Unigarment: A unified simulation and benchmark for garment manipulation, 2025
Haoran Lu, Yitong Li, Ruihai Wu, Chuanruo Ning, Yan Shen, and Hao Dong. Unigarment: A unified simulation and benchmark for garment manipulation, 2025. URLhttps://api.semanticscholar.org/CorpusID:275782214. Manuscript
2025
-
[22]
Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025
arXiv 2025
-
[23]
Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, et al. Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning.arXiv preprint arXiv:2511.04831, 2025. 54
Pith/arXiv arXiv 2025
-
[24]
Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, et al. Maniskill2: A unified benchmark for generalizable manipulation skills.arXiv preprint arXiv:2302.04659, 2023
arXiv 2023
-
[25]
Habitat: A platform for embodied ai research.arXiv preprint arXiv:1904.01201, 2019
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied ai research.arXiv preprint arXiv:1904.01201, 2019
arXiv 1904
-
[26]
Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, and Silvio Savarese. igibson 1.0: A simulation environment for interactive tasks in large realistic scenes.arXiv preprint arXiv:2012.02924, 2020
arXiv 2012
-
[27]
Karen Liu, Jiajun Wu, and Li Fei-Fei
Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R ...
Pith/arXiv arXiv 2024
-
[28]
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The robot learning benchmark & learning environment.arXiv preprint arXiv:1909.12271, 2019
arXiv 1909
-
[29]
Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks.arXiv preprint arXiv:2112.03227, 2021
arXiv 2021
-
[30]
Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. RoboCasa: Large-scale simulation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024
Pith/arXiv arXiv 2024
-
[31]
Open x-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023
Open X-Embodiment Collaboration. Open x-embodiment: Robotic learning datasets and RT-X models.arXiv preprint arXiv:2310.08864, 2023
Pith/arXiv arXiv 2023
-
[32]
Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023
Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale.arXiv preprint arXiv:2308.12952, 2023
arXiv 2023
-
[33]
DROID: A large-scale in-the-wild robot manipulation dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, et al. DROID: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024
Pith/arXiv arXiv 2024
-
[34]
Vima: General robot manipulation with multimodal prompts.arXiv preprint arXiv:2210.03094, 2022
Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts.arXiv preprint arXiv:2210.03094, 2022
arXiv 2022
-
[35]
Mimicgen: A data generation system for scalable robot learning using human demonstrations
Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In Conference on Robot Learning (CoRL), 2023. arXiv:2310.17596
Pith/arXiv arXiv 2023
-
[36]
Sucan, Mark Moll, and Lydia E
Ioan A. Sucan, Mark Moll, and Lydia E. Kavraki. The open motion planning library.IEEE Robotics & Automation Magazine, 19(4):72–82, 2012
2012
-
[37]
David Coleman, Ioan Sucan, Sachin Chitta, and Nikolaus Correll. Reducing the barrier to entry of complex robotic software: a MoveIt! case study.arXiv preprint arXiv:1404.3785, 2014
Pith/arXiv arXiv 2014
-
[38]
Hierarchical task and motion planning in the now
Leslie Pack Kaelbling and Tomás Lozano-Pérez. Hierarchical task and motion planning in the now. In2011 IEEE International Conference on Robotics and Automation, pages 1470–1477, 2011
2011
-
[39]
Zhigen Zhao, Shuo Cheng, Yan Ding, Ziyi Zhou, Shiqi Zhang, Danfei Xu, and Ye Zhao. A survey of optimization- based task and motion planning: From classical to learning approaches.arXiv preprint arXiv:2404.02817, 2024
arXiv 2024
-
[40]
Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, and Dieter Fox. curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023. 55
arXiv 2023
-
[41]
RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022
Pith/arXiv arXiv 2022
-
[42]
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023
Pith/arXiv arXiv 2023
-
[43]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as i can, not as i say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691, 2022
Pith/arXiv arXiv 2022
-
[44]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. PaLM-E: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023
Pith/arXiv arXiv 2023
-
[45]
Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023
Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, and Animesh Garg. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023
2023
-
[46]
Domain random- ization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain random- ization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30, 2017
2017
-
[47]
Openai gym.arXiv preprint arXiv:1606.01540, 2016
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym.arXiv preprint arXiv:1606.01540, 2016
Pith/arXiv arXiv 2016
-
[48]
Hybridflow: A flexible and efficient RLHF framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient RLHF framework. InProceedings of the Twentieth European Conference on Computer Systems (EuroSys), 2025. The verl library implements HybridFlow
2025
-
[49]
Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, et al. RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation.arXiv preprint arXiv:2509.15965, 2025
arXiv 2025
-
[50]
Scenesmith: Agentic generation of simulation-ready indoor scenes, 2026
Nicholas Pfaff, Thomas Cohn, Sergey Zakharov, Rick Cory, and Russ Tedrake. Scenesmith: Agentic generation of simulation-ready indoor scenes, 2026. URLhttps://arxiv.org/abs/2602.09153
Pith/arXiv arXiv 2026
-
[51]
Holodeck: Language guided generation of 3d embodied ai environments
Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, and Christopher Clark. Holodeck: Language guided generation of 3d embodied ai environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...
2024
-
[52]
Pink: Python inverse kinematics based on Pinocchio, 2026
Stéphane Caron, Yann De Mont-Marin, Rohan Budhiraja, Seung Hyeon Bang, Ivan Domrachev, Simeon Nedelchev, Peter Du, Adrien Escande, Joris Vaillant, Bruce Wingo, Santosh Patapati, Daniel San José Pro, and Nicolas Guillermo Marticorena Vidal. Pink: Python inverse kinematics based on Pinocchio, 2026. URL https://github.com/stephane-caron/pink
2026
-
[53]
Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, and Jiangmiao Pang. HOMIE: Humanoid loco-manipulation with isomorphic exoskeleton cockpit.arXiv preprint arXiv:2502.13013, 2025
arXiv 2025
-
[54]
Agile: A comprehensive workflow for humanoid loco-manipulation learning, 2026
Huihua Zhao*, Rafael Cathomen*, Lionel Gulich, Wei Liu, Efe Arda Ongan, Michael Lin, Shalin Jain, Soha Pouya, and Yan Chang. Agile: A comprehensive workflow for humanoid loco-manipulation learning, 2026. URL https://arxiv.org/abs/2603.20147
arXiv 2026
-
[55]
The dynamic window approach to collision avoidance
Dieter Fox, Wolfram Burgard, and Sebastian Thrun. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, 4(1):23–33, 1997
1997
-
[56]
Gonzalez, Clark Barrett, and Ying Sheng
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, and Ying Sheng. SGLang: Efficient execution of structured language model programs. InAdvances in Neural Information Processing Systems, 2024
2024
-
[57]
MindCube: Spatial mental modeling from limited views.arXiv preprint arXiv:2506.21458, 2025
Qineng Wang, Baiqiao Yin, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu, Li Fei-Fei, and Manling Li. MindCube: Spatial mental modeling from limited views.arXiv preprint arXiv:2506.21458, 2025. 56
arXiv 2025
-
[58]
Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, and Han Liu. Phys4D: Fine-grained physics-consistent 4D modeling from video diffusion.arXiv preprint arXiv:2603.03485, 2026
Pith/arXiv arXiv 2026
-
[59]
Wenzhen Yuan, Siyuan Dong, and Edward H. Adelson. GelSight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017. doi: 10.3390/s17122762
-
[60]
Taxim: An example-based simulation model for GelSight tactile sensors.IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022
Zilin Si and Wenzhen Yuan. Taxim: An example-based simulation model for GelSight tactile sensors.IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022
2022
-
[61]
Iretiayo Akinola, Jie Xu, Jan Carius, Dieter Fox, and Yashraj Narang. TacSL: A library for visuotactile sensor simulation and learning.arXiv preprint arXiv:2408.06506, 2024
arXiv 2024
-
[62]
Binghao Huang and Yunzhu Li. FlexiTac: A low-cost, open-source, scalable tactile sensing solution for robotic systems.arXiv preprint arXiv:2604.28156, 2026
Pith/arXiv arXiv 2026
-
[63]
Lei Su, Zhijie Peng, Renyuan Ren, Shengping Mao, Juan Du, Kaifeng Zhang, and Xuezhou Zhu. Tacmap: Bridging the tactile sim-to-real gap via geometry-consistent penetration depth map.arXiv preprint arXiv:2602.21625, 2026
Pith/arXiv arXiv 2026
-
[64]
Annotateanything: Automatic annotation of 3D assets for robot manipulation, 2026
AnnotateAnything Team. Annotateanything: Automatic annotation of 3D assets for robot manipulation, 2026. Companion paper, under review. Citation to be updated upon publication
2026
-
[65]
Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025
Shuai Bai, Yuheng Cai, Ruisheng Chen, Kai Chen, Xi Chen, Zesen Cheng, Lianghao Deng, Wenyu Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025
Pith/arXiv arXiv 2025
-
[66]
Qwen3.5: Towards native multimodal agents
Qwen Team. Qwen3.5: Towards native multimodal agents. Official release post, February 2026. URLhttps: //www.alibabacloud.com/blog/602894. Accessed 2026-06-10
2026
-
[67]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems (NeurIPS), 2017
2017
-
[68]
P3-SAM: Native 3D part segmentation.arXiv preprint arXiv:2509.06784, 2025
Changfeng Ma, Yang Li, Xinhao Yan, Jiachen Xu, Yunhan Yang, Chunshi Wang, Zibo Zhao, Yanwen Guo, Zhuo Chen, and Chunchao Guo. P3-SAM: Native 3D part segmentation.arXiv preprint arXiv:2509.06784, 2025
arXiv 2025
-
[69]
Xinhao Yan, Jiachen Xu, Yang Li, Changfeng Ma, Yunhan Yang, Chunshi Wang, Zibo Zhao, Zeqiang Lai, Yunfei Zhao, Zhuo Chen, et al. X-Part: High fidelity and structure coherent shape decomposition.arXiv preprint arXiv:2509.08643, 2025
arXiv 2025
-
[70]
NVIDIA Isaac Sim documentation
NVIDIA. NVIDIA Isaac Sim documentation. https://docs.isaacsim.omniverse.nvidia.com, 2025. Accessed 2026-06-10
2025
-
[71]
Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai
Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Viswesh Nagaswamy Rajesh, Yong Woo Choi, Yen-Ru Chen, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, and Hao Su. Maniskill3: Gpu parallelized robotics simulation and r...
arXiv 2025
-
[72]
Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, ...
arXiv 2023
-
[73]
Learning to walk in minutes using massively parallel deep reinforcement learning
Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. InProceedings of the 5th Conference on Robot Learning (CoRL), volume 164 ofProceedings of Machine Learning Research, pages 91–100. PMLR, 2022
2022
-
[74]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems (RSS), 2023. arXiv:2304.13705
Pith/arXiv arXiv 2023
-
[75]
Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning
Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. InIEEE International Conference on Robotics and Automation (ICRA), 2025. arXiv:2410.24185
arXiv 2025
-
[76]
Skillmimicgen: Automated demonstration genera- tion for efficient skill learning and deployment
Caelan Garrett, Ajay Mandlekar, Bowen Wen, and Dieter Fox. Skillmimicgen: Automated demonstration genera- tion for efficient skill learning and deployment. InConference on Robot Learning (CoRL), 2024. arXiv:2410.18907. 57
arXiv 2024
-
[77]
Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, and Ajay Mandlekar. Softmimicgen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026
arXiv 2026
-
[78]
Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo, and Yao Mu. Robotwin 2.0: A scalable d...
Pith/arXiv arXiv 2025
-
[79]
Gensim: Generating robotic simulation tasks via large language models
Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, and Xiaolong Wang. Gensim: Generating robotic simulation tasks via large language models. InInternational Conference on Learning Representations (ICLR), 2024. arXiv:2310.01361
arXiv 2024
-
[80]
Gensim2: Scaling robot data generation with multi-modal and reasoning llms
Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, and Lirui Wang. Gensim2: Scaling robot data generation with multi-modal and reasoning llms. InConference on Robot Learning (CoRL),
-
[81]
Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023
arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.