DeformGen: Dynamics-Based Topology Augmentation for Deformable Manipulation Policy Learning

Hanxin Zhu; Jiaolong Yang; Junyan Lin; Wenjun Zeng; Wenyao Zhang; Xiaokang Yang; Xin Jin; Yao Mu; Yuyang Zhang; Zekun Qi

arxiv: 2606.25939 · v1 · pith:JMHXXUZInew · submitted 2026-06-24 · 💻 cs.RO

DeformGen: Dynamics-Based Topology Augmentation for Deformable Manipulation Policy Learning

Zili Lin , Wenyao Zhang , Yuyang Zhang , Zekun Qi , Junyan Lin , Hanxin Zhu , Jiaolong Yang , Zhibo Chen

show 4 more authors

Yao Mu Xiaokang Yang Xin Jin Wenjun Zeng

This is my paper

Pith reviewed 2026-06-25 20:36 UTC · model grok-4.3

classification 💻 cs.RO

keywords deformable manipulationdemonstration augmentationpolicy learningdynamics simulationtrajectory transfertopology augmentationrobot learning

0 comments

The pith

DeformGen augments demonstration data for deformable manipulation by using localized physical disturbances, forward simulation, and deformation-field warping to expand valid states and transfer trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the limits of standard demonstration augmentation when robots must handle soft, stretchy objects. Small pose changes cannot reach most valid deformed configurations because of physics constraints, and robot paths cannot simply be copied because material points move independently. DeformGen instead applies small localized pushes, runs the physics forward to produce new coherent states, and warps the original end-effector paths through a continuous deformation field so the behavior stays consistent with the new shape. Experiments on high-fidelity benchmarks show policies trained on the resulting data outperform those trained on the original demonstrations or on rigid-style augmentations.

Core claim

DeformGen achieves topological diversity for deformable objects by expanding the valid state distribution through localized physical disturbances and forward simulation, and by transferring trajectories via deformation-field warping, jointly augmenting states and behaviors.

What carries the argument

DeformGen framework that applies localized physical disturbances followed by forward dynamics simulation for states and deformation-field warping to adapt source trajectories to new geometries.

If this is right

Policies trained with the augmented data achieve higher success rates than those trained on original demonstrations.
The generated states respect physical constraints better than those produced by rigid pose perturbations.
Trajectory transfer maintains consistent end-effector behavior across deformed object geometries.
Joint state and behavior augmentation improves learning across multiple high-fidelity deformable manipulation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same disturbance-plus-simulation idea might apply to other non-rigid robotic tasks such as pouring or folding where state validity is hard to sample directly.
If the forward simulator matches real material behavior closely enough, the method could lower the amount of real-world demonstration collection needed for new objects.
Extending the warping step to handle contact-rich interactions could reveal whether the current approach breaks when objects touch multiple surfaces or each other.

Load-bearing premise

Forward-simulating dynamics from localized physical disturbances produces topology-coherent and physically plausible states that improve policy learning, and deformation-field warping transfers trajectories while preserving essential manipulation behavior.

What would settle it

Running the reported benchmark experiments and finding that policies trained on DeformGen-augmented data achieve no higher success rates than policies trained on the original demonstrations alone.

read the original abstract

Demonstration augmentation is proposed for cost-efficient data acquisition, but existing methods are fundamentally limited in deformable manipulation due to two challenges: (1) the state space is high-dimensional with physics-induced constraints, making valid configurations impossible to reach via low-dimensional pose perturbations; and (2) trajectory transfer is non-equivariant, as material points no longer move rigidly together under deformation. We present DeformGen, a dynamics-based augmentation framework that achieves topological diversity for deformable objects. For the state challenge, DeformGen expands the valid state distribution by applying localized physical disturbances and forward-simulating the dynamics to obtain topology-coherent, physically plausible deformable states. For the trajectory challenge, DeformGen transfers source manipulation trajectories via deformation-field warping, which lifts per-particle displacements into a continuous spatial function to adapt the end-effector trajectory consistently with the deformed geometry. In this way, our method jointly augments the state distribution and its associated manipulation behavior. Experiments on high-fidelity deformable manipulation benchmarks show that DeformGen generally improves policy learning compared with training on the original demonstrations alone and with rigid-style augmentation baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeformGen pairs localized disturbance simulation with deformation-field warping to augment states and trajectories for deformable manipulation, a targeted step past rigid baselines though the abstract leaves the size of gains unclear.

read the letter

Colleague,

The main point is that DeformGen generates new states by applying localized physical disturbances then forward-simulating dynamics, and transfers trajectories by lifting displacements into a continuous deformation field for warping. This combination directly targets the high-dimensional constraints and non-rigid motion issues that break standard augmentation in deformable manipulation.

The paper does a clean job framing the two challenges and matching each with a physics-grounded fix. The warping step, in particular, avoids treating the object as rigid and instead adapts the end-effector path consistently with the deformed geometry. That feels like a practical technical choice rather than a generic trick.

The soft spots sit mostly in the results section. The abstract says the method improves policy learning over original data and rigid baselines on high-fidelity benchmarks, but supplies no numbers, error bars, or ablation details. Without those, it is hard to judge whether the gains are large enough to matter in practice or whether the simulation steps introduce artifacts that hurt downstream performance. The dependence on accurate forward simulation is also a standing risk if the physics model drifts from reality.

This work is aimed at people doing learning-based control for soft or cloth-like objects. A reader already working on demonstration augmentation or deformable robotics would find the concrete method useful. It is coherent enough on its own terms to deserve referee time, even if the empirical claims will need tightening.

I would send it for peer review.

Referee Report

3 major / 0 minor

Summary. The manuscript presents DeformGen, a dynamics-based augmentation framework for deformable manipulation policy learning. It targets two challenges in demonstration augmentation: (1) high-dimensional physics-constrained state spaces unreachable by low-dimensional perturbations, addressed via localized physical disturbances followed by forward simulation to produce topology-coherent states; and (2) non-equivariant trajectory transfer under deformation, addressed via deformation-field warping that lifts per-particle displacements to a continuous spatial function for consistent end-effector adaptation. The method jointly augments states and behaviors, with experiments on high-fidelity benchmarks claiming general policy improvements over original demonstrations and rigid-style baselines.

Significance. If the empirical improvements hold under detailed scrutiny, the work could advance data-efficient learning for deformable robotics by providing a physics-grounded alternative to purely geometric augmentation, potentially reducing reliance on extensive real-world data collection while preserving physical plausibility.

major comments (3)

[Abstract / Experiments] Abstract and experimental claims: the central assertion of 'general improvement' in policy learning is presented without any quantitative metrics, error bars, statistical tests, or details on data exclusion, which directly limits evaluation of the magnitude and reliability of the reported gains over baselines.
[Method (state augmentation component)] Method description (state augmentation): while localized disturbances and forward simulation are proposed to expand the valid state distribution, the manuscript provides no explicit validation (e.g., via metrics on physical plausibility or topology coherence) that the generated states remain within the manifold of feasible deformable configurations without introducing artifacts.
[Method (trajectory warping component)] Method description (trajectory transfer): the deformation-field warping is claimed to preserve essential manipulation behavior, but no analysis or ablation is given on whether the lifted continuous function introduces harmful artifacts or alters task-relevant dynamics in the transferred trajectories.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, agreeing where revisions are needed to improve clarity and rigor, and outlining specific changes we will make.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental claims: the central assertion of 'general improvement' in policy learning is presented without any quantitative metrics, error bars, statistical tests, or details on data exclusion, which directly limits evaluation of the magnitude and reliability of the reported gains over baselines.

Authors: We acknowledge that while the experimental section reports comparative success rates on high-fidelity benchmarks against original data and rigid baselines, the presentation lacks explicit error bars, statistical tests, and data exclusion details. In the revised manuscript, we will add these elements (including standard deviations from multiple seeds, t-tests for significance, and explicit data handling protocols) and revise the abstract to reference key quantitative gains. This will directly address the concern about evaluating reliability. revision: yes
Referee: [Method (state augmentation component)] Method description (state augmentation): while localized disturbances and forward simulation are proposed to expand the valid state distribution, the manuscript provides no explicit validation (e.g., via metrics on physical plausibility or topology coherence) that the generated states remain within the manifold of feasible deformable configurations without introducing artifacts.

Authors: The referee is correct that the current manuscript does not include explicit quantitative validation metrics for the generated states. We will add a dedicated validation subsection (or appendix) reporting metrics such as physical property preservation (e.g., mass conservation, collision-free checks post-simulation) and topology coherence (e.g., mesh connectivity analysis), along with qualitative examples. These will confirm the states remain feasible without artifacts. revision: yes
Referee: [Method (trajectory warping component)] Method description (trajectory transfer): the deformation-field warping is claimed to preserve essential manipulation behavior, but no analysis or ablation is given on whether the lifted continuous function introduces harmful artifacts or alters task-relevant dynamics in the transferred trajectories.

Authors: We agree that an explicit analysis or ablation on potential artifacts from the continuous deformation-field lifting is missing. In the revision, we will incorporate an ablation study comparing trajectory fidelity (e.g., end-effector deviation metrics) and downstream policy performance with/without the lifting step, plus checks for dynamics preservation. This will substantiate the claim that essential behavior is maintained. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a method using external forward dynamics simulation from localized disturbances to generate new states and deformation-field warping to transfer trajectories. These steps rely on standard physics engines and spatial interpolation rather than any fitted parameters, self-definitions, or self-citation chains that reduce the claimed outputs to the inputs by construction. The central claims concern empirical policy improvement on benchmarks and are not derived tautologically from the method description itself. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the method rests on standard domain assumptions about physics simulation fidelity but introduces no explicitly fitted free parameters or new invented entities; full text would be needed to audit any implementation-specific constants.

axioms (1)

domain assumption Forward simulation of localized physical disturbances produces topology-coherent, physically plausible deformable states.
Invoked directly in the state-augmentation component described in the abstract.

pith-pipeline@v0.9.1-grok · 5762 in / 1290 out tokens · 31562 ms · 2026-06-25T20:36:42.429121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 1 canonical work pages

[1]

pi0: A vision-language-action flow model for general robot control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. pi0: A vision-language-action flow model for general robot control. arXiv preprint, 2024

2024
[2]

pi0.5: a vision-language-action model with open-world generalization.arXiv preprint, 2025

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. pi0.5: a vision-language-action model with open-world generalization.arXiv preprint, 2025

2025
[3]

Openvla: An open-source vision-language-action model.arXiv preprint, 2024

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint, 2024

2024
[4]

Gr00t n1: An open foundation model for generalist humanoid robots

Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint, 2025

2025
[5]

Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy

Xinyi Chen, Yilun Chen, Yanwei Fu, Ning Gao, Jiaya Jia, Weiyang Jin, Hao Li, Yao Mu, Jiangmiao Pang, Yu Qiao, et al. Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy. arXiv preprint arXiv:2510.13778, 2025

Pith/arXiv arXiv 2025
[6]

Dreamvla: A vision-language-action model dreamed with comprehensive world knowledge.arXiv preprint, 2025

Wenyao Zhang, Hongsi Liu, Zekun Qi, Yunnan Wang, Xinqiang Yu, Jiazhao Zhang, Runpei Dong, Jiawei He, He Wang, Zhizheng Zhang, et al. Dreamvla: A vision-language-action model dreamed with comprehensive world knowledge.arXiv preprint, 2025

2025
[7]

Disentangled robot learning via separate forward and inverse dynamics pretraining.arXiv preprint arXiv:2604.16391, 2026

Wenyao Zhang, Bozhou Zhang, Zekun Qi, Wenjun Zeng, Xin Jin, and Li Zhang. Disentangled robot learning via separate forward and inverse dynamics pretraining.arXiv preprint arXiv:2604.16391, 2026

Pith/arXiv arXiv 2026
[8]

Vla-jepa: Enhancing vision-language-action model with latent world model.arXiv preprint arXiv:2602.10098, 2026

Jingwen Sun, Wenyao Zhang, Zekun Qi, Shaojie Ren, Zezhi Liu, Hanxin Zhu, Guangzhong Sun, Xin Jin, and Zhibo Chen. Vla-jepa: Enhancing vision-language-action model with latent world model.arXiv preprint arXiv:2602.10098, 2026

arXiv 2026
[9]

Discrete diffusion vla: Bringing discrete diffusion to action decoding in vision- language-action policies.arXiv preprint, 2025

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Xiaokang Yang, Jiangmiao Pang, Yao Mu, and Ping Luo. Discrete diffusion vla: Bringing discrete diffusion to action decoding in vision- language-action policies.arXiv preprint, 2025

2025
[10]

Robotwin: Dual-arm robot benchmark with generative digital twins (early version)

Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Yude Zou, Lunkai Lin, Zhiqiang Xie, and Ping Luo. Robotwin: Dual-arm robot benchmark with generative digital twins (early version). InECCV, 2025

2025
[11]

Mimicgen: A data generation system for scalable robot learning using human demonstrations

Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In Conference on Robot Learning, pages 1820–1864. PMLR, 2023

2023
[12]

Demogen: Syn- thetic demonstration generation for data-efficient visuomotor policy learning.arXiv preprint arXiv:2502.16932, 2025

Zhengrong Xue, Shuying Deng, Zhenyang Chen, Yixuan Wang, Zhecheng Yuan, and Huazhe Xu. Demogen: Syn- thetic demonstration generation for data-efficient visuomotor policy learning.arXiv preprint arXiv:2502.16932, 2025

arXiv 2025
[13]

Noveldemonstra- tion generation with gaussian splatting enables robust one-shot manipulation.arXiv preprint arXiv:2504.13175, 2025

SizheYang, WenyeYu, JiaZeng, JunLv, KeruiRen, CewuLu, DahuaLin, andJiangmiaoPang. Noveldemonstra- tion generation with gaussian splatting enables robust one-shot manipulation.arXiv preprint arXiv:2504.13175, 2025

arXiv 2025
[14]

Egodemogen: Novel egocentric demonstration generation enables viewpoint-robust manipulation.arXiv preprint arXiv:2509.22578, 2025

Yuan Xu, Jiabing Yang, Xiaofeng Wang, Yixiang Chen, Zheng Zhu, Bowen Fang, Guan Huang, Xinze Chen, Yun Ye, Qiang Zhang, et al. Egodemogen: Novel egocentric demonstration generation enables viewpoint-robust manipulation.arXiv preprint arXiv:2509.22578, 2025

arXiv 2025
[15]

Softmimic- gen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, and Ajay Mandlekar. Softmimic- gen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

arXiv 2026
[16]

Sim1: Physics-aligned simulator as zero-shot data scaler in deformable worlds

Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, et al. Sim1: Physics-aligned simulator as zero-shot data scaler in deformable worlds. arXiv preprint arXiv:2604.08544, 2026

Pith/arXiv arXiv 2026
[17]

Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey.The International Journal of Robotics Research, 37(7):688–716, 2018

Jose Sanchez, Juan-Antonio Corrales, Belhassen-Chedli Bouzgarrou, and Youcef Mezouar. Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey.The International Journal of Robotics Research, 37(7):688–716, 2018

2018
[18]

Modeling, learning, perception, and control methods for deformable object manipulation.Science Robotics, 6(54):eabd8803, 2021

Hang Yin, Anastasia Varava, and Danica Kragic. Modeling, learning, perception, and control methods for deformable object manipulation.Science Robotics, 6(54):eabd8803, 2021

2021
[19]

Real-to-sim robot policy evaluation with gaussian splatting simulation of soft-body interactions.arXiv preprint arXiv:2511.04665, 2025

Kaifeng Zhang, Shuo Sha, Hanxiao Jiang, Matthew Loper, Hyunjong Song, Guangyan Cai, Zhuo Xu, Xiaochen Hu, Changxi Zheng, and Yunzhu Li. Real-to-sim robot policy evaluation with gaussian splatting simulation of soft-body interactions.arXiv preprint arXiv:2511.04665, 2025

arXiv 2025
[20]

High-fidelity simulated data generation for real-world zero-shot robotic manipulation learning with gaussian splatting.IEEE Robotics and Automation Letters, 11(5):5310–5317, 2026

Haoyu Zhao, Cheng Zeng, Linghao Zhuang, Yaxi Zhao, Shengke Xue, Hao Wang, Xingyue Zhao, Zhongyu Li, Kehan Li, Siteng Huang, Mingxiu Chen, Xin Li, Deli Zhao, and Hua Zou. High-fidelity simulated data generation for real-world zero-shot robotic manipulation learning with gaussian splatting.IEEE Robotics and Automation Letters, 11(5):5310–5317, 2026. doi: 10...

work page doi:10.1109/lra.2026.3671535 2026
[21]

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The Robot Learning Benchmark & Learning Environment.arXiv preprint arXiv:1909.12271, 2019

arXiv 1909
[22]

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation. InInternational Conference on Machine Learning, 2024

2024
[23]

Rl-driven data generation for robust vision-based dexterous grasping.arXiv preprint arXiv:2504.18084, 2025

Atsushi Kanehira, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu, and Katsushi Ikeuchi. Rl-driven data generation for robust vision-based dexterous grasping.arXiv preprint arXiv:2504.18084, 2025

arXiv 2025
[24]

Semantically controllable augmentations for generalizable robot learning.The International Journal of Robotics Research, 44(10-11):1705–1726, 2025

Zoey Chen, Zhao Mandi, Homanga Bharadhwaj, Mohit Sharma, Shuran Song, Abhishek Gupta, and Vikash Kumar. Semantically controllable augmentations for generalizable robot learning.The International Journal of Robotics Research, 44(10-11):1705–1726, 2025

2025
[25]

Gigabrain-0: A world model-powered vision-language-action model

GigaAI. Gigabrain-0: A world model-powered vision-language-action model. 2025. URLhttps://arxiv.org/ abs/2510.19430

arXiv 2025
[26]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Jim Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 16923–16930. IEEE, 2025

2025
[27]

Dreamgen: Unlocking generalization in robot learning through neural trajectories.arXiv preprint, 2025

Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. Dreamgen: Unlocking generalization in robot learning through neural trajectories.arXiv preprint, 2025

2025
[28]

Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory

Ying Li, Xiaobao Wei, Xiaowei Chi, Yuming Li, Zhongyu Zhao, Hao Wang, Ningning Ma, Ming Lu, and Sirui Han. Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 6644–6652, 2026

2026
[29]

Oxe-auge: Alarge-scalerobotaugmentationofoxeforscalingcross-embodiment policy learning.arXiv preprint arXiv:2512.13100, 2025

Guanhua Ji, Harsha Polavaram, Lawrence Yunliang Chen, Sandeep Bajamahal, Zehan Ma, Simeon Adebola, ChenfengXu, andKenGoldberg. Oxe-auge: Alarge-scalerobotaugmentationofoxeforscalingcross-embodiment policy learning.arXiv preprint arXiv:2512.13100, 2025

arXiv 2025
[30]

Robovip: Multi-view video generation with visual identity prompting augments robot manipulation.arXiv preprint arXiv:2601.05241, 2026

Boyang Wang, Haoran Zhang, Shujie Zhang, Jinkun Hao, Mingda Jia, Qi Lv, Yucheng Mao, Zhaoyang Lyu, Jia Zeng, Xudong Xu, et al. Robovip: Multi-view video generation with visual identity prompting augments robot manipulation.arXiv preprint arXiv:2601.05241, 2026

arXiv 2026
[31]

One demo is worth a thousand trajectories: Action-view augmentation for visuomotor policies

Chuer Pan, Litian Liang, Dominik Bauer, Eric Cousineau, Benjamin Burchfiel, Siyuan Feng, and Shuran Song. One demo is worth a thousand trajectories: Action-view augmentation for visuomotor policies. In9th Annual Conference on Robot Learning, 2025

2025
[32]

Real2render2real: Scaling robot data without dynamics simulation or robot hardware,

Justin Yu, Letian Fu, Huang Huang, Karim El-Refai, Rares Andrei Ambrus, Richard Cheng, Muhammad Zubair Irshad, and Ken Goldberg. Real2render2real: Scaling robot data without dynamics simulation or robot hardware,
[33]

URLhttps://arxiv.org/abs/2505.09601

arXiv
[34]

Real2edit2real: Generating robotic demonstrations via a 3d control interface.arXiv preprint arXiv:2512.19402, 2025

Yujie Zhao, Hongwei Fan, Di Chen, Shengcong Chen, Liliang Chen, Xiaoqi Li, Guanghui Ren, and Hao Dong. Real2edit2real: Generating robotic demonstrations via a 3d control interface.arXiv preprint arXiv:2512.19402, 2025

arXiv 2025
[35]

Skillmimicgen: Automated demonstration gener- ation for efficient skill learning and deployment.arXiv preprint arXiv:2410.18907, 2024

Caelan Garrett, Ajay Mandlekar, Bowen Wen, and Dieter Fox. Skillmimicgen: Automated demonstration gener- ation for efficient skill learning and deployment.arXiv preprint arXiv:2410.18907, 2024

arXiv 2024
[36]

3d gaussian splatting for real-time radiance field rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

2023
[37]

Deformation constraints in a mass-spring model to describe rigid cloth behaviour

Xavier Provot et al. Deformation constraints in a mass-spring model to describe rigid cloth behaviour. InGraphics interface, pages 147–147. Canadian Information Processing Society, 1995

1995
[38]

Real-time elastic deformations of soft tissues for surgery simulation.IEEE transactions on Visualization and Computer Graphics, 5(1):62–73, 2002

Stéphane Cotin, Hervé Delingette, and Nicholas Ayache. Real-time elastic deformations of soft tissues for surgery simulation.IEEE transactions on Visualization and Computer Graphics, 5(1):62–73, 2002

2002
[39]

A moving least squares material point method with displacement discontinuity and two-way rigid body coupling.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

Yuanming Hu, Yu Fang, Ziheng Ge, Ziyin Qu, Yixin Zhu, Andre Pradhana, and Chenfanfu Jiang. A moving least squares material point method with displacement discontinuity and two-way rigid body coupling.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

2018
[40]

Position based dynamics.Journal of Visual Communication and Image Representation, 18(2):109–118, 2007

Matthias Müller, Bruno Heidelberger, Marcus Hennix, and John Ratcliff. Position based dynamics.Journal of Visual Communication and Image Representation, 18(2):109–118, 2007

2007
[41]

May, Tushar Kusnur, George Konidaris, and Laura Herlant

Sergio Orozco, Brandon B. May, Tushar Kusnur, George Konidaris, and Laura Herlant. Learning equivariant neural-augmented object dynamics from few interactions. InBeyond Rigid Worlds: Representing and Interacting with Non-Rigid Objects, 2025. URLhttps://openreview.net/forum?id=JAiJpFozaD

2025
[42]

Tenenbaum, David Held, and Chuang Gan

Xingyu Lin, Zhiao Huang, Yunzhu Li, Joshua B. Tenenbaum, David Held, and Chuang Gan. Diffskill: Skill ab- straction from differentiable physics for deformable object manipulations with tools. InInternational Conference on Learning Representations (ICLR), 2022

2022
[43]

Robocook: Long-horizon elasto-plastic object manipulation with diverse tools

Haochen Shi, Huazhe Xu, Samuel Clarke, Yunzhu Li, and Jiajun Wu. Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. InConference on Robot Learning (CoRL), 2023

2023
[44]

Predicting object interactions with behavior primitives: An application in stowing tasks

Haonan Chen, Yilong Niu, Kaiwen Hou, Shuijing Liu, Yixuan Wang, Yunzhu Li, and Katherine Driggs-Campbell. Predicting object interactions with behavior primitives: An application in stowing tasks. InConference on Robot Learning (CoRL), 2023

2023
[45]

Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects

Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, and Dieter Fox. Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects. InIEEE International Conference on Robotics and Automation (ICRA), 2022

2022
[46]

Robotic manipulation of deformable objects: a comprehensive review.Robotic Intelligence and Automation, pages 1–16, 2026

Lijun Han and Hesheng Wang. Robotic manipulation of deformable objects: a comprehensive review.Robotic Intelligence and Automation, pages 1–16, 2026

2026
[47]

A perspective on open challenges in deformable object manipulation

Ryan Paul McKennaa and John Oyekan. A perspective on open challenges in deformable object manipulation. arXiv preprint arXiv:2602.22998, 2026

arXiv 2026
[48]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InThe International Journal of Robotics Research, 2024

2024
[49]

Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InICLR, 2023

Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InICLR, 2023

2023
[50]

Tiebot: Learning to knot a tie from visual demonstration through a real-to-sim-to-real approach.arXiv preprint arXiv:2407.03245, 2024

Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, and Lin Shao. Tiebot: Learning to knot a tie from visual demonstration through a real-to-sim-to-real approach.arXiv preprint arXiv:2407.03245, 2024

arXiv 2024
[51]

Robotic assembly of deformable linear objects via curriculum reinforcement learning.IEEE Robotics and Automation Letters, 2025

Kai Wu, Rongkang Chen, Qi Chen, and Weihua Li. Robotic assembly of deformable linear objects via curriculum reinforcement learning.IEEE Robotics and Automation Letters, 2025

2025
[52]

Checheng Yu, Chonghao Sima, Gangcheng Jiang, Hai Zhang, Haoguang Mai, Hongyang Li, Huijie Wang, Jin Chen, Kaiyang Wu, Li Chen, Lirui Zhao, Modi Shi, Ping Luo, Qingwen Bu, Shijia Peng, Tianyu Li, and Yibo Yuan.χ 0: Resource-aware robust manipulation via taming distributional inconsistencies.arXiv preprint arXiv:2602.09021, 2026

arXiv 2026
[53]

Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor

Daniel Seita, Aditya Ganapathi, Ryan Hoque, Minho Hwang, Edward Cen, Ajay Kumar Tanwani, Ashwin Balakrishna, Brijen Thananjeyan, Jeffrey Ichnowski, Nawid Jamali, et al. Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

2020
[54]

Fabricflownet: Bimanual cloth manipulation with a flow-based policy

Thomas Weng, Sujay Bajracharya, Yufei Wang, Khush Agrawal, and David Held. Fabricflownet: Bimanual cloth manipulation with a flow-based policy. InConference on Robot Learning (CoRL), 2022

2022
[55]

Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy

Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, and Hao Dong. Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy. arXiv preprint arXiv:2505.11032, 2025

arXiv 2025
[56]

Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding

Huy Ha and Shuran Song. Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. InConference on Robot Learning (CoRL), pages 24–33. PMLR, 2022

2022
[57]

Phystwin: Physics- informed reconstruction and simulation of deformable objects from videos.ICCV, 2025

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable objects from videos.ICCV, 2025

2025
[58]

Softgym: Benchmarking deep reinforcement learning for deformable object manipulation

Xingyu Lin, Yufei Wang, Jake Olkin, and David Held. Softgym: Benchmarking deep reinforcement learning for deformable object manipulation. InConference on Robot Learning, 2020

2020
[59]

Taichi: a language for high-performance computation on spatially sparse data structures.ACM Transactions on Graphics (TOG), 38 (6):1–16, 2019

Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. Taichi: a language for high-performance computation on spatially sparse data structures.ACM Transactions on Graphics (TOG), 38 (6):1–16, 2019

2019
[60]

Warp: A high-performance python framework for gpu simulation and graphics

Miles Macklin. Warp: A high-performance python framework for gpu simulation and graphics. InNVIDIA GPU Technology Conference (GTC), volume 3, 2022

2022
[61]

Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy.arXiv preprint arXiv:2511.16651, 2025

Yang Tian, Yuyin Yang, Yiman Xie, Zetao Cai, Xu Shi, Ning Gao, Hangxu Liu, Xuekun Jiang, Zherui Qiu, Feng Yuan, et al. Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy.arXiv preprint arXiv:2511.16651, 2025

arXiv 2025
[62]

Learning from demonstrations through the use of non-rigid registration

John Schulman, Jonathan Ho, Cameron Lee, and Pieter Abbeel. Learning from demonstrations through the use of non-rigid registration. InRobotics Research: The 16th International Symposium ISRR, pages 339–354. Springer, 2016

2016
[63]

Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint, 2023

Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint, 2023

2023
[64]

Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2023

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2023

2023
[65]

Smolvla: A vision-language-action model for affordable and efficient robotics.arXiv preprint, 2025

Mustafa Shukor, Dana Aubakirova, Francesco Capuano, Pepijn Kooijmans, Steven Palma, Adil Zouitine, Michel Aractingi, Caroline Pascal, Martino Russi, Andres Marafioti, et al. Smolvla: A vision-language-action model for affordable and efficient robotics.arXiv preprint, 2025. Appendix A State Augmentation Details A.1 Formal Assumption Our approach relies on ...

2025

[1] [1]

pi0: A vision-language-action flow model for general robot control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. pi0: A vision-language-action flow model for general robot control. arXiv preprint, 2024

2024

[2] [2]

pi0.5: a vision-language-action model with open-world generalization.arXiv preprint, 2025

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. pi0.5: a vision-language-action model with open-world generalization.arXiv preprint, 2025

2025

[3] [3]

Openvla: An open-source vision-language-action model.arXiv preprint, 2024

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint, 2024

2024

[4] [4]

Gr00t n1: An open foundation model for generalist humanoid robots

Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint, 2025

2025

[5] [5]

Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy

Xinyi Chen, Yilun Chen, Yanwei Fu, Ning Gao, Jiaya Jia, Weiyang Jin, Hao Li, Yao Mu, Jiangmiao Pang, Yu Qiao, et al. Internvla-m1: A spatially guided vision-language-action framework for generalist robot policy. arXiv preprint arXiv:2510.13778, 2025

Pith/arXiv arXiv 2025

[6] [6]

Dreamvla: A vision-language-action model dreamed with comprehensive world knowledge.arXiv preprint, 2025

Wenyao Zhang, Hongsi Liu, Zekun Qi, Yunnan Wang, Xinqiang Yu, Jiazhao Zhang, Runpei Dong, Jiawei He, He Wang, Zhizheng Zhang, et al. Dreamvla: A vision-language-action model dreamed with comprehensive world knowledge.arXiv preprint, 2025

2025

[7] [7]

Disentangled robot learning via separate forward and inverse dynamics pretraining.arXiv preprint arXiv:2604.16391, 2026

Wenyao Zhang, Bozhou Zhang, Zekun Qi, Wenjun Zeng, Xin Jin, and Li Zhang. Disentangled robot learning via separate forward and inverse dynamics pretraining.arXiv preprint arXiv:2604.16391, 2026

Pith/arXiv arXiv 2026

[8] [8]

Vla-jepa: Enhancing vision-language-action model with latent world model.arXiv preprint arXiv:2602.10098, 2026

Jingwen Sun, Wenyao Zhang, Zekun Qi, Shaojie Ren, Zezhi Liu, Hanxin Zhu, Guangzhong Sun, Xin Jin, and Zhibo Chen. Vla-jepa: Enhancing vision-language-action model with latent world model.arXiv preprint arXiv:2602.10098, 2026

arXiv 2026

[9] [9]

Discrete diffusion vla: Bringing discrete diffusion to action decoding in vision- language-action policies.arXiv preprint, 2025

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Xiaokang Yang, Jiangmiao Pang, Yao Mu, and Ping Luo. Discrete diffusion vla: Bringing discrete diffusion to action decoding in vision- language-action policies.arXiv preprint, 2025

2025

[10] [10]

Robotwin: Dual-arm robot benchmark with generative digital twins (early version)

Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Yude Zou, Lunkai Lin, Zhiqiang Xie, and Ping Luo. Robotwin: Dual-arm robot benchmark with generative digital twins (early version). InECCV, 2025

2025

[11] [11]

Mimicgen: A data generation system for scalable robot learning using human demonstrations

Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, and Dieter Fox. Mimicgen: A data generation system for scalable robot learning using human demonstrations. In Conference on Robot Learning, pages 1820–1864. PMLR, 2023

2023

[12] [12]

Demogen: Syn- thetic demonstration generation for data-efficient visuomotor policy learning.arXiv preprint arXiv:2502.16932, 2025

Zhengrong Xue, Shuying Deng, Zhenyang Chen, Yixuan Wang, Zhecheng Yuan, and Huazhe Xu. Demogen: Syn- thetic demonstration generation for data-efficient visuomotor policy learning.arXiv preprint arXiv:2502.16932, 2025

arXiv 2025

[13] [13]

Noveldemonstra- tion generation with gaussian splatting enables robust one-shot manipulation.arXiv preprint arXiv:2504.13175, 2025

SizheYang, WenyeYu, JiaZeng, JunLv, KeruiRen, CewuLu, DahuaLin, andJiangmiaoPang. Noveldemonstra- tion generation with gaussian splatting enables robust one-shot manipulation.arXiv preprint arXiv:2504.13175, 2025

arXiv 2025

[14] [14]

Egodemogen: Novel egocentric demonstration generation enables viewpoint-robust manipulation.arXiv preprint arXiv:2509.22578, 2025

Yuan Xu, Jiabing Yang, Xiaofeng Wang, Yixiang Chen, Zheng Zhu, Bowen Fang, Guan Huang, Xinze Chen, Yun Ye, Qiang Zhang, et al. Egodemogen: Novel egocentric demonstration generation enables viewpoint-robust manipulation.arXiv preprint arXiv:2509.22578, 2025

arXiv 2025

[15] [15]

Softmimic- gen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, and Ajay Mandlekar. Softmimic- gen: A data generation system for scalable robot learning in deformable object manipulation.arXiv preprint arXiv:2603.25725, 2026

arXiv 2026

[16] [16]

Sim1: Physics-aligned simulator as zero-shot data scaler in deformable worlds

Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, et al. Sim1: Physics-aligned simulator as zero-shot data scaler in deformable worlds. arXiv preprint arXiv:2604.08544, 2026

Pith/arXiv arXiv 2026

[17] [17]

Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey.The International Journal of Robotics Research, 37(7):688–716, 2018

Jose Sanchez, Juan-Antonio Corrales, Belhassen-Chedli Bouzgarrou, and Youcef Mezouar. Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey.The International Journal of Robotics Research, 37(7):688–716, 2018

2018

[18] [18]

Modeling, learning, perception, and control methods for deformable object manipulation.Science Robotics, 6(54):eabd8803, 2021

Hang Yin, Anastasia Varava, and Danica Kragic. Modeling, learning, perception, and control methods for deformable object manipulation.Science Robotics, 6(54):eabd8803, 2021

2021

[19] [19]

Real-to-sim robot policy evaluation with gaussian splatting simulation of soft-body interactions.arXiv preprint arXiv:2511.04665, 2025

Kaifeng Zhang, Shuo Sha, Hanxiao Jiang, Matthew Loper, Hyunjong Song, Guangyan Cai, Zhuo Xu, Xiaochen Hu, Changxi Zheng, and Yunzhu Li. Real-to-sim robot policy evaluation with gaussian splatting simulation of soft-body interactions.arXiv preprint arXiv:2511.04665, 2025

arXiv 2025

[20] [20]

High-fidelity simulated data generation for real-world zero-shot robotic manipulation learning with gaussian splatting.IEEE Robotics and Automation Letters, 11(5):5310–5317, 2026

Haoyu Zhao, Cheng Zeng, Linghao Zhuang, Yaxi Zhao, Shengke Xue, Hao Wang, Xingyue Zhao, Zhongyu Li, Kehan Li, Siteng Huang, Mingxiu Chen, Xin Li, Deli Zhao, and Hua Zou. High-fidelity simulated data generation for real-world zero-shot robotic manipulation learning with gaussian splatting.IEEE Robotics and Automation Letters, 11(5):5310–5317, 2026. doi: 10...

work page doi:10.1109/lra.2026.3671535 2026

[21] [21]

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The Robot Learning Benchmark & Learning Environment.arXiv preprint arXiv:1909.12271, 2019

arXiv 1909

[22] [22]

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation. InInternational Conference on Machine Learning, 2024

2024

[23] [23]

Rl-driven data generation for robust vision-based dexterous grasping.arXiv preprint arXiv:2504.18084, 2025

Atsushi Kanehira, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu, and Katsushi Ikeuchi. Rl-driven data generation for robust vision-based dexterous grasping.arXiv preprint arXiv:2504.18084, 2025

arXiv 2025

[24] [24]

Semantically controllable augmentations for generalizable robot learning.The International Journal of Robotics Research, 44(10-11):1705–1726, 2025

Zoey Chen, Zhao Mandi, Homanga Bharadhwaj, Mohit Sharma, Shuran Song, Abhishek Gupta, and Vikash Kumar. Semantically controllable augmentations for generalizable robot learning.The International Journal of Robotics Research, 44(10-11):1705–1726, 2025

2025

[25] [25]

Gigabrain-0: A world model-powered vision-language-action model

GigaAI. Gigabrain-0: A world model-powered vision-language-action model. 2025. URLhttps://arxiv.org/ abs/2510.19430

arXiv 2025

[26] [26]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Jim Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 16923–16930. IEEE, 2025

2025

[27] [27]

Dreamgen: Unlocking generalization in robot learning through neural trajectories.arXiv preprint, 2025

Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. Dreamgen: Unlocking generalization in robot learning through neural trajectories.arXiv preprint, 2025

2025

[28] [28]

Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory

Ying Li, Xiaobao Wei, Xiaowei Chi, Yuming Li, Zhongyu Zhao, Hao Wang, Ningning Ma, Ming Lu, and Sirui Han. Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 6644–6652, 2026

2026

[29] [29]

Oxe-auge: Alarge-scalerobotaugmentationofoxeforscalingcross-embodiment policy learning.arXiv preprint arXiv:2512.13100, 2025

Guanhua Ji, Harsha Polavaram, Lawrence Yunliang Chen, Sandeep Bajamahal, Zehan Ma, Simeon Adebola, ChenfengXu, andKenGoldberg. Oxe-auge: Alarge-scalerobotaugmentationofoxeforscalingcross-embodiment policy learning.arXiv preprint arXiv:2512.13100, 2025

arXiv 2025

[30] [30]

Robovip: Multi-view video generation with visual identity prompting augments robot manipulation.arXiv preprint arXiv:2601.05241, 2026

Boyang Wang, Haoran Zhang, Shujie Zhang, Jinkun Hao, Mingda Jia, Qi Lv, Yucheng Mao, Zhaoyang Lyu, Jia Zeng, Xudong Xu, et al. Robovip: Multi-view video generation with visual identity prompting augments robot manipulation.arXiv preprint arXiv:2601.05241, 2026

arXiv 2026

[31] [31]

One demo is worth a thousand trajectories: Action-view augmentation for visuomotor policies

Chuer Pan, Litian Liang, Dominik Bauer, Eric Cousineau, Benjamin Burchfiel, Siyuan Feng, and Shuran Song. One demo is worth a thousand trajectories: Action-view augmentation for visuomotor policies. In9th Annual Conference on Robot Learning, 2025

2025

[32] [32]

Real2render2real: Scaling robot data without dynamics simulation or robot hardware,

Justin Yu, Letian Fu, Huang Huang, Karim El-Refai, Rares Andrei Ambrus, Richard Cheng, Muhammad Zubair Irshad, and Ken Goldberg. Real2render2real: Scaling robot data without dynamics simulation or robot hardware,

[33] [33]

URLhttps://arxiv.org/abs/2505.09601

arXiv

[34] [34]

Real2edit2real: Generating robotic demonstrations via a 3d control interface.arXiv preprint arXiv:2512.19402, 2025

Yujie Zhao, Hongwei Fan, Di Chen, Shengcong Chen, Liliang Chen, Xiaoqi Li, Guanghui Ren, and Hao Dong. Real2edit2real: Generating robotic demonstrations via a 3d control interface.arXiv preprint arXiv:2512.19402, 2025

arXiv 2025

[35] [35]

Skillmimicgen: Automated demonstration gener- ation for efficient skill learning and deployment.arXiv preprint arXiv:2410.18907, 2024

Caelan Garrett, Ajay Mandlekar, Bowen Wen, and Dieter Fox. Skillmimicgen: Automated demonstration gener- ation for efficient skill learning and deployment.arXiv preprint arXiv:2410.18907, 2024

arXiv 2024

[36] [36]

3d gaussian splatting for real-time radiance field rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

2023

[37] [37]

Deformation constraints in a mass-spring model to describe rigid cloth behaviour

Xavier Provot et al. Deformation constraints in a mass-spring model to describe rigid cloth behaviour. InGraphics interface, pages 147–147. Canadian Information Processing Society, 1995

1995

[38] [38]

Real-time elastic deformations of soft tissues for surgery simulation.IEEE transactions on Visualization and Computer Graphics, 5(1):62–73, 2002

Stéphane Cotin, Hervé Delingette, and Nicholas Ayache. Real-time elastic deformations of soft tissues for surgery simulation.IEEE transactions on Visualization and Computer Graphics, 5(1):62–73, 2002

2002

[39] [39]

A moving least squares material point method with displacement discontinuity and two-way rigid body coupling.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

Yuanming Hu, Yu Fang, Ziheng Ge, Ziyin Qu, Yixin Zhu, Andre Pradhana, and Chenfanfu Jiang. A moving least squares material point method with displacement discontinuity and two-way rigid body coupling.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

2018

[40] [40]

Position based dynamics.Journal of Visual Communication and Image Representation, 18(2):109–118, 2007

Matthias Müller, Bruno Heidelberger, Marcus Hennix, and John Ratcliff. Position based dynamics.Journal of Visual Communication and Image Representation, 18(2):109–118, 2007

2007

[41] [41]

May, Tushar Kusnur, George Konidaris, and Laura Herlant

Sergio Orozco, Brandon B. May, Tushar Kusnur, George Konidaris, and Laura Herlant. Learning equivariant neural-augmented object dynamics from few interactions. InBeyond Rigid Worlds: Representing and Interacting with Non-Rigid Objects, 2025. URLhttps://openreview.net/forum?id=JAiJpFozaD

2025

[42] [42]

Tenenbaum, David Held, and Chuang Gan

Xingyu Lin, Zhiao Huang, Yunzhu Li, Joshua B. Tenenbaum, David Held, and Chuang Gan. Diffskill: Skill ab- straction from differentiable physics for deformable object manipulations with tools. InInternational Conference on Learning Representations (ICLR), 2022

2022

[43] [43]

Robocook: Long-horizon elasto-plastic object manipulation with diverse tools

Haochen Shi, Huazhe Xu, Samuel Clarke, Yunzhu Li, and Jiajun Wu. Robocook: Long-horizon elasto-plastic object manipulation with diverse tools. InConference on Robot Learning (CoRL), 2023

2023

[44] [44]

Predicting object interactions with behavior primitives: An application in stowing tasks

Haonan Chen, Yilong Niu, Kaiwen Hou, Shuijing Liu, Yixuan Wang, Yunzhu Li, and Katherine Driggs-Campbell. Predicting object interactions with behavior primitives: An application in stowing tasks. InConference on Robot Learning (CoRL), 2023

2023

[45] [45]

Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects

Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, and Dieter Fox. Defgraspsim: Physics-based simulation of grasp outcomes for 3d deformable objects. InIEEE International Conference on Robotics and Automation (ICRA), 2022

2022

[46] [46]

Robotic manipulation of deformable objects: a comprehensive review.Robotic Intelligence and Automation, pages 1–16, 2026

Lijun Han and Hesheng Wang. Robotic manipulation of deformable objects: a comprehensive review.Robotic Intelligence and Automation, pages 1–16, 2026

2026

[47] [47]

A perspective on open challenges in deformable object manipulation

Ryan Paul McKennaa and John Oyekan. A perspective on open challenges in deformable object manipulation. arXiv preprint arXiv:2602.22998, 2026

arXiv 2026

[48] [48]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InThe International Journal of Robotics Research, 2024

2024

[49] [49]

Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InICLR, 2023

Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InICLR, 2023

2023

[50] [50]

Tiebot: Learning to knot a tie from visual demonstration through a real-to-sim-to-real approach.arXiv preprint arXiv:2407.03245, 2024

Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, and Lin Shao. Tiebot: Learning to knot a tie from visual demonstration through a real-to-sim-to-real approach.arXiv preprint arXiv:2407.03245, 2024

arXiv 2024

[51] [51]

Robotic assembly of deformable linear objects via curriculum reinforcement learning.IEEE Robotics and Automation Letters, 2025

Kai Wu, Rongkang Chen, Qi Chen, and Weihua Li. Robotic assembly of deformable linear objects via curriculum reinforcement learning.IEEE Robotics and Automation Letters, 2025

2025

[52] [52]

Checheng Yu, Chonghao Sima, Gangcheng Jiang, Hai Zhang, Haoguang Mai, Hongyang Li, Huijie Wang, Jin Chen, Kaiyang Wu, Li Chen, Lirui Zhao, Modi Shi, Ping Luo, Qingwen Bu, Shijia Peng, Tianyu Li, and Yibo Yuan.χ 0: Resource-aware robust manipulation via taming distributional inconsistencies.arXiv preprint arXiv:2602.09021, 2026

arXiv 2026

[53] [53]

Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor

Daniel Seita, Aditya Ganapathi, Ryan Hoque, Minho Hwang, Edward Cen, Ajay Kumar Tanwani, Ashwin Balakrishna, Brijen Thananjeyan, Jeffrey Ichnowski, Nawid Jamali, et al. Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

2020

[54] [54]

Fabricflownet: Bimanual cloth manipulation with a flow-based policy

Thomas Weng, Sujay Bajracharya, Yufei Wang, Khush Agrawal, and David Held. Fabricflownet: Bimanual cloth manipulation with a flow-based policy. InConference on Robot Learning (CoRL), 2022

2022

[55] [55]

Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy

Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, and Hao Dong. Dexgarmentlab: Dexterous garment manipulation environment with generalizable policy. arXiv preprint arXiv:2505.11032, 2025

arXiv 2025

[56] [56]

Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding

Huy Ha and Shuran Song. Flingbot: The unreasonable effectiveness of dynamic manipulation for cloth unfolding. InConference on Robot Learning (CoRL), pages 24–33. PMLR, 2022

2022

[57] [57]

Phystwin: Physics- informed reconstruction and simulation of deformable objects from videos.ICCV, 2025

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable objects from videos.ICCV, 2025

2025

[58] [58]

Softgym: Benchmarking deep reinforcement learning for deformable object manipulation

Xingyu Lin, Yufei Wang, Jake Olkin, and David Held. Softgym: Benchmarking deep reinforcement learning for deformable object manipulation. InConference on Robot Learning, 2020

2020

[59] [59]

Taichi: a language for high-performance computation on spatially sparse data structures.ACM Transactions on Graphics (TOG), 38 (6):1–16, 2019

Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. Taichi: a language for high-performance computation on spatially sparse data structures.ACM Transactions on Graphics (TOG), 38 (6):1–16, 2019

2019

[60] [60]

Warp: A high-performance python framework for gpu simulation and graphics

Miles Macklin. Warp: A high-performance python framework for gpu simulation and graphics. InNVIDIA GPU Technology Conference (GTC), volume 3, 2022

2022

[61] [61]

Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy.arXiv preprint arXiv:2511.16651, 2025

Yang Tian, Yuyin Yang, Yiman Xie, Zetao Cai, Xu Shi, Ning Gao, Hangxu Liu, Xuekun Jiang, Zherui Qiu, Feng Yuan, et al. Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy.arXiv preprint arXiv:2511.16651, 2025

arXiv 2025

[62] [62]

Learning from demonstrations through the use of non-rigid registration

John Schulman, Jonathan Ho, Cameron Lee, and Pieter Abbeel. Learning from demonstrations through the use of non-rigid registration. InRobotics Research: The 16th International Symposium ISRR, pages 339–354. Springer, 2016

2016

[63] [63]

Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint, 2023

Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint, 2023

2023

[64] [64]

Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2023

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2023

2023

[65] [65]

Smolvla: A vision-language-action model for affordable and efficient robotics.arXiv preprint, 2025

Mustafa Shukor, Dana Aubakirova, Francesco Capuano, Pepijn Kooijmans, Steven Palma, Adil Zouitine, Michel Aractingi, Caroline Pascal, Martino Russi, Andres Marafioti, et al. Smolvla: A vision-language-action model for affordable and efficient robotics.arXiv preprint, 2025. Appendix A State Augmentation Details A.1 Formal Assumption Our approach relies on ...

2025