MimicIK: Real-Time Generative Inverse Kinematics from Teleoperation with FK Consistency

Chengsi Yao; Fan Feng; Ge Wang; Jiahao Yang; Shenhao Yan; Yatong Han; Yiming Zhao; Zhixin Mai

arxiv: 2606.15148 · v2 · pith:VOAJQ3DAnew · submitted 2026-06-13 · 💻 cs.RO · cs.AI

MimicIK: Real-Time Generative Inverse Kinematics from Teleoperation with FK Consistency

Jiahao Yang , Shenhao Yan , Fan Feng , Chengsi Yao , Ge Wang , Zhixin Mai , Yiming Zhao , Yatong Han This is my paper

Pith reviewed 2026-06-27 04:35 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords inverse kinematicsflow matchinggenerative modelsteleoperationrobot manipulationforward kinematicsreal-time controlsingularities

0 comments

The pith

MimicIK uses conditional flow matching on teleoperation data plus an FK consistency loss to generate stable real-time joint commands for inverse kinematics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MimicIK as a generative method that learns joint-space motion priors directly from 8,848 human teleoperation demonstrations to solve inverse kinematics for a 6-DOF robot. It conditions a flow-matching model on the current joint state and target end-effector pose, then applies a two-step iterative refinement and a differentiable forward-kinematics loss during training to keep predicted motions physically consistent. This combination is intended to deliver both the spatial accuracy of classical solvers and the smoothness and speed required for closed-loop deployment, while avoiding the discontinuous jumps near singularities that affect numerical methods and the divergence seen in deterministic learned baselines. A reader would care because successful real-time IK from noisy demonstration data would let robots execute manipulation tasks at 20 Hz without custom singularity handling or hand-tuned solvers.

Core claim

MimicIK is a generative inverse-kinematics framework that predicts continuous delta-joint commands via conditional flow matching with a Minimal Iterative Policy backbone; an FK consistency loss is added during training on teleoperation data so that the resulting model produces accurate, smooth trajectories that remain stable near kinematic singularities and support 20 Hz closed-loop control.

What carries the argument

Conditional flow matching model with two-step Minimal Iterative Policy refinement, regularized by a differentiable forward-kinematics consistency loss that penalizes task-space deviation from the target pose.

If this is right

The model reaches 4.65 mm mean position error and 92.01 percent success within 10 mm on held-out teleoperation trajectories.
Inference latency drops to 6.74 ms, enabling 20 Hz real-time control on deployment hardware.
Trajectory spike rate falls to 7.99 percent, producing smoother motion than a UNet diffusion baseline.
The generative approach remains stable near singular configurations where deterministic MLP baselines diverge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Because the method is trained only on demonstration data, new robot geometries could be handled by collecting fresh teleoperation traces without changing the loss or architecture.
The observed stability near singularities could let higher-level task planners ignore explicit singularity avoidance, simplifying overall motion planning pipelines.
The two-step MIP refinement might transfer to other conditional generative robotics problems such as trajectory forecasting or contact-rich control.

Load-bearing premise

The FK consistency loss applied on the training demonstrations will continue to guarantee physically consistent, non-divergent joint commands when the two-step refinement is run in closed-loop control near kinematic singularities.

What would settle it

Run the trained MimicIK model in real-time closed-loop control on the 6-DOF robot while driving the end-effector toward a known kinematic singularity and check whether joint commands remain continuous and the end-effector converges without spikes or divergence.

Figures

Figures reproduced from arXiv: 2606.15148 by Chengsi Yao, Fan Feng, Ge Wang, Jiahao Yang, Shenhao Yan, Yatong Han, Yiming Zhao, Zhixin Mai.

**Figure 1.** Figure 1: Paradigm Shift in EEF-Level Robot Data Scaling. Left: Conventional robot-specific IK conversion fragments shared EEF demonstrations into isolated joint-space datasets. Right: MimicIK keeps VLA training in embodiment-agnostic EEF space and performs robot-specific real-time IK adaptation only at deployment. 1 Introduction Inspired by the massive flywheel effects demonstrated by large language models on inte… view at source ↗

**Figure 2.** Figure 2: Architecture of the MimicIK Framework. The model receives an observation history comprising the current joint configuration (qcurr), current end-effector pose (xcurr), and target endeffector pose (xtgt). This condition is injected into a pose-conditioned delta-joint flow generator (parameterized by a SuDeepDiT Transformer), which decodes the continuous joint displacement (∆q) via a highly efficient two-s… view at source ↗

**Figure 3.** Figure 3: Hardware Setup and Dataset Distribution. (a) The AIRBOT dual-arm platform used for human teleoperation data collection. (b) The 3D heatmap visualizes the extensive spatial distribution of the 8,848 real-robot trajectories. Color intensity indicates orientation diversity within each voxel, highlighting the rich, multi-modal human motion priors captured in our dataset. 4 Experiment Our experiments are design… view at source ↗

**Figure 4.** Figure 4: FK consistency loss improves training stability. Across three seeds, FK regularization prevents catastrophic drift and reduces cross-seed variance by 7.5×. The Crucial Role of FK Consistency: Ablation on the FK loss reveals that its primary contribution is acting as a powerful structural regularizer that guarantees training stability ( [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative Comparison during Real-Robot OOD Deployment. (Top) Traditional numerical solvers (e.g., cuRobo) fall into singularity traps near singular configurations, executing erratic wrist flips (red circles) that risk hardware collision. (Bottom) MimicIK (Ours) leverages conditional flow matching to inject human-like ”self-rescuing” priors, effortlessly navigating the same trajectory with strict temporal… view at source ↗

**Figure 6.** Figure 6: MIP Singularity Recovery. MimicIK gracefully navigates out of kinematic dead-ends with strict temporal smoothness, avoiding erratic wrist flips. Physical deployment imposes stringent safety and continuity constraints that are absent in offline Cartesian evaluation. The 6-DOF AIRBOT platform used in our experiments operates at a 20 Hz closed-loop control frequency (a 50 ms cycle budget). At the hardware l… view at source ↗

**Figure 7.** Figure 7: Joint-Space Error and Smoothness Comparison. Trajectory analysis comparing our generative MimicIK against a traditional numerical solver (Pinocchio). Around control step 38, the unconstrained numerical solver triggers a catastrophic branch switch (a ”wrist flip” dropping by nearly 60 radians instantaneously) to minimize local Cartesian error, which would inevitably trigger hardware E-stops. In contrast, Mi… view at source ↗

read the original abstract

Inverse kinematics (IK) remains a critical bottleneck for real-time robot manipulation. Classical numerical solvers achieve high geometric precision but often suffer from discontinuous branch switching and unstable behavior near kinematic singularities during closed-loop deployment. Meanwhile, learned IK approaches frequently struggle to balance spatial accuracy, motion smoothness, and real-time efficiency, particularly when trained on noisy human teleoperation data. We present \textbf{MimicIK}, a real-time generative inverse kinematics framework that learns smooth and robust joint-space motion priors from teleoperation demonstrations through conditional flow matching. Given the current joint configuration and a target end-effector pose, MimicIK predicts continuous delta-joint commands using an efficient two-step iterative refinement process based on a Minimal Iterative Policy (MIP) backbone. To enforce physical consistency, we further introduce an FK consistency loss, a differentiable forward-kinematics regularization that penalizes task-space deviations from the target pose during training. We evaluate MimicIK on a real-world 6-DOF robot dataset containing 8,848 teleoperation demonstrations. MimicIK achieves a mean position error of 4.65 mm, a 10 mm success rate of 92.01\%, and a trajectory spike rate of only 7.99\%. Compared with a UNet diffusion baseline, our method improves both spatial accuracy and motion smoothness while reducing inference latency from 21.66 ms to 6.74 ms. Furthermore, unlike deterministic MLP baselines that catastrophically diverge under out-of-distribution deployment, MimicIK remains stable near singular configurations and enables robust 20 Hz real-time control on deployment hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MimicIK applies conditional flow matching with MIP refinement and an FK consistency loss to teleop IK, delivering lower latency than diffusion baselines and claiming stability near singularities, but the closed-loop evidence is thin.

read the letter

MimicIK uses conditional flow matching on teleop data with an FK consistency loss and MIP refinement to produce real-time joint commands for IK, and it reports lower latency and better smoothness than a diffusion baseline while claiming stability near singularities.

The approach is new in applying flow matching this way to delta-joint prediction for teleoperation IK, and the FK loss is a simple but useful addition to enforce consistency. The results on the 8848 demo dataset look solid for the metrics they report, with clear improvements in speed and the success rate.

The soft spot is the deployment stability. The abstract gives no sign that the training set has many singular configurations, and there's no mention of extra checks for closed-loop use at 20 Hz. If the generative model can still produce divergent commands when the arm approaches a singularity, the reported numbers from test splits won't tell us if it works in practice. The two-step MIP might mitigate this, but it's not shown.

This is aimed at people building real-time robot manipulation systems from human data. A reader in that area would get some useful ideas from the method and the latency gains. It deserves peer review to see the full details on the experiments and whether the stability holds up under the conditions they claim.

Referee Report

2 major / 2 minor

Summary. The paper presents MimicIK, a generative inverse kinematics framework that uses conditional flow matching on 8,848 teleoperation demonstrations to predict delta-joint commands via a two-step Minimal Iterative Policy (MIP) refinement process. An FK consistency loss is introduced as differentiable regularization to enforce task-space consistency during training. On a 6-DOF robot dataset, it reports 4.65 mm mean position error, 92.01% 10 mm success rate, 7.99% trajectory spike rate, 6.74 ms inference latency (vs. 21.66 ms for UNet diffusion), and claims stability near kinematic singularities for 20 Hz closed-loop control, unlike diverging MLP baselines.

Significance. If the quantitative claims and stability results hold under rigorous evaluation, the work could provide a practical real-time IK solution for teleoperated manipulation by showing that flow-matching priors plus FK regularization can improve accuracy, smoothness, and latency over diffusion and deterministic baselines while avoiding catastrophic divergence. The emphasis on deployment hardware and teleoperation data is a strength if the closed-loop behavior is demonstrated.

major comments (2)

[Abstract] Abstract: The central claim that MimicIK 'remains stable near singular configurations and enables robust 20 Hz real-time control' rests on the FK consistency loss applied to the 8,848 demonstrations, yet no experiment, metric, or test distribution is described that evaluates closed-loop MIP refinement near singularities (e.g., no singularity-aware sampling, velocity limits, or out-of-distribution closed-loop trials). The reported metrics appear confined to in-distribution test splits, so the deployment stability claim cannot be assessed.
[Abstract] Abstract: The comparisons to the UNet diffusion baseline (latency reduction from 21.66 ms to 6.74 ms, improved accuracy and smoothness) and MLP baselines (catastrophic divergence) are presented without any information on baseline implementations, training procedures, data splits, or statistical testing. This renders the quantitative superiority claims unevaluable and load-bearing for the paper's contribution.

minor comments (2)

[Abstract] The acronym 'MIP' (Minimal Iterative Policy) is introduced without an initial definition or citation to prior work.
[Abstract] No mention of how singularities were identified or sampled in the dataset, which would aid reproducibility of the stability claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting areas where the abstract claims require stronger substantiation. We address each major comment below and will incorporate revisions to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that MimicIK 'remains stable near singular configurations and enables robust 20 Hz real-time control' rests on the FK consistency loss applied to the 8,848 demonstrations, yet no experiment, metric, or test distribution is described that evaluates closed-loop MIP refinement near singularities (e.g., no singularity-aware sampling, velocity limits, or out-of-distribution closed-loop trials). The reported metrics appear confined to in-distribution test splits, so the deployment stability claim cannot be assessed.

Authors: We agree that the manuscript does not describe dedicated experiments, metrics, or test distributions for closed-loop MIP refinement specifically near singularities. The stability claim derives from the generative flow-matching prior combined with the FK consistency loss, which was observed to prevent divergence in real-world 20 Hz deployment trials (unlike the MLP baseline). To substantiate this, we will add a new subsection in the Experiments section with singularity-aware sampling, velocity-limited closed-loop trials, and out-of-distribution metrics. revision: yes
Referee: [Abstract] Abstract: The comparisons to the UNet diffusion baseline (latency reduction from 21.66 ms to 6.74 ms, improved accuracy and smoothness) and MLP baselines (catastrophic divergence) are presented without any information on baseline implementations, training procedures, data splits, or statistical testing. This renders the quantitative superiority claims unevaluable and load-bearing for the paper's contribution.

Authors: Details on the UNet diffusion and MLP baseline architectures, training procedures, and shared data splits are provided in Section 4 (Experiments). However, we acknowledge that explicit hyperparameter listings, pseudocode, and statistical testing (e.g., means and standard deviations across seeds) are insufficiently detailed. We will expand this section with the requested information and add statistical analysis to make the comparisons fully evaluable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with no derivation chain or self-referential reductions

full rationale

The paper describes an applied ML framework (conditional flow matching + MIP refinement + FK consistency loss) trained on 8,848 teleoperation trajectories and evaluated on held-out test splits for position error, success rate, and spike rate. No equations, first-principles derivations, or predictive claims that reduce to fitted inputs by construction appear in the provided text. The FK loss is presented as an independent regularization term during training; reported metrics are direct empirical outcomes on the dataset rather than outputs forced by the loss definition itself. No self-citations or uniqueness theorems are invoked as load-bearing. The work is therefore self-contained as an empirical engineering result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities. The described components (conditional flow matching, FK consistency loss, MIP backbone) are referenced at a high level without equations or implementation details that would reveal fitted values or background assumptions.

pith-pipeline@v0.9.1-grok · 5840 in / 1478 out tokens · 60083 ms · 2026-06-27T04:35:52.970338+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 11 linked inside Pith

[1]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Pith/arXiv arXiv 2022
[2]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning, pages 2165–2183. PMLR, 2023

2023
[3]

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots.arXiv preprint arXiv:2402.10329, 2024

Pith/arXiv arXiv 2024
[4]

S. Buss. Introduction to inverse kinematics with jacobian transpose, pseudoinverse and damped least squares methods. 2004

2004
[5]

Chiaverini

S. Chiaverini. Singularity-robust task-priority redundancy resolution for real-time kinematic control of robot manipulators.IEEE Transactions on Robotics and Automation, 13(3):398– 410, 2002

2002
[6]

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023

Pith/arXiv arXiv 2023
[7]

Mandlekar, D

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipulation.arXiv preprint arXiv:2108.03298, 2021

Pith/arXiv arXiv 2021
[8]

Florence, C

P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mor- datch, and J. Tompson. Implicit behavioral cloning. InConference on robot learning, pages 158–168. PMLR, 2022

2022
[9]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10-11):1684–1704, 2025

2025
[10]

Pearce, T

T. Pearce, T. Rashid, A. Kanervisto, D. Bignell, M. Sun, R. Georgescu, S. V . Macua, S. Z. Tan, I. Momennejad, K. Hofmann, et al. Imitating human behaviour with diffusion models.arXiv preprint arXiv:2301.10677, 2023

arXiv 2023
[11]

J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

Pith/arXiv arXiv 2010
[12]

Lipman, R

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

Pith/arXiv arXiv 2022
[13]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

Pith/arXiv arXiv 2022
[14]

Dasari, F

S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn. Robonet: Large-scale multi-robot learning.arXiv preprint arXiv:1910.11215, 2019

Pith/arXiv arXiv 1910
[15]

H. R. Walke, K. Black, T. Z. Zhao, Q. Vuong, C. Zheng, P. Hansen-Estruch, A. W. He, V . My- ers, M. J. Kim, M. Du, et al. Bridgedata v2: A dataset for robot learning at scale. InConference on Robot Learning, pages 1723–1736. PMLR, 2023

2023
[16]

Khazatsky, K

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset.arXiv preprint arXiv:2403.12945, 2024. 9

Pith/arXiv arXiv 2024
[17]

O’Neill, A

A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024

2024
[18]

O. Mees, D. Ghosh, K. Pertsch, K. Black, H. R. Walke, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, et al. Octo: An open-source generalist robot policy. InFirst Workshop on Vision- Language Models for Navigation and Manipulation at ICRA 2024, 2024

2024
[19]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

Pith/arXiv arXiv 2024
[20]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A Vision-Language-Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164, 2024

Pith/arXiv arXiv 2024
[21]

J. Yu, L. Fu, H. Huang, K. El-Refai, R. A. Ambrus, R. Cheng, M. Z. Irshad, and K. Goldberg. Real2render2real: Scaling robot data without dynamics simulation or robot hardware.arXiv preprint arXiv:2505.09601, 2025

arXiv 2025
[22]

Beeson and B

P. Beeson and B. Ames. Trac-ik: An open-source library for improved solving of generic inverse kinematics. In2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935. IEEE, 2015

2015
[23]

Carpentier, G

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard. The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algo- rithms and their analytical derivatives. In2019 IEEE/SICE International Symposium on System Integration (SII), pages 614–619. IEEE, 2019

2019
[24]

Sundaralingam, S

B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk, V . Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos, et al. curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023

arXiv 2023
[25]

Coumans and Y

E. Coumans and Y . Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016

2016
[26]

B. Ames, J. Morgan, and G. Konidaris. Ikflow: Generating diverse inverse kinematics solu- tions.IEEE Robotics and Automation Letters, 7(3):7177–7184, 2022

2022
[27]

Zhang and Z

Z. Zhang and Z. Jiao. Ikdiffuser: Fast and diverse inverse kinematics solution generation for multi-arm robotic systems.arXiv e-prints, pages arXiv–2506, 2025

2025
[28]

Braun, N

M. Braun, N. Jaquier, L. Rozo, and T. Asfour. Riemannian flow matching policy for robot mo- tion learning. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5144–5151. IEEE, 2024

2024
[29]

C. Pan, G. Anantharaman, N.-C. Huang, C. Jin, D. Pfrommer, C. Yuan, F. Permenter, G. Qu, N. Boffi, G. Shi, et al. Much ado about noising: Dispelling the myths of generative robotic control.arXiv preprint arXiv:2512.01809, 2025

arXiv 2025
[30]

M ¨olschl, J

L. M ¨olschl, J. J. Hollenstein, and J. Piater. Differentiable forward kinematics for tensorflow 2. arXiv preprint arXiv:2301.09954, 2023

arXiv 2023
[31]

Cadene, S

R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, S. Palma, P. Kooijmans, M. Ar- actingi, M. Shukor, D. Aubakirova, M. Russi, F. Capuano, C. Pascal, J. Choghari, J. Moss, and T. Wolf. Lerobot: State-of-the-art machine learning for real-world robotics in pytorch. https://github.com/huggingface/lerobot, 2024

2024
[32]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 10 6 APPENDIX 6.1 A Real-Robot Deployment and Singularity Escapes This appendix provides extended quantitative details regarding the physical deployment environ- ment discussed in Section 4.4, acting as a textual c...

arXiv 2020

[1] [1]

Brohan, N

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

Pith/arXiv arXiv 2022

[2] [2]

Zitkovich, T

B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning, pages 2165–2183. PMLR, 2023

2023

[3] [3]

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots.arXiv preprint arXiv:2402.10329, 2024

Pith/arXiv arXiv 2024

[4] [4]

S. Buss. Introduction to inverse kinematics with jacobian transpose, pseudoinverse and damped least squares methods. 2004

2004

[5] [5]

Chiaverini

S. Chiaverini. Singularity-robust task-priority redundancy resolution for real-time kinematic control of robot manipulators.IEEE Transactions on Robotics and Automation, 13(3):398– 410, 2002

2002

[6] [6]

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023

Pith/arXiv arXiv 2023

[7] [7]

Mandlekar, D

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipulation.arXiv preprint arXiv:2108.03298, 2021

Pith/arXiv arXiv 2021

[8] [8]

Florence, C

P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mor- datch, and J. Tompson. Implicit behavioral cloning. InConference on robot learning, pages 158–168. PMLR, 2022

2022

[9] [9]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10-11):1684–1704, 2025

2025

[10] [10]

Pearce, T

T. Pearce, T. Rashid, A. Kanervisto, D. Bignell, M. Sun, R. Georgescu, S. V . Macua, S. Z. Tan, I. Momennejad, K. Hofmann, et al. Imitating human behaviour with diffusion models.arXiv preprint arXiv:2301.10677, 2023

arXiv 2023

[11] [11]

J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

Pith/arXiv arXiv 2010

[12] [12]

Lipman, R

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

Pith/arXiv arXiv 2022

[13] [13]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

Pith/arXiv arXiv 2022

[14] [14]

Dasari, F

S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn. Robonet: Large-scale multi-robot learning.arXiv preprint arXiv:1910.11215, 2019

Pith/arXiv arXiv 1910

[15] [15]

H. R. Walke, K. Black, T. Z. Zhao, Q. Vuong, C. Zheng, P. Hansen-Estruch, A. W. He, V . My- ers, M. J. Kim, M. Du, et al. Bridgedata v2: A dataset for robot learning at scale. InConference on Robot Learning, pages 1723–1736. PMLR, 2023

2023

[16] [16]

Khazatsky, K

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset.arXiv preprint arXiv:2403.12945, 2024. 9

Pith/arXiv arXiv 2024

[17] [17]

O’Neill, A

A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024

2024

[18] [18]

O. Mees, D. Ghosh, K. Pertsch, K. Black, H. R. Walke, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, et al. Octo: An open-source generalist robot policy. InFirst Workshop on Vision- Language Models for Navigation and Manipulation at ICRA 2024, 2024

2024

[19] [19]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

Pith/arXiv arXiv 2024

[20] [20]

Black, N

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A Vision-Language-Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164, 2024

Pith/arXiv arXiv 2024

[21] [21]

J. Yu, L. Fu, H. Huang, K. El-Refai, R. A. Ambrus, R. Cheng, M. Z. Irshad, and K. Goldberg. Real2render2real: Scaling robot data without dynamics simulation or robot hardware.arXiv preprint arXiv:2505.09601, 2025

arXiv 2025

[22] [22]

Beeson and B

P. Beeson and B. Ames. Trac-ik: An open-source library for improved solving of generic inverse kinematics. In2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935. IEEE, 2015

2015

[23] [23]

Carpentier, G

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard. The pinocchio c++ library: A fast and flexible implementation of rigid body dynamics algo- rithms and their analytical derivatives. In2019 IEEE/SICE International Symposium on System Integration (SII), pages 614–619. IEEE, 2019

2019

[24] [24]

Sundaralingam, S

B. Sundaralingam, S. K. S. Hari, A. Fishman, C. Garrett, K. Van Wyk, V . Blukis, A. Millane, H. Oleynikova, A. Handa, F. Ramos, et al. curobo: Parallelized collision-free minimum-jerk robot motion generation.arXiv preprint arXiv:2310.17274, 2023

arXiv 2023

[25] [25]

Coumans and Y

E. Coumans and Y . Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016

2016

[26] [26]

B. Ames, J. Morgan, and G. Konidaris. Ikflow: Generating diverse inverse kinematics solu- tions.IEEE Robotics and Automation Letters, 7(3):7177–7184, 2022

2022

[27] [27]

Zhang and Z

Z. Zhang and Z. Jiao. Ikdiffuser: Fast and diverse inverse kinematics solution generation for multi-arm robotic systems.arXiv e-prints, pages arXiv–2506, 2025

2025

[28] [28]

Braun, N

M. Braun, N. Jaquier, L. Rozo, and T. Asfour. Riemannian flow matching policy for robot mo- tion learning. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5144–5151. IEEE, 2024

2024

[29] [29]

C. Pan, G. Anantharaman, N.-C. Huang, C. Jin, D. Pfrommer, C. Yuan, F. Permenter, G. Qu, N. Boffi, G. Shi, et al. Much ado about noising: Dispelling the myths of generative robotic control.arXiv preprint arXiv:2512.01809, 2025

arXiv 2025

[30] [30]

M ¨olschl, J

L. M ¨olschl, J. J. Hollenstein, and J. Piater. Differentiable forward kinematics for tensorflow 2. arXiv preprint arXiv:2301.09954, 2023

arXiv 2023

[31] [31]

Cadene, S

R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, S. Palma, P. Kooijmans, M. Ar- actingi, M. Shukor, D. Aubakirova, M. Russi, F. Capuano, C. Pascal, J. Choghari, J. Moss, and T. Wolf. Lerobot: State-of-the-art machine learning for real-world robotics in pytorch. https://github.com/huggingface/lerobot, 2024

2024

[32] [32]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 10 6 APPENDIX 6.1 A Real-Robot Deployment and Singularity Escapes This appendix provides extended quantitative details regarding the physical deployment environ- ment discussed in Section 4.4, acting as a textual c...

arXiv 2020