arxiv: 2604.15455 · v1 · submitted 2026-04-16 · 💻 cs.RO

Recognition: unknown

One-Shot Cross-Geometry Skill Transfer through Part Decomposition

Skye Thompson , Ondrej Biza , George Konidaris

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:26 UTC · model grok-4.3

classification 💻 cs.RO

keywords robot skill transferpart decompositionone-shot transfercross-geometry generalizationsemantic partsgenerative shape modelsobject manipulationimitation learning

0 comments

The pith

Decomposing objects into semantic parts lets robots transfer skills to unfamiliar shapes from one demonstration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robots typically fail when a demonstrated skill must be applied to objects whose shapes differ from the training example. The paper argues that first splitting both the demonstrated and target objects into consistent semantic parts, then using generative shape models to move the key interaction points onto the new geometry, produces reliable transfer without extra demonstrations. Alignment of those points is turned into an objective that the robot optimizes autonomously for each skill. If the decomposition stays consistent across shapes, this removes the need for shape-specific retraining and lets one example cover many objects. The result is shown in both simulation and real-robot trials for several manipulation skills.

Core claim

The central claim is that object decomposition into semantic parts combined with data-efficient generative shape models allows interaction points from a single demonstration to be transferred and aligned on novel geometries, producing successful one-shot skill transfer that works across a wider range of object shapes than prior methods, in both simulated and physical environments.

What carries the argument

Semantic part decomposition paired with generative shape models that map interaction points, followed by autonomous construction of an alignment objective on skill-relevant parts.

If this is right

One demonstration suffices for multiple skills and objects of varying shapes.
Transfer succeeds without shape-specific retraining or additional data collection.
The approach applies in both simulation and on physical robots.
Alignment optimization on the transferred points produces executable trajectories for the new geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If part labels remain stable across object categories, the same decomposition could support chaining multiple skills on composite objects.
The method might reduce the data needed for continual learning by reusing part-level mappings rather than whole-object policies.
Integration with online perception could let robots discover new parts during deployment without retraining the generative models.

Load-bearing premise

Objects can be broken into the same semantic parts across widely different overall shapes and the generative models can move interaction points accurately without further demonstrations or manual adjustment.

What would settle it

An experiment in which objects with the same functional parts but inconsistent geometry labels cause the transferred points to produce grasp or motion failures even after optimization.

Figures

Figures reproduced from arXiv: 2604.15455 by George Konidaris, Ondrej Biza, Skye Thompson.

**Figure 2.** Figure 2: Conditioning a skill trajectory on keypoints trans [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The full pipeline for skill transfer through part decomposition. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The red points represent example keypoints [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: We evaluated this method on objects of diverse geometry. The most common failure modes for whole object warping [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: An example of relational descriptors on a teapot. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Given a demonstration, a robot should be able to generalize a skill to any object it encounters-but existing approaches to skill transfer often fail to adapt to objects with unfamiliar shapes. Motivated by examples of improved transfer from compositional modeling, we propose a method for improving transfer by decomposing objects into their constituent semantic parts. We leverage data-efficient generative shape models to accurately transfer interaction points from the parts of a demonstration object to a novel object. We autonomously construct an objective to optimize the alignment of those points on skill-relevant object parts. Our method generalizes to a wider range of object geometries than existing work, and achieves successful one-shot transfer for a range of skills and objects from a single demonstration, in both simulated and real environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a workable method for one-shot robot skill transfer to new object shapes by breaking them into parts and using generative models to move contact points.

read the letter

The key takeaway is that this work demonstrates one-shot skill transfer across object geometries by decomposing them into semantic parts and using generative models to map interaction points. The paper does a solid job of addressing a practical issue in robotics: robots often struggle with new shapes even after seeing a skill once. By leveraging part decomposition, they extend compositional ideas to allow transfer without retraining or multiple examples. The use of data-efficient generative shape models to transfer points and then optimize alignment autonomously is a reasonable step forward. They back this with results in simulation and on physical robots for various skills and objects, which shows the method isn't just theoretical. What is new here is the specific pipeline that combines these elements for cross-geometry cases, going beyond previous compositional modeling by focusing on one-shot adaptation to unfamiliar shapes. On the soft spots, the central assumptions are that objects can be consistently broken into semantic parts across geometries and that the generative models can accurately predict the right points without additional data. These could be fragile in real scenarios with irregular or composite objects. The paper likely needs to show more on failure modes or how sensitive the results are to decomposition quality. Also, since it's one-shot, any error in the initial transfer gets amplified, so robustness testing would help. Overall, this paper is aimed at researchers in robot manipulation and learning who want better generalization. Someone looking for methods to improve skill transfer with limited data would get concrete ideas from it. It has enough of a method and experiments to warrant peer review, though revisions might be needed to strengthen the evaluation of the decomposition step.

Referee Report

0 major / 2 minor

Summary. The paper proposes a method for one-shot cross-geometry skill transfer by decomposing objects into semantic parts, leveraging data-efficient generative shape models to transfer interaction points from a single demonstration, and autonomously optimizing an alignment objective on skill-relevant parts. It claims improved generalization over prior work to a wider range of object geometries, with successful transfer demonstrated for multiple skills and objects in both simulation and real-robot settings.

Significance. If the central claims hold, the work would meaningfully advance data-efficient robot learning by reducing reliance on multiple demonstrations or per-object tuning when transferring manipulation skills to novel shapes. The emphasis on compositional part decomposition and generative models for point transfer is a clear strength, as is the one-shot framing combined with both simulated and physical validation.

minor comments (2)

[Abstract] Abstract: the phrase 'a range of skills and objects' is repeated without concrete examples or enumeration; adding one or two specific instances (e.g., 'pouring into mugs of varying handle curvature') would immediately clarify the scope of the generalization claim.
[Abstract] Abstract: the assertion that the method 'generalizes to a wider range of object geometries than existing work' would be stronger if it briefly named the closest baselines or cited the quantitative metric (e.g., success rate delta) used to support the comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. We are encouraged that the significance of using part decomposition with generative shape models for one-shot cross-geometry skill transfer is recognized, along with the value of our simulated and real-robot validation.

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained

full rationale

The abstract and available description present a high-level method proposal for part-based skill transfer without any equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations. No self-definitional steps, uniqueness theorems, or ansatzes are visible that reduce outputs to inputs by construction. Claims rest on the empirical performance of the proposed decomposition and transfer procedure rather than tautological reductions, consistent with the reader's assessment of low circularity risk.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that semantic parts are consistent and transferable across object geometries, plus reliance on pre-existing generative shape models whose accuracy is taken as given.

axioms (1)

domain assumption Objects have consistent semantic parts that can be identified and aligned across different geometries
Invoked to enable point transfer from demonstration object to novel object.

pith-pipeline@v0.9.0 · 5415 in / 1046 out tokens · 25433 ms · 2026-05-10T10:26:39.437334+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 15 canonical work pages · 3 internal anchors

[1]

Neural descriptor fields: Se(3)- equivariant object representations for manipulation,

A. Simeonov, Y . Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V . Sitzmann, “Neural descriptor fields: Se(3)- equivariant object representations for manipulation,”CoRR, vol. abs/2112.05124, 2021. [Online]. Available: https://arxiv.org/abs/2112. 05124

work page arXiv 2021
[2]

One- shot imitation learning via interaction warping,

O. Biza, S. Thompson, K. R. Pagidi, A. Kumar, E. van der Pol, R. Walters, T. Kipf, J. van de Meent, L. L. S. Wong, and R. Platt, “One- shot imitation learning via interaction warping,” inConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, USA, ser. Proceedings of Machine Learning Research, J. Tan, M. Toussaint, and K. Darvish, Eds., v...

2023
[3]

Composable part-based manipulation,

W. Liu, J. Mao, J. Hsu, T. Hermans, A. Garg, and J. Wu, “Composable part-based manipulation,” 2024. [Online]. Available: https://arxiv.org/abs/2405.05876

work page arXiv 2024
[4]

Local neural descriptor fields: Locally conditioned object representations for manipulation,

E. Chun, Y . Du, A. Simeonov, T. Lozano-Perez, and L. Kaelbling, “Local neural descriptor fields: Locally conditioned object representations for manipulation,” 2023. [Online]. Available: https://arxiv.org/abs/2302.03573

work page arXiv 2023
[6]

Available: http://arxiv.org/abs/1812.02713

[Online]. Available: http://arxiv.org/abs/1812.02713

work page arXiv
[7]

Task-oriented hierarchical object decomposition for visuomotor control.arXiv preprint arXiv:2411.01284, 2024

J. Qian, Y . Li, B. Bucher, and D. Jayaraman, “Task-oriented hierarchical object decomposition for visuomotor control,” 2024. [Online]. Available: https://arxiv.org/abs/2411.01284

work page arXiv 2024
[8]

Segment Anything

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Doll ´ar, and R. Girshick, “Segment anything,” 2023. [Online]. Available: https://arxiv.org/abs/2304.02643

work page internal anchor Pith review arXiv 2023
[9]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “Shapenet: An information-rich 3d model repository,”CoRR, vol. abs/1512.03012, 2015. [Online]. Available: http://arxiv.org/abs/1512.03012

work page internal anchor Pith review arXiv 2015
[10]

kpam: Keypoint affordances for category-level robotic manipulation, 2019

L. Manuelli, W. Gao, P. R. Florence, and R. Tedrake, “kpam: Keypoint affordances for category-level robotic manipulation,”CoRR, vol. abs/1903.06684, 2019. [Online]. Available: http://arxiv.org/abs/ 1903.06684

work page arXiv 1903
[11]

Transferring Category-Based Func- tional Grasping Skills by Latent Space Non-Rigid Registration,

D. Rodriguez and S. Behnke, “Transferring Category-Based Func- tional Grasping Skills by Latent Space Non-Rigid Registration,”IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2662–2669, Jul. 2018

2018
[12]

Schulman, J

J. Schulman, J. Ho, C. Lee, and P. Abbeel,Learning from Demonstrations Through the Use of Non-rigid Registration. Cham: Springer International Publishing, 2016, pp. 339–354. [Online]. Available: https://doi.org/10.1007/978-3-319-28872-7 20

work page doi:10.1007/978-3-319-28872-7 2016
[13]

Shape-Based Transfer of Generic Skills,

S. Thompson, L. P. Kaelbling, and T. Lozano-Perez, “Shape-Based Transfer of Generic Skills,” in2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021, pp. 5996–6002

2021
[14]

OV-PARTS: Towards open-vocabulary part segmentation,

M. Wei, X. Yue, W. Zhang, S. Kong, X. Liu, and J. Pang, “OV-PARTS: Towards open-vocabulary part segmentation,” inThirty- seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Online]. Available: https: //openreview.net/forum?id=EFl8zjjXeX

2023
[15]

A method for registration of 3-d shapes,

P. Besl and N. D. McKay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992

1992
[16]

Pybullet, a python module for physics sim- ulation for games, robotics and machine learning,

E. Coumans and Y . Bai, “Pybullet, a python module for physics sim- ulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021

2016
[17]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supe...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Vision-based manipulation from single human video with open-world object graphs.arXiv preprint arXiv:2405.20321,

Y . Zhu, A. Lim, P. Stone, and Y . Zhu, “Vision-based manipulation from single human video with open-world object graphs,” 2024. [Online]. Available: https://arxiv.org/abs/2405.20321

work page arXiv 2024
[19]

One-shot manipulation strategy learning by making contact analogies,

Y . Liu, J. Mao, J. Tenenbaum, T. Lozano-P ´erez, and L. P. Kaelbling, “One-shot manipulation strategy learning by making contact analogies,” 2024. [Online]. Available: https://arxiv.org/abs/ 2411.09627

work page arXiv 2024
[20]

Demogen: Syn- thetic demonstration generation for data-efficient visuo- motor policy learning.arXiv preprint arXiv:2502.16932, 2025

Z. Xue, S. Deng, Z. Chen, Y . Wang, Z. Yuan, and H. Xu, “Demogen: Synthetic demonstration generation for data-efficient visuomotor policy learning,” 2025. [Online]. Available: https: //arxiv.org/abs/2502.16932

work page arXiv 2025
[21]

Two by two: Learning multi-task pairwise objects assembly for generalizable robot manipulation,

Y . Qi, Y . Ju, T. Wei, C. Chu, L. L. S. Wong, and H. Xu, “Two by two: Learning multi-task pairwise objects assembly for generalizable robot manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2504.06961

work page arXiv 2025
[22]

Riemann: Near real-time se(3)-equivariant robot manipulation without point cloud segmentation,

C. Gao, Z. Xue, S. Deng, T. Liang, S. Yang, L. Shao, and H. Xu, “Riemann: Near real-time se(3)-equivariant robot manipulation without point cloud segmentation,” 2024. [Online]. Available: https://arxiv.org/abs/2403.19460

work page arXiv 2024
[23]

Point-Set Registration: Coherent Point Drift,

A. Myronenko and X. Song, “Point-Set Registration: Coherent Point Drift,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 12, pp. 2262–2275, Dec. 2010

2010