pith. machine review for the scientific record. sign in

arxiv: 2604.02845 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Deformation-based In-Context Learning for Point Cloud Understanding

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:53 UTC · model grok-4.3

classification 💻 cs.CV
keywords point cloudin-context learningdeformationChamfer Distancereconstructiondenoisingregistrationgeometric reasoning
0
0 comments X

The pith

DeformPIC learns to deform the query point cloud using task-specific prompt guidance instead of reconstructing masked targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recent point cloud in-context learning methods use masked point modeling to predict target clouds from masked inputs, but this skips geometric priors and creates a training-inference mismatch because target details are unavailable at test time. DeformPIC instead trains the model to deform the query point cloud itself under guidance from example prompts that encode the task. This change supplies explicit geometric reasoning through the deformation process and keeps the exact same objective at training and inference. Experiments report lower average Chamfer Distance on reconstruction, denoising, and registration, plus stronger results on a new out-of-domain benchmark.

Core claim

DeformPIC is a deformation-based framework for point cloud in-context learning. It learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives between training and inference, in contrast to masked point modeling methods that predict the target directly from token correlations without geometric priors.

What carries the argument

DeformPIC's deformation operation, which transforms the query point cloud according to task-specific prompt guidance to perform in-context learning while preserving geometric structure.

Load-bearing premise

Deforming the query point cloud under prompt guidance is sufficient to capture necessary geometric priors and task semantics without target-side information at inference time.

What would settle it

A controlled test where a masked-modeling baseline is given target-side information at inference and still matches or exceeds DeformPIC's Chamfer Distance, or a point-cloud configuration where deformation alone cannot resolve the required task geometry.

Figures

Figures reproduced from arXiv: 2604.02845 by Chengxing Lin, Jinhong Deng, Wen Li, Yinjie Lei.

Figure 1
Figure 1. Figure 1: Comparison between Masked Point Modeling (MPM)-based ICL and our Deformation-based ICL framework. (a) Existing MPM methods directly predict the target from masked tokens without leveraging geometric priors, and they also suffer from a training￾inference gap. (b) Our deformation-based ICL employs a deformation strategy to enhance geometric coherence and maintains consistency between the training and inferen… view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of DeformPIC. Our model consists of two core components. The Deformation Extraction Network (DEN) extracts the geometric transformation from an example pair (prompt input→prompt target) into a task embedding. This embedding then modulates the Deformation Transfer Network (DTN), thereby guiding it to apply the same transformation to a new query input to produce the final prediction. The en… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison on ShapeNet In-Context dataset. From left to right: input point clouds, PIC-Cat [10] results, our DeformPIC results, and ground truth. Our method achieves better geometric detail preservation, improved structural coherence, and fewer visual artifacts across diverse tasks [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Task Feature Visualization of DeformPIC on the ShapeNet / ModelNet40 / ScanObjectNN In-Context datasets. We use t-SNE [29] to reduce the dimensionality of the task features to a 2D space and visualize the distributions of task features. ShapeNet In-Context dataset [10] and then directly tested on ModelNet In-Context and ScanObjectNN In-Context without any fine-tuning or additional adaptation strategy. The … view at source ↗
Figure 5
Figure 5. Figure 5: Visualization results on registration task compared with the PIC [10] and our approach. 3 [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization results on reconstruction task compared with the PIC [10] and our approach. 4 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization results on denoising task compared with the PIC [10] and our approach. 5 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization results on part segmentation task compared with the PIC [10] and our approach. 6 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

Recent advances in point cloud In-Context Learning (ICL) have demonstrated strong multitask capabilities. Existing approaches typically adopt a Masked Point Modeling (MPM)-based paradigm for point cloud ICL. However, MPM-based methods directly predict the target point cloud from masked tokens without leveraging geometric priors, requiring the model to infer spatial structure and geometric details solely from token-level correlations via transformers. Additionally, these methods suffer from a training-inference objective mismatch, as the model learns to predict the target point cloud using target-side information that is unavailable at inference time. To address these challenges, we propose DeformPIC, a deformation-based framework for point cloud ICL. Unlike existing approaches that rely on masked reconstruction, DeformPIC learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives. Extensive experiments demonstrate that DeformPIC consistently outperforms previous state-of-the-art methods, achieving reductions of 1.6, 1.8, and 4.7 points in average Chamfer Distance on reconstruction, denoising, and registration tasks, respectively. Furthermore, we introduce a new out-of-domain benchmark to evaluate generalization across unseen data distributions, where DeformPIC achieves state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes DeformPIC, a deformation-based in-context learning framework for point cloud understanding. It replaces Masked Point Modeling (MPM) with a prompt-guided deformation of the query point cloud, arguing that this enables explicit geometric reasoning, eliminates training-inference objective mismatch, and avoids target-side information leakage at inference. Experiments report average Chamfer Distance reductions of 1.6, 1.8, and 4.7 points on reconstruction, denoising, and registration tasks, respectively, along with state-of-the-art results on a newly introduced out-of-domain benchmark.

Significance. If the deformation module and prompt encoder are implemented as described without target leakage, the approach provides a principled alignment of train and test objectives in point cloud ICL, which could improve consistency and generalization over direct target prediction methods. The introduction of an out-of-domain benchmark strengthens the evaluation of robustness across distributions.

minor comments (3)
  1. [Abstract] Abstract: The reported Chamfer Distance reductions lack accompanying details on baselines, number of runs, or statistical significance; these should be summarized briefly even in the abstract to support the quantitative claims.
  2. [§3] §3 (Method): The integration of the deformation module with the transformer backbone would benefit from an explicit equation showing how the deformed query coordinates are computed from prompt embeddings, to clarify the geometric reasoning path.
  3. [§4] §4 (Experiments): The out-of-domain benchmark description should specify the exact distribution shifts (e.g., sensor type, density, or object categories) to allow readers to assess the generalization claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation of minor revision. The referee's summary accurately reflects the core motivation and contributions of DeformPIC, including the replacement of MPM with prompt-guided deformation to enable explicit geometric reasoning, remove objective mismatch, and prevent target-side leakage. We also appreciate the recognition that the out-of-domain benchmark strengthens evaluation of robustness.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces DeformPIC as a new framework that deforms the query point cloud under prompt guidance, replacing MPM-based masked reconstruction. No equations, derivations, or self-citations are shown that reduce the central claims (consistent objectives, explicit geometric reasoning) to fitted parameters or prior self-references by construction. Performance gains on Chamfer Distance are presented as direct consequences of the design, with no load-bearing steps that collapse to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the method is described at a high level without equations or fitting details.

pith-pipeline@v0.9.0 · 5523 in / 999 out tokens · 29889 ms · 2026-05-13T19:53:24.566354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 12 internal anchors

  1. [1]

    Layer Normalization

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton. Layer normalization.arXiv preprint arXiv:1607.06450,

  2. [2]

    Visual prompting via image inpaint- ing.NeurIPS, 35:25005–25017, 2022

    Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Glober- son, and Alexei Efros. Visual prompting via image inpaint- ing.NeurIPS, 35:25005–25017, 2022. 2

  3. [3]

    Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A Vision-Language- Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164, 2024. 2

  4. [4]

    Language models are few-shot learners.NeurIPS, 33:1877– 1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.NeurIPS, 33:1877– 1901, 2020. 2

  5. [5]

    ShapeNet: An Information-Rich 3D Model Repository

    Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015. 5

  6. [6]

    End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024

    Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, An- dreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024. 2

  7. [7]

    A Survey on In-context Learning

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2022. 2

  8. [8]

    S., Geiger, M., K ¨ohler, J., and Welling, M

    Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jian- jian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoen- coders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?arXiv preprint arXiv:2212.08320, 2022. 6

  9. [9]

    A point set generation network for 3d object reconstruction from a single image

    Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InCVPR, pages 605–613, 2017. 3, 4, 6, 7

  10. [10]

    Explore in-context learning for 3d point cloud understanding.NeurIPS, 36: 42382–42395, 2023

    Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M Buhmann, Chen Change Loy, and Mengyuan Liu. Explore in-context learning for 3d point cloud understanding.NeurIPS, 36: 42382–42395, 2023. 1, 2, 3, 4, 5, 6, 7, 8

  11. [11]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. 2

  12. [12]

    3d-coded: 3d corre- spondences by deep deformation

    Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. 3d-coded: 3d corre- spondences by deep deformation. InECCV, pages 230–246,

  13. [13]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2

  14. [14]

    Pct: Point cloud transformer.Computational Visual Media, 7(2):187–199,

    Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud transformer.Computational Visual Media, 7(2):187–199,

  15. [15]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, pages 16000–16009, 2022. 2

  16. [16]

    In-context learning creates task vectors

    Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. InFindings of EMNLP, pages 9318–9333, 2023. 2

  17. [17]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. In CVPR, pages 17853–17862, 2023. 2

  18. [18]

    Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al.π 0.5: A Vision-Language-Action Model with Open-World General- ization.arXiv preprint arXiv:2504.16054, 2025. 2

  19. [19]

    Shapeflow: Learnable deformation flows among 3d shapes.NeurIPS, 33:9745–9757, 2020

    Chiyu Jiang, Jingwei Huang, Andrea Tagliasacchi, and Leonidas J Guibas. Shapeflow: Learnable deformation flows among 3d shapes.NeurIPS, 33:9745–9757, 2020. 5

  20. [20]

    Dg- pic: Domain generalized point-in-context learning for point cloud understanding

    Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, and Jian Jun Zhang. Dg- pic: Domain generalized point-in-context learning for point cloud understanding. InECCV, pages 455–474. Springer,

  21. [21]

    Pcotta: Continual test-time adaptation for multi- task point cloud understanding.NeurIPS, 37:96229–96253,

    Jincen Jiang, Qianyu Zhou, Yuhang Li, Xinkui Zhao, Meili Wang, Lizhuang Ma, Jian Chang, Jian Zhang, Xuequan Lu, et al. Pcotta: Continual test-time adaptation for multi- task point cloud understanding.NeurIPS, 37:96229–96253,

  22. [22]

    Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE TPAMI, 2024

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE TPAMI, 2024. 2

  23. [23]

    Point-in-context: Understanding point cloud via in-context learning.arXiv preprint arXiv:2404.12352, 2024

    Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M Buh- mann, Xiangtai Li, and Chen Change Loy. Point-in-context: Understanding point cloud via in-context learning.arXiv preprint arXiv:2404.12352, 2024. 2, 3, 5, 6

  24. [24]

    Human-in-context: Unified cross-domain 3d human motion modeling via in-context learning.arXiv preprint arXiv:2508.10897, 2025

    Mengyuan Liu, Xinshun Wang, Zhongbin Fang, Deheng Ye, Xia Li, Tao Tang, Songtao Wu, Xiangtai Li, and Ming-Hsuan Yang. Human-in-context: Unified cross-domain 3d human motion modeling via in-context learning.arXiv preprint arXiv:2508.10897, 2025. 3

  25. [25]

    In- context vectors: Making in context learning more effective and controllable through latent space steering

    Sheng Liu, Haotian Ye, Lei Xing, and James Y Zou. In- context vectors: Making in context learning more effective and controllable through latent space steering. InICML, pages 32287–32307. PMLR, 2024. 2 9

  26. [26]

    Flownet3d: Learning scene flow in 3d point clouds

    Xingyu Liu, Charles R Qi, and Leonidas J Guibas. Flownet3d: Learning scene flow in 3d point clouds. In CVPR, pages 529–537, 2019. 3

  27. [27]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov, Frank Hutter, et al. Fixing weight decay regularization in adam.arXiv preprint arXiv:1711.05101, 5 (5):5, 2017. 5

  28. [28]

    A Survey on Vision-Language-Action Models for Embodied AI

    Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision-language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024. 2

  29. [29]

    Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 8

  30. [30]

    Masked autoencoders for point cloud self-supervised learning

    Yatian Pang, Wenxiao Wang, Francis EH Tay, Wei Liu, Yonghong Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InECCV, pages 604–621. Springer, 2022. 2, 6, 7

  31. [31]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InCVPR, pages 4195–4205, 2023. 5, 1

  32. [32]

    Pointnet: Deep learning on point sets for 3d classification and segmentation

    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InCVPR, pages 652–660, 2017. 4, 6

  33. [33]

    Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

    Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. InICML, pages 28223–28243. PMLR, 2023. 6, 7

  34. [34]

    Micas: Multi-grained in-context adap- tive sampling for 3d point cloud processing

    Feifei Shao, Ping Liu, Zhao Wang, Yawei Luo, Hongwei Wang, and Jun Xiao. Micas: Multi-grained in-context adap- tive sampling for 3d point cloud processing. InCVPR, pages 6616–6626, 2025. 2

  35. [35]

    Neural shape deformation priors

    Jiapeng Tang, Lev Markhasin, Bi Wang, Justus Thies, and Matthias Nießner. Neural shape deformation priors. NeurIPS, 35:17117–17132, 2022. 5

  36. [36]

    What do single-view 3d reconstruction networks learn? InCVPR, pages 3405–3414, 2019

    Maxim Tatarchenko, Stephan R Richter, Ren ´e Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? InCVPR, pages 3405–3414, 2019. 1

  37. [37]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aure- lien Rodriguez, Armand Joulin, Edouard Grave, and Guil- laume Lample. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv:2302.13971, 2023. 2

  38. [38]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023. 2

  39. [39]

    Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

    Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InCVPR, pages 1588–1597, 2019. 5

  40. [40]

    Attention is all you need.NeurIPS, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017. 4, 5

  41. [41]

    Pixel2mesh: Generating 3d mesh models from single rgb images

    Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. InECCV, pages 52–67,

  42. [42]

    Images speak in images: A generalist painter for in-context visual learning

    Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, and Tiejun Huang. Images speak in images: A generalist painter for in-context visual learning. InCVPR, pages 6830–6839,

  43. [43]

    Skeleton-in-context: Unified skeleton sequence modeling with in-context learning

    Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, and Mengyuan Liu. Skeleton-in-context: Unified skeleton sequence modeling with in-context learning. In CVPR, pages 2436–2446, 2024. 3

  44. [44]

    Dynamic graph cnn for learning on point clouds.ACM TOG, 38(5): 1–12, 2019

    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.ACM TOG, 38(5): 1–12, 2019. 6

  45. [45]

    In- context learning unlocked for diffusion models.NeurIPS, 36:8542–8562, 2023

    Zhendong Wang, Yifan Jiang, Yadong Lu, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou, et al. In- context learning unlocked for diffusion models.NeurIPS, 36:8542–8562, 2023. 3

  46. [46]

    Unipre3d: Unified pre-training of 3d point cloud models with cross-modal gaussian splatting

    Ziyi Wang, Yanran Zhang, Jie Zhou, and Jiwen Lu. Unipre3d: Unified pre-training of 3d point cloud models with cross-modal gaussian splatting. InCVPR, pages 1319– 1329, 2025. 6, 7

  47. [47]

    Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion

    Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. InCVPR, pages 1042–1051, 2019. 3

  48. [48]

    3d shapenets: A deep representation for volumetric shapes

    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 5

  49. [49]

    Simmim: A simple framework for masked image modeling

    Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InCVPR, pages 9653–9663, 2022. 2

  50. [50]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 2

  51. [51]

    A scalable active framework for region annotation in 3d shape collections.ACM TOG, 35(6): 1–12, 2016

    Li Yi, Vladimir G Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections.ACM TOG, 35(6): 1–12, 2016. 5

  52. [52]

    Point-bert: Pre-training 3d point cloud transformers with masked point modeling

    Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. InCVPR, pages 19313–19322, 2022. 2

  53. [53]

    Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

    Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hong- sheng Li. Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. InCVPR, pages 21769–21780, 2023. 2, 6

  54. [54]

    disruption

    Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. Pcp- mae: Learning to predict centers for point masked autoen- coders.NeurIPS, 37:80303–80327, 2024. 6, 7 10 Deformation-based In-Context Learning for Point Cloud Understanding Supplementary Material Overview of the Supplementary Material To complement the main paper, this supplementary material offers expa...