Recognition: 2 theorem links
· Lean TheoremDeformation-based In-Context Learning for Point Cloud Understanding
Pith reviewed 2026-05-13 19:53 UTC · model grok-4.3
The pith
DeformPIC learns to deform the query point cloud using task-specific prompt guidance instead of reconstructing masked targets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeformPIC is a deformation-based framework for point cloud in-context learning. It learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives between training and inference, in contrast to masked point modeling methods that predict the target directly from token correlations without geometric priors.
What carries the argument
DeformPIC's deformation operation, which transforms the query point cloud according to task-specific prompt guidance to perform in-context learning while preserving geometric structure.
Load-bearing premise
Deforming the query point cloud under prompt guidance is sufficient to capture necessary geometric priors and task semantics without target-side information at inference time.
What would settle it
A controlled test where a masked-modeling baseline is given target-side information at inference and still matches or exceeds DeformPIC's Chamfer Distance, or a point-cloud configuration where deformation alone cannot resolve the required task geometry.
Figures
read the original abstract
Recent advances in point cloud In-Context Learning (ICL) have demonstrated strong multitask capabilities. Existing approaches typically adopt a Masked Point Modeling (MPM)-based paradigm for point cloud ICL. However, MPM-based methods directly predict the target point cloud from masked tokens without leveraging geometric priors, requiring the model to infer spatial structure and geometric details solely from token-level correlations via transformers. Additionally, these methods suffer from a training-inference objective mismatch, as the model learns to predict the target point cloud using target-side information that is unavailable at inference time. To address these challenges, we propose DeformPIC, a deformation-based framework for point cloud ICL. Unlike existing approaches that rely on masked reconstruction, DeformPIC learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives. Extensive experiments demonstrate that DeformPIC consistently outperforms previous state-of-the-art methods, achieving reductions of 1.6, 1.8, and 4.7 points in average Chamfer Distance on reconstruction, denoising, and registration tasks, respectively. Furthermore, we introduce a new out-of-domain benchmark to evaluate generalization across unseen data distributions, where DeformPIC achieves state-of-the-art performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DeformPIC, a deformation-based in-context learning framework for point cloud understanding. It replaces Masked Point Modeling (MPM) with a prompt-guided deformation of the query point cloud, arguing that this enables explicit geometric reasoning, eliminates training-inference objective mismatch, and avoids target-side information leakage at inference. Experiments report average Chamfer Distance reductions of 1.6, 1.8, and 4.7 points on reconstruction, denoising, and registration tasks, respectively, along with state-of-the-art results on a newly introduced out-of-domain benchmark.
Significance. If the deformation module and prompt encoder are implemented as described without target leakage, the approach provides a principled alignment of train and test objectives in point cloud ICL, which could improve consistency and generalization over direct target prediction methods. The introduction of an out-of-domain benchmark strengthens the evaluation of robustness across distributions.
minor comments (3)
- [Abstract] Abstract: The reported Chamfer Distance reductions lack accompanying details on baselines, number of runs, or statistical significance; these should be summarized briefly even in the abstract to support the quantitative claims.
- [§3] §3 (Method): The integration of the deformation module with the transformer backbone would benefit from an explicit equation showing how the deformed query coordinates are computed from prompt embeddings, to clarify the geometric reasoning path.
- [§4] §4 (Experiments): The out-of-domain benchmark description should specify the exact distribution shifts (e.g., sensor type, density, or object categories) to allow readers to assess the generalization claims.
Simulated Author's Rebuttal
We thank the referee for the positive review and recommendation of minor revision. The referee's summary accurately reflects the core motivation and contributions of DeformPIC, including the replacement of MPM with prompt-guided deformation to enable explicit geometric reasoning, remove objective mismatch, and prevent target-side leakage. We also appreciate the recognition that the out-of-domain benchmark strengthens evaluation of robustness.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces DeformPIC as a new framework that deforms the query point cloud under prompt guidance, replacing MPM-based masked reconstruction. No equations, derivations, or self-citations are shown that reduce the central claims (consistent objectives, explicit geometric reasoning) to fitted parameters or prior self-references by construction. Performance gains on Chamfer Distance are presented as direct consequences of the design, with no load-bearing steps that collapse to inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DeformPIC learns to deform the query point cloud under task-specific guidance from prompts... minimizing the Chamfer Distance
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Deformation Extraction Network (DEN) extracts the geometric transformation... task token
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton. Layer normalization.arXiv preprint arXiv:1607.06450,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Visual prompting via image inpaint- ing.NeurIPS, 35:25005–25017, 2022
Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Glober- son, and Alexei Efros. Visual prompting via image inpaint- ing.NeurIPS, 35:25005–25017, 2022. 2
work page 2022
-
[3]
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A Vision-Language- Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Language models are few-shot learners.NeurIPS, 33:1877– 1901, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.NeurIPS, 33:1877– 1901, 2020. 2
work page 1901
-
[5]
ShapeNet: An Information-Rich 3D Model Repository
Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015. 5
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[6]
End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024
Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, An- dreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024. 2
work page 2024
-
[7]
A Survey on In-context Learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2022. 2
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[8]
S., Geiger, M., K ¨ohler, J., and Welling, M
Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jian- jian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoen- coders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?arXiv preprint arXiv:2212.08320, 2022. 6
-
[9]
A point set generation network for 3d object reconstruction from a single image
Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InCVPR, pages 605–613, 2017. 3, 4, 6, 7
work page 2017
-
[10]
Explore in-context learning for 3d point cloud understanding.NeurIPS, 36: 42382–42395, 2023
Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M Buhmann, Chen Change Loy, and Mengyuan Liu. Explore in-context learning for 3d point cloud understanding.NeurIPS, 36: 42382–42395, 2023. 1, 2, 3, 4, 5, 6, 7, 8
work page 2023
-
[11]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
3d-coded: 3d corre- spondences by deep deformation
Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. 3d-coded: 3d corre- spondences by deep deformation. InECCV, pages 230–246,
-
[13]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Pct: Point cloud transformer.Computational Visual Media, 7(2):187–199,
Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud transformer.Computational Visual Media, 7(2):187–199,
-
[15]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, pages 16000–16009, 2022. 2
work page 2022
-
[16]
In-context learning creates task vectors
Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. InFindings of EMNLP, pages 9318–9333, 2023. 2
work page 2023
-
[17]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. In CVPR, pages 17853–17862, 2023. 2
work page 2023
-
[18]
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al.π 0.5: A Vision-Language-Action Model with Open-World General- ization.arXiv preprint arXiv:2504.16054, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Shapeflow: Learnable deformation flows among 3d shapes.NeurIPS, 33:9745–9757, 2020
Chiyu Jiang, Jingwei Huang, Andrea Tagliasacchi, and Leonidas J Guibas. Shapeflow: Learnable deformation flows among 3d shapes.NeurIPS, 33:9745–9757, 2020. 5
work page 2020
-
[20]
Dg- pic: Domain generalized point-in-context learning for point cloud understanding
Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, and Jian Jun Zhang. Dg- pic: Domain generalized point-in-context learning for point cloud understanding. InECCV, pages 455–474. Springer,
-
[21]
Jincen Jiang, Qianyu Zhou, Yuhang Li, Xinkui Zhao, Meili Wang, Lizhuang Ma, Jian Chang, Jian Zhang, Xuequan Lu, et al. Pcotta: Continual test-time adaptation for multi- task point cloud understanding.NeurIPS, 37:96229–96253,
-
[22]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE TPAMI, 2024. 2
work page 2024
-
[23]
Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M Buh- mann, Xiangtai Li, and Chen Change Loy. Point-in-context: Understanding point cloud via in-context learning.arXiv preprint arXiv:2404.12352, 2024. 2, 3, 5, 6
-
[24]
Mengyuan Liu, Xinshun Wang, Zhongbin Fang, Deheng Ye, Xia Li, Tao Tang, Songtao Wu, Xiangtai Li, and Ming-Hsuan Yang. Human-in-context: Unified cross-domain 3d human motion modeling via in-context learning.arXiv preprint arXiv:2508.10897, 2025. 3
-
[25]
Sheng Liu, Haotian Ye, Lei Xing, and James Y Zou. In- context vectors: Making in context learning more effective and controllable through latent space steering. InICML, pages 32287–32307. PMLR, 2024. 2 9
work page 2024
-
[26]
Flownet3d: Learning scene flow in 3d point clouds
Xingyu Liu, Charles R Qi, and Leonidas J Guibas. Flownet3d: Learning scene flow in 3d point clouds. In CVPR, pages 529–537, 2019. 3
work page 2019
-
[27]
Decoupled Weight Decay Regularization
Ilya Loshchilov, Frank Hutter, et al. Fixing weight decay regularization in adam.arXiv preprint arXiv:1711.05101, 5 (5):5, 2017. 5
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[28]
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision-language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 8
work page 2008
-
[30]
Masked autoencoders for point cloud self-supervised learning
Yatian Pang, Wenxiao Wang, Francis EH Tay, Wei Liu, Yonghong Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InECCV, pages 604–621. Springer, 2022. 2, 6, 7
work page 2022
-
[31]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InCVPR, pages 4195–4205, 2023. 5, 1
work page 2023
-
[32]
Pointnet: Deep learning on point sets for 3d classification and segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InCVPR, pages 652–660, 2017. 4, 6
work page 2017
-
[33]
Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining
Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. InICML, pages 28223–28243. PMLR, 2023. 6, 7
work page 2023
-
[34]
Micas: Multi-grained in-context adap- tive sampling for 3d point cloud processing
Feifei Shao, Ping Liu, Zhao Wang, Yawei Luo, Hongwei Wang, and Jun Xiao. Micas: Multi-grained in-context adap- tive sampling for 3d point cloud processing. InCVPR, pages 6616–6626, 2025. 2
work page 2025
-
[35]
Neural shape deformation priors
Jiapeng Tang, Lev Markhasin, Bi Wang, Justus Thies, and Matthias Nießner. Neural shape deformation priors. NeurIPS, 35:17117–17132, 2022. 5
work page 2022
-
[36]
What do single-view 3d reconstruction networks learn? InCVPR, pages 3405–3414, 2019
Maxim Tatarchenko, Stephan R Richter, Ren ´e Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? InCVPR, pages 3405–3414, 2019. 1
work page 2019
-
[37]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aure- lien Rodriguez, Armand Joulin, Edouard Grave, and Guil- laume Lample. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv:2302.13971, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InCVPR, pages 1588–1597, 2019. 5
work page 2019
-
[40]
Attention is all you need.NeurIPS, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017. 4, 5
work page 2017
-
[41]
Pixel2mesh: Generating 3d mesh models from single rgb images
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. InECCV, pages 52–67,
-
[42]
Images speak in images: A generalist painter for in-context visual learning
Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, and Tiejun Huang. Images speak in images: A generalist painter for in-context visual learning. InCVPR, pages 6830–6839,
-
[43]
Skeleton-in-context: Unified skeleton sequence modeling with in-context learning
Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, and Mengyuan Liu. Skeleton-in-context: Unified skeleton sequence modeling with in-context learning. In CVPR, pages 2436–2446, 2024. 3
work page 2024
-
[44]
Dynamic graph cnn for learning on point clouds.ACM TOG, 38(5): 1–12, 2019
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.ACM TOG, 38(5): 1–12, 2019. 6
work page 2019
-
[45]
In- context learning unlocked for diffusion models.NeurIPS, 36:8542–8562, 2023
Zhendong Wang, Yifan Jiang, Yadong Lu, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou, et al. In- context learning unlocked for diffusion models.NeurIPS, 36:8542–8562, 2023. 3
work page 2023
-
[46]
Unipre3d: Unified pre-training of 3d point cloud models with cross-modal gaussian splatting
Ziyi Wang, Yanran Zhang, Jie Zhou, and Jiwen Lu. Unipre3d: Unified pre-training of 3d point cloud models with cross-modal gaussian splatting. InCVPR, pages 1319– 1329, 2025. 6, 7
work page 2025
-
[47]
Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion
Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. InCVPR, pages 1042–1051, 2019. 3
work page 2019
-
[48]
3d shapenets: A deep representation for volumetric shapes
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 5
work page 1912
-
[49]
Simmim: A simple framework for masked image modeling
Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InCVPR, pages 9653–9663, 2022. 2
work page 2022
-
[50]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
A scalable active framework for region annotation in 3d shape collections.ACM TOG, 35(6): 1–12, 2016
Li Yi, Vladimir G Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections.ACM TOG, 35(6): 1–12, 2016. 5
work page 2016
-
[52]
Point-bert: Pre-training 3d point cloud transformers with masked point modeling
Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. InCVPR, pages 19313–19322, 2022. 2
work page 2022
-
[53]
Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders
Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hong- sheng Li. Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. InCVPR, pages 21769–21780, 2023. 2, 6
work page 2023
-
[54]
Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. Pcp- mae: Learning to predict centers for point masked autoen- coders.NeurIPS, 37:80303–80327, 2024. 6, 7 10 Deformation-based In-Context Learning for Point Cloud Understanding Supplementary Material Overview of the Supplementary Material To complement the main paper, this supplementary material offers expa...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.