arxiv: 2604.02845 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Deformation-based In-Context Learning for Point Cloud Understanding

Chengxing Lin , Jinhong Deng , Yinjie Lei , Wen Li

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords point cloudin-context learningdeformationChamfer Distancereconstructiondenoisingregistrationgeometric reasoning

0 comments

The pith

DeformPIC learns to deform the query point cloud using task-specific prompt guidance instead of reconstructing masked targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recent point cloud in-context learning methods use masked point modeling to predict target clouds from masked inputs, but this skips geometric priors and creates a training-inference mismatch because target details are unavailable at test time. DeformPIC instead trains the model to deform the query point cloud itself under guidance from example prompts that encode the task. This change supplies explicit geometric reasoning through the deformation process and keeps the exact same objective at training and inference. Experiments report lower average Chamfer Distance on reconstruction, denoising, and registration, plus stronger results on a new out-of-domain benchmark.

Core claim

DeformPIC is a deformation-based framework for point cloud in-context learning. It learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives between training and inference, in contrast to masked point modeling methods that predict the target directly from token correlations without geometric priors.

What carries the argument

DeformPIC's deformation operation, which transforms the query point cloud according to task-specific prompt guidance to perform in-context learning while preserving geometric structure.

Load-bearing premise

Deforming the query point cloud under prompt guidance is sufficient to capture necessary geometric priors and task semantics without target-side information at inference time.

What would settle it

A controlled test where a masked-modeling baseline is given target-side information at inference and still matches or exceeds DeformPIC's Chamfer Distance, or a point-cloud configuration where deformation alone cannot resolve the required task geometry.

Figures

Figures reproduced from arXiv: 2604.02845 by Chengxing Lin, Jinhong Deng, Wen Li, Yinjie Lei.

**Figure 1.** Figure 1: Comparison between Masked Point Modeling (MPM)-based ICL and our Deformation-based ICL framework. (a) Existing MPM methods directly predict the target from masked tokens without leveraging geometric priors, and they also suffer from a traininginference gap. (b) Our deformation-based ICL employs a deformation strategy to enhance geometric coherence and maintains consistency between the training and inferen… view at source ↗

**Figure 2.** Figure 2: Overall framework of DeformPIC. Our model consists of two core components. The Deformation Extraction Network (DEN) extracts the geometric transformation from an example pair (prompt input→prompt target) into a task embedding. This embedding then modulates the Deformation Transfer Network (DTN), thereby guiding it to apply the same transformation to a new query input to produce the final prediction. The en… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on ShapeNet In-Context dataset. From left to right: input point clouds, PIC-Cat [10] results, our DeformPIC results, and ground truth. Our method achieves better geometric detail preservation, improved structural coherence, and fewer visual artifacts across diverse tasks [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Task Feature Visualization of DeformPIC on the ShapeNet / ModelNet40 / ScanObjectNN In-Context datasets. We use t-SNE [29] to reduce the dimensionality of the task features to a 2D space and visualize the distributions of task features. ShapeNet In-Context dataset [10] and then directly tested on ModelNet In-Context and ScanObjectNN In-Context without any fine-tuning or additional adaptation strategy. The … view at source ↗

**Figure 5.** Figure 5: Visualization results on registration task compared with the PIC [10] and our approach. 3 [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization results on reconstruction task compared with the PIC [10] and our approach. 4 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization results on denoising task compared with the PIC [10] and our approach. 5 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization results on part segmentation task compared with the PIC [10] and our approach. 6 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Recent advances in point cloud In-Context Learning (ICL) have demonstrated strong multitask capabilities. Existing approaches typically adopt a Masked Point Modeling (MPM)-based paradigm for point cloud ICL. However, MPM-based methods directly predict the target point cloud from masked tokens without leveraging geometric priors, requiring the model to infer spatial structure and geometric details solely from token-level correlations via transformers. Additionally, these methods suffer from a training-inference objective mismatch, as the model learns to predict the target point cloud using target-side information that is unavailable at inference time. To address these challenges, we propose DeformPIC, a deformation-based framework for point cloud ICL. Unlike existing approaches that rely on masked reconstruction, DeformPIC learns to deform the query point cloud under task-specific guidance from prompts, enabling explicit geometric reasoning and consistent objectives. Extensive experiments demonstrate that DeformPIC consistently outperforms previous state-of-the-art methods, achieving reductions of 1.6, 1.8, and 4.7 points in average Chamfer Distance on reconstruction, denoising, and registration tasks, respectively. Furthermore, we introduce a new out-of-domain benchmark to evaluate generalization across unseen data distributions, where DeformPIC achieves state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeformPIC replaces masked point prediction with prompt-guided deformation of the query cloud, which removes the train-test objective mismatch and produces modest Chamfer Distance gains on reconstruction, denoising, and registration.

read the letter

The main contribution is a shift away from masked point modeling. Instead of predicting a full target cloud from masked tokens, the model learns to deform the actual query points using task-specific prompt guidance. This keeps the same objective at training and inference and lets the network work directly with geometric structure rather than inferring it from token correlations alone. The stress-test note confirms the deformation module and prompt encoder are set up so no target-side information leaks into inference, which removes the circularity in prior MPM methods. The reported improvements—1.6, 1.8, and 4.7 points lower average Chamfer Distance on the three tasks—follow from that design, and the new out-of-domain benchmark shows the method still leads on unseen distributions. The gains are not large, so the practical impact will depend on how well the deformation parameterization generalizes and whether tighter ablations separate the contribution of the deformation step from the prompt encoder. The abstract does not spell out every baseline detail, but nothing in the provided description suggests hidden confounds or inconsistent assumptions. This work is aimed at researchers already focused on point-cloud in-context learning and 3D vision pipelines. A reader who values consistent objectives and explicit geometric operations will get something concrete from it. The core idea is a genuine alternative to the dominant paradigm and the claims are internally coherent, so the paper deserves a serious referee.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes DeformPIC, a deformation-based in-context learning framework for point cloud understanding. It replaces Masked Point Modeling (MPM) with a prompt-guided deformation of the query point cloud, arguing that this enables explicit geometric reasoning, eliminates training-inference objective mismatch, and avoids target-side information leakage at inference. Experiments report average Chamfer Distance reductions of 1.6, 1.8, and 4.7 points on reconstruction, denoising, and registration tasks, respectively, along with state-of-the-art results on a newly introduced out-of-domain benchmark.

Significance. If the deformation module and prompt encoder are implemented as described without target leakage, the approach provides a principled alignment of train and test objectives in point cloud ICL, which could improve consistency and generalization over direct target prediction methods. The introduction of an out-of-domain benchmark strengthens the evaluation of robustness across distributions.

minor comments (3)

[Abstract] Abstract: The reported Chamfer Distance reductions lack accompanying details on baselines, number of runs, or statistical significance; these should be summarized briefly even in the abstract to support the quantitative claims.
[§3] §3 (Method): The integration of the deformation module with the transformer backbone would benefit from an explicit equation showing how the deformed query coordinates are computed from prompt embeddings, to clarify the geometric reasoning path.
[§4] §4 (Experiments): The out-of-domain benchmark description should specify the exact distribution shifts (e.g., sensor type, density, or object categories) to allow readers to assess the generalization claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation of minor revision. The referee's summary accurately reflects the core motivation and contributions of DeformPIC, including the replacement of MPM with prompt-guided deformation to enable explicit geometric reasoning, remove objective mismatch, and prevent target-side leakage. We also appreciate the recognition that the out-of-domain benchmark strengthens evaluation of robustness.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces DeformPIC as a new framework that deforms the query point cloud under prompt guidance, replacing MPM-based masked reconstruction. No equations, derivations, or self-citations are shown that reduce the central claims (consistent objectives, explicit geometric reasoning) to fitted parameters or prior self-references by construction. Performance gains on Chamfer Distance are presented as direct consequences of the design, with no load-bearing steps that collapse to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the method is described at a high level without equations or fitting details.

pith-pipeline@v0.9.0 · 5523 in / 999 out tokens · 29889 ms · 2026-05-13T19:53:24.566354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeformPIC learns to deform the query point cloud under task-specific guidance from prompts... minimizing the Chamfer Distance
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Deformation Extraction Network (DEN) extracts the geometric transformation... task token

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 12 internal anchors

[1]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin- ton. Layer normalization.arXiv preprint arXiv:1607.06450,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Visual prompting via image inpaint- ing.NeurIPS, 35:25005–25017, 2022

Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Glober- son, and Alexei Efros. Visual prompting via image inpaint- ing.NeurIPS, 35:25005–25017, 2022. 2

work page 2022
[3]

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A Vision-Language- Action Flow Model for General Robot Control.arXiv preprint arXiv:2410.24164, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Language models are few-shot learners.NeurIPS, 33:1877– 1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.NeurIPS, 33:1877– 1901, 2020. 2

work page 1901
[5]

ShapeNet: An Information-Rich 3D Model Repository

Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015. 5

work page internal anchor Pith review Pith/arXiv arXiv 2015
[6]

End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, An- dreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.IEEE TPAMI, 2024. 2

work page 2024
[7]

A Survey on In-context Learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[8]

S., Geiger, M., K ¨ohler, J., and Welling, M

Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jian- jian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoen- coders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?arXiv preprint arXiv:2212.08320, 2022. 6

work page arXiv 2022
[9]

A point set generation network for 3d object reconstruction from a single image

Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InCVPR, pages 605–613, 2017. 3, 4, 6, 7

work page 2017
[10]

Explore in-context learning for 3d point cloud understanding.NeurIPS, 36: 42382–42395, 2023

Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M Buhmann, Chen Change Loy, and Mengyuan Liu. Explore in-context learning for 3d point cloud understanding.NeurIPS, 36: 42382–42395, 2023. 1, 2, 3, 4, 5, 6, 7, 8

work page 2023
[11]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

3d-coded: 3d corre- spondences by deep deformation

Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. 3d-coded: 3d corre- spondences by deep deformation. InECCV, pages 230–246,

work page
[13]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Pct: Point cloud transformer.Computational Visual Media, 7(2):187–199,

Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud transformer.Computational Visual Media, 7(2):187–199,

work page
[15]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InCVPR, pages 16000–16009, 2022. 2

work page 2022
[16]

In-context learning creates task vectors

Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. InFindings of EMNLP, pages 9318–9333, 2023. 2

work page 2023
[17]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. In CVPR, pages 17853–17862, 2023. 2

work page 2023
[18]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al.π 0.5: A Vision-Language-Action Model with Open-World General- ization.arXiv preprint arXiv:2504.16054, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Shapeflow: Learnable deformation flows among 3d shapes.NeurIPS, 33:9745–9757, 2020

Chiyu Jiang, Jingwei Huang, Andrea Tagliasacchi, and Leonidas J Guibas. Shapeflow: Learnable deformation flows among 3d shapes.NeurIPS, 33:9745–9757, 2020. 5

work page 2020
[20]

Dg- pic: Domain generalized point-in-context learning for point cloud understanding

Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, and Jian Jun Zhang. Dg- pic: Domain generalized point-in-context learning for point cloud understanding. InECCV, pages 455–474. Springer,

work page
[21]

Pcotta: Continual test-time adaptation for multi- task point cloud understanding.NeurIPS, 37:96229–96253,

Jincen Jiang, Qianyu Zhou, Yuhang Li, Xinkui Zhao, Meili Wang, Lizhuang Ma, Jian Chang, Jian Zhang, Xuequan Lu, et al. Pcotta: Continual test-time adaptation for multi- task point cloud understanding.NeurIPS, 37:96229–96253,

work page
[22]

Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE TPAMI, 2024

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE TPAMI, 2024. 2

work page 2024
[23]

Point-in-context: Understanding point cloud via in-context learning.arXiv preprint arXiv:2404.12352, 2024

Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M Buh- mann, Xiangtai Li, and Chen Change Loy. Point-in-context: Understanding point cloud via in-context learning.arXiv preprint arXiv:2404.12352, 2024. 2, 3, 5, 6

work page arXiv 2024
[24]

Human-in-context: Unified cross-domain 3d human motion modeling via in-context learning.arXiv preprint arXiv:2508.10897, 2025

Mengyuan Liu, Xinshun Wang, Zhongbin Fang, Deheng Ye, Xia Li, Tao Tang, Songtao Wu, Xiangtai Li, and Ming-Hsuan Yang. Human-in-context: Unified cross-domain 3d human motion modeling via in-context learning.arXiv preprint arXiv:2508.10897, 2025. 3

work page arXiv 2025
[25]

In- context vectors: Making in context learning more effective and controllable through latent space steering

Sheng Liu, Haotian Ye, Lei Xing, and James Y Zou. In- context vectors: Making in context learning more effective and controllable through latent space steering. InICML, pages 32287–32307. PMLR, 2024. 2 9

work page 2024
[26]

Flownet3d: Learning scene flow in 3d point clouds

Xingyu Liu, Charles R Qi, and Leonidas J Guibas. Flownet3d: Learning scene flow in 3d point clouds. In CVPR, pages 529–537, 2019. 3

work page 2019
[27]

Decoupled Weight Decay Regularization

Ilya Loshchilov, Frank Hutter, et al. Fixing weight decay regularization in adam.arXiv preprint arXiv:1711.05101, 5 (5):5, 2017. 5

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

A Survey on Vision-Language-Action Models for Embodied AI

Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. A survey on vision-language-action models for embodied ai.arXiv preprint arXiv:2405.14093, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 8

work page 2008
[30]

Masked autoencoders for point cloud self-supervised learning

Yatian Pang, Wenxiao Wang, Francis EH Tay, Wei Liu, Yonghong Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InECCV, pages 604–621. Springer, 2022. 2, 6, 7

work page 2022
[31]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InCVPR, pages 4195–4205, 2023. 5, 1

work page 2023
[32]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InCVPR, pages 652–660, 2017. 4, 6

work page 2017
[33]

Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. InICML, pages 28223–28243. PMLR, 2023. 6, 7

work page 2023
[34]

Micas: Multi-grained in-context adap- tive sampling for 3d point cloud processing

Feifei Shao, Ping Liu, Zhao Wang, Yawei Luo, Hongwei Wang, and Jun Xiao. Micas: Multi-grained in-context adap- tive sampling for 3d point cloud processing. InCVPR, pages 6616–6626, 2025. 2

work page 2025
[35]

Neural shape deformation priors

Jiapeng Tang, Lev Markhasin, Bi Wang, Justus Thies, and Matthias Nießner. Neural shape deformation priors. NeurIPS, 35:17117–17132, 2022. 5

work page 2022
[36]

What do single-view 3d reconstruction networks learn? InCVPR, pages 3405–3414, 2019

Maxim Tatarchenko, Stephan R Richter, Ren ´e Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? InCVPR, pages 3405–3414, 2019. 1

work page 2019
[37]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aure- lien Rodriguez, Armand Joulin, Edouard Grave, and Guil- laume Lample. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv:2302.13971, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InCVPR, pages 1588–1597, 2019. 5

work page 2019
[40]

Attention is all you need.NeurIPS, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017. 4, 5

work page 2017
[41]

Pixel2mesh: Generating 3d mesh models from single rgb images

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. InECCV, pages 52–67,

work page
[42]

Images speak in images: A generalist painter for in-context visual learning

Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, and Tiejun Huang. Images speak in images: A generalist painter for in-context visual learning. InCVPR, pages 6830–6839,

work page
[43]

Skeleton-in-context: Unified skeleton sequence modeling with in-context learning

Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, and Mengyuan Liu. Skeleton-in-context: Unified skeleton sequence modeling with in-context learning. In CVPR, pages 2436–2446, 2024. 3

work page 2024
[44]

Dynamic graph cnn for learning on point clouds.ACM TOG, 38(5): 1–12, 2019

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.ACM TOG, 38(5): 1–12, 2019. 6

work page 2019
[45]

In- context learning unlocked for diffusion models.NeurIPS, 36:8542–8562, 2023

Zhendong Wang, Yifan Jiang, Yadong Lu, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou, et al. In- context learning unlocked for diffusion models.NeurIPS, 36:8542–8562, 2023. 3

work page 2023
[46]

Unipre3d: Unified pre-training of 3d point cloud models with cross-modal gaussian splatting

Ziyi Wang, Yanran Zhang, Jie Zhou, and Jiwen Lu. Unipre3d: Unified pre-training of 3d point cloud models with cross-modal gaussian splatting. InCVPR, pages 1319– 1329, 2025. 6, 7

work page 2025
[47]

Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion

Chao Wen, Yinda Zhang, Zhuwen Li, and Yanwei Fu. Pixel2mesh++: Multi-view 3d mesh generation via deforma- tion. InCVPR, pages 1042–1051, 2019. 3

work page 2019
[48]

3d shapenets: A deep representation for volumetric shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 5

work page 1912
[49]

Simmim: A simple framework for masked image modeling

Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InCVPR, pages 9653–9663, 2022. 2

work page 2022
[50]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

A scalable active framework for region annotation in 3d shape collections.ACM TOG, 35(6): 1–12, 2016

Li Yi, Vladimir G Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections.ACM TOG, 35(6): 1–12, 2016. 5

work page 2016
[52]

Point-bert: Pre-training 3d point cloud transformers with masked point modeling

Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. InCVPR, pages 19313–19322, 2022. 2

work page 2022
[53]

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hong- sheng Li. Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders. InCVPR, pages 21769–21780, 2023. 2, 6

work page 2023
[54]

disruption

Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. Pcp- mae: Learning to predict centers for point masked autoen- coders.NeurIPS, 37:80303–80327, 2024. 6, 7 10 Deformation-based In-Context Learning for Point Cloud Understanding Supplementary Material Overview of the Supplementary Material To complement the main paper, this supplementary material offers expa...

work page 2024