Hierarchical Feature Learning for Medical Point Clouds via State Space Model
Pith reviewed 2026-05-22 18:53 UTC · model grok-4.3
The pith
State space models with coordinate-order and inside-out scanning hierarchically learn features from medical point clouds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed SSM-based hierarchical framework downsamples input point clouds into multiple levels via farthest point sampling, performs KNN queries at each level to aggregate multi-scale structural information, and applies coordinate-order and inside-out scanning strategies to serialize irregular points for processing by vanilla and group Point SSM blocks that progressively calculate features from short neighbor sequences and long point sequences, thereby capturing both local patterns and long-range dependencies and achieving superior performance across all tasks on the MedPointS dataset.
What carries the argument
Coordinate-order and inside-out scanning strategies that serialize irregular medical point clouds into sequences suitable for state space model processing while preserving structural information, used within vanilla and group Point SSM blocks at multiple hierarchical levels.
If this is right
- Anatomy classification, completion, and segmentation of medical structures all achieve higher accuracy than prior approaches.
- Local patterns and long-range dependencies are modeled jointly without the computational cost of transformers.
- Multi-level downsampling combined with multi-scale KNN aggregation supports effective feature learning at varying resolutions.
- The released MedPointS dataset enables standardized evaluation of future medical point cloud methods.
Where Pith is reading between the lines
- The serialization approach could extend to other irregular 3D data domains such as LiDAR or CAD models.
- Lower computational demands relative to transformers might support real-time clinical point cloud analysis.
- The hierarchical SSM structure could be tested on additional medical tasks such as registration or anomaly detection.
Load-bearing premise
The coordinate-order and inside-out scanning strategies successfully serialize irregular medical point clouds for SSM processing while preserving the structural information needed for accurate feature learning.
What would settle it
A re-run of the experiments on MedPointS in which the proposed method fails to outperform prior point cloud models on classification, completion, or segmentation metrics would falsify the claim of superior performance.
Figures
read the original abstract
Deep learning-based point cloud modeling has been widely investigated as an indispensable component of general shape analysis. Recently, transformer and state space model (SSM) have shown promising capacities in point cloud learning. However, limited research has been conducted on medical point clouds, which have great potential in disease diagnosis and treatment. This paper presents an SSM-based hierarchical feature learning framework for medical point cloud understanding. Specifically, we down-sample input into multiple levels through the farthest point sampling. At each level, we perform a series of k-nearest neighbor (KNN) queries to aggregate multi-scale structural information. To assist SSM in processing point clouds, we introduce coordinate-order and inside-out scanning strategies for efficient serialization of irregular points. Point features are calculated progressively from short neighbor sequences and long point sequences through vanilla and group Point SSM blocks, to capture both local patterns and long-range dependencies. To evaluate the proposed method, we build a large-scale medical point cloud dataset named MedPointS for anatomy classification, completion, and segmentation. Extensive experiments conducted on MedPointS demonstrate that our method achieves superior performance across all tasks. The dataset is available at https://flemme-docs.readthedocs.io/en/latest/medpoints.html. Code is merged to a public medical imaging platform: https://github.com/wlsdzyzl/flemme.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper proposes a hierarchical state space model (SSM) framework for medical point cloud understanding. It uses farthest point sampling (FPS) to create multi-level downsampled representations, applies k-nearest neighbor (KNN) queries for multi-scale structural aggregation at each level, and introduces coordinate-order and inside-out scanning strategies to serialize irregular point clouds. Features are then extracted progressively using vanilla Point SSM blocks on short neighbor sequences and group Point SSM blocks on longer point sequences to capture local patterns and long-range dependencies. The authors introduce the MedPointS dataset and report superior empirical performance on anatomy classification, completion, and segmentation tasks, with the dataset and code made publicly available.
Significance. If the experimental results hold under rigorous validation, this work could offer an efficient SSM-based alternative to transformer architectures for medical point cloud tasks, where long-range dependencies and irregular sampling are common. The release of the MedPointS dataset represents a concrete contribution that may enable standardized benchmarking in medical shape analysis. The hierarchical design combining local KNN aggregation with global SSM processing addresses a relevant gap in adapting sequence models to 3D medical data.
major comments (2)
- [§3.2] §3.2 (Scanning Strategies): The coordinate-order and inside-out scanning strategies are presented as enabling valid serialization for SSM blocks without critical loss of spatial relations, yet no ablation study, information-preservation metric, or sensitivity analysis is provided to support this assumption, which is load-bearing for the claim that the method successfully processes irregular medical point clouds.
- [Section 5] Section 5 (Experiments): The central claim of superior performance across classification, completion, and segmentation on MedPointS is stated without reported quantitative metrics, error bars, statistical significance tests, or detailed baseline implementations in the experimental summary, preventing verification of whether improvements are robust and reproducible.
minor comments (3)
- [Abstract] Abstract: The abstract asserts superior performance but supplies no numerical results, baseline names, or dataset statistics, which reduces immediate clarity for readers.
- [§3.3] Notation: The distinction between 'vanilla Point SSM' and 'group Point SSM' blocks could be clarified with a small diagram or explicit equations showing how grouping is performed.
- [Introduction] References: Consider adding citations to recent SSM variants for point clouds (e.g., PointMamba or related works) to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects that will improve the rigor and clarity of the manuscript. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Scanning Strategies): The coordinate-order and inside-out scanning strategies are presented as enabling valid serialization for SSM blocks without critical loss of spatial relations, yet no ablation study, information-preservation metric, or sensitivity analysis is provided to support this assumption, which is load-bearing for the claim that the method successfully processes irregular medical point clouds.
Authors: We agree that empirical validation of the scanning strategies would strengthen the claims. In the revised manuscript, we will add a dedicated ablation study comparing coordinate-order, inside-out, and random scanning strategies on the anatomy classification task using MedPointS. We will report accuracy differences and include qualitative visualizations of serialized point sequences to illustrate preservation of local and global spatial relations. This directly supports the effectiveness of the proposed serialization for irregular medical point clouds. revision: yes
-
Referee: [Section 5] Section 5 (Experiments): The central claim of superior performance across classification, completion, and segmentation on MedPointS is stated without reported quantitative metrics, error bars, statistical significance tests, or detailed baseline implementations in the experimental summary, preventing verification of whether improvements are robust and reproducible.
Authors: We appreciate the need for greater transparency in the experimental summary. The full manuscript contains quantitative tables reporting metrics such as accuracy, Chamfer distance, and mIoU across tasks with comparisons to baselines including PointNet++, DGCNN, and Point Transformer variants. To address the concern, we will revise the experimental summary to explicitly highlight key quantitative results. We will also add error bars from multiple independent runs, include statistical significance testing (e.g., paired t-tests against baselines), and expand the description of baseline implementations with hyperparameters and training details. These changes will improve verifiability and reproducibility. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes an empirical architecture for medical point cloud processing: FPS downsampling, KNN aggregation, coordinate-order and inside-out scanning to serialize points, followed by vanilla and group Point SSM blocks for local and long-range features. Central claims concern superior performance on the released MedPointS dataset across classification, completion, and segmentation. No equations, predictions, or first-principles results are shown that reduce by construction to fitted inputs, self-definitions, or self-citation chains; the method is presented as a practical pipeline whose validity rests on external experimental benchmarks rather than internal tautologies.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Farthest point sampling and k-nearest neighbor queries preserve sufficient structural information at multiple scales.
- ad hoc to paper Coordinate-order and inside-out scanning strategies produce valid sequences for SSM without critical loss of spatial relations.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we introduce coordinate-order and inside-out scanning strategies for efficient serialization of irregular points... vanilla and group Point SSM blocks
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Atomistic Machine Learning with Irreducible Cartesian Natural Tensors
CarNet develops irreducible Cartesian natural tensors and an equivariant model that matches leading spherical-tensor performance for ML interatomic potentials and high-rank tensor predictions like elastic constants.
Reference graph
Works this paper leans on
-
[1]
ShapeNet: An Information-Rich 3D Model Repository
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[2]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin,J.,Chang,M.W.,Lee,K.,Toutanova,K.:Bert:Pre-trainingofdeepbidirec- tional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[4]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Computational Visual Media7, 187–199 (2021)
Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: Pct: Point cloud transformer. Computational Visual Media7, 187–199 (2021)
work page 2021
-
[6]
Nature methods18(2), 203–211 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)
work page 2021
-
[7]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[8]
Biomedical Engineering/Biomedizinische Technik (0) (2024)
Li, J., Zhou, Z., Yang, J., Pepe, A., Gsaxner, C., Luijten, G., Qu, C., Zhang, T., Chen, X., Li, W., et al.: Medshapenet–a large-scale dataset of 3d medical shapes for computer vision. Biomedical Engineering/Biomedizinische Technik (0) (2024)
work page 2024
-
[9]
Liang, D., Zhou, X., Xu, W., Zhu, X., Zou, Z., Ye, X., Tan, X., Bai, X.: Point- mamba:Asimplestatespacemodelforpointcloudanalysis.In:AdvancesinNeural Information Processing Systems (2024) 10 G. Zhang and J. Yang et al
work page 2024
-
[10]
arXiv preprint arXiv:2403.06467 (2024)
Liu, J., Yu, R., Wang, Y., Zheng, Y., Deng, T., Ye, W., Wang, H.: Point mamba: A novel point cloud backbone based on state space model with octree-based ordering strategy. arXiv preprint arXiv:2403.06467 (2024)
-
[11]
Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., Liu, Y.: Vmamba: Visual state space model (2024)
work page 2024
-
[12]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)
work page 2021
-
[13]
In: 2016 fourth international confer- ence on 3D vision (3DV)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international confer- ence on 3D vision (3DV). pp. 565–571. Ieee (2016)
work page 2016
-
[14]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 652–660 (2017)
work page 2017
-
[15]
Advances in neural information processing systems30(2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space. Advances in neural information processing systems30(2017)
work page 2017
-
[16]
Vm-unet: Vision mamba unet for medical image segmentation
Ruan, J., Li, J., Xiang, S.: Vm-unet: Vision mamba unet for medical image seg- mentation. arXiv preprint arXiv:2402.02491 (2024)
-
[17]
Advances in neural information pro- cessing systems30(2017)
Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)
work page 2017
-
[18]
ACM Transactions on Graphics (tog)38(5), 1–12 (2019)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog)38(5), 1–12 (2019)
work page 2019
-
[19]
arXiv preprint arXiv:2402.05079 (2024)
Wang, Z., Zheng, J.Q., Zhang, Y., Cui, G., Li, L.: Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079 (2024)
-
[20]
Advances in Neural Information Processing Systems34, 29088–29100 (2021)
Wu, T., Pan, L., Zhang, J., Wang, T., Liu, Z., Lin, D.: Balanced chamfer dis- tance as a comprehensive metric for point cloud completion. Advances in Neural Information Processing Systems34, 29088–29100 (2021)
work page 2021
-
[21]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1912–1920 (2015)
work page 1912
-
[22]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: Point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 206–215 (2018)
work page 2018
-
[23]
In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Zhang, G., Yang, J., Li, Y.: Flemme: A flexible and modular learning platform for medical images. In: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 4018–4023. IEEE (2024)
work page 2024
-
[24]
arXiv preprint arXiv:2403.00762 (2024)
Zhang,T.,Li,X.,Yuan,H.,Ji,S.,Yan,S.:Pointcouldmamba:Pointcloudlearning via state space model. arXiv preprint arXiv:2403.00762 (2024)
-
[25]
In: Proceed- ings of the IEEE/CVF international conference on computer vision
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceed- ings of the IEEE/CVF international conference on computer vision. pp. 16259– 16268 (2021)
work page 2021
-
[26]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X.: Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.