Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors
Pith reviewed 2026-05-21 04:47 UTC · model grok-4.3
The pith
Language priors from models resolve long-tail ambiguity by guiding hierarchical clustering in unsupervised 3D point cloud segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LangTail constructs an entity-level semantic prior from language models that captures balanced and fine-grained world knowledge across categories. These priors are injected into a hierarchical clustering framework via contrastive alignment. This guides multi-granularity semantic structure formation and prevents minor classes from being absorbed by dominant clusters, yielding more discriminative representations for underrepresented categories.
What carries the argument
Entity-level semantic prior extracted from language models and injected via contrastive alignment into a hierarchical clustering framework to enforce multi-level associations with visually underrepresented classes.
If this is right
- Multi-level associations between language priors and visual features compensate for the biased attention of clustering toward dominant classes.
- Hierarchical clustering produces more discriminative representations for underrepresented categories instead of absorbing them.
- The approach yields measurable gains in mean intersection-over-union on real-world 3D datasets such as ScanNet-v2, S3DIS, and nuScenes.
- Unsupervised segmentation becomes viable for scenes that follow natural long-tail class distributions without requiring labels.
Where Pith is reading between the lines
- The same language-prior mechanism could be tested on other unsupervised tasks such as 2D image clustering to isolate the contribution of the hierarchical structure.
- Controlled experiments that vary the granularity of the language priors would reveal the minimal level of semantic detail needed to protect minority classes.
- Deployment on streaming 3D data from moving sensors would show whether the alignment remains stable when new rare objects appear over time.
Load-bearing premise
Language models hold balanced world knowledge that can be transferred to 3D visual features through contrastive alignment without creating new semantic mismatches or biases.
What would settle it
Apply the method to a controlled 3D dataset where language model categories are deliberately shifted away from the visual object distributions and check whether minority-class separation still improves or reverts to the baseline absorption pattern.
Figures
read the original abstract
Existing approaches for unsupervised 3D point cloud segmentation predominantly rely on a purely visual similarity-based learning-by-clustering paradigm, which suffers from a fundamental limitation: long-tail ambiguity. In such a paradigm, features of minor classes are consistently absorbed by dominant clusters, leading to severely imbalanced predictions. To address this issue, we propose LangTail, a language-guided hierarchical learning framework that leverages the balanced world knowledge encoded in language models to mitigate long-tail ambiguity in unsupervised 3D segmentation. The key idea is to establish multi-level associations between language-derived semantic priors and visually underrepresented minor classes, thereby compensating for the biased attention of purely visual clustering toward dominant classes. Specifically, LangTail first constructs an entity-level semantic prior from language models, capturing balanced and fine-grained world knowledge across categories. These priors are injected into a hierarchical clustering framework via contrastive alignment. This guides multi-granularity semantic structure formation and prevents minor classes from being absorbed by dominant clusters, yielding more discriminative representations for underrepresented categories. Extensive experiments on ScanNet-v2, S3DIS, and nuScenes demonstrate that LangTail consistently outperforms existing methods by significant margins, \ie, +13.5, +12.9, and +8.9 mIoU, respectively. These results demonstrate the effectiveness of language priors in improving the representation of minority classes in 3D point clouds. The code will be released at: https://github.com/Whisky0129/langtail_official.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes LangTail, a language-guided hierarchical learning framework for unsupervised 3D point cloud segmentation. It extracts entity-level semantic priors from language models and injects them via contrastive alignment into a hierarchical clustering pipeline to mitigate long-tail ambiguity, where features of minor classes are absorbed by dominant visual clusters. The work reports large mIoU gains of +13.5 on ScanNet-v2, +12.9 on S3DIS, and +8.9 on nuScenes, attributing the improvements to multi-level associations between language priors and visually underrepresented classes.
Significance. If the central claims hold after verification, the result would be significant for unsupervised 3D segmentation by showing that external language priors can compensate for visual long-tail biases in clustering. The planned code release supports reproducibility and allows direct inspection of the alignment mechanism. The approach is a clear attempt to move beyond purely visual similarity-based methods.
major comments (2)
- [§3] §3 (Method), contrastive alignment description: no equation or loss term is given for how the language-prior alignment objective is balanced against the hierarchical clustering objective, so it is impossible to verify whether the mechanism corrects visual dominance or simply adds an external bias source as feared in the stress-test note.
- [Experimental results] Experimental results section and Table 1: the headline mIoU gains are reported without any description of baseline re-implementations, ablation on the language-prior component, data-split details, or statistical significance across runs; this directly undermines attribution of the +13.5/+12.9/+8.9 improvements to the proposed multi-level association rather than implementation differences.
minor comments (1)
- [Abstract] The abstract and method overview use the term 'multi-granularity semantic structure' without defining the specific granularity levels or how they map to the entity-level prior.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation of major revision. We address each major comment below and will update the manuscript to improve clarity and experimental rigor.
read point-by-point responses
-
Referee: [§3] §3 (Method), contrastive alignment description: no equation or loss term is given for how the language-prior alignment objective is balanced against the hierarchical clustering objective, so it is impossible to verify whether the mechanism corrects visual dominance or simply adds an external bias source as feared in the stress-test note.
Authors: We agree that an explicit formulation is needed. The current manuscript describes the injection of language priors via contrastive alignment into the hierarchical clustering pipeline but does not provide the combined objective. In the revision we will add a formal loss equation in §3 that defines the total objective as a weighted sum of the hierarchical clustering loss and the language-prior contrastive alignment loss, including the balancing hyperparameter. This will allow readers to verify the relative influence of each term. revision: yes
-
Referee: [Experimental results] Experimental results section and Table 1: the headline mIoU gains are reported without any description of baseline re-implementations, ablation on the language-prior component, data-split details, or statistical significance across runs; this directly undermines attribution of the +13.5/+12.9/+8.9 improvements to the proposed multi-level association rather than implementation differences.
Authors: We acknowledge the need for greater transparency. The revised Experimental results section will include: (i) details on baseline re-implementations and any adaptations made, (ii) a new ablation study that isolates the language-prior component, (iii) explicit train/val/test split information for ScanNet-v2, S3DIS, and nuScenes, and (iv) mean mIoU and standard deviation computed over multiple independent runs with different random seeds. These additions will support the attribution of gains to the proposed multi-level language-visual associations. revision: yes
Circularity Check
No significant circularity; method relies on external pre-trained LMs and visual extractors
full rationale
The derivation chain begins with external language models providing entity-level semantic priors, which are then contrastively aligned into a hierarchical clustering pipeline on 3D features. Performance is evaluated via mIoU on held-out benchmarks (ScanNet-v2, S3DIS, nuScenes) rather than any quantity fitted from the same data and re-labeled as a prediction. No equations reduce the reported gains to self-referential fits, and no load-bearing uniqueness theorem or ansatz is imported from the authors' own prior work. The central claim therefore remains independent of the target datasets' labels or statistics.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Language models encode balanced and fine-grained world knowledge across categories that can compensate for visual clustering bias.
- domain assumption Contrastive alignment between language priors and visual features produces more discriminative representations for underrepresented classes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LangTail first constructs an entity-level semantic prior from language models... injected into a hierarchical clustering framework via contrastive alignment... Lentity = −wci 1/M ∑ log exp(fθ(Pi)·b+/τ) / (exp(...) + ∑ exp(...))
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Ward-based hierarchical clustering... K={120,80,Kprim}... dual-branch (Local/Global) with spectral graph Fourier basis
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
P. Krüsi, P. Furgale, M. Bosse, and R. Siegwart, “Driving on point clouds: Motion planning, trajectory optimization, and terrain assessment in generic nonplanar environments,”Journal of Field Robotics, vol. 34, no. 5, pp. 940–984, 2017
work page 2017
-
[2]
Deep learning for image and point cloud fusion in autonomous driving: A review,
Y . Cui, R. Chen, W. Chu, L. Chen, D. Tian, Y . Li, and D. Cao, “Deep learning for image and point cloud fusion in autonomous driving: A review,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 722–739, 2021
work page 2021
-
[3]
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,”arXiv preprint arXiv:2403.03954, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
arXiv preprint arXiv:2601.03782 , year=
W. Huang, Y .-W. Chao, A. Mousavian, M.-Y . Liu, D. Fox, K. Mo, and L. Fei-Fei, “Pointworld: Scaling 3d world models for in-the-wild robotic manipulation,”arXiv preprint arXiv:2601.03782, 2026
-
[5]
Randla-net: Efficient semantic segmentation of large-scale point clouds,
Q. Hu, B. Yang, L. Xie, S. Rosa, Y . Guo, Z. Wang, N. Trigoni, and A. Markham, “Randla-net: Efficient semantic segmentation of large-scale point clouds,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 108–11 117
work page 2020
-
[6]
Point transformer v3: Simpler faster stronger,
X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao, “Point transformer v3: Simpler faster stronger,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 4840–4851
work page 2024
-
[7]
Oneformer3d: One transformer for unified point cloud segmentation,
M. Kolodiazhnyi, A. V orontsova, A. Konushin, and D. Rukhovich, “Oneformer3d: One transformer for unified point cloud segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 943–20 953
work page 2024
-
[8]
Semantickitti: A dataset for semantic scene understanding of lidar sequences,
J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9297–9307
work page 2019
-
[9]
Towards semantic segmentation of urban-scale 3d point clouds: A dataset, benchmarks and challenges,
Q. Hu, B. Yang, S. Khalid, W. Xiao, N. Trigoni, and A. Markham, “Towards semantic segmentation of urban-scale 3d point clouds: A dataset, benchmarks and challenges,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4977–4987
work page 2021
-
[10]
Growsp: Unsupervised semantic segmentation of 3d point clouds,
Z. Zhang, B. Yang, B. Wang, and B. Li, “Growsp: Unsupervised semantic segmentation of 3d point clouds,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2023, pp. 17 619–17 629
work page 2023
-
[11]
Z. Chen, H. Xu, W. Chen, Z. Zhou, H. Xiao, B. Sun, X. Xieet al., “Pointdc: Unsupervised semantic segmentation of 3d point clouds via cross-modal distillation and super-voxel clustering,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14 290–14 299
work page 2023
-
[12]
Z. Zhang, W. Dai, H. Wen, and B. Yang, “Logosp: Local-global grouping of superpoints for unsupervised semantic segmentation of 3d point clouds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 1374–1384
work page 2025
-
[13]
D. Guo, F. Wu, F. Zhu, F. Leng, G. Shi, H. Chen, H. Fan, J. Wang, J. Jiang, J. Wanget al., “Seed1. 5-vl technical report,”arXiv preprint arXiv:2505.07062, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Seed1.8 Model Card: Towards Generalized Real-World Agency
B. Seed, “Seed1. 8 model card: Towards generalized real-world agency,”arXiv preprint arXiv:2603.20633, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[15]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025. 10
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
S. Bai, Y . Cai, R. Chen, K. Chen, X. Chen, Z. Cheng, L. Deng, W. Ding, C. Gao, C. Geet al., “Qwen3-vl technical report,”arXiv preprint arXiv:2511.21631, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
work page 2021
-
[18]
Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering,
J. H. Cho, U. Mall, K. Bala, and B. Hariharan, “Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 794–16 804
work page 2021
-
[19]
Invariant information clustering for unsupervised image classifica- tion and segmentation,
X. Ji, J. F. Henriques, and A. Vedaldi, “Invariant information clustering for unsupervised image classifica- tion and segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9865–9874
work page 2019
-
[20]
Deep clustering for unsupervised learning of visual features,
M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 132–149
work page 2018
-
[21]
Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,
S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, and O. Litany, “Pointcontrast: Unsupervised pre-training for 3d point cloud understanding,” inEuropean conference on computer vision. Springer, 2020, pp. 574–591
work page 2020
-
[22]
Exploring data-efficient 3d scene understanding with contrastive scene contexts,
J. Hou, B. Graham, M. Nießner, and S. Xie, “Exploring data-efficient 3d scene understanding with contrastive scene contexts,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15 587–15 597
work page 2021
-
[23]
U3ds3: Unsupervised 3d semantic scene segmentation,
J. Liu, Z. Yu, T. P. Breckon, and H. P. Shum, “U3ds3: Unsupervised 3d semantic scene segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 3759–3768
work page 2024
-
[24]
L. Zhan, J. Jie, T. Zhou, Y . Du, Y . Zheng, and X. Duan, “P-slcr: Unsupervised point cloud semantic segmentation via prototypes structure learning and consistent reasoning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 15, 2026, pp. 12 349–12 357
work page 2026
-
[25]
Growsp++: Growing superpoints and primitives for unsupervised 3d semantic segmentation,
Z. Zhang, W. Dai, B. Wang, B. Li, and B. Yang, “Growsp++: Growing superpoints and primitives for unsupervised 3d semantic segmentation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
work page 2026
-
[26]
Scan: Learning to classify images without labels,
W. Van Gansbeke, S. Vandenhende, S. Georgoulis, M. Proesmans, and L. Van Gool, “Scan: Learning to classify images without labels,” inEuropean conference on computer vision. Springer, 2020, pp. 268–285
work page 2020
-
[27]
Unsupervised semantic segmenta- tion by distilling feature correspondences,
M. Hamilton, Z. Zhang, B. Hariharan, N. Snavely, and W. T. Freeman, “Unsupervised semantic segmenta- tion by distilling feature correspondences,”arXiv preprint arXiv:2203.08414, 2022
-
[28]
Controlrm: Fast and controllable 3d generation via large reconstruction model,
H. Xu, W. Chen, Z. Zhou, F. Xiao, B. Sun, M. Z. Shou, and W. Kang, “Controlrm: Fast and controllable 3d generation via large reconstruction model,”arXiv preprint arXiv:2410.09592, 2024
-
[29]
Cyc3d: Fine-grained controllable 3d generation via cycle consistency regularization,
H. Xu, C. Yu, F. Xiao, J. Xing, H. Ci, W. Chen, F. Wang, and M. Li, “Cyc3d: Fine-grained controllable 3d generation via cycle consistency regularization,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 21, 2026, pp. 17 895–17 903
work page 2026
-
[30]
Self-supervised multi-view stereo via effective co- segmentation and data-augmentation,
H. Xu, Z. Zhou, Y . Qiao, W. Kang, and Q. Wu, “Self-supervised multi-view stereo via effective co- segmentation and data-augmentation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021, pp. 3030–3038
work page 2021
-
[31]
Digging into uncertainty in self-supervised multi-view stereo,
H. Xu, Z. Zhou, Y . Wang, W. Kang, B. Sun, H. Li, and Y . Qiao, “Digging into uncertainty in self-supervised multi-view stereo,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6078–6087
work page 2021
-
[32]
Costformer: Cost transformer for cost aggregation in multi-view stereo,
W. Chen, H. Xu, Z. Zhou, Y . Liu, B. Sun, W. Kang, and X. Xie, “Costformer: Cost transformer for cost aggregation in multi-view stereo,”arXiv preprint arXiv:2305.10320, 2023
-
[33]
Semi-supervised deep multi-view stereo,
H. Xu, W. Chen, Y . Liu, Z. Zhou, H. Xiao, B. Sun, X. Xie, and W. Kang, “Semi-supervised deep multi-view stereo,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 4616–4625
work page 2023
-
[34]
Robustmvs: Single domain generalized deep multi-view stereo,
H. Xu, W. Chen, B. Sun, X. Xie, and W. Kang, “Robustmvs: Single domain generalized deep multi-view stereo,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 10, pp. 9181–9194, 2024. 11
work page 2024
-
[35]
Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning,
X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, “Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 2639–2650
work page 2023
-
[36]
Openscene: 3d scene understanding with open vocabularies,
S. Peng, K. Genova, C. Jiang, A. Tagliasacchi, M. Pollefeys, T. Funkhouseret al., “Openscene: 3d scene understanding with open vocabularies,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 815–824
work page 2023
-
[37]
Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,
L. Xue, M. Gao, C. Xing, R. Martín-Martín, J. Wu, C. Xiong, R. Xu, J. C. Niebles, and S. Savarese, “Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 1179– 1189
work page 2023
-
[38]
Openshape: Scaling up 3d shape representation towards open-world understanding,
M. Liu, R. Shi, K. Kuang, Y . Zhu, X. Li, S. Han, H. Cai, F. Porikli, and H. Su, “Openshape: Scaling up 3d shape representation towards open-world understanding,”Advances in neural information processing systems, vol. 36, pp. 44 860–44 879, 2023
work page 2023
-
[39]
Foundational models for 3d point clouds: A survey and outlook,
V . Thengane, X. Zhu, S. Bouzerdoum, S. L. Phung, and Y . Li, “Foundational models for 3d point clouds: A survey and outlook,”arXiv preprint arXiv:2501.18594, 2025
-
[40]
Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding,
J. Yang, R. Ding, W. Deng, Z. Wang, and X. Qi, “Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 823–19 832
work page 2024
-
[41]
SAM 2: Segment Anything in Images and Videos
N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Scannet: Richly-annotated 3d reconstructions of indoor scenes,
A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839
work page 2017
-
[43]
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” arXiv preprint arXiv:1702.01105, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631
work page 2020
-
[45]
Pointnet: Deep learning on point sets for 3d classification and segmentation,
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660
work page 2017
-
[46]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space,
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[47]
3d semantic segmentation with submanifold sparse convolutional networks,
B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232
work page 2018
-
[48]
Dynamic graph cnn for learning on point clouds,
Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,”ACM Transactions on Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019
work page 2019
-
[49]
Pointcnn: Convolution on x-transformed points,
Y . Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x-transformed points,” Advances in neural information processing systems, vol. 31, 2018. 12 APPENDIX A Broader Imapacts Our work significantly advances unsupervised 3D semantic segmentation by enabling accurate, label-free discovery of complex scene structures directly from raw po...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.