Recognition: 2 theorem links
· Lean TheoremFrom Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments
Pith reviewed 2026-05-08 19:09 UTC · model grok-4.3
The pith
Switching from spherical to Gaussian cropping improves 3D semantic segmentation on large outdoor point clouds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Replacing spherical cropping with Gaussian, exponential, or linear functions produces subclouds whose spatial extent is larger for any fixed point budget, and this change raises semantic segmentation accuracy on large-scale outdoor datasets while leaving indoor results largely unchanged.
What carries the argument
Cropping functions (Gaussian, exponential, linear) that map a target point count to a larger spatial radius than the conventional spherical crop.
If this is right
- Performance gains appear most clearly in large-scale outdoor scenes.
- New state-of-the-art segmentation scores are reached on the tested outdoor datasets.
- Both evaluated network architectures benefit from the change.
- The released code makes the cropping variants directly reusable.
Where Pith is reading between the lines
- The same cropping adjustment could be tested on related tasks such as object detection or instance segmentation in point clouds.
- Combining the larger-radius crops with existing downsampling or voxelization pipelines might reduce memory use further.
- In LiDAR-based mapping applications, wider context per subcloud could improve consistency across scene boundaries.
Load-bearing premise
Observed accuracy differences are caused by the shape of the crop rather than by unmeasured differences in point density, training schedule, or implementation details.
What would settle it
Re-run the exact same models and datasets with identical hyperparameters and random seeds while forcing every crop to contain the same number of points; if the accuracy gap between spherical and Gaussian crops disappears, the central claim is falsified.
Figures
read the original abstract
Large-scale 3D point clouds can consist of billions of points. Even after downsampling, these point clouds are too large for modern 3D neural networks. In order to develop a semantic understanding of the scene, the point clouds are divided into smaller subclouds that can be processed. Typically, this division is done using spherical crops, resulting in a loss of surrounding geometric context. To address this issue, we propose alternative methods that produce subclouds with larger crop sizes while maintaining a similar number of points. Specifically, we compare exponential, Gaussian, and linear cropping methods with the spherical method. We evaluated two 3D deep learning model architectures using multiple indoor and outdoor environment datasets. Our results demonstrate that altering the cropping strategy can enhance model performance, especially for large-scale outdoor scenes, yielding new state-of-the-art results. Code is available at https://github.com/mvg-inatech/point_cloud_cropping
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes alternative point cloud cropping strategies (exponential, Gaussian, and linear) to the standard spherical cropping for processing large-scale 3D point clouds in semantic segmentation tasks. These methods aim to provide larger crop sizes while maintaining a similar number of points, thereby preserving more geometric context. The approaches are evaluated using two 3D deep learning model architectures on multiple indoor and outdoor datasets, with claims of enhanced performance, particularly in large-scale outdoor scenes, and achievement of new state-of-the-art results.
Significance. If the results hold and are properly controlled, this could offer a practical improvement for handling large 3D environments without modifying network architectures, potentially benefiting applications in robotics and autonomous systems. The provision of code supports reproducibility and further research.
major comments (2)
- Abstract: the claim of performance gains and new state-of-the-art results is asserted without any numerical scores, statistical significance tests, or details on how point counts were equalized across methods, leaving the central claim unsupported by visible evidence.
- §4 (Experimental Evaluation): the attribution of gains to crop geometry is insecure without explicit controls confirming identical point sampling, density, and training protocols across strategies. The abstract states alternatives maintain 'a similar number of points' but does not specify the exact sampling procedure inside each crop region; non-uniform outdoor densities could therefore produce systematically different local statistics even at matched cardinality.
minor comments (2)
- Abstract: consider including at least one key quantitative result (e.g., mIoU delta on an outdoor dataset) to make the summary of findings more informative.
- Methods section: provide explicit equations or pseudocode for the exponential, Gaussian, and linear cropping functions to ensure precise reproducibility beyond the high-level description.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work's potential impact and for the detailed comments. We address each major comment below and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: Abstract: the claim of performance gains and new state-of-the-art results is asserted without any numerical scores, statistical significance tests, or details on how point counts were equalized across methods, leaving the central claim unsupported by visible evidence.
Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version we will incorporate specific performance metrics (such as mIoU improvements on the outdoor datasets) drawn from the experimental results already reported in §4, and we will explicitly state that point counts were equalized to the same cardinality across all cropping strategies. We will also add a brief reference to variance across runs to address statistical significance. revision: yes
-
Referee: §4 (Experimental Evaluation): the attribution of gains to crop geometry is insecure without explicit controls confirming identical point sampling, density, and training protocols across strategies. The abstract states alternatives maintain 'a similar number of points' but does not specify the exact sampling procedure inside each crop region; non-uniform outdoor densities could therefore produce systematically different local statistics even at matched cardinality.
Authors: We thank the referee for highlighting this point. The manuscript already uses identical training protocols (same architectures, hyperparameters, and optimization settings) for all cropping strategies. To make the controls fully explicit, we will expand §4 with a precise description of the point-sampling procedure applied inside each crop region and will add a short analysis confirming that the resulting local point-density distributions remain comparable across methods. These additions will better isolate the contribution of crop geometry. revision: yes
Circularity Check
No circularity: purely empirical comparison with no derivations or self-referential reductions
full rationale
The manuscript is an experimental study that evaluates four cropping geometries (spherical, exponential, Gaussian, linear) on public indoor/outdoor point-cloud datasets using two standard 3D network architectures. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the abstract or described content. Performance differences are reported as measured outcomes on fixed benchmarks; they do not reduce by construction to quantities defined inside the paper. Any self-citations are incidental and non-load-bearing for the central empirical claim.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
p_g = e^{-(d/σ_d)^2}; p_e = e^{-λd}; p_l = (d_m - d)/d_m; p_s = 1 if d<d_m else 0
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
C. R. Qi, H. Su, K. Mo, L. J. Guibas, Point- Net: Deep Learning on Point Sets for 3D Clas- sification and Segmentation, arXiv:1612.00593 [cs] (Feb. 2016).doi:10.48550/arXiv.1612. 00593. URLhttp://arxiv.org/abs/1612.00593
-
[2]
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
I. Armeni, S. Sax, A. R. Zamir, S. Savarese, Joint 2D-3D-Semantic Data for Indoor Scene Understanding, arXiv:1702.01105 [cs] (Apr. 2017).doi:10.48550/arXiv.1702.01105. URLhttp://arxiv.org/abs/1702.01105
-
[3]
I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, S. Savarese, 3D Seman- tic Parsing of Large-Scale Indoor Spaces, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA, 2016, pp. 1534–1543. doi:10.1109/CVPR.2016.170. URLhttp://ieeexplore.ieee.org/ document/7780539/
- [4]
-
[5]
H. Thomas, Y.-H. H. Tsai, T. D. Bar- foot, J. Zhang, KPConvX: Modernizing Ker- nel Point Convolution with Kernel Attention, arXiv:2405.13194 [cs] (May 2024).doi:10. 48550/arXiv.2405.13194. URLhttp://arxiv.org/abs/2405.13194
-
[6]
Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, M. Bennamoun, Deep Learning for 3D Point Clouds: A Survey, IEEE Transac- tions on Pattern Analysis and Machine Intelligence 43 (12) (2021) 4338–4364. doi:10.1109/TPAMI.2020.3005434. URLhttps://ieeexplore.ieee.org/ document/9127813/
-
[7]
R. Zhang, Y. Wu, W. Jin, X. Meng, Deep-Learning-Based Point Cloud Se- mantic Segmentation: A Survey, Elec- tronics 12 (17) (2023) 3642.doi: 10.3390/electronics12173642. URLhttps://www.mdpi.com/2079-9292/ 12/17/3642
-
[8]
Li, L., Bayuelo, A., Bobadilla, L., Alam, T., and Shell, D
A. Milioto, I. Vizzo, J. Behley, C. Stach- niss, RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Macau, China, 2019, pp. 4213–4220. doi:10.1109/IROS40897.2019.8967762. URLhttps://ieeexplore.ieee.org/ document/8967762/
-
[9]
T. Cortinhal, G. Tzelepis, E. Erdal Aksoy, SalsaNext: Fast, Uncertainty-Aware Seman- tic Segmentation of LiDAR Point Clouds, in: G. Bebis, Z. Yin, E. Kim, J. Bender, K. Subr, B. C. Kwon, J. Zhao, D. Kalkofen, G. Baciu (Eds.), Advances in Visual Com- puting, Vol. 12510, Springer International Publishing, Cham, 2020, pp. 207–222, series 12 Title: Lecture No...
- [10]
-
[11]
B. Wu, X. Zhou, S. Zhao, X. Yue, K. Keutzer, SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE, Montreal, QC, Canada, 2019, pp. 4376–4382. doi:10.1109/ICRA.2019.8793495. URLhttps://ieeexplore.ieee.org/ d...
- [12]
-
[13]
DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection
Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, H. Foroosh, PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 9598–9607. doi:10.1109/CVPR42600.2020.00962. URLhttps://ieeexplore.ieee.org/ docume...
-
[14]
G.Shi, R.Li, C.Ma, PillarNet: Real-Timeand High-Performance Pillar-Based 3D Object De- tection, in: S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, T. Hassner (Eds.), Computer Vision – ECCV 2022, Vol. 13670, Springer Nature Switzerland, Cham, 2022, pp. 35–52, series Title: Lecture Notes in Computer Sci- ence.doi:10.1007/978-3-031-20080-9_3. URLhttps://l...
-
[15]
Lee, Jialiang Alan Zhao, Amrita S
M. Gerdzhev, R. Razani, E. Taghavi, L. Bingbing, TORNADO-Net: mulTi- view tOtal vaRiatioN semAntic segmen- tation with Diamond inceptiOn module, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Xi’an, China, 2021, pp. 9543–9549. doi:10.1109/ICRA48506.2021.9562041. URLhttps://ieeexplore.ieee.org/ document/9562041/
-
[16]
K. Chen, R. Oldja, N. Smolyanskiy, S. Birch- field, A. Popov, D. Wehr, I. Eden, J. Pehserl, MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views, arXiv:2006.05518 [cs] (Aug. 2020).doi:10.48550/arXiv.2006.05518. URLhttp://arxiv.org/abs/2006.05518
- [17]
-
[18]
Çiçek, A
Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation, in: S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, W. Wells (Eds.), Medical Image Computing and Computer- Assisted Intervention – MICCAI 2016, Springer International Publishing, Cham, 2016, pp. 424–432
2016
-
[19]
Savarese, SEGCloud: Semantic Segmen- tation of 3D Point Clouds, arXiv:1710.07563 [cs] (Oct
L.P.Tchapmi, C.B.Choy, I.Armeni, J.Gwak, S. Savarese, SEGCloud: Semantic Segmen- tation of 3D Point Clouds, arXiv:1710.07563 [cs] (Oct. 2017).doi:10.48550/arXiv.1710. 07563. URLhttp://arxiv.org/abs/1710.07563
-
[20]
B. Graham, M. Engelcke, L. V. D. Maaten, 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, in: 2018 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 2018, pp. 9224–9232. doi:10.1109/CVPR.2018.00961. URLhttps://ieeexplore.ieee.org/ document/8579059/ 13
-
[21]
C. Choy, J. Gwak, S. Savarese, 4D Spatio- Temporal ConvNets: Minkowski Convolu- tional Neural Networks, in: 2019 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 3070–3079. doi:10.1109/CVPR.2019.00319. URLhttps://ieeexplore.ieee.org/ document/8953494/
-
[22]
Contributors, Spconv: Spatially Sparse Convolution Library (2022)
S. Contributors, Spconv: Spatially Sparse Convolution Library (2022). URLhttps://github.com/traveller59/ spconv
2022
-
[23]
X. Ding, X. Zhang, J. Han, G. Ding, Scal- ing Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 11953–11965. doi:10.1109/CVPR52688.2022.01166. URLhttps://ieeexplore.ieee.org/ document/9880273/
-
[24]
Y. Chen, J. Liu, X. Zhang, X. Qi, J. Jia, LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs, in: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), IEEE, Vancou- ver, BC, Canada, 2023, pp. 13488–13498. doi:10.1109/CVPR52729.2023.01296. URLhttps://ieeexplore.ieee.org/ document/10203060/
-
[25]
T. Feng, W. Wang, F. Ma, Y. Yang, LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels, in: 2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 14916–14927. doi:10.1109/CVPR52733.2024.01413. URLhttps://ieeexplore.ieee.org/ document/10656196/
-
[26]
B. Peng, X. Wu, L. Jiang, Y. Chen, H. Zhao, Z. Tian, J. Jia, OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 21305– 21315.doi:10.1109/CVPR52733.2024.02013. URLhttps://ieeexplore.ieee.org/ document/10655421/
-
[27]
C. R. Qi, L. Yi, H. Su, L. J. Guibas, Point- Net++: Deep Hierarchical Feature Learn- ing on Point Sets in a Metric Space, arXiv:1706.02413 [cs] (Jun. 2017).doi:10. 48550/arXiv.1706.02413. URLhttp://arxiv.org/abs/1706.02413
work page Pith review arXiv 2017
-
[28]
Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, A. Markham, RandLA- Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 11105–11114. doi:10.1109/CVPR42600.2020.01112. URLhttps://ieeexplore.ieee.org/ document/9156466/
-
[29]
L. Landrieu, M. Simonovsky, Large-scale Point CloudSemanticSegmentationwithSuperpoint Graphs, arXiv:1711.09869 [cs] (Mar. 2018). doi:10.48550/arXiv.1711.09869. URLhttp://arxiv.org/abs/1711.09869
-
[30]
Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, J. M. Solomon, Dynamic Graph CNN for Learning on Point Clouds, ACM Transactions on Graphics 38 (5) (2019) 1–12. doi:10.1145/3326362. URLhttps://dl.acm.org/doi/10.1145/ 3326362
-
[31]
H. Lei, N. Akhtar, A. Mian, SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 11608–11617. doi:10.1109/CVPR42600.2020.01163. URLhttps://ieeexplore.ieee.org/ document/9157177/
-
[32]
M. Tatarchenko, J. Park, V. Koltun, Q.-Y. Zhou, Tangent Convolutions for Dense Predic- tion in 3D, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 2018, pp. 3887–3896.doi:10.1109/CVPR.2018.00409. URLhttps://ieeexplore.ieee.org/ document/8578507/
- [33]
-
[34]
H. Thomas, C. R. Qi, J.-E. Deschaud, B. Mar- cotegui, F. Goulette, L. J. Guibas, KPConv: Flexible and Deformable Convolution for Point Clouds, arXiv:1904.08889 [cs] (Aug. 2019). doi:10.48550/arXiv.1904.08889. URLhttp://arxiv.org/abs/1904.08889
-
[35]
X. Li, Z. Zhang, Y. Li, M. Huang, J. Zhang, SFL-NET: Slight Filter Learn- ing Network for Point Cloud Semantic Segmentation, IEEE Transactions on Geo- science and Remote Sensing 61 (2023) 1–14. doi:10.1109/TGRS.2023.3313876. URLhttps://ieeexplore.ieee.org/ document/10250869/
-
[36]
H. Zhao, L. Jiang, J. Jia, P. Torr, V. Koltun, Point Transformer, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Mon- treal, QC, Canada, 2021, pp. 16239–16248. doi:10.1109/ICCV48922.2021.01595. URLhttps://ieeexplore.ieee.org/ document/9710703/
-
[37]
Computa- tional Visual Media 7:187–199
M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, S.-M. Hu, PCT: Point cloud transformer, Computational Visual Media 7 (2) (2021) 187–199. doi:10.1007/s41095-021-0229-5. URLhttps://ieeexplore.ieee.org/ document/10897555/
-
[38]
X. Wu, Y. Lao, L. Jiang, X. Liu, H. Zhao, Point Transformer V2: Grouped Vector Attention and Partition-based Pooling, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Bel- grave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 33330–33342. URLhttps://proceedings.neurips. cc/paper_files/paper/20...
2022
-
[39]
X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He, H. Zhao, Point Transformer V3: Simpler, Faster, Stronger, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 4840– 4851.doi:10.1109/CVPR52733.2024.00463. URLhttps://ieeexplore.ieee.org/ document/10658198/
-
[40]
Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, B. Guo, Swin Transformer V2: Scal- ing Up Capacity and Resolution, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 11999–12009. doi:10.1109/CVPR52688.2022.01170. URLhttps://ieeexplore.ieee.org/ do...
-
[41]
Y.-Q. Yang, Y.-X. Guo, J.-Y. Xiong, Y. Liu, H. Pan, P.-S. Wang, X. Tong, B. Guo, Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding, Compu- tational Visual Media 11 (1) (2025) 83–101. doi:10.26599/CVM.2025.9450383. URLhttps://ieeexplore.ieee.org/ document/10901941/
-
[42]
M. Kellner, B. Stahl, A. Reiterer, Fused Projection-Based Point Cloud Segmen- tation, Sensors 22 (3) (2022) 1139. doi:10.3390/s22031139. URLhttps://www.mdpi.com/1424-8220/ 22/3/1139
-
[43]
Y.Hou, X.Zhu, Y.Ma, C.C.Loy, Y.Li, Point- to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation, arXiv:2206.02099 [cs] (Jun. 2022).doi:10.48550/arXiv.2206. 02099. URLhttp://arxiv.org/abs/2206.02099
-
[44]
J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, S. Pu, RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for Li- DAR Point Cloud Segmentation, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Mon- treal, QC, Canada, 2021, pp. 16004–16013. doi:10.1109/ICCV48922.2021.01572. URLhttps://ieeexplore.ieee.org/ document/9709941/
-
[45]
In: Proceedings of the IEEE/CVF ICCV, pp
M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, A. Joulin, Emerging Properties in Self-Supervised Vision Trans- formers, in: 2021 IEEE/CVF International 15 Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 2021, pp. 9630– 9640.doi:10.1109/ICCV48922.2021.00951. URLhttps://ieeexplore.ieee.org/ document/9709990/
-
[46]
K. He, X. Chen, S. Xie, Y. Li, P. Dollar, R. Girshick, Masked Autoencoders Are Scal- able Vision Learners, in: 2022 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 15979–15988. doi:10.1109/CVPR52688.2022.01553. URLhttps://ieeexplore.ieee.org/ document/9879206/
-
[47]
DINOv2: Learning Robust Visual Features without Supervision
M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, P. Bojanowski, DI- NOv2: Learning Robust Visual Features with- out Supervisi...
work page internal anchor Pith review doi:10.48550/arxiv.2304.07193 2024
-
[48]
O. Siméoni, H. V. Vo, M. Seitzer, F. Bal- dassarre, M. Oquab, C. Jose, V. Khali- dov, M. Szafraniec, S. Yi, M. Ramamon- jisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sen- tana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. Jé- gou, P. Labatut, P. Bojanowski, DINOv3, arXiv:2508.10104 [cs] (Aug. 202...
work page Pith review arXiv 2025
-
[49]
S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, O. Litany, PointContrast: Unsupervised Pre- training for 3D Point Cloud Understand- ing, in: A. Vedaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.), Computer Vision – ECCV 2020, Springer International Publish- ing, Cham, 2020, pp. 574–591
2020
-
[50]
Y. Pang, W. Wang, F. E. H. Tay, W. Liu, Y. Tian, L. Yuan, Masked Autoencoders for Point Cloud Self-supervised Learning, in: S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, T. Hassner (Eds.), Computer Vision – ECCV 2022, Vol. 13662, Springer Nature Switzerland, Cham, 2022, pp. 604–621, series Title: Lecture Notes in Computer Science. doi:10.1007/978-3-03...
-
[51]
X. Wu, X. Wen, X. Liu, H. Zhao, Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning, in: 2023 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), IEEE, Vancouver, BC, Canada, 2023, pp. 9415–9424. doi:10.1109/CVPR52729.2023.00908. URLhttps://ieeexplore.ieee.org/ document/10203752/
- [52]
-
[53]
arXiv preprint arXiv:2510.23607 (2025) 19 36 N
Y. Zhang, X. Wu, Y. Lao, C. Wang, Z. Tian, N. Wang, H. Zhao, Concerto: Joint 2D- 3D Self-Supervised Learning Emerges Spatial Representations, arXiv:2510.23607 [cs] (Oct. 2025).doi:10.48550/arXiv.2510.23607. URLhttp://arxiv.org/abs/2510.23607
-
[54]
H. Zhu, H. Yang, X. Wu, D. Huang, S. Zhang, X. He, H. Zhao, C. Shen, Y. Qiao, T. He, W. Ouyang, PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre- trainingParadigm, arXiv:2310.08586[cs](Apr. 2025).doi:10.48550/arXiv.2310.08586. URLhttp://arxiv.org/abs/2310.08586
- [55]
-
[56]
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, S. Zhao, S. Cheng, Y. Zhang, J. Shlens, 16 Z. Chen, D. Anguelov, Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset, arXiv:1912.0483...
-
[57]
H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, nuScenes: A mul- timodal dataset for autonomous driving, arXiv:1903.11027 [cs] (May 2020).doi:10. 48550/arXiv.1903.11027. URLhttp://arxiv.org/abs/1903.11027
-
[58]
X. Roynard, J.-E. Deschaud, F. Goulette, Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification, The International Journal of Robotics Re- search 37 (6) (2018) 545–557.doi:10.1177/ 0278364918767506. URLhttps://journals.sagepub.com/doi/ 10.1177/0278364918767506
-
[59]
W. Tan, N. Qin, L. Ma, Y. Li, J. Du, G. Cai, K. Yang, J. Li, Toronto-3D: A Large-scale Mo- bile LiDAR Dataset for Semantic Segmenta- tion of Urban Roadways, in: 2020 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition Workshops (CVPRW), IEEE, Seattle, WA, USA, 2020, pp. 797–806. doi:10.1109/CVPRW50498.2020.00109. URLhttps://ieeexplore.ieee.or...
-
[61]
H. Thomas, F. Goulette, J.-E. Deschaud, B. Marcotegui, Y. LeGall, Semantic Clas- sification of 3D Point Clouds with Multi- scale Spherical Neighborhoods, in: 2018 International Conference on 3D Vision (3DV), IEEE, Verona, 2018, pp. 390–398. doi:10.1109/3DV.2018.00052. URLhttps://ieeexplore.ieee.org/ document/8490990/
-
[62]
N. Varney, V. K. Asari, Q. Graehling, Pyra- mid Point: A Multi-Level Focusing Network for Revisiting Feature Layers (2020).doi: 10.48550/ARXIV.2011.08692. URLhttps://arxiv.org/abs/2011.08692
-
[63]
S. Yoo, Y. Jeong, M. Jameela, G. Sohn, Human Vision Based 3D Point Cloud Se- mantic Segmentation of Large-Scale Outdoor Scenes, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Vancou- ver, BC, Canada, 2023, pp. 6577–6586. doi:10.1109/CVPRW59228.2023.00699. URLhttps://ieeexplore.ieee.org/ document/10208664/
-
[64]
S. Yoo, Y. Jeong, M. M. Sheikholeslami, G. Sohn, EyeNet++: A Multiscale and Multidensity Approach for Outdoor 3-D Semantic Segmentation Inspired by the Hu- man Visual Field, IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1–19.doi:10.1109/TGRS.2025.3589287. URLhttps://ieeexplore.ieee.org/ document/11080501/
-
[65]
Contributors, Spconv: Spatially sparse convolution library,https://github.com/ traveller59/spconv(2022)
S. Contributors, Spconv: Spatially sparse convolution library,https://github.com/ traveller59/spconv(2022)
2022
-
[66]
M. Fey, J. E. Lenssen, Fast Graph Repre- sentation Learning with PyTorch Geometric, arXiv:1903.02428 [cs] (Apr. 2019).doi:10. 48550/arXiv.1903.02428. URLhttp://arxiv.org/abs/1903.02428
work page internal anchor Pith review arXiv 1903
-
[67]
T. Dao, D. Y. Fu, S. Ermon, A. Rudra, C. Ré, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, version Number: 2 (2022).doi:10.48550/ARXIV. 2205.14135. URLhttps://arxiv.org/abs/2205.14135
work page internal anchor Pith review doi:10.48550/arxiv 2022
-
[68]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
T. Dao, FlashAttention-2: Faster Attention with Better Parallelism and Work Partition- ing, version Number: 1 (2023).doi:10. 48550/ARXIV.2307.08691. URLhttps://arxiv.org/abs/2307.08691
work page internal anchor Pith review arXiv 2023
-
[69]
Ioffe, C
S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Re- ducing Internal Covariate Shift, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd In- ternational Conference on Machine Learning, 17 Vol. 37 of Proceedings of Machine Learning Research, PMLR, Lille, France, 2015, pp. 448–456. URLhttps://proceedings.mlr.press/ v37/ioffe15.html
2015
-
[70]
J. L. Ba, J. R. Kiros, G. E. Hinton, Layer Nor- malization, version Number: 1 (2016).doi: 10.48550/ARXIV.1607.06450. URLhttps://arxiv.org/abs/1607.06450
-
[71]
C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cour- napeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. Van Kerkwijk, M. Brett, A. Hal- dane, J. F. Del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T. E. Oliphant, Array programmin...
-
[72]
Paszke, S
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d....
2019
-
[73]
S. K. Lam, A. Pitrou, S. Seibert, Numba: a LLVM-based Python JIT compiler, in: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, ACM, Austin Texas, 2015, pp. 1– 6.doi:10.1145/2833157.2833162. URLhttps://dl.acm.org/doi/10.1145/ 2833157.2833162
-
[74]
Decoupled Weight Decay Regularization
I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, version Number: 3 (2017).doi:10.48550/ARXIV.1711.05101. URLhttps://arxiv.org/abs/1711.05101
-
[75]
L. N. Smith, N. Topin, Super-Convergence: Very Fast Training of Neural Networks Us- ing Large Learning Rates, version Number: 3 (2017).doi:10.48550/ARXIV.1708.07120. URLhttps://arxiv.org/abs/1708.07120
-
[76]
Rethinking the Inception Architecture for Computer Vision
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the Inception Architec- ture for Computer Vision, version Number: 3 (2015).doi:10.48550/ARXIV.1512.00567. URLhttps://arxiv.org/abs/1512.00567
-
[77]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020).doi:10.48550/ ARXIV.2010.11929. URLhttps://arxiv.org/abs/2010.11929
work page Pith review arXiv 2020
-
[78]
X. Jiao, C. Lv, J. Zhao, R. Yi, Y.-H. Wen, Z. Pan, Z. Wu, Y.-J. Liu, Weighted Poisson- disk Resampling on Large-Scale Point Clouds, Proceedings of the AAAI Conference on Artificial Intelligence 39 (4) (2025) 4084–4092. doi:10.1609/aaai.v39i4.32428. URLhttps://ojs.aaai.org/index.php/ AAAI/article/view/32428
-
[79]
M. Kellner, A. Schmitt, A. Reiterer, Automatic Generation of 3D Bridge Models from 3D Point Clouds, Re- sults in Engineering (2026) 109532doi: 10.1016/j.rineng.2026.109532. URLhttps://linkinghub.elsevier.com/ retrieve/pii/S2590123026005724 18
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.