An Enhanced Geometric-Spectral Feature Learning Framework for Airborne Multispectral Point Cloud Classification
Pith reviewed 2026-06-27 17:18 UTC · model grok-4.3
The pith
A two-stream attention fusion model with joint loss improves classification of airborne multispectral point clouds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The model extracts position-encoded global spectral features in one stream using fusion self-attention and spectral-guided geometric features in the second stream using multikernel point convolution and feature aggregation attention; a residual attention fusion block then integrates the informative parts from both streams, while a joint loss function strengthens performance on unbalanced and inter-class similar samples.
What carries the argument
Two-stream attention-based geometric-spectral feature fusion via residual attention block, paired with a joint loss function.
If this is right
- Higher classification accuracy on airborne MPC data with inter-class spectral overlap.
- More effective use of both geometric structure and spectral signatures in a single model.
- Improved handling of unbalanced class distributions common in remote-sensing point clouds.
Where Pith is reading between the lines
- The same two-stream fusion pattern could be tested on other multimodal 3D datasets that combine geometry with per-point attributes.
- The joint loss design might transfer to other remote-sensing tasks that suffer from class imbalance and feature similarity.
- Public release of the two datasets would allow direct comparison of future fusion methods against this baseline.
Load-bearing premise
Performance gains on the two datasets come from the attention fusion and joint loss rather than from dataset construction choices, preprocessing, or unstated tuning.
What would settle it
Training the same network architecture on the two datasets but removing the attention mechanisms and joint loss term, then checking whether accuracy falls to levels comparable with earlier methods.
Figures
read the original abstract
Multispectral point cloud (MPC) is composed of 3D spatial-spectral information, which holds tremendous potential for accurate land-cover classification. However, the representation power of classification models is limited by inherent high-dimensional and heterogeneous spatial-spectral information, unbalanced sample distribution, and inter-class spectral similarity of airborne MPCs. We build two MPC datasets and propose an enhanced geometric-spectral feature learning framework based on attentions for airborne MPC classification. A key component in our model is a two-stream feature fusion method with attention mechanisms, which enhances the representation capability of spatial-spectral features from high-dimensional heterogeneous MPCs. The first stream aims to extract position-encoded global spectral features with fusion self-attention, and the second stream comprises a multikernel point convolution and feature aggregation attention to extract spectral-guided geometric features. We then develop a residual attention fusion block to integrate the most informative geometric-spectral features from the two parallel streams. Another important contribution of this work is a joint loss function to improve the learning ability on unbalanced and interclass similar samples. Experimental results on two airborne MPC datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art methods. Furthermore, the codes and datasets used in this paper will be made available freely at https://github.com/HITlixian/TGRS_GSFF.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces two new airborne multispectral point cloud (MPC) datasets and proposes an enhanced geometric-spectral feature learning framework. Key components include a two-stream attention fusion method (position-encoded spectral self-attention in one stream and multikernel point convolution with feature aggregation attention in the other), a residual attention fusion block to integrate features, and a joint loss function to handle unbalanced samples and inter-class spectral similarity. The central claim is that this framework yields superior land-cover classification performance compared to state-of-the-art methods on the two introduced datasets, with code and data to be released.
Significance. If the performance improvements are robustly attributable to the proposed attention mechanisms and joint loss, the work could advance handling of high-dimensional heterogeneous spatial-spectral data in remote sensing. The commitment to release datasets and code is a clear strength that supports reproducibility and further research in the field.
major comments (2)
- [Experimental results] Experimental results section: The claim that the two-stream attention fusion and joint loss produce the reported gains is load-bearing but unsecured, as the manuscript simultaneously introduces both the method and the new datasets without ablation studies isolating the contribution of each proposed component (attention fusion, residual block, joint loss) from dataset-specific tuning, preprocessing, or implementation choices.
- [Abstract] Abstract and results: No quantitative metrics, error bars, ablation tables, or statistical tests are referenced to support the superiority claim over SOTA methods, leaving the magnitude and reliability of gains unverified in the provided summary.
minor comments (1)
- Ensure that upon publication the promised code and dataset release at the GitHub link is completed and includes the exact preprocessing and training configurations used for the reported results.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the validation of our claims. We address each major point below and commit to revisions that provide the requested isolation of contributions and quantitative support.
read point-by-point responses
-
Referee: [Experimental results] Experimental results section: The claim that the two-stream attention fusion and joint loss produce the reported gains is load-bearing but unsecured, as the manuscript simultaneously introduces both the method and the new datasets without ablation studies isolating the contribution of each proposed component (attention fusion, residual block, joint loss) from dataset-specific tuning, preprocessing, or implementation choices.
Authors: We agree that the absence of component-wise ablations leaves the attribution of gains partially unsecured, particularly given the introduction of new datasets. In the revised manuscript we will add a dedicated ablation section that systematically removes or replaces each element (position-encoded spectral self-attention, multikernel convolution with feature aggregation attention, residual attention fusion block, and joint loss) while keeping all other implementation and preprocessing choices fixed. Results will be reported on both datasets with the same training protocol, thereby isolating the contribution of the proposed modules from dataset-specific factors. revision: yes
-
Referee: [Abstract] Abstract and results: No quantitative metrics, error bars, ablation tables, or statistical tests are referenced to support the superiority claim over SOTA methods, leaving the magnitude and reliability of gains unverified in the provided summary.
Authors: The current abstract states only qualitative superiority. We will revise the abstract to report concrete metrics (overall accuracy, mean F1-score, and per-class improvements) together with the magnitude of gains over the strongest baseline. The results section will be expanded to include error bars from repeated runs with different random seeds, the requested ablation tables, and paired statistical tests (e.g., McNemar or Wilcoxon) to establish significance of the observed differences. revision: yes
Circularity Check
No circularity: purely empirical claims with no derivations or self-referential fitting
full rationale
The paper introduces a neural architecture (two-stream attention fusion, residual block, joint loss) and two new datasets, then reports classification accuracies against SOTA baselines. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. All claims rest on external experimental comparison rather than any reduction of outputs to inputs by construction. This matches the default expectation of a non-circular empirical ML paper.
Axiom & Free-Parameter Ledger
free parameters (3)
- attention head count and dimension
- joint loss weighting coefficients
- multikernel sizes in point convolution
axioms (2)
- domain assumption Training and test splits are i.i.d. samples from the same underlying distribution
- ad hoc to paper Attention mechanisms can meaningfully separate inter-class spectral similarity without additional regularization
Reference graph
Works this paper leans on
-
[1]
C. Wang, X. Li, and Y. Gu, “An adaptive 3D reconstruction method for asymmetric dual -angle multispectral stereo imaging syst em on UAV platform,” Sci. China Inf. Sci., vol. 67, no. 8, Art. no. 182305, 2024, doi: 10.1007/s11432-024-4056-8
-
[2]
C. Wang, Y. Gu, and X. Li, “A Robust Multispectral Point Cloud Generation Method Based on 3 -D Reconstruction from Multispectral Images,” IEEE Trans. Geosci. Remote Sens. , vol. 61, pp. 1 -12, Art. no. 5407612, 2023, doi: 10.1109/TGRS.2023.3326153
-
[3]
Multimodal fusion of UAV -based computer vision and plant water content dynamics for high -throughput soybean maturity classification,
Y. Li, T. Li, Y. Zhao, and X. Zhang, “Multimodal fusion of UAV -based computer vision and plant water content dynamics for high -throughput soybean maturity classification,” Crop and Environment, 2025
2025
-
[4]
Advancing soybean biomass estimation through multi-source UAV data fusion and machine learning algorithms,
H. Da, Y. Li, L. Xu, and X. Wang, “Advancing soybean biomass estimation through multi-source UAV data fusion and machine learning algorithms,” Smart Agric. Technol., vol. 10, p. 100778, 2025
2025
-
[5]
A high- resolution canopy height model of the earth,
N. Lang, W. Jetz , K. Schindler, and M. Wegner, “A high -resolution canopy height model of the Earth,” Nat Ecol Evol, vol. 7, pp. 1778-1789, 2023, doi: 10.1038/s41559-023-02206-6
-
[6]
Feasibility of underwater true color three -dimensional i maging using hyperspectral LiDAR,
Y. Wang, H. Pan, S. Qiu, and X. Liu, “Feasibility of underwater true color three -dimensional i maging using hyperspectral LiDAR,” Opt. Lasers Eng, vol. 194, p. 109158, 2025
2025
-
[7]
A Normalized Spatial– Spectral Supervoxel Segmentation Method for Multispectral Point Cloud Data,
L. Chen, L. Gu, Y. Gu, X. Li, X. Zhang, and B. Liu, “A Normalized Spatial-Spectral Supervoxel Segmentation Method for Multispectral Point Cloud Data,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1-11, Art. no. 5704311, 2023, doi: 10.1109/TGRS.2023.3313734
-
[8]
Masking Graph Cross - Convolution Network for Multispectral Point Cloud Classification,
Q. Wang, Y. Gu, X. Li, and L. Chen, “Masking Graph Cross - Convolution Network for Multispectral Point Cloud Classification,” IEEE Trans. Geosci. Remote Sens. , vol. 63, pp. 1 -15, Art. no. 5701815, 2025, doi: 10.1109/TGRS.2025.3545783
-
[9]
A feature selection method for multimodal multispectral LiDAR sensing,
Y. Han, D. Salido -Monzu, J. A. Butt, and K. Schindler, “A feature selection method for multimodal multispectral LiDAR sensing,” ISPRS J. Photogramm. Remote Sens., vol. 212, pp. 42-57, 2024
2024
-
[10]
Supercontinuum-based hyperspectral LiDAR for precision laser scanning,
P. Ray , D. Salido -Monzú , S. L. Camenzind, and M. Wegner, “Supercontinuum-based hyperspectral LiDAR for precision laser scanning,” Opt. Express, vol. 31, no. 20, pp. 33486-33499, 2023
2023
-
[11]
Y. Gu, C. Wang, and X. Li, “An Intensity -Independent Stereo Registration Method of Push-Broom Hyperspectral Scanner and LiDAR on UAV Platforms,” IEEE Trans. Geosci. Remote Sens. , vol. 60, pp. 1 - 14, Art. no. 5540014, 2022, doi: 10.1109/TGRS.2022.3211202
-
[12]
LPRNet: A Self -Supervised Registration Network for L iDAR and Photogrammetric Point Clouds,
C. Wang, Y. Gu, and X. Li, “LPRNet: A Self -Supervised Registration Network for L iDAR and Photogrammetric Point Clouds,” IEEE Trans. Geosci. Remote Sens. , vol. 63, pp. 1 -12, Art. no. 4404012, 2025, doi: 10.1109/TGRS.2025.3541639
-
[13]
A maximum entropy -based optimal neighbor selection for multispectral airborne LiDAR point cloud classification,
G. Jiang, W. Y. Yan, and D. D. Lichti, “A maximum entropy -based optimal neighbor selection for multispectral airborne LiDAR point cloud classification,” IEEE Trans. Geosci. Remote Sens. , vol. 61, pp. 1 -18, 2023
2023
-
[14]
Unsupervised occluded target detection based on spherical shell with multispectral point clouds,
L. Chen, Y. Gu, and X. Li, “Unsupervised Occluded Target Detection Based on Spherical Shell with Multispectral Point Clouds,” IEEE Trans. Geosci. Remote Sens. , vol. 63, pp. 1 -13, Art. no. 4413413, 2025, doi: 10.1109/TGRS.2025.3585524
-
[15]
Pillar-Voxel Fusion Network for 3D Object Detection in Airborne Hyperspectral Point Clouds,
Y. Jiang, Y. Gu, and X. Li, “Pillar-Voxel Fusion Network for 3D Object Detection in Airborne Hyperspectral Point Clouds,” Sci. China Inf. Sci. , vol. 69, no. 1, Art. no. 112301, 2026, doi: 10.1007/s11432-024-4458-0
-
[16]
Development of a multispectral fluorescence LiDAR for point cloud segmentation of plants,
K. Zheng, H. Lin, X. Hong, and Y. Wang, “Development of a multispectral fluorescence LiDAR for point cloud segmentation of plants,” Opt. Express, vol. 31, no. 11, pp. 18613-18629, 2023
2023
-
[17]
CapViT: Cross-Context Capsule Vision Transformers for Land Cover Classification with Airborne Multispectral LiDAR Data,
Y. Yu, et al, “CapViT: Cross-Context Capsule Vision Transformers for Land Cover Classification with Airborne Multispectral LiDAR Data,” Int. J. Appl. Earth Obs. Geoinf., vol. 111, p. 102837, 2022
2022
-
[18]
Spatial -Spectral Feature Fusion and Spectral Recon struction of Multispectral LiDAR Point Clouds by Attention Mechanism,
G. Zhou, H. Qi, S. Shi, and Y. Gu, “Spatial -Spectral Feature Fusion and Spectral Recon struction of Multispectral LiDAR Point Clouds by Attention Mechanism,” Remote Sens., vol. 17, no. 14, p. 2411, 2025
2025
-
[19]
Land-Cover Classification of Multispectral LiDAR Data Using CNN with Optimized Hyper-Parameters,
S. Pan, et al, “Land-Cover Classification of Multispectral LiDAR Data Using CNN with Optimized Hyper-Parameters,” ISPRS J. Photogramm. Remote Sens., vol. 166, pp. 241-254, 2020
2020
-
[20]
3D -UMamba: 3D U -Net with State Space Model for Semantic Segmentation of Multi -Source LiDAR Point Clouds,
D. Lu, L. Xu, J. Zhou, and Y. Li, “3D -UMamba: 3D U -Net with State Space Model for Semantic Segmentation of Multi -Source LiDAR Point Clouds,” Int. J. Appl. Earth Obs. Geoinf., vol. 136, p. 104401, 2025
2025
-
[21]
T. Liu, et al, “An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long -Tailed Multispectral Point Clouds,” Sci. China Inf. Sci. , vol. 68, no. 8, Art. no. 182302, 2025, doi: 10.1007/s11432-024-4324-6
-
[22]
Multispectral Point Cloud Classification Network Based on Multilateral Attention,
B. Hu, X. Li, and T. Liu, “Multispectral Point Cloud Classification Network Based on Multilateral Attention,” in Proc. 13th WHISPERS, 2023, pp. 1-5
2023
-
[23]
SCSQ -Net: A Shared Kernel Point Convolution Semantic Query Network for Weakly Supervised Classification of Multispec tral LiDAR Point Clouds,
K. Chen, H. Guan, Y. Yu, and X. Li, “SCSQ -Net: A Shared Kernel Point Convolution Semantic Query Network for Weakly Supervised Classification of Multispec tral LiDAR Point Clouds,” IEEE Trans. Geosci. Remote Sens., 2024
2024
-
[24]
Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation,
Y. Zheng, G. Wang, J. Liu, and H. Zhang, “Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation,” Adv. Neural Inf. Process. Syst., vol. 37, pp. 121827-121858, 2024
2024
-
[25]
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652-660
2017
-
[26]
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” Adv. Neural Inf. Process. Syst., vol. 30, 2017
2017
-
[27]
Beyond Single Receptive Field: A Receptive Field Fusion-and-Stratification Network for Airborne Laser Scanning Point Cloud Classification,
Y. Mao, K. Chen, W. Diao, and X. Li, “Beyond Single Receptive Field: A Receptive Field Fusion-and-Stratification Network for Airborne Laser Scanning Point Cloud Classification,” ISPRS J. Photogramm. Remote Sens., vol. 188, pp. 45-61, 2022
2022
-
[28]
Point Transformer,
H. Zhao, L. Jiang, J. Jia, and S. Wu, “Point Transformer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16259-16268
2021
-
[29]
Point Transformer V2: Grouped Vector Attention and Partition-Based Pooling,
X. Wu, Y. Lao, L. Jiang, and P. S. Wang, “Point Transformer V2: Grouped Vector Attention and Partition-Based Pooling,” Adv. Neural Inf. Process. Syst., vol. 35, pp. 33330-33342, 2022
2022
-
[30]
Point Transformer V3: Simpler Faster Stronger,
X. Wu, L. Jiang, P. S. Wang, and S. Wu, “Point Transformer V3: Simpler Faster Stronger,” in Proc. IE EE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 4840-4851
2024
-
[31]
PointConv: Deep Convolutional Networks on 3D Point Clouds,
W. Wu, Z. Qi, and L. Fuxin, “PointConv: Deep Convolutional Networks on 3D Point Clouds,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9621-9630
2019
-
[32]
RandLA-Net: Efficient Semantic Segmentation of Large -Scale Point Clouds,
Q. Hu, B. Yang, L. Xie, and S. Rosa, “RandLA-Net: Efficient Semantic Segmentation of Large -Scale Point Clouds,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11108-11117
2020
-
[33]
PointNeXt: Revisiting PointNet++ with Improved Train ing and Scaling Strategies,
G. Qian, Y. Li, H. Peng, and X. Li, “PointNeXt: Revisiting PointNet++ with Improved Train ing and Scaling Strategies,” Adv. Neural Inf. Process. Syst., vol. 35, pp. 23192-23204, 2022
2022
-
[34]
Unsupervised Domain Adaptation for Cross -Scene Multispectral Point Cloud Classification,
Q. Wang, M. Wang, J. Huang, and X. Li, “Unsupervised Domain Adaptation for Cross -Scene Multispectral Point Cloud Classification,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1-15, 2024
2024
-
[35]
Point Tree Transformer for Point Cloud Registration,
M. Wang, et al, “Point Tree Transformer for Point Cloud Registration,” IEEE Trans. Circuits Syst. Video Technol., 2025
2025
-
[36]
Point Cloud Mamba: Point Cloud Learning via State Space Model,
T. Zhang, H. Yuan, L. Qi, and Y. Gu, “Point Cloud Mamba: Point Cloud Learning via State Space Model,” in Proc. AAAI Conf. Artif. Intell., vol. 39, no. 10, pp. 10121-10130, 2025
2025
-
[37]
Quantity-Quality Enhanced Self-Training Network for Weakly Supervised Point Cloud Semantic Segmentation,
J. Deng, J. Lu, and T. Zhang, “Quantity-Quality Enhanced Self-Training Network for Weakly Supervised Point Cloud Semantic Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., 2025
2025
-
[38]
Multilateral Cascading Network for Semantic Segmentation of Large -Scale Outdoor Point Clouds,
H. Gong, H. Wang, and D. Wang, “Multilateral Cascading Network for Semantic Segmentation of Large -Scale Outdoor Point Clouds,” IEEE Geosci. Remote Sens. Lett., vol. 22, pp. 1-5, Art. no. 6501005, 2025, doi: 10.1109/LGRS.2025.3547913
-
[39]
GeoSegNet: Point Cloud Semantic Segmentation via Geometric Encoder -Decoder Modeling,
C. Chen, Y. Wang, H. Chen , and X. Li, “GeoSegNet: Point Cloud Semantic Segmentation via Geometric Encoder -Decoder Modeling,” The Visual Comput., vol. 40, no. 8, pp. 5107-5121, 2024
2024
-
[40]
Understanding Center Loss -Based Network for Image Retrieval with Few Tr aining Data,
P. Ghosh, and L. S. Davis, “Understanding Center Loss -Based Network for Image Retrieval with Few Tr aining Data,” in Proc. Eur. Conf. Comput. Vis. (ECCV) Workshops., 2018, pp. 0-0
2018
-
[41]
An End -to-End Framework for Joint Denoising and Classification of Hyperspectral Images,
X. Li, M. Ding, Y. Gu, and A. Pižurica, “An End -to-End Framework for Joint Denoising and Classification of Hyperspectral Images,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 3 4, no. 7, pp. 3269 -3283, Jul. 2023, doi: 10.1109/TNNLS.2023.3264587
-
[42]
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion,
S. Qiu, at al, “Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1757-1767
2021
-
[43]
CGA -Net: Category Guided Aggregation for Point Cloud Semantic Segmentation,
T. Lu, L. Wang, and G. Wu, “CGA -Net: Category Guided Aggregation for Point Cloud Semantic Segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11693-11702
2021
-
[44]
Human Vision -Based 3D Point Cloud Semantic Segmentation of Large-Scale Outdoor Scenes,
S. Yoo, et al , “Human Vision -Based 3D Point Cloud Semantic Segmentation of Large-Scale Outdoor Scenes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 6577-6586. 11 > REPLACE THIS LINE WITH YOUR MANUSCRIPT ID NUMBER (DOUBLE -CLICK HERE TO EDIT) < Xian Li (Member, IEEE) received the Ph.D. degree in instrument science and technology fro...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.