SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification
Pith reviewed 2026-05-24 03:37 UTC · model grok-4.3
The pith
A three-branch complex-valued CNN fuses shallow and deep features to improve PolSAR land cover classification accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Shallow to Deep Feature Fusion Network (SDF2Net), by integrating feature maps from three branches of a complex-valued CNN at progressively deeper stages, produces representations that yield higher overall classification accuracy on PolSAR land-cover tasks than the state-of-the-art methods tested on the AIRSAR Flevoland, AIRSAR San Francisco, and ESAR Oberpfaffenhofen datasets.
What carries the argument
The three-branch shallow-to-deep feature fusion architecture applied to complex-valued PolSAR inputs.
If this is right
- Overall accuracy rises by 1.3 percent and 0.8 percent on the two AIRSAR datasets.
- Overall accuracy rises by 0.5 percent on the ESAR dataset.
- The model reaches 96.01 percent accuracy on Flevoland data with only a 1 percent sampling ratio.
Where Pith is reading between the lines
- The same shallow-to-deep fusion pattern could be tested on other complex-valued modalities such as full-polarimetric radar or interferometric data.
- If the gains hold under stricter cross-validation, the method would lower the labeled-data requirement for operational PolSAR mapping.
- Explicit connections between early and late layers may help capture both fine texture and broader context that single-stream complex CNNs miss.
Load-bearing premise
The three-branch fusion architecture generates feature representations that are superior to those from standard complex-valued CNN baselines on the chosen test sets.
What would settle it
A side-by-side accuracy comparison of SDF2Net against the same baselines on a new PolSAR scene, using the identical one-percent sampling ratio, would show whether the reported gains persist.
Figures
read the original abstract
Polarimetric synthetic aperture radar (PolSAR) images encompass valuable information that can facilitate extensive land cover interpretation and generate diverse output products. Extracting meaningful features from PolSAR data poses challenges distinct from those encountered in optical imagery. Deep learning (DL) methods offer effective solutions for overcoming these challenges in PolSAR feature extraction. Convolutional neural networks (CNNs) play a crucial role in capturing PolSAR image characteristics by leveraging kernel capabilities to consider local information and the complex-valued nature of PolSAR data. In this study, a novel three-branch fusion of complex-valued CNN, named the Shallow to Deep Feature Fusion Network (SDF2Net), is proposed for PolSAR image classification. To validate the performance of the proposed method, classification results are compared against multiple state-of-the-art approaches using the airborne synthetic aperture radar (AIRSAR) datasets of Flevoland and San Francisco, as well as the ESAR Oberpfaffenhofen dataset. The results indicate that the proposed approach demonstrates improvements in overallaccuracy, with a 1.3% and 0.8% enhancement for the AIRSAR datasets and a 0.5% improvement for the ESAR dataset. Analyses conducted on the Flevoland data underscore the effectiveness of the SDF2Net model, revealing a promising overall accuracy of 96.01% even with only a 1% sampling ratio.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SDF2Net, a three-branch shallow-to-deep feature fusion network using complex-valued CNNs for PolSAR image classification. It reports overall accuracy improvements of 1.3% and 0.8% on the two AIRSAR datasets and 0.5% on the ESAR Oberpfaffenhofen dataset relative to prior SOTA methods, including a 96.01% accuracy at 1% sampling ratio on Flevoland.
Significance. If the reported gains prove robust under repeated trials and proper statistical controls, the fusion architecture could offer a practical advance for handling complex-valued PolSAR inputs in remote-sensing classification tasks. The use of standard public benchmarks and direct SOTA comparisons is a positive aspect of the evaluation design.
major comments (2)
- [Abstract and §5] Abstract and §5 (Experiments): the central performance claim rests on single-run point estimates (e.g., the 1.3% gain on AIRSAR Flevoland and 96.01% at 1% sampling) without reported standard deviations, multiple random seeds, or any statistical significance test against the complex-valued CNN baselines. At 1% sampling, label noise and initialization sensitivity are known to be high, so the observed deltas cannot be distinguished from experimental fluctuation on the basis of the presented evidence.
- [§4 and §5] §4 (Proposed Method) and §5: no ablation isolating the shallow-to-deep fusion component is provided, so it is impossible to determine whether the modest accuracy deltas arise from the three-branch architecture itself or from other unstated differences in training protocol, data augmentation, or hyper-parameter choices.
minor comments (2)
- [§3] Figure captions and §3: the complex-valued convolution and fusion operations would benefit from explicit equations rather than prose descriptions alone.
- [§5] Table captions in §5: clarify whether the listed baselines are re-implemented with the same training protocol or taken from the original papers.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract and §5] Abstract and §5 (Experiments): the central performance claim rests on single-run point estimates (e.g., the 1.3% gain on AIRSAR Flevoland and 96.01% at 1% sampling) without reported standard deviations, multiple random seeds, or any statistical significance test against the complex-valued CNN baselines. At 1% sampling, label noise and initialization sensitivity are known to be high, so the observed deltas cannot be distinguished from experimental fluctuation on the basis of the presented evidence.
Authors: We acknowledge that the presented results rely on single-run point estimates without standard deviations or statistical tests. While single-run reporting is prevalent in PolSAR classification literature, the concern about robustness at low sampling ratios is valid. In the revised manuscript we will rerun all experiments with multiple random seeds, report mean and standard deviation values, and add paired statistical significance tests against the complex-valued baselines. revision: yes
-
Referee: [§4 and §5] §4 (Proposed Method) and §5: no ablation isolating the shallow-to-deep fusion component is provided, so it is impossible to determine whether the modest accuracy deltas arise from the three-branch architecture itself or from other unstated differences in training protocol, data augmentation, or hyper-parameter choices.
Authors: We agree that an ablation isolating the shallow-to-deep fusion is necessary to attribute performance gains specifically to the proposed architecture. The revised manuscript will include an ablation study that compares the full three-branch SDF2Net against controlled variants (single-branch and late-fusion baselines) while holding training protocol, augmentation, and hyperparameters fixed. revision: yes
Circularity Check
No circularity: purely empirical architecture comparison on held-out test pixels
full rationale
The paper introduces SDF2Net, a three-branch complex-valued CNN fusion architecture, and evaluates it via overall accuracy on standard PolSAR benchmark scenes (Flevoland, San Francisco, Oberpfaffenhofen) at fixed sampling ratios. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted hyperparameters, self-citations, or renamed inputs; reported deltas (0.5–1.3 %) are direct empirical measurements against prior methods on the same held-out pixels. The central claim therefore rests on experimental comparison rather than any self-referential loop.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.