SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification

Hussain Al-Ahmad; Mina Al-Saad; Mohammed Q. Alkhatib; M. Sami Zitouni; Nour Aburaed

arxiv: 2402.17672 · v2 · pith:CERGS5BVnew · submitted 2024-02-27 · 💻 cs.CV · eess.IV

SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification

Mohammed Q. Alkhatib , M. Sami Zitouni , Mina Al-Saad , Nour Aburaed , Hussain Al-Ahmad This is my paper

Pith reviewed 2026-05-24 03:37 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords PolSAR image classificationcomplex-valued CNNfeature fusiondeep learningland cover classificationremote sensingsynthetic aperture radar

0 comments

The pith

A three-branch complex-valued CNN fuses shallow and deep features to improve PolSAR land cover classification accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SDF2Net, a network built from three parallel branches of complex-valued convolutional layers that combine features extracted at shallow, middle, and deep stages for classifying polarimetric synthetic aperture radar images. Experiments on the Flevoland and San Francisco AIRSAR scenes plus the ESAR Oberpfaffenhofen scene show accuracy gains of 0.5 to 1.3 percent over prior methods, with 96.01 percent overall accuracy retained when only one percent of pixels are used for training. A sympathetic reader would care because PolSAR data carry phase and amplitude information that is costly to label at scale, so an architecture that extracts usable features from small labeled sets could support more frequent land-cover mapping from radar.

Core claim

The central claim is that the Shallow to Deep Feature Fusion Network (SDF2Net), by integrating feature maps from three branches of a complex-valued CNN at progressively deeper stages, produces representations that yield higher overall classification accuracy on PolSAR land-cover tasks than the state-of-the-art methods tested on the AIRSAR Flevoland, AIRSAR San Francisco, and ESAR Oberpfaffenhofen datasets.

What carries the argument

The three-branch shallow-to-deep feature fusion architecture applied to complex-valued PolSAR inputs.

If this is right

Overall accuracy rises by 1.3 percent and 0.8 percent on the two AIRSAR datasets.
Overall accuracy rises by 0.5 percent on the ESAR dataset.
The model reaches 96.01 percent accuracy on Flevoland data with only a 1 percent sampling ratio.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same shallow-to-deep fusion pattern could be tested on other complex-valued modalities such as full-polarimetric radar or interferometric data.
If the gains hold under stricter cross-validation, the method would lower the labeled-data requirement for operational PolSAR mapping.
Explicit connections between early and late layers may help capture both fine texture and broader context that single-stream complex CNNs miss.

Load-bearing premise

The three-branch fusion architecture generates feature representations that are superior to those from standard complex-valued CNN baselines on the chosen test sets.

What would settle it

A side-by-side accuracy comparison of SDF2Net against the same baselines on a new PolSAR scene, using the identical one-percent sampling ratio, would show whether the reported gains persist.

Figures

Figures reproduced from arXiv: 2402.17672 by Hussain Al-Ahmad, Mina Al-Saad, Mohammed Q. Alkhatib, M. Sami Zitouni, Nour Aburaed.

**Figure 1.** Figure 1: Illustration of different types of convolution on images with multiple channels. (a) 2D Convolution; (b) 3D Convolution; (c) Complex Valued 3D [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Squeeze and Excitation Block. suppressing irrelevant features. The excitation function can be expressed as s = Fex(z,W) = σ(g(z, W)) = σ(W2ReLU(W1z)) (4) Where σ represents the Sigmoid activation function, W1 ∈ R C r ×C and W2 ∈ R C× C r denote the two fully connected layers. Here, W1 functions as the dimensionality reduction layer with a reduction ratio of r, while W2 serves as the proportionally identica… view at source ↗

**Figure 3.** Figure 3: Block diagram of the proposed SDF2Net. III. METHODOLOGY Within this section, a comprehensive description of the SDF2Net architecture is provided. Initially, the processing of polarimetric data from PolSAR images is showcased, followed by an exposition of the SDF2Net network architecture. A. PolSAR Data Preprocessing The construction of a polarimetric feature vector serves as a fundamental step in PolSAR im… view at source ↗

**Figure 4.** Figure 4: Flevoland PolSAR data (left) Pauli RGB composite (right) Reference [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: San Francisco PolSAR data (left) Pauli RGB composite (right) [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Oberpfaffenhofen PolSAR data (left) Pauli RGB composite (right) [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: The overall accuracy of the proposed model employing varying [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Classification results of the Flevoland dataset. (a) PauliRGB; (b) Reference Class Map; (c) SVM; (d) 2D-CVNN; (e) Wavelet CNN; (f) CV-CNN-SE; [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Classification results of the San Francisco dataset. (a) PauliRGB; (b) Reference Class Map; (c) SVM; (d) 2D-CVNN; (e) Wavelet CNN; (f) CV-CNN [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Classification results of the Oberpfaffenhofen dataset. (a) PauliRGB; (b) Reference Class Map; (c) SVM; (d) 2D-CVNN; (e) Wavelet CNN; (f) [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Urban Image: (a) classification map resulted from the proposed model; (b) classification map after median filtering; (c) reference data classification [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Classification accuracy at different percentages of training data (a) Flevoland; (b) San Francisco; (c) Oberpfaffenhofen. [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

read the original abstract

Polarimetric synthetic aperture radar (PolSAR) images encompass valuable information that can facilitate extensive land cover interpretation and generate diverse output products. Extracting meaningful features from PolSAR data poses challenges distinct from those encountered in optical imagery. Deep learning (DL) methods offer effective solutions for overcoming these challenges in PolSAR feature extraction. Convolutional neural networks (CNNs) play a crucial role in capturing PolSAR image characteristics by leveraging kernel capabilities to consider local information and the complex-valued nature of PolSAR data. In this study, a novel three-branch fusion of complex-valued CNN, named the Shallow to Deep Feature Fusion Network (SDF2Net), is proposed for PolSAR image classification. To validate the performance of the proposed method, classification results are compared against multiple state-of-the-art approaches using the airborne synthetic aperture radar (AIRSAR) datasets of Flevoland and San Francisco, as well as the ESAR Oberpfaffenhofen dataset. The results indicate that the proposed approach demonstrates improvements in overallaccuracy, with a 1.3% and 0.8% enhancement for the AIRSAR datasets and a 0.5% improvement for the ESAR dataset. Analyses conducted on the Flevoland data underscore the effectiveness of the SDF2Net model, revealing a promising overall accuracy of 96.01% even with only a 1% sampling ratio.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SDF2Net is a three-branch complex-valued CNN fusion for PolSAR that reports 0.5-1.3% gains on standard benchmarks, but those numbers are single-run point estimates with no variance, ablations, or significance tests.

read the letter

The paper introduces SDF2Net, a named three-branch architecture that fuses shallow-to-deep features inside a complex-valued CNN for PolSAR land-cover classification. That specific design and its reported numbers on the listed scenes are new relative to the cited prior work. It evaluates on three public datasets (AIRSAR Flevoland, AIRSAR San Francisco, ESAR Oberpfaffenhofen) and states concrete overall-accuracy improvements plus a 96.01% result at 1% sampling on Flevoland. Working directly with complex-valued inputs is the right move for this data type, and the comparisons are against multiple existing methods rather than a single weak baseline. The empirical focus on low-sample regimes is also practical for remote-sensing settings where labels are expensive. The central weakness is that the claimed deltas rest on single reported figures with no error bars, no repeated trials, and no statistical tests. At 1% sampling the label noise and initialization effects are known to be large, so a 1% gap can easily be within run-to-run fluctuation. The abstract gives no ablation that isolates the fusion component, and training protocol details are absent. Without those, it is difficult to judge whether the architecture itself is responsible for the lift. The work is a straightforward incremental architecture paper aimed at the PolSAR classification community. Readers who need another data point on complex CNN variants for these scenes may find the numbers worth checking once the full methods and any stability experiments are available. It is coherent on its own terms and engages the existing literature, so it deserves a serious referee to verify the experimental protocol and see whether the gains survive proper statistical scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes SDF2Net, a three-branch shallow-to-deep feature fusion network using complex-valued CNNs for PolSAR image classification. It reports overall accuracy improvements of 1.3% and 0.8% on the two AIRSAR datasets and 0.5% on the ESAR Oberpfaffenhofen dataset relative to prior SOTA methods, including a 96.01% accuracy at 1% sampling ratio on Flevoland.

Significance. If the reported gains prove robust under repeated trials and proper statistical controls, the fusion architecture could offer a practical advance for handling complex-valued PolSAR inputs in remote-sensing classification tasks. The use of standard public benchmarks and direct SOTA comparisons is a positive aspect of the evaluation design.

major comments (2)

[Abstract and §5] Abstract and §5 (Experiments): the central performance claim rests on single-run point estimates (e.g., the 1.3% gain on AIRSAR Flevoland and 96.01% at 1% sampling) without reported standard deviations, multiple random seeds, or any statistical significance test against the complex-valued CNN baselines. At 1% sampling, label noise and initialization sensitivity are known to be high, so the observed deltas cannot be distinguished from experimental fluctuation on the basis of the presented evidence.
[§4 and §5] §4 (Proposed Method) and §5: no ablation isolating the shallow-to-deep fusion component is provided, so it is impossible to determine whether the modest accuracy deltas arise from the three-branch architecture itself or from other unstated differences in training protocol, data augmentation, or hyper-parameter choices.

minor comments (2)

[§3] Figure captions and §3: the complex-valued convolution and fusion operations would benefit from explicit equations rather than prose descriptions alone.
[§5] Table captions in §5: clarify whether the listed baselines are re-implemented with the same training protocol or taken from the original papers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Experiments): the central performance claim rests on single-run point estimates (e.g., the 1.3% gain on AIRSAR Flevoland and 96.01% at 1% sampling) without reported standard deviations, multiple random seeds, or any statistical significance test against the complex-valued CNN baselines. At 1% sampling, label noise and initialization sensitivity are known to be high, so the observed deltas cannot be distinguished from experimental fluctuation on the basis of the presented evidence.

Authors: We acknowledge that the presented results rely on single-run point estimates without standard deviations or statistical tests. While single-run reporting is prevalent in PolSAR classification literature, the concern about robustness at low sampling ratios is valid. In the revised manuscript we will rerun all experiments with multiple random seeds, report mean and standard deviation values, and add paired statistical significance tests against the complex-valued baselines. revision: yes
Referee: [§4 and §5] §4 (Proposed Method) and §5: no ablation isolating the shallow-to-deep fusion component is provided, so it is impossible to determine whether the modest accuracy deltas arise from the three-branch architecture itself or from other unstated differences in training protocol, data augmentation, or hyper-parameter choices.

Authors: We agree that an ablation isolating the shallow-to-deep fusion is necessary to attribute performance gains specifically to the proposed architecture. The revised manuscript will include an ablation study that compares the full three-branch SDF2Net against controlled variants (single-branch and late-fusion baselines) while holding training protocol, augmentation, and hyperparameters fixed. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical architecture comparison on held-out test pixels

full rationale

The paper introduces SDF2Net, a three-branch complex-valued CNN fusion architecture, and evaluates it via overall accuracy on standard PolSAR benchmark scenes (Flevoland, San Francisco, Oberpfaffenhofen) at fixed sampling ratios. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted hyperparameters, self-citations, or renamed inputs; reported deltas (0.5–1.3 %) are direct empirical measurements against prior methods on the same held-out pixels. The central claim therefore rests on experimental comparison rather than any self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the network itself presumably contains standard CNN hyperparameters (learning rate, kernel sizes, branch depths) but none are enumerated.

pith-pipeline@v0.9.0 · 5796 in / 1215 out tokens · 23306 ms · 2026-05-24T03:37:51.118543+00:00 · methodology

SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)