pith. sign in

arxiv: 2605.30380 · v2 · pith:M5JQHVQPnew · submitted 2026-05-27 · 💻 cs.CV

Lightweight SAR Ship Detection via Contrastive Distillation

Pith reviewed 2026-06-29 12:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords SAR ship detectionknowledge distillationcontrastive learninglightweight detectorsrelational geometryInfoNCE objectiveSSDD benchmarkHRSID benchmark
0
0 comments X

The pith

Contrastive distillation transfers relational geometry from teacher to student detectors for improved SAR ship detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SURGE, a framework that distills knowledge for SAR ship detection by matching geometric relationships among ship representations rather than just local features. It does this with a contrastive InfoNCE loss in a shared embedding space, creating an interface that works for two-stage, one-stage, and transformer detectors without changing their architectures. This addresses the limitation of lightweight models in capturing complex backscatter structures in SAR imagery. Experiments on SSDD and HRSID show the approach yields gains of up to 6.2 mAP and 8.0 AP75 over baseline students, and can exceed the teacher's performance. The method is presented as the first transformer-based knowledge distillation setup in the SAR ship detection domain.

Core claim

The SURGE framework transfers relational geometry from a powerful teacher detector to a compact student detector using a contrastive InfoNCE objective in a shared projection embedding space, providing a common region-level distillation interface that works across detector architectures without modification and delivers up to 6.2 mAP and 8.0 AP75 gains on the SSDD and HRSID benchmarks, sometimes surpassing the teacher.

What carries the argument

Contrastive InfoNCE objective in a shared projection embedding space that captures and transfers geometric relationships among object representations at the region level.

If this is right

  • Two-stage detectors achieve up to 6.2 mAP and 8.0 AP75 gains over the baseline student on SSDD and HRSID.
  • Student models can exceed the performance of the teacher detector.
  • The same framework applies to one-stage and transformer-based detectors without any architecture changes.
  • It provides the first transformer-based knowledge distillation approach for SAR ship detection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The relational focus could apply to other remote-sensing detection tasks where spatial structure between objects matters more than local appearance.
  • Deployment on edge hardware for real-time SAR monitoring becomes more feasible if the gains hold across additional datasets.
  • Combining the contrastive interface with model compression techniques might yield further efficiency without losing relational accuracy.

Load-bearing premise

That geometric relationships among object representations are neglected by standard feature or logit matching and can be transferred successfully from teacher to student via contrastive loss across different detector architectures.

What would settle it

If experiments on the SSDD or HRSID benchmarks show that adding the contrastive distillation produces no mAP or AP75 gains over the baseline student or fails to allow the student to surpass the teacher, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.30380 by Abhijit Mahalanobis, Banafsheh Saber Latibari, Surendar Devasundaram.

Figure 1
Figure 1. Figure 1: Overview of SURGE framework for SAR ship detection. Given an input SAR image 𝑥, a frozen teacher detector 𝑇 (·) and a trainable student detector 𝑆 (·) extract feature maps using a backbone and FPN. Teacher predictions are decoded into candidate regions R = {𝑟𝑖 } (region proposals for two-stage detectors, dense predictions for one-stage detectors or decoder-predicted bounding boxes in DETR), followed by top… view at source ↗
Figure 2
Figure 2. Figure 2: Head network designs for different object detec￾tion paradigms. Top: One-stage detectors directly predict class scores and bounding boxes from dense backbone fea￾ture maps. Middle: Two-stage detectors generate region proposals via an RPN, followed by per-proposal classification and localization heads. Bottom: Detection transformers process CNN features with a transformer encoder– decoder, where object quer… view at source ↗
read the original abstract

Deep convolutional and transformer-based detectors achieve strong performance for SAR ship detection but are often computationally prohibitive for real-time or onboard deployment. Lightweight models offer improved efficiency yet struggle to capture the complex structural relationships inherent in SAR backscatter. Most existing SAR knowledge-distillation approaches rely on feature or logit matching, which enforces localized activation similarity while neglecting the geometric relationships among object representations. We propose a Structured Unified Relational knowledGE distillation framework for SAR Ship detection (SURGE) that transfers relational geometry from a powerful teacher detector to a compact student detector using a contrastive InfoNCE objective in a shared projection embedding space. To the best of our knowledge, this work presents the first transformer-based SAR ship detector knowledge distillation framework in SAR domain. The framework is architecture-agnostic in the sense that it provides a common region-level distillation interface for two-stage, one-stage and transformer-based detectors without modifying their deployed architectures. Experiments on the SSDD and HRSID benchmarks demonstrate that the proposed method yields substantial improvements for two-stage detectors, achieving up to 6.2 mAP and 8.0 AP75 gains over baseline student and even surpassing teacher performance

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SURGE, a Structured Unified Relational knowledGE distillation framework for SAR ship detection that employs a contrastive InfoNCE objective in a shared projection space to transfer relational geometry from a teacher detector to a lightweight student. It positions the method as architecture-agnostic (supporting two-stage, one-stage, and transformer detectors without architectural changes) and the first transformer-based KD approach in the SAR domain. Experiments on SSDD and HRSID are claimed to yield up to 6.2 mAP and 8.0 AP75 gains over the baseline student, sometimes surpassing the teacher.

Significance. If the reported gains prove robust and attributable to the contrastive relational transfer, the framework could provide a practical, general-purpose distillation interface for deploying efficient SAR detectors while preserving geometric object relationships that standard feature or logit matching overlooks.

major comments (2)
  1. [Experiments] The central performance claims (up to 6.2 mAP / 8.0 AP75 gains and student > teacher) rest on the premise that InfoNCE contrastive distillation transfers geometric relationships better than standard KD; however, no controlled ablation holding optimizer, schedule, augmentations, and training duration fixed while swapping only the distillation objective is reported, leaving open the possibility that gains arise from training differences rather than relational transfer.
  2. [Abstract and Experiments] The abstract and experimental description supply no dataset splits, baseline implementations, statistical tests, error bars, or variance across runs, which is load-bearing for assessing whether the reported improvements on SSDD/HRSID are reliable or reproducible.
minor comments (2)
  1. [Abstract] The claim of being 'architecture-agnostic' and providing a 'common region-level distillation interface' for one-stage and transformer detectors is stated but not demonstrated with results beyond two-stage detectors.
  2. [Method] Notation for the shared projection embedding space and the precise form of the InfoNCE loss (including temperature and negative sampling strategy) should be formalized with equations for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the experimental evidence.

read point-by-point responses
  1. Referee: [Experiments] The central performance claims (up to 6.2 mAP / 8.0 AP75 gains and student > teacher) rest on the premise that InfoNCE contrastive distillation transfers geometric relationships better than standard KD; however, no controlled ablation holding optimizer, schedule, augmentations, and training duration fixed while swapping only the distillation objective is reported, leaving open the possibility that gains arise from training differences rather than relational transfer.

    Authors: We agree that a controlled ablation isolating the distillation objective is necessary to attribute gains specifically to the contrastive relational transfer rather than other training factors. In the revised manuscript we will add such an ablation: all training hyperparameters (optimizer, schedule, augmentations, epochs) will be held fixed while only the distillation loss is swapped (standard feature/logit KD versus the proposed InfoNCE relational objective). This will directly test whether the relational geometry transfer is the source of the observed improvements. revision: yes

  2. Referee: [Abstract and Experiments] The abstract and experimental description supply no dataset splits, baseline implementations, statistical tests, error bars, or variance across runs, which is load-bearing for assessing whether the reported improvements on SSDD/HRSID are reliable or reproducible.

    Authors: We acknowledge that the current manuscript lacks these details. The revised version will explicitly state the train/test splits for SSDD and HRSID, provide implementation details for all baselines (including any re-implementation choices), and report mean performance with standard deviation across at least three independent runs together with statistical significance tests (e.g., paired t-test) to quantify reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical distillation framework with benchmark validation

full rationale

The paper presents an empirical knowledge-distillation method (SURGE) using a contrastive InfoNCE loss for SAR ship detection and reports performance gains on SSDD/HRSID benchmarks. No derivation chain, uniqueness theorem, fitted-parameter prediction, or self-citation load-bearing step is present; the central claims rest on experimental results rather than any reduction of outputs to inputs by construction. The architecture-agnostic framing and 'first transformer-based' claim are statements of scope, not circular logic.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no free parameters, invented entities, or detailed axioms are extractable beyond the domain assumption that contrastive objectives capture relational geometry.

axioms (1)
  • domain assumption Contrastive InfoNCE objective in shared embedding space transfers relational geometry effectively
    Core mechanism stated in abstract for the distillation framework.

pith-pipeline@v0.9.1-grok · 5739 in / 1051 out tokens · 37157 ms · 2026-06-29T12:40:35.768300+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 2 canonical work pages

  1. [1]

    Carion et al. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229

  2. [2]

    Chen et al . 2017. Learning efficient object detection models with knowledge distillation.Advances in neural information processing systems30 (2017)

  3. [3]

    Feng et al. 2023. OEGR-DETR: A novel detection transformer based on orientation enhancement and group relations for SAR object detection. Remote Sensing16, 1 (2023), 106

  4. [4]

    Gao et al . 2022. RetinaNet-based compact polarization SAR ship detection.IEEE Journal on Miniaturization for Air and Space Systems 3, 3 (2022), 146–152

  5. [5]

    Girshick et al. 2015. Fast r-cnn. InProceedings of the IEEE international conference on computer vision. 1440–1448

  6. [6]

    Han et al. 2024. Improving sar automatic target recognition via trusted knowledge distillation from simulated data.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–14

  7. [7]

    Lang et al. 2025. Recent Advances in Deep Learning Based SAR Image Targets Detection and Recognition.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing(2025)

  8. [8]

    Lee et al. 2021. Privileged knowledge distillation for SAR building extraction. In2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 3014–3017

  9. [9]

    Li et al. 2017. Ship detection in SAR images based on an improved faster R-CNN. In2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA). IEEE, 1–6

  10. [10]

    Lin et al. 2017. Focal Loss for Dense Object Detection. InProceedings of the IEEE International Conference on Computer Vision (ICCV)

  11. [11]

    Min et al. 2019. A gradually distilled CNN for SAR target recognition. IEEE access7 (2019), 42190–42200

  12. [12]

    Moreira et al . 2013. A tutorial on synthetic aperture radar.IEEE Geoscience and Remote Sensing Magazine1, 1 (2013), 6–43. doi:10.1109/ MGRS.2013.2248301

  13. [13]

    Park et al. 2019. Relational knowledge distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3967–3976

  14. [14]

    Tian et al. 2019. Contrastive representation distillation.arXiv preprint arXiv:1910.10699(2019)

  15. [15]

    Tung et al . 2019. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision. 1365–1374

  16. [16]

    Wang et al. 2021. Boosting lightweight CNNs through network prun- ing and knowledge distillation for SAR target recognition.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing14 (2021), 8386–8397

  17. [17]

    Wang et al. 2021. SAR Target Classification Based on Knowledge Distillation. In2021 CIE International Conference on Radar (Radar). IEEE, 2095–2098

  18. [18]

    Wang et al. 2024. M-FSDistill: A feature map knowledge distillation algorithm for SAR ship detection.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing(2024)

  19. [19]

    Wu et al. 2018. Unsupervised feature learning via non-parametric instance discrimination. InProceedings of the IEEE conference on com- puter vision and pattern recognition. 3733–3742

  20. [20]

    Yang et al. 2022. Masked generative distillation. InEuropean Conference on Computer Vision. Springer, 53–69

  21. [21]

    Yang et al. 2022. SAR target recognition based on inverted residual and knowledge distillation. InInternational Conference on Advanced Algorithms and Neural Networks (AANN 2022), Vol. 12285. SPIE, 210– 217

  22. [22]

    Yin et al. 2025. Ship detection transformer in SAR images based on key scattering points feature aggregation and context feature refine- ment.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing(2025). Lightweight SAR Ship Detection via Contrastive Distillation

  23. [23]

    Yu et al. 2023. Multilevel adaptive knowledge distillation network for incremental SAR target recognition.IEEE Geoscience and Remote Sensing Letters20 (2023), 1–5

  24. [24]

    Zhu et al. 2021. Deep learning meets SAR: Concepts, models, pitfalls, and perspectives.IEEE Geoscience and Remote Sensing Magazine9, 4 (2021), 143–172