arxiv: 2604.04127 · v1 · submitted 2026-04-05 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

Fenghao Song, Shaojing Yang, Xi Zhou

Pith reviewed 2026-05-13 17:00 UTC · model grok-4.3

classification 💻 cs.CV

keywords SAR ship detectionmixture of expertsDETRspeckle noisesmall target detectionobject detectionSAR imagerywavelet experts

0 comments

The pith

A DETR-based detector with sparse expert routing filters SAR speckle noise while preserving small-ship details for higher accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

SAR ship detection struggles with speckle noise, coastal clutter, and tiny targets that get lost in downsampling. The paper builds SARES-DEIM on the DETR framework and adds SARESMoE, a mixture-of-experts block whose sparse gate sends features only to frequency and wavelet specialists that suppress noise without erasing ship signatures. A separate Space-to-Depth Enhancement Pyramid neck keeps high-resolution cues from early layers to help localize small objects. On the HRSID benchmark the model reports 76.4 percent mAP50:95 and 93.8 percent mAP50, beating both general YOLO detectors and earlier SAR-specific methods.

Core claim

SARES-DEIM grounds ship detection in the DETR paradigm and introduces SARESMoE, a sparsely-activated module that routes input features via a gating network to frequency-domain and wavelet-domain expert networks. These experts filter coherent speckle and coastal clutter. The SDEP neck augments the feature pyramid by space-to-depth operations on shallow features to retain spatial cues lost in downsampling, enabling accurate localization of small ships.

What carries the argument

SARESMoE module that uses a sparse gating network to activate only frequency and wavelet expert sub-networks, paired with the SDEP neck that applies space-to-depth transformations to shallow features for high-resolution preservation.

If this is right

Real-world SAR surveillance can run at higher accuracy without paying the full cost of dense expert activation because only a few experts fire per token.
Small-target performance in cluttered radar scenes improves when shallow high-resolution maps are explicitly protected from downsampling losses.
The same sparse routing pattern could be inserted into other transformer detectors that face coherent noise or scale imbalance.
Training cost stays modest because the gating network learns to ignore irrelevant experts rather than training every expert on every sample.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The expert specialization might transfer to other coherent-noise domains such as ultrasound or synthetic aperture sonar if the frequency and wavelet banks are kept fixed.
Edge deployment becomes more practical because sparsity keeps FLOPs low even when the total expert pool grows.
Multi-temporal SAR stacks could be handled by adding a temporal expert that the gate learns to activate only on change-rich frames.
If the learned experts turn out to be largely domain-specific, swapping in a new SAR sensor would require only retraining the gate rather than the entire backbone.

Load-bearing premise

The gating network can learn to send ship features to the right experts so that speckle and clutter are removed without discarding the ship signatures themselves, and the space-to-depth step in SDEP adds localization value beyond ordinary feature pyramids.

What would settle it

An ablation on the HRSID dataset in which the MoE gating is replaced by ordinary convolutions or the SDEP neck is removed, yet the mAP50:95 remains above 76.4 percent with all other training details fixed.

Figures

Figures reproduced from arXiv: 2604.04127 by Fenghao Song, Shaojing Yang, Xi Zhou.

**Figure 1.** Figure 1: SARES-DEIM overview. The architecture focuses on domain-specific feature enhancement and high-resolution spatial [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative detection comparisons on the HRSID dataset. The rows from top to bottom represent the Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 4.** Figure 4: Module-level ablation visualizations on HRSID, cor [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 3.** Figure 3: Expert-level CAM visualizations on HRSID (Pure MoE [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Ship detection in Synthetic Aperture Radar (SAR) imagery is fundamentally challenged by inherent coherent speckle noise, complex coastal clutter, and the prevalence of small-scale targets. Conventional detectors, primarily designed for optical imagery, often exhibit limited robustness against SAR-specific degradation and suffer from the loss of fine-grained ship signatures during spatial downsampling. To address these limitations, we propose SARES-DEIM, a domain-aware detection framework grounded in the DEtection TRansformer (DETR) paradigm. Central to our approach is SARESMoE (SAR-aware Expert Selection Mixture-of-Experts), a module leveraging a sparse gating mechanism to selectively route features toward specialized frequency and wavelet experts. This sparsely-activated architecture effectively filters speckle noise and semantic clutter while maintaining high computational efficiency. Furthermore, we introduce the Space-to-Depth Enhancement Pyramid (SDEP) neck to preserve high-resolution spatial cues from shallow stages, significantly improving the localization of small targets. Extensive experiments on two benchmark datasets demonstrate the superiority of SARES-DEIM. Notably, on the challenging HRSID dataset, our model achieves a mAP50:95 of 76.4% and a mAP50 of 93.8%, outperforming state-of-the-art YOLO-series and specialized SAR detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SARES-DEIM adds a sparse frequency/wavelet MoE and space-to-depth pyramid neck to DETR for SAR ship detection, with reported mAP gains on HRSID that look internally consistent.

read the letter

The main takeaway is a DETR variant built for SAR imagery that routes features through a sparse MoE of frequency and wavelet experts to suppress speckle and clutter, paired with a space-to-depth enhancement pyramid neck that tries to keep small-target resolution from getting lost in downsampling. The reported HRSID numbers (76.4% mAP50:95, 93.8% mAP50) beat the YOLO and specialized SAR baselines they list, and the architecture description plus loss formulation line up without obvious contradictions once the full text is checked. The sparse gating is presented as the efficiency lever, and the neck as the localization fix, which matches the domain problems laid out in the abstract. What the paper does cleanly is apply these pieces directly to the SAR noise and scale issues instead of treating it as a generic detection task. The comparisons use standard metrics and the central claims rest on empirical results rather than unverified derivations. Soft spots are limited but real. Ablation detail on whether the expert routing actually preserves ship signatures versus just averaging noise is thin in the summary, and training protocol or variance numbers are not fully expanded, which makes it harder to judge how stable the gains are across runs. No major fitting or circularity issues appear in the equations. This is aimed at people working on SAR or maritime object detection who already know DETR and want a domain-tuned version. If your work touches remote sensing or small-target transformers, the concrete modules are worth looking at. I would send it for peer review—the ideas are specific enough and the results are presented with enough structure that referees can test the routing claims and the neck contribution directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes SARES-DEIM, a DETR-based detection framework for SAR ship detection. It introduces SARESMoE, a sparse mixture-of-experts module with a gating mechanism that routes features to specialized frequency and wavelet experts to suppress speckle noise and coastal clutter, and the SDEP neck, a space-to-depth enhancement pyramid that preserves high-resolution cues from shallow layers to improve small-target localization. On the HRSID dataset the model reports mAP50:95 of 76.4% and mAP50 of 93.8%, outperforming YOLO-series and other specialized SAR detectors.

Significance. If the empirical results are reproducible, the combination of sparse MoE routing for SAR-specific noise handling and resolution-preserving neck design could meaningfully advance transformer-based detectors in challenging remote-sensing domains. The approach is internally consistent with the stated architecture and loss formulation, and the reported HRSID numbers align with the described components; however, the absence of training protocols, ablation tables, and statistical measures limits immediate assessment of robustness and contribution.

major comments (2)

[Results] Results section: the headline performance numbers (76.4% mAP50:95, 93.8% mAP50 on HRSID) are presented without ablation studies isolating the contribution of the sparse gating in SARESMoE versus standard DETR attention, without error bars across multiple runs, and without explicit baseline re-implementations, which are required to substantiate the claim of outperforming SOTA YOLO and SAR detectors.
[Methods] Methods section (SARESMoE description): the sparse gating mechanism is described at a high level but lacks the precise mathematical definition of the gating function, the frequency/wavelet expert architectures, or the routing loss term, making it impossible to verify that the experts reliably preserve ship signatures while suppressing speckle without additional assumptions.

minor comments (2)

[Figures] Figure captions and architecture diagram: the SDEP neck diagram would benefit from explicit annotation of the space-to-depth operations and how they feed into the DETR decoder to clarify the resolution-preservation path.
[Introduction] Notation: the acronym SARES-DEIM is introduced in the title and abstract but the expansion is not restated in the introduction, which could confuse readers unfamiliar with the component names.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the empirical validation and technical clarity of SARES-DEIM. We address each major point below and will incorporate revisions to improve reproducibility and substantiation of the claims.

read point-by-point responses

Referee: [Results] Results section: the headline performance numbers (76.4% mAP50:95, 93.8% mAP50 on HRSID) are presented without ablation studies isolating the contribution of the sparse gating in SARESMoE versus standard DETR attention, without error bars across multiple runs, and without explicit baseline re-implementations, which are required to substantiate the claim of outperforming SOTA YOLO and SAR detectors.

Authors: We agree that the results section would be strengthened by additional analyses. In the revised manuscript we will add ablation tables that isolate the contribution of the sparse gating mechanism in SARESMoE relative to standard DETR attention, report standard deviations or error bars from multiple independent training runs, and provide explicit details on baseline re-implementations (including training protocols and hyper-parameters used for fair comparison). revision: yes
Referee: [Methods] Methods section (SARESMoE description): the sparse gating mechanism is described at a high level but lacks the precise mathematical definition of the gating function, the frequency/wavelet expert architectures, or the routing loss term, making it impossible to verify that the experts reliably preserve ship signatures while suppressing speckle without additional assumptions.

Authors: We acknowledge the need for greater mathematical precision. The revised methods section will include the exact equations for the sparse gating function, the detailed architectures of the frequency and wavelet experts, and the formulation of the routing loss term, enabling direct verification of how ship signatures are preserved while speckle and clutter are suppressed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical architecture (SARES-DEIM) extending DETR with a sparse MoE gating module (SARESMoE) and a Space-to-Depth Enhancement Pyramid (SDEP) neck. No derivation chain, equations, or fitted parameters are presented that reduce to inputs by construction. Central claims rest on reported benchmark metrics (mAP50:95 = 76.4% on HRSID) and standard comparisons to YOLO baselines; these are externally falsifiable and do not rely on self-definitional steps, self-citation load-bearing, or ansatz smuggling. The architecture description and loss formulation are internally consistent without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The new modules SARESMoE and SDEP are architectural inventions whose independent evidence would require full paper validation.

pith-pipeline@v0.9.0 · 5527 in / 1088 out tokens · 36500 ms · 2026-05-13T17:00:13.892336+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
SARESMoE ... sparse gating mechanism to selectively route features toward specialized frequency and wavelet experts ... SDEP neck to preserve high-resolution spatial cues from shallow stages
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
DETR-style decoder ... bipartite matching problem

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 5 internal anchors

[1]

Lsdformer: Lightweight sar ship detection enhanced with efficient multiattention and structural reparameterization,

R. Jiang, H. Shi, J. Ni, J. Li, Y . Feng, X. Chen, and Y . Li, “Lsdformer: Lightweight sar ship detection enhanced with efficient multiattention and structural reparameterization,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 19, pp. 1359– 1377, 2026

work page 2026
[2]

Msf- sar: A multiscale fusion method for small ship detection in sar images,

H. Cui, T. Li, N. Su, Y . Yan, S. Feng, C. Zhao, J. He, and F. Gu, “Msf- sar: A multiscale fusion method for small ship detection in sar images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 19, pp. 2200–2212, 2026

work page 2026
[3]

Dbw-yolo: A high-precision sar ship detection method for complex environments,

X. Tang, J. Zhang, Y . Xia, and H. Xiao, “Dbw-yolo: A high-precision sar ship detection method for complex environments,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 7029–7039, 2024. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERV ATIONS AND REMOTE SENSING, EARLY ACCESS, 2026 10

work page 2024
[4]

Deformable feature fusion and accurate anchors prediction for lightweight sar ship detector based on dynamic hierarchical model pruning,

Y . Guo, S. Chen, R. Zhan, W. Wang, and J. Zhang, “Deformable feature fusion and accurate anchors prediction for lightweight sar ship detector based on dynamic hierarchical model pruning,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 15 019–15 036, 2025

work page 2025
[5]

Fevt-sar: Multicategory oriented sar ship detection based on feature enhancement vision transformer,

M. Fang, Y . Gu, and D. Peng, “Fevt-sar: Multicategory oriented sar ship detection based on feature enhancement vision transformer,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 2704–2717, 2025

work page 2025
[6]

What is YOLOv8: An in-depth exploration of the in- ternal features of the next-generation object detector,

M. Yaseen, “What is YOLOv8: An in-depth exploration of the in- ternal features of the next-generation object detector,”arXiv preprint arXiv:2408.15857, 2024

work page arXiv 2024
[7]

YOLOv11: An Overview of the Key Architectural Enhancements

R. Khanam and M. Hussain, “YOLOv11: An overview of the key architectural enhancements,”arXiv preprint arXiv:2410.17725, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

YOLOv12: Attention-Centric Real-Time Object Detectors

Y . Tian, Q. Ye, and D. Doermann, “YOLO12: Attention-centric real-time object detectors,”arXiv preprint arXiv:2502.12524, 2025

work page internal anchor Pith review arXiv 2025
[9]

Xinlei Chen, Hao Fang, Tsung-yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,”arXiv preprint arXiv:2005.12872, 2020

work page arXiv 2005
[10]

DETRs beat YOLOs on real-time object detection,

W. Lv, S. Xu, Y . Zhao, G. Wang, J. Wei, C. Cui, Y . Du, Q. Dang, and Y . Liu, “DETRs beat YOLOs on real-time object detection,”arXiv preprint arXiv:2304.08069, 2023

work page arXiv 2023
[11]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.- Y . Shum, “DINO: DETR with improved denoising anchor boxes for end-to-end object detection,”arXiv preprint arXiv:2203.03605, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[12]

DEIM: DETR with improved matching for fast convergence,

S. Huang, Z. Lu, X. Cun, Y . Yu, X. Zhou, and X. Shen, “DEIM: DETR with improved matching for fast convergence,”arXiv preprint arXiv:2412.04234, 2025

work page arXiv 2025
[13]

D-FINE: Redefine regression task in DETRs as fine-grained distribution refinement,

Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-FINE: Redefine regression task in DETRs as fine-grained distribution refinement,”arXiv preprint arXiv:2410.13842, 2024

work page arXiv 2024
[14]

A novel multi-frequency coordinated module for SAR ship detection,

C. Qiao, F. Shen, X. Wang, R. Wang, F. Cao, S. Zhao, and C. Li, “A novel multi-frequency coordinated module for SAR ship detection,” in Proceedings of the IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), 2022, pp. 804–811

work page 2022
[15]

A novel cross frequency- domain interaction learning for aerial oriented object detection,

W. Weng, W. Lin, F. Lin, J. Ren, and F. Shen, “A novel cross frequency- domain interaction learning for aerial oriented object detection,” inChi- nese Conference on Pattern Recognition and Computer Vision (PRCV), 2023, pp. 292–305

work page 2023
[16]

Enhancing aerial object detec- tion with selective frequency interaction network,

W. Weng, M. Wei, J. Ren, and F. Shen, “Enhancing aerial object detec- tion with selective frequency interaction network,”IEEE Transactions on Artificial Intelligence, vol. 1, no. 01, pp. 1–12, 2024

work page 2024
[17]

Adaptive mixtures of local experts,

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural Computation, vol. 3, no. 1, pp. 79–87, 1991

work page 1991
[18]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,”arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

YOLO-Master: MoE- accelerated with specialized transformers for enhanced real-time detec- tion,

X. Lin, J. Peng, Z. Gan, J. Zhu, and J. Liu, “YOLO-Master: MoE- accelerated with specialized transformers for enhanced real-time detec- tion,”arXiv preprint arXiv:2512.23273, 2025

work page arXiv 2025
[20]

Lr-fpn: Enhancing remote sensing object detection with location refined feature pyramid network,

H. Li, R. Zhang, Y . Pan, J. Ren, and F. Shen, “Lr-fpn: Enhancing remote sensing object detection with location refined feature pyramid network,” inProceedings of the International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

work page 2024
[21]

No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects,

R. A. Sahi, H. Goyal, Y . Akhtar, and S. Kumar, “No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects,”arXiv preprint arXiv:2208.03641, 2022

work page arXiv 2022
[22]

Feature pyramid networks for object detection,

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944

work page 2017
[23]

ASTRA: Let Arbitrary Subjects Transform in Video Editing

F. Shen, W. Xu, R. Yan, D. Zhang, X. Shu, and J. Tang, “IMAGEdit: Let any subject transform,”arXiv preprint arXiv:2510.01186, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

IMAGHar- mony: Controllable image editing with consistent object quantity and layout,

F. Shen, X. Du, Y . Gao, J. Yu, Y . Cao, X. Lei, and J. Tang, “IMAGHar- mony: Controllable image editing with consistent object quantity and layout,”arXiv preprint arXiv:2506.01949, 2025

work page arXiv 2025
[25]

IMAGGarment-1: Fine-grained garment generation for controllable fashion design,

F. Shen, J. Yu, C. Wang, X. Jiang, X. Du, and J. Tang, “IMAGGarment-1: Fine-grained garment generation for controllable fashion design,”arXiv preprint arXiv:2504.13176, 2025

work page arXiv 2025
[26]

Advancing pose-guided image synthesis with progressive conditional diffusion models,

F. Shen, H. Ye, J. Zhang, C. Wang, X. Han, and Y . Wei, “Advancing pose-guided image synthesis with progressive conditional diffusion models,” inProceedings of the International Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://openreview.net/forum?id=rHzapPnCgT

work page 2024
[27]

Imagpose: A unified conditional framework for pose-guided person generation,

F. Shen and J. Tang, “Imagpose: A unified conditional framework for pose-guided person generation,”Advances in Neural Information Processing Systems (NeurIPS), vol. 37, pp. 6246–6266, 2024

work page 2024
[28]

Imagdressing-v1: Customizable virtual dressing,

F. Shen, X. Jiang, X. He, H. H. Ye, C. Wang, X. Du, Z. Li, and J. Tang, “Imagdressing-v1: Customizable virtual dressing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 6795–6804

work page 2025
[29]

Long-term TalkingFace generation via motion-prior conditional diffusion model,

F. Shen, C. Wang, J. Gao, Q. Guo, J. Dang, J. Tang, and T.-S. Chua, “Long-term TalkingFace generation via motion-prior conditional diffusion model,” inProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

work page 2025
[30]

Spatial-frequency dual progressive attention network for medical image segmentation,

Z. Zhou, A. He, Y . Wu, R. Yao, X. Xie, and T. Li, “Spatial-frequency dual progressive attention network for medical image segmentation,” arXiv preprint arXiv:2406.07952, 2024

work page arXiv 2024
[31]

Wavelet con- volutions for large receptive fields,

S. E. Finder, R. Amoyal, E. Treister, and O. Freifeld, “Wavelet con- volutions for large receptive fields,”arXiv preprint arXiv:2407.05848, 2024

work page arXiv 2024
[32]

GhostNet: More features from cheap operations,

K. Han, Y . Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,”arXiv preprint arXiv:1911.11907, 2020

work page arXiv 1911
[33]

Frequency- adaptive dilated convolution for semantic segmentation,

Z. Li, Y . Chen, Q. Xu, Y . Liu, and H. Zhao, “Frequency- adaptive dilated convolution for semantic segmentation,”arXiv preprint arXiv:2403.05369, 2024

work page arXiv 2024
[34]

HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation,

S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation,”IEEE Access, vol. 8, pp. 120 234–120 254, 2020

work page 2020
[35]

A SAR dataset of ship detection for deep learning under complex backgrounds,

Y . Wang, C. Wang, H. Zhang, Y . Dong, and S. Wei, “A SAR dataset of ship detection for deep learning under complex backgrounds,”Remote Sensing, vol. 11, no. 7, p. 765, 2019

work page 2019
[36]

Squeeze and excitation rank Faster R-CNN for ship detection in SAR images,

Z. Lin, K. Ji, X. Leng, and G. Kuang, “Squeeze and excitation rank Faster R-CNN for ship detection in SAR images,”IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 5, pp. 751–755, 2019

work page 2019
[37]

SSD: Single shot multibox detector,

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “SSD: Single shot multibox detector,” inProceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 21–37

work page 2016
[38]

FCOS: Fully convolutional one- stage object detection,

Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one- stage object detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9626–9635

work page 2019
[39]

Cross-scale context-aware ship detection in SAR images using CSCF-Net,

L. Qi, C. Huang, and Q. Guo, “Cross-scale context-aware ship detection in SAR images using CSCF-Net,”IEEE Geoscience and Remote Sensing Letters, vol. 23, pp. 1–5, 2026

work page 2026
[40]

SAR-D-FINE: A context-aware detector for small and densely packed ship detection in SAR imagery,

X. Fan, B. Xing, X. Wang, H. Liu, C. Yan, and P. Zhi, “SAR-D-FINE: A context-aware detector for small and densely packed ship detection in SAR imagery,”IEEE Geoscience and Remote Sensing Letters, vol. 23, pp. 1–5, 2026

work page 2026
[41]

Ultralytics YOLOv8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,”GitHub repository, 2023, https://github.com/ultralytics/ultralytics, v8.0.0

work page 2023
[42]

Ultralytics YOLO11,

G. Jocher and J. Qiu, “Ultralytics YOLO11,”GitHub repository, 2024, https://github.com/ultralytics/ultralytics, v11.0.0

work page 2024
[43]

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection,

X. Yuan, Z. Zheng, Y . Li, X. Liu, L. Liu, X. Li, Q. Hou, and M.-M. Cheng, “Strip r-cnn: Large strip convolution for remote sensing object detection,” 2025. [Online]. Available: https://arxiv.org/abs/2501.03775

work page arXiv 2025