arxiv: 2604.13568 · v1 · submitted 2026-04-15 · 💻 cs.CV

Recognition: unknown

ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing

Zhentao Yang , Yixiang Luomei , Zhuoyang Liu , Zhenyu Liu , Feng Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords wideband spectrum sensingphysics-guided learningspectrogram processingcoarse-to-fine detectionmodulation classificationtime-frequency analysissignal purificationlow-altitude monitoring

0 comments

The pith

ZoomSpec integrates log-space spectrogram transforms and adaptive signal purification into a coarse-to-fine network to lift wideband spectrum sensing accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that treating spectrograms as ordinary images creates a domain mismatch that hides narrowband signals amid spectral leakage and poor resolution in wideband data. It counters this by embedding two signal-processing priors: a log-space short-time Fourier transform that keeps relative resolution constant while sharpening fine structures, and an adaptive heterodyne low-pass module that aligns centers, matches bandwidths, and decimates safely before fine recognition. These modules feed a dual-domain attention network that jointly refines temporal boundaries and modulation labels from both raw I/Q and magnitude features. If the approach holds, monitoring systems for low-altitude environments gain reliable detection across heterogeneous protocols and fluctuating signal-to-noise ratios. A reader cares because the method turns an ill-posed image-classification task back into a physics-aware signal problem without sacrificing learned flexibility.

Core claim

ZoomSpec is a physics-guided coarse-to-fine framework in which a Log-Space STFT overcomes the geometric limits of linear spectrograms, a Coarse Proposal Net rapidly screens the full band, an Adaptive Heterodyne Low-Pass module purifies the signal by center-frequency alignment, bandwidth-matched filtering and safe decimation, and a Fine Recognition Net fuses purified time-domain I/Q with spectral magnitude through dual-domain attention to refine boundaries and classify modulations.

What carries the argument

The Adaptive Heterodyne Low-Pass (AHLP) module, which executes center-frequency aligning, bandwidth-matched filtering, and safe decimation to remove out-of-band interference before fine recognition.

If this is right

The framework reaches 78.1 mAP@0.5:0.95 on real-world SpaceNet recordings, exceeding prior systems.
Detection stability holds across diverse modulation bandwidths where earlier methods degrade.
Narrowband visibility improves while constant relative resolution is preserved over wide bands.
Out-of-band interference is suppressed before classification, reducing false boundaries.
Dual-domain attention jointly optimizes temporal localization and modulation labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coarse-to-fine purification pattern could be tested on radar or underwater acoustic spectrograms that share similar leakage and resolution issues.
Real-time deployment becomes feasible once the coarse proposal net is quantized, enabling dynamic spectrum access on resource-limited platforms.
The log-space transform may generalize to other logarithmic frequency representations used in vibration analysis or biomedical signal processing.

Load-bearing premise

The assumption that the physics priors (log-space STFT and adaptive heterodyne filtering) fully close the domain gap between natural-image training data and real spectrograms without introducing new biases that hurt narrowband detection.

What would settle it

A direct ablation on the SpaceNet dataset in which removing the AHLP module or reverting to linear spectrograms causes mAP@0.5:0.95 to fall below the current leaderboard systems on narrowband modulations.

Figures

Figures reproduced from arXiv: 2604.13568 by Feng Xu, Yixiang Luomei, Zhentao Yang, Zhenyu Liu, Zhuoyang Liu.

**Figure 2.** Figure 2: Visual comparison of spectral representations on simulated narrowband signals (Zigbee, LoRa, and NB-FM). While standard STFT suffers from sparse [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The AHLP processing chain. Guided by the coarse parameters ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FRN architecture. After AHLP, two streams-time-domain I/Q and FFT magnitude-pass through a 1-D downsampling stem and a shallow conv encoder, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Class distribution of SpaceNet dataset. IV. EXPERIMENTS A. Dataset SpaceNet dataset is jointly curated by the Institute of Space Internet of Fudan University and the Shanghai Radio Monitoring Station, and serves as the official dataset of the 2025 “AI+Radio” Challenge [37]. All experiments are conducted on the SpaceNet public real-world benchmark, which covers the entire 2.4-2.4835 GHz ISM band. Measureme… view at source ↗

**Figure 6.** Figure 6: Visualization of the standard STFT spectrogram. Due to the linear frequency sampling, narrowband emissions (highlighted in the zoomed insets) [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of the proposed LS-STFT spectrogram under the same frequency budget [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: mAP versus IoU threshold on SpaceNet dataset. Color indicates [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: LS-STFT already produces diagonally dominant [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 9.** Figure 9: Per-class confusion matrices comparison on SpaceNet dataset. From left to right: RF-DETR [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Confusion matrices for the 4-way bandwidth grading task in the CPN under different spectral representations. Standard STFT fails to resolve [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

read the original abstract

Wideband spectrum sensing for low-altitude monitoring is critical yet challenging due to heterogeneous protocols,large bandwidths, and non-stationary SNR. Existing data-driven approaches treat spectrograms as natural images,suffering from domain mismatch: they neglect time-frequency resolution constraints and spectral leakage, leading topoor narrowband visibility. This paper proposes ZoomSpec, a physics-guided coarse-to-fine framework integrating signal processing priors with deep learning. We introduce a Log-Space STFT (LS-STFT) to overcome the geometric bottleneck of linear spectrograms, sharpening narrowband structures while maintaining constant relative resolution. A lightweight Coarse Proposal Net (CPN) rapidly screens the full band. To bridge coarse detection and fine recognition, we design an Adaptive Heterodyne Low-Pass (AHLP) module that executes center-frequency aligning, bandwidth-matched filtering, and safe decimation, purifying signals of out-of-band interference. A Fine Recognition Net (FRN) fuses purified time-domain I/Q with spectral magnitude via dual-domain attention to jointly refine temporal boundaries and modulation classification. Evaluations on the SpaceNet real-world dataset demonstrate state-of-the-art 78.1 mAP@0.5:0.95, surpassing existing leaderboard systems with superior stability across diverse modulation bandwidths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ZoomSpec adds targeted physics priors to fix spectrogram domain mismatch in a coarse-to-fine pipeline and reports competitive real-data results, though the experiments need more detail to pin down the gains.

read the letter

ZoomSpec is a practical step forward for wideband spectrum sensing. It embeds signal-processing knowledge into deep learning to stop treating spectrograms like ordinary photos, which often hides narrowband signals amid leakage and resolution problems. The Log-Space STFT keeps constant relative resolution while sharpening those narrow features, and the Adaptive Heterodyne Low-Pass module handles center alignment, matched filtering, and safe decimation before the fine stage. The dual-domain attention in the recognition net then pulls together time-domain I/Q and spectral magnitude. That combination looks new enough in this setting and directly attacks the stated shortcomings of prior image-based methods. The 78.1 mAP on the SpaceNet dataset with claimed stability across bandwidths is the headline empirical result, and nothing in the description suggests circular claims or obvious contradictions in the architecture. The motivation and flow are straightforward. The soft spots sit mostly in the evaluation. The abstract gives the top-line number but leaves ablations, full baseline tables, and error breakdowns for the full text; without those it is hard to measure how much each new module actually moves the needle versus dataset specifics or tuning. The assumption that the priors clean up domain mismatch without adding their own narrowband artifacts also needs checking against more varied cases. This paper is for people working on spectrum sensing for low-altitude or drone applications who already know the signal-processing basics. A reader looking for concrete hybrid modules rather than pure end-to-end learning would find usable ideas here. It deserves peer review because the core approach is coherent, grounded in real constraints, and backed by external data; referees can verify the implementation and push for the missing controls.

Referee Report

2 major / 2 minor

Summary. The paper proposes ZoomSpec, a physics-guided coarse-to-fine framework for wideband spectrum sensing. It introduces a Log-Space STFT (LS-STFT) to achieve constant relative resolution and sharpen narrowband structures in spectrograms, a Coarse Proposal Net (CPN) for initial screening, an Adaptive Heterodyne Low-Pass (AHLP) module for center-frequency alignment, bandwidth-matched filtering and safe decimation to purify signals, and a Fine Recognition Net (FRN) that fuses time-domain I/Q signals with spectral magnitude via dual-domain attention for refined boundary detection and modulation classification. The central empirical claim is state-of-the-art performance of 78.1 mAP@0.5:0.95 on the SpaceNet real-world dataset, with improved stability across diverse modulation bandwidths.

Significance. If the results hold under detailed scrutiny, the work has moderate significance for integrating signal-processing priors (LS-STFT and AHLP) with deep networks to address domain mismatch between natural-image training and spectrogram data in spectrum sensing. This could benefit applications in low-altitude monitoring with heterogeneous protocols and non-stationary conditions. The coarse-to-fine design and dual-domain fusion are conceptually coherent, but the absence of ablations and baselines in the provided text limits assessment of whether the physics components deliver the claimed gains without new biases.

major comments (2)

Abstract: The central claim of 78.1 mAP@0.5:0.95 as state-of-the-art with superior stability is load-bearing for the paper's contribution, yet the text provides no baseline comparisons, ablation studies on LS-STFT/AHLP/FRN, implementation details, or error analysis, rendering the performance result unverifiable from the available manuscript.
Abstract (physics priors section): The claim that LS-STFT and AHLP fully resolve domain mismatch and improve narrowband visibility without introducing artifacts is central to the motivation, but no quantitative evidence or analysis of potential biases from these modules (e.g., effects on narrowband detection) is supplied to support it.

minor comments (2)

Abstract: Typo 'topoor' should be 'to poor'; missing space after 'protocols,'.
Abstract: The mAP@0.5:0.95 metric is standard but would benefit from explicit definition of the IoU thresholds used for the SpaceNet evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will make revisions to enhance the verifiability of our claims and the supporting evidence for the physics-guided components.

read point-by-point responses

Referee: Abstract: The central claim of 78.1 mAP@0.5:0.95 as state-of-the-art with superior stability is load-bearing for the paper's contribution, yet the text provides no baseline comparisons, ablation studies on LS-STFT/AHLP/FRN, implementation details, or error analysis, rendering the performance result unverifiable from the available manuscript.

Authors: We acknowledge that the abstract's central performance claim requires clearer supporting evidence within the manuscript to ensure verifiability. The current text focuses on the high-level result without embedding or cross-referencing the necessary comparisons and analyses. In the revised manuscript, we will expand the Experiments section to include explicit baseline comparisons against existing leaderboard systems on the SpaceNet dataset, ablation studies isolating LS-STFT, CPN, AHLP, and FRN, detailed implementation parameters, and error analysis across modulation bandwidths. We will also revise the abstract to briefly reference the key baselines and add a compact results summary table in the introduction for immediate context. revision: yes
Referee: Abstract (physics priors section): The claim that LS-STFT and AHLP fully resolve domain mismatch and improve narrowband visibility without introducing artifacts is central to the motivation, but no quantitative evidence or analysis of potential biases from these modules (e.g., effects on narrowband detection) is supplied to support it.

Authors: We agree that the claims regarding LS-STFT and AHLP require quantitative backing to demonstrate resolution of domain mismatch, improved narrowband visibility, and absence of introduced artifacts or biases. The current manuscript motivates these modules but lacks dedicated metrics or bias analysis. In the revision, we will add ablation experiments quantifying narrowband detection performance (e.g., precision on narrowband signals) with and without LS-STFT/AHLP, along with visual and numerical analysis of potential artifacts or biases in spectrograms and detection outcomes. This will directly support the physics priors section. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The abstract and available claims describe an empirical architecture (LS-STFT for spectrograms, AHLP for decimation, dual-domain FRN) whose central result is an mAP score measured on the external SpaceNet dataset. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems are shown that would reduce the reported performance to the inputs by construction. The physics priors are presented as design choices whose value is validated externally rather than defined circularly.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the effectiveness of newly introduced modules and the representativeness of the SpaceNet dataset; no explicit free parameters or invented physical entities are stated, but the framework assumes standard deep-learning training succeeds once domain mismatch is mitigated by the physics modules.

axioms (1)

domain assumption Existing data-driven spectrogram methods suffer from domain mismatch due to neglected time-frequency resolution and spectral leakage
Directly stated in the abstract as the motivation for the physics-guided approach.

invented entities (2)

Log-Space STFT (LS-STFT) no independent evidence
purpose: Overcome geometric bottleneck of linear spectrograms to sharpen narrowband structures with constant relative resolution
New transform introduced to address limitations of standard STFT in the framework.
Adaptive Heterodyne Low-Pass (AHLP) module no independent evidence
purpose: Perform center-frequency aligning, bandwidth-matched filtering, and safe decimation to purify signals from out-of-band interference
New module designed to bridge coarse detection and fine recognition stages.

pith-pipeline@v0.9.0 · 5536 in / 1497 out tokens · 28331 ms · 2026-05-10T13:44:54.159797+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 4 canonical work pages · 2 internal anchors

[1]

On radio-frequency spectrum management,

R. Struzak, T. Tjelta, and J. P. Borrego, “On radio-frequency spectrum management,”URSI Radio Science Bulletin, vol. 2015, no. 354, pp. 11– 35, 2015

2015
[2]

Sensing capacity for integrated sensing and communication systems in low-altitude economy,

J. Wan, H. Ren, C. Pan, Z. Zhang, S. Gao, Y . Yu, and C. Wang, “Sensing capacity for integrated sensing and communication systems in low-altitude economy,”IEEE Communications Letters, 2025

2025
[3]

Wideband spectrum sensing in cognitive radio networks,

Z. Quan, S. Cui, A. H. Sayed, and H. V . Poor, “Wideband spectrum sensing in cognitive radio networks,” inProc. IEEE International Conference on Communications (ICC). IEEE, 2008, pp. 901–906

2008
[4]

Improved spectrum sensing for cognitive radio based on adaptive threshold,

R. K. Dubey and G. Verma, “Improved spectrum sensing for cognitive radio based on adaptive threshold,” in2015 Second International Con- ference on Advances in Computing and Communication Engineering. IEEE, 2015, pp. 253–256

2015
[5]

Exploitation of spectral redundancy in cyclostationary signals,

W. A. Gardner, “Exploitation of spectral redundancy in cyclostationary signals,”IEEE Signal Processing Magazine, vol. 8, no. 2, pp. 14–36, 2002

2002
[6]

Collaborative cyclostationary spectrum sensing for cognitive radio systems,

J. Lund ´en, V . Koivunen, A. Huttunen, and H. V . Poor, “Collaborative cyclostationary spectrum sensing for cognitive radio systems,”IEEE Transactions on Signal Processing, vol. 57, no. 11, pp. 4182–4195, 2009

2009
[7]

Spectrum sensing for cognitive radio: State-of-the-art and recent advances,

E. Axell, G. Leus, E. G. Larsson, and H. V . Poor, “Spectrum sensing for cognitive radio: State-of-the-art and recent advances,”IEEE Signal Processing Magazine, vol. 29, no. 3, pp. 101–116, 2012

2012
[8]

Covariance based signal detections for cognitive radio,

Y . Zeng and Y .-C. Liang, “Covariance based signal detections for cognitive radio,” in2007 2nd IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks. IEEE, 2007, pp. 202–207

2007
[9]

A survey on spectrum sensing techniques for cognitive radio,

D. D. Ariananda, M. K. Lakshmanan, and H. Nikookar, “A survey on spectrum sensing techniques for cognitive radio,” in2009 Second International Workshop on Cognitive Radio and Advanced Spectrum Management, 2009, pp. 74–79

2009
[10]

MCNet: An efficient CNN architecture for robust automatic modulation classifica- tion,

T. Huynh-The, C.-H. Hua, Q.-V . Pham, and D.-S. Kim, “MCNet: An efficient CNN architecture for robust automatic modulation classifica- tion,”IEEE Communications Letters, vol. 24, no. 4, pp. 811–815, Apr. 2020

2020
[11]

CNN-based automatic modulation classification for beyond 5G communications,

A. P. Hermawan, R. R. Ginanjar, D.-S. Kim, and J.-M. Lee, “CNN-based automatic modulation classification for beyond 5G communications,” IEEE Communications Letters, vol. 24, no. 5, pp. 1038–1041, May 2020

2020
[12]

SigNet: A novel deep learning framework for radio signal classification,

Z. Chenet al., “SigNet: A novel deep learning framework for radio signal classification,”IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 529–541, Jun. 2022

2022
[13]

UniFormer: Unifying convolution and self-attention for visual recognition,

K. Liet al., “UniFormer: Unifying convolution and self-attention for visual recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 581–12 600, Oct. 2023

2023
[14]

Deep learning models for wireless signal classification with distributed low- cost spectrum sensors,

S. Rajendran, W. Meert, D. Giustiniano, V . Lenders, and S. Pollin, “Deep learning models for wireless signal classification with distributed low- cost spectrum sensors,”IEEE Transactions on Cognitive Communica- tions and Networking, vol. 4, no. 3, pp. 433–445, Sep. 2018

2018
[15]

Real-time radio technology and modulation clas- sification via an LSTM auto-encoder,

Z. Ke and H. Vikalo, “Real-time radio technology and modulation clas- sification via an LSTM auto-encoder,”IEEE Transactions on Wireless Communications, vol. 21, no. 1, pp. 370–382, Jan. 2022

2022
[16]

Deep architectures for modulation recog- nition,

N. E. West and T. O’Shea, “Deep architectures for modulation recog- nition,” inProc. IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2017, pp. 1–6

2017
[17]

Intra-pulse modulation radar signal recognition based on CLDN network,

S. Wei, Q. Qu, H. Su, M. Wang, J. Shi, and X. Hao, “Intra-pulse modulation radar signal recognition based on CLDN network,”IET Radar , Sonar & Navigation, vol. 14, no. 6, pp. 803–810, 2020

2020
[18]

A spatiotemporal multi-channel learning framework for automatic modulation recognition,

J. Xu, C. Luo, G. Parr, and Y . Luo, “A spatiotemporal multi-channel learning framework for automatic modulation recognition,”IEEE Wire- less Communications Letters, vol. 9, no. 10, pp. 1629–1632, Oct. 2020

2020
[19]

Spectrum monitoring for radar bands using deep convolutional neural networks,

A. Selim, F. Paisana, J. A. Arokkiam, Y . Zhang, L. Doyle, and L. A. DaSilva, “Spectrum monitoring for radar bands using deep convolutional neural networks,” inGLOBECOM 2017: 2017 IEEE Global Communi- cations Conference. IEEE, 2017, pp. 1–6

2017
[20]

Deepsense: Fast wideband spectrum sensing through real-time in-the-loop deep learning,

D. Uvaydov, S. D’Oro, F. Restuccia, and T. Melodia, “Deepsense: Fast wideband spectrum sensing through real-time in-the-loop deep learning,” inIEEE INFOCOM 2021: IEEE Conference on Computer Communications. IEEE, 2021, pp. 1–10

2021
[21]

Spectrum trans- former: An attention-based wideband spectrum detector,

W. Zhang, Y . Wang, X. Chen, Z. Cai, and Z. Tian, “Spectrum trans- former: An attention-based wideband spectrum detector,”IEEE Trans- actions on Wireless Communications, vol. 23, no. 9, pp. 12 343–12 353, 2024

2024
[22]

Spectrum sensing and signal identification with deep learning based on spectral correlation function,

K. Tekbıyık, ¨O. Akbunar, A. R. Ekti, A. G ¨orc ¸in, G. K. Kurt, and K. A. Qaraqe, “Spectrum sensing and signal identification with deep learning based on spectral correlation function,”IEEE Transactions on V ehicular Technology, vol. 70, no. 10, pp. 10 514–10 527, 2021

2021
[23]

A review of YOLO algorithm developments,

P. Jiang, D. Ergu, F. Liu, Y . Cai, and B. Ma, “A review of YOLO algorithm developments,”Procedia Computer Science, vol. 199, pp. 1066–1073, 2022

2022
[24]

YOLOv11: An Overview of the Key Architectural Enhancements

R. Khanam and M. Hussain, “YOLOv11: An overview of the key architectural enhancements,”arXiv preprint arXiv:2410.17725, 2024

work page internal anchor Pith review arXiv 2024
[25]

Joint detection and classification of RF signals using deep learning,

A. Vagollari, V . Schram, W. Wicke, M. Hirschbeck, and W. Gerstacker, “Joint detection and classification of RF signals using deep learning,” in 2021 IEEE 93rd V ehicular Technology Conference (VTC2021-Spring). IEEE, 2021, pp. 1–7

2021
[26]

An end-to-end deep learning framework for wideband signal recognition,

A. Vagollari, M. Hirschbeck, and W. Gerstacker, “An end-to-end deep learning framework for wideband signal recognition,”IEEE Access, vol. 11, pp. 52 899–52 922, 2023

2023
[27]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision (ECCV). Springer, 2020, pp. 213–229

2020
[28]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.- Y . Shum, “DINO: DETR with improved denoising anchor boxes for end-to-end object detection,”arXiv preprint arXiv:2203.03605, 2022

work page internal anchor Pith review arXiv 2022
[29]

RT-DETR-based wideband signal detection and modulation classification,

M. Cao, P. Chu, P. Ma, and B. Fang, “RT-DETR-based wideband signal detection and modulation classification,”Frontiers in Computing and Intelligent Systems, 2025

2025
[30]

Heisenberg’s uncertainty princi- ple,

P. Busch, T. Heinonen, and P. Lahti, “Heisenberg’s uncertainty princi- ple,”Physics Reports, vol. 452, no. 6, pp. 155–176, 2007

2007
[31]

SpaceNet: A large-scale real-measurement bench- mark dataset for low-altitude spectrum sensing,

Fudan University Space Internet Research Institute and Shanghai Radio Monitoring Station, “SpaceNet: A large-scale real-measurement bench- mark dataset for low-altitude spectrum sensing,” [Online]. Available: https://www.chaspark.com/\#/s/SpaceNet, 2025, accessed: 2026-02-01

2025
[32]

IQFormer: A novel transformer-based model with multi-modality fusion for automatic mod- ulation recognition,

M. Shao, D. Li, S. Hong, J. Qi, and H. Sun, “IQFormer: A novel transformer-based model with multi-modality fusion for automatic mod- ulation recognition,”IEEE Transactions on Cognitive Communications and Networking, 2024

2024
[33]

A. V . Oppenheim and R. W. Schafer,Signals and Systems, 2nd ed. Prentice Hall, 2010

2010
[34]

YOLOv11: Next-generation object detection models,

Ultralytics, “YOLOv11: Next-generation object detection models,” [Online]. Available: https://github.com/ultralytics/ultralytics, 2024, YOLOv11-nano variant. Accessed: 2026-02-01

2024
[35]

SwiftFormer: Efficient additive attention for transformer-based real-time mobile vision applications,

A. Shaker, M. Maaz, H. Rasheed, S. Khan, M.-H. Yang, and F. S. Khan, “SwiftFormer: Efficient additive attention for transformer-based real-time mobile vision applications,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 17 425– 17 436

2023
[36]

Bidirectional recurrent neural net- works,

M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- works,”IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997

1997
[37]

AI+Radio

2025 World “AI+Radio” Challenge(W ARC) Organizing Committee, “2025 global “ai+radio” challenge (warc): Official website,” [Online]. Available: http://www.airadio2025.com/, 2025, accessed: 2026-02-01

2025
[38]

D-fine: Redefine regression task in detrs as fine-grained distribution refinement.arXiv preprint arXiv:2410.13842, 2024

Y . Peng, H. Li, P. Wu, Y . Zhang, X. Sun, and F. Wu, “D-FINE: Redefine regression task in DETRs as fine-grained distribution refinement,”arXiv preprint arXiv:2410.13842, 2024

work page arXiv 2024
[39]

Rf-detr: neural architecture search for real-time detection transformers.arXiv preprint arXiv:2511.09554, 2025

I. Robinson, P. Robicheaux, M. Popov, D. Ramanan, and N. Peri, “RF- DETR: Neural architecture search for real-time detection transformers,” arXiv preprint arXiv:2511.09554, 2025

work page arXiv 2025
[40]

Airadio2025 leaderboard,

Airadio2025, “Airadio2025 leaderboard,” [Online]. Available: http:// www.airadio2025.com/front/news?id=1988409955859873794, 2025, ac- cessed: 2026-02-01

2025