pith. machine review for the scientific record. sign in

arxiv: 2605.07623 · v1 · submitted 2026-05-08 · 📡 eess.SP

Recognition: unknown

AI-Empowered Low-Altitude Economy: Cooperative Sensing With Fixed Wireless Access

Chao-Kai Wen, Jiajia Guo, Jinya Zhang, Shi Jin, Xiangyi Li

Pith reviewed 2026-05-11 02:03 UTC · model grok-4.3

classification 📡 eess.SP
keywords UAV sensingcooperative sensingfixed wireless accesschannel state informationattention mechanismTransformerlow-altitude economy3GPP requirements
0
0 comments X

The pith

Fixed wireless access equipment can serve as wireless cameras to detect and locate unauthorized UAVs through AI analysis of uplink channel state information from multiple links.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes turning densely deployed FWA customer premises equipment into static sensors for the radio environment to address safety concerns from unauthorized UAVs in the growing low-altitude economy. It introduces a two-stage cooperative pipeline where neural networks first extract lightweight CSI features from base station-CPE pairs, an attention mechanism integrates them for detection while selecting key pairs, and a Transformer then fuses individual estimates and features for localization. Simulations demonstrate that this approach reduces missed detection probability to 0.63 percent and achieves a 95-percent positioning error of 6.50 meters, meeting 3GPP standards. A sympathetic reader would care because it offers wide-coverage UAV supervision at low additional cost by repurposing existing infrastructure.

Core claim

We develop an artificial intelligence-empowered two-stage cooperative sensing pipeline that exploits uplink channel state information from multiple base station-CPE pairs for UAV detection and localization. In cooperative detection, lightweight CSI features are first individually extracted by neural network and then adaptively integrated through an attention-based scheme to declare UAV presence. The learned attention scores identify critical pairs during detection and facilitate UAV-affected pair selection for subsequent localization. For cooperative localization, neural network initially generates individual estimates and extracts CSI features from selected pairs; these estimates together,

What carries the argument

The two-stage AI pipeline that uses an attention mechanism to adaptively integrate CSI features across base station-CPE pairs for detection and a Transformer to fuse individual estimates with features and pair indexes for localization.

If this is right

  • Cooperative attention-based detection can reduce missed UAV detection probability to 0.63 percent.
  • Transformer fusion of multi-pair CSI data can achieve 6.50-meter positioning error at 95-percent .
  • The learned attention scores can automatically select critical base station-CPE pairs for efficient localization.
  • This FWA-assisted approach can satisfy 3GPP requirements for UAV supervision with existing infrastructure.
  • Dataset and code release enables further validation of the CSI feature extraction and fusion methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing 5G networks with dense FWA CPEs could be repurposed for continuous low-altitude monitoring without dedicated radar hardware.
  • The attention mechanism's pair selection might generalize to other dynamic scatterers such as vehicles or birds in the same frequency bands.
  • Public release of the CSI dataset could support benchmarking against alternative fusion architectures beyond Transformer.

Load-bearing premise

The simulations accurately reflect real-world conditions for CSI variations due to UAVs and the performance of the attention and Transformer models in practical deployments.

What would settle it

Real-world field tests with actual UAV flights over a network of FWA CPEs that measure missed detection rate and 95-percent positioning error against the simulated values of 0.63 percent and 6.50 meters.

Figures

Figures reproduced from arXiv: 2605.07623 by Chao-Kai Wen, Jiajia Guo, Jinya Zhang, Shi Jin, Xiangyi Li.

Figure 1
Figure 1. Figure 1: Widely-deployed, fixed FWA CPEs show wireless sensing potential. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The LAE scenario comprises a CPU, multiple BSs/CPEs, and a UAV. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The two-stage cooperative UAV detection framework. In [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The NN architecture of the I-UDetNet for individual detection. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The two-stage cooperative UAV localization framework. Prior to localization, the CPU screens [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The NN architecture of the I-ULocNet for individual localization. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The NN architecture of the C-ULocNet for cooperative localization. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The LAE scenario modeled with OSM, Blender, and Sionna. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Cooperative detection performance with different numbers of BSs/CPEs. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of attention scores and true pair labels. In each [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: With k = 10 and varying σatt, the CDF of APE performance in the proposed two-stage localization framework. 200 epochs. I-ULocNet and C-ULocNet use an initial learning rate of 1×10−3 and a batch size of 32, with training durations of 1,000 and 300 epochs, respectively. To enhance training stability and mitigate overfitting, early stopping and learning rate decay strategies are applied across all NNs. B. Pe… view at source ↗
Figure 12
Figure 12. Figure 12: With varying σatt, APE comparison among proposed framework, hard fusion, and soft fusion. tends to assign higher attention weights to pairs with more pronounced UAV-induced channel variations. In addition, since the attention scores are normalized to sum to one, the attention allocated to each positive pair may decrease when the number of positive pairs becomes large. C. Performance Evaluation of Cooperat… view at source ↗
Figure 13
Figure 13. Figure 13: With varying CSI features, APE performance in cooperative [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Sensing region visualizations of the samples from the testing dataset. Multiple BSs provide complementary coverage that reduces blind areas, while [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
read the original abstract

The rapid growth of the low-altitude economy has intensified safety concerns arising from unauthorized unmanned aerial vehicles (UAVs), positioning UAV supervision as a key use case in 3GPP. To precisely sense such UAVs with wide coverage and low cost, we leverage fixed wireless access (FWA) customer premises equipment (CPEs), static, densely deployed devices that serve as wireless cameras for the radio environment. We develop an artificial intelligence-empowered two-stage cooperative sensing pipeline that exploits uplink channel state information (CSI) from multiple base station-CPE pairs for UAV detection and localization. In cooperative detection, lightweight CSI features are first individually extracted by neural network, and then adaptively integrated through an attention-based scheme to declare UAV presence. The learned attention scores effectively identify the critical pairs during detection, while facilitating UAV-affected pair selection for subsequent localization. For cooperative localization, neural network initially generates individual estimates and extract CSI features from selected pairs. These estimates, together with features and pair indexes, are fused using a Transformer to produce a precise cooperative estimate. Simulations show that cooperative schemes significantly reduce the missed detection probability to 0.63% and realize a 95%-confidence positioning error of 6.50 m, satisfying 3GPP requirements and showing the potential of FWA-assisted cooperative sensing. Dataset and codes are available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an AI-empowered two-stage cooperative sensing pipeline that uses uplink CSI from multiple base station-FWA CPE pairs to detect and localize unauthorized UAVs. Lightweight neural-network features are extracted per pair and adaptively fused via attention for detection (with attention scores also selecting pairs); individual localization estimates plus features are then fused by a Transformer. Simulations report that the cooperative scheme reduces missed-detection probability to 0.63 % and achieves a 95 %-confidence positioning error of 6.50 m, satisfying 3GPP requirements.

Significance. If the reported simulation performance generalizes, the work offers a low-cost, infrastructure-reuse approach to wide-area UAV supervision that aligns with 3GPP low-altitude-economy use cases. The public release of the dataset and code on GitHub is a clear strength that supports reproducibility and independent verification.

major comments (2)
  1. [§4] §4 (Simulation results): the central performance claims (0.63 % missed detection, 6.50 m 95 %-confidence error) rest entirely on the fidelity of the generated uplink CSI under UAV perturbations. The manuscript does not specify whether the channel model incorporates UAV Doppler spread, body-induced dynamic multipath, or FWA-specific impairments such as timing-advance jitter and hardware phase noise; without these details or validation against measured data, the 3GPP-compliance conclusion cannot be regarded as substantiated.
  2. [§3.2] §3.2 (Cooperative localization): the Transformer fusion receives pair indices and attention-selected features whose quality inherits the same unvalidated channel-model assumptions. No sensitivity analysis is provided showing how the 6.50 m error degrades when realistic Doppler or phase-noise terms are added, rendering the localization gain load-bearing on an untested modeling choice.
minor comments (2)
  1. The abstract states that 'lightweight CSI features' are extracted, yet the precise feature definitions (amplitude/phase statistics, subcarrier selection, etc.) are not enumerated in the method description.
  2. All neural-network and Transformer hyperparameters (learning rates, layer counts, attention heads, etc.) should be collected in a single table to facilitate exact reproduction, given that they are listed as free parameters.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and robustness of our simulation-based analysis. We address each major comment point by point below, indicating the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Simulation results): the central performance claims (0.63 % missed detection, 6.50 m 95 %-confidence error) rest entirely on the fidelity of the generated uplink CSI under UAV perturbations. The manuscript does not specify whether the channel model incorporates UAV Doppler spread, body-induced dynamic multipath, or FWA-specific impairments such as timing-advance jitter and hardware phase noise; without these details or validation against measured data, the 3GPP-compliance conclusion cannot be regarded as substantiated.

    Authors: We agree that the original manuscript did not provide sufficient explicit details on the channel model components used to generate the uplink CSI. In the revised version, we have added a dedicated paragraph in §4 describing the simulation setup: UAV Doppler spread is incorporated via a modified Jakes' spectrum with maximum Doppler shift computed from UAV velocity and carrier frequency; body-induced dynamic multipath is modeled by introducing time-varying scatterers with reflection coefficients drawn from UAV channel measurements in the literature; and FWA-specific impairments include timing-advance jitter (modeled as Gaussian with 3GPP-specified variance) and hardware phase noise (using a phase-locked loop model with parameters from TR 38.901). These extensions are consistent with 3GPP low-altitude economy channel studies. However, our work is simulation-based and does not include direct validation against proprietary measured CSI datasets from operational FWA deployments, which would require field trials outside the current scope. We have added an explicit limitations paragraph acknowledging this and the reliance on standardized models. revision: partial

  2. Referee: [§3.2] §3.2 (Cooperative localization): the Transformer fusion receives pair indices and attention-selected features whose quality inherits the same unvalidated channel-model assumptions. No sensitivity analysis is provided showing how the 6.50 m error degrades when realistic Doppler or phase-noise terms are added, rendering the localization gain load-bearing on an untested modeling choice.

    Authors: We acknowledge the need for sensitivity analysis to demonstrate robustness. In the revised manuscript, we have added new simulation results in §4 (with supporting discussion in §3.2) that vary the UAV Doppler spread (from 0 to 200 Hz) and phase-noise variance (across 3GPP-recommended ranges) while keeping other parameters fixed. The results show that the Transformer-based cooperative fusion maintains a positioning error below 8.5 m at the 95 % confidence level even under elevated impairments, with graceful degradation compared to the baseline single-pair estimates. The attention mechanism continues to select high-quality pairs effectively. These additional curves and tables have been inserted to substantiate the localization claims under more realistic conditions. revision: yes

standing simulated objections not resolved
  • Direct validation of the generated CSI against real-world measured data from FWA CPEs in the presence of UAVs, which is not feasible within the simulation-focused scope of this work and would require extensive proprietary field measurements.

Circularity Check

0 steps flagged

No circularity: performance claims rest on independent simulations of a neural pipeline

full rationale

The paper describes an attention-based detection stage followed by Transformer fusion for localization, with all quantitative results (0.63% missed detection, 6.50 m positioning error) obtained from Monte-Carlo simulations of uplink CSI under a UAV channel model. No equation, parameter, or claim is shown to be defined in terms of itself or recovered by construction from fitted inputs; the attention scores and Transformer outputs are learned quantities whose reported accuracy is measured against held-out simulation realizations rather than being tautological. Self-citations, if present, are not load-bearing for the central performance numbers. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work introduces no new physical entities but relies on standard assumptions in wireless sensing and machine learning applications to CSI data.

free parameters (1)
  • Neural network and Transformer hyperparameters
    Trainable parameters in the feature extraction NNs, attention mechanism, and fusion Transformer are fitted to simulated CSI data.
axioms (1)
  • domain assumption Uplink CSI from BS-CPE pairs contains sufficient information for UAV detection and localization
    The pipeline is built on this premise without deriving it from first principles.

pith-pipeline@v0.9.0 · 5549 in / 1318 out tokens · 37922 ms · 2026-05-11T02:03:56.593916+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Deep learning-based cooperative UA V detection with CPE-assisted sensing,

    J. Zhang, J. Guo, X. Li, C.-K. Wen, and S. Jin, “Deep learning-based cooperative UA V detection with CPE-assisted sensing,” inProc. IEEE ICC Workshops, 2026

  2. [2]

    From ground to sky: Architec- tures, applications, and challenges shaping low-altitude wireless networks,

    W. Yuanet al., “From ground to sky: Architectures, applications, and challenges shaping low-altitude wireless networks,”arXiv preprint arXiv:2506.12308, 2025

  3. [3]

    LLM-empowered near-field communica- tions for low-altitude economy,

    Z. Xu, T. Zheng, and L. Dai, “LLM-empowered near-field communica- tions for low-altitude economy,”IEEE Trans. Commun., vol. 73, no. 11, pp. 11 186–11 196, Nov. 2025

  4. [4]

    Precise UA V MMW-vision positioning: A modal-oriented self-tuning fusion framework,

    G. Wu, F. Zhou, C. Meng, and X.-Y . Li, “Precise UA V MMW-vision positioning: A modal-oriented self-tuning fusion framework,”IEEE J. Sel. Areas Commun., vol. 42, no. 1, pp. 6–20, Jan. 2024

  5. [5]

    Future technology trends of terrestrial International Mobile Telecommunications systems towards 2030 and beyond,

    ITU-R, “Future technology trends of terrestrial International Mobile Telecommunications systems towards 2030 and beyond,”

  6. [6]

    Available: https://www.itu.int/dms pub/itu-r/opb/rep/ R-REP-M.2516-2022-PDF-E.pdf

    [Online]. Available: https://www.itu.int/dms pub/itu-r/opb/rep/ R-REP-M.2516-2022-PDF-E.pdf

  7. [7]

    Integrated sensing and communications: Toward dual- functional wireless networks for 6G and beyond,

    F. Liuet al., “Integrated sensing and communications: Toward dual- functional wireless networks for 6G and beyond,”IEEE J. Sel. Areas Commun., vol. 40, no. 6, pp. 1728–1767, Jun. 2022

  8. [8]

    Multiple-target detection in cell-free massive MIMO- assisted ISAC,

    M. Elfiatoure, M. Mohammadi, H. Quoc Ngo, H. Shin, and M. Matthaiou, “Multiple-target detection in cell-free massive MIMO- assisted ISAC,”IEEE Trans. Wireless Commun., vol. 24, no. 5, pp. 4283–4298, May 2025

  9. [9]

    Beamwidth-adaptive ISAC beamforming: A joint optimization framework for detection and com- munication,

    Z. Fu, J. Yuan, Y . Yang, and M. Guizani, “Beamwidth-adaptive ISAC beamforming: A joint optimization framework for detection and com- munication,” inProc. IEEE Int. Conf. Commun. China (ICCC), Aug. 2025, pp. 1–6

  10. [10]

    Practical issues and challenges in CSI-based integrated sensing and communication,

    D. Zhanget al., “Practical issues and challenges in CSI-based integrated sensing and communication,” inProc. IEEE ICC Workshops, May 2022, pp. 836–841

  11. [11]

    TR 22.870: Study on 6G use cases and service requirements,

    3GPP, “TR 22.870: Study on 6G use cases and service requirements,”

  12. [12]

    Available: https://www.3gpp.org/DynaReport/22870.htm

    [Online]. Available: https://www.3gpp.org/DynaReport/22870.htm

  13. [13]

    Device-free wireless sensing: Challenges, opportunities, and applications,

    J. Wang, Q. Gao, M. Pan, and Y . Fang, “Device-free wireless sensing: Challenges, opportunities, and applications,”IEEE Netw., vol. 32, no. 2, pp. 132–137, Mar.-Apr. 2018

  14. [14]

    Toward ubiquitous sensing and localization with reconfigurable intelligent sur- faces,

    H. Zhang, B. Di, K. Bian, Z. Han, H. V . Poor, and L. Song, “Toward ubiquitous sensing and localization with reconfigurable intelligent sur- faces,”Proc. IEEE, vol. 110, no. 9, pp. 1401–1422, Sep. 2022

  15. [15]

    Device-free lo- calization systems utilizing wireless RSSI: A comparative practical investigation,

    D. Konings, F. Alam, F. Noble, and E. M.-K. Lai, “Device-free lo- calization systems utilizing wireless RSSI: A comparative practical investigation,”IEEE Sensors J., vol. 19, no. 7, pp. 2747–2757, Apr. 2019

  16. [16]

    Accurate passive location estimation using TOA measurements,

    J. Shen, A. F. Molisch, and J. Salmi, “Accurate passive location estimation using TOA measurements,”IEEE Trans. Wireless Commun., vol. 11, no. 6, pp. 2182–2192, Jun. 2012

  17. [17]

    DeFi: Robust training-free device-free wireless localization with WiFi,

    L. Zhang, Q. Gao, X. Ma, J. Wang, T. Yang, and H. Wang, “DeFi: Robust training-free device-free wireless localization with WiFi,”IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8822–8831, Sep. 2018

  18. [18]

    Learning-based integrated CSI feedback and localization in massive MIMO,

    J. Guo, Y . Lv, C.-K. Wen, X. Li, and S. Jin, “Learning-based integrated CSI feedback and localization in massive MIMO,”IEEE Trans. Wireless Commun., vol. 23, no. 10, pp. 14 988–15 001, Oct. 2024

  19. [19]

    WiFi sensing with channel state information: A survey,

    Y . Ma, G. Zhou, and S. Wang, “WiFi sensing with channel state information: A survey,”ACM Comput. Surveys, vol. 52, no. 3, pp. 1–36, Jun. 2019

  20. [20]

    Multi-person passive WiFi indoor localization with intelligent reflecting surface,

    G. Zhang, D. Zhang, Y . He, J. Chen, F. Zhou, and Y . Chen, “Multi-person passive WiFi indoor localization with intelligent reflecting surface,” IEEE Trans. Wireless Commun., vol. 22, no. 10, pp. 6534–6546, Oct. 2023

  21. [21]

    SHARP: Environment and person independent activity recognition with commodity IEEE 802.11 access points,

    F. Meneghello, D. Garlisi, N. Dal Fabbro, I. Tinnirello, and M. Rossi, “SHARP: Environment and person independent activity recognition with commodity IEEE 802.11 access points,”IEEE Trans. Mobile Comput., vol. 22, no. 10, pp. 6160–6175, Oct. 2023

  22. [22]

    CSI fingerprinting for device-free localization: Phase calibration and SSIM-based augmen- tation,

    W. Wei, J. Yan, X. Wu, C. Wang, and G. Zhang, “CSI fingerprinting for device-free localization: Phase calibration and SSIM-based augmen- tation,”IEEE Wireless Commun. Lett., vol. 11, no. 6, pp. 1137–1141, Jun. 2022

  23. [23]

    Prompt-enabled large AI models for CSI feedback,

    J. Guo, Y . Cui, C.-K. Wen, and S. Jin, “Prompt-enabled large AI models for CSI feedback,”IEEE J. Sel. Areas Commun., vol. 44, pp. 2654–2668, 2026

  24. [24]

    Device-free sensing in OFDM cellular network,

    Q. Shi, L. Liu, S. Zhang, and S. Cui, “Device-free sensing in OFDM cellular network,”IEEE J. Sel. Areas Commun., vol. 40, no. 6, pp. 1838– 1853, Jun. 2022

  25. [25]

    Dynamic- MUSIC: Accurate device-free indoor localization,

    X. Li, S. Li, D. Zhang, J. Xiong, Y . Wang, and H. Mei, “Dynamic- MUSIC: Accurate device-free indoor localization,” inProc. ACM Int. Jt. Conf. Pervasive Ubiquitous Comput. (UbiComp), 2016, pp. 196–207

  26. [26]

    Cooperative integrated sensing and communication in 6G: From operators perspective,

    X. Wanget al., “Cooperative integrated sensing and communication in 6G: From operators perspective,”IEEE Wireless Commun., vol. 32, no. 1, pp. 52–59, Feb. 2025

  27. [27]

    Cooperative ISAC-empowered low-altitude economy,

    J. Tanget al., “Cooperative ISAC-empowered low-altitude economy,” IEEE Trans. Wireless Commun., vol. 24, no. 5, pp. 3837–3853, May 2025

  28. [28]

    Optimal weight scheme for fusion-assisted cooperative multi-monostatic object localization in 6G networks,

    M. R. Figueroa, P. K. Bishoyi, and M. Petrova, “Optimal weight scheme for fusion-assisted cooperative multi-monostatic object localization in 6G networks,” inProc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2024, pp. 1389–1394

  29. [29]

    ISAC enabled cooperative detection for cellular- connected UA V network,

    Y . Wanget al., “ISAC enabled cooperative detection for cellular- connected UA V network,”IEEE Trans. Wireless Commun., vol. 24, no. 2, pp. 1541–1554, Feb. 2025

  30. [30]

    A heterogeneous 6G networked sensing architecture with active and passive anchors,

    Q. Wang, L. Liu, S. Zhang, B. Di, and F. C. M. Lau, “A heterogeneous 6G networked sensing architecture with active and passive anchors,” IEEE Trans. Wireless Commun., vol. 23, no. 8, pp. 9502–9517, Aug. 2024

  31. [31]

    Cooperative ISAC for joint localization and velocity estimation in cell-free MIMO systems,

    Z. Wang, V . W. Wong, and R. Schober, “Cooperative ISAC for joint localization and velocity estimation in cell-free MIMO systems,”IEEE J. Sel. Areas Commun., vol. 44, pp. 642–658, 2026

  32. [32]

    Leveraging a variety of anchors in cellular network for ubiquitous sensing,

    L. Liu, S. Zhang, and S. Cui, “Leveraging a variety of anchors in cellular network for ubiquitous sensing,”IEEE Commun. Mag., vol. 62, no. 9, pp. 98–104, Sep. 2024

  33. [33]

    Fixed wireless access,

    Ericsson, “Fixed wireless access,” 2024. [Online]. Available: https: //www.ericsson.com/en/fixed-wireless-access/

  34. [34]

    TS 22.137: Service requirements for integrated sensing and communication,

    3GPP, “TS 22.137: Service requirements for integrated sensing and communication,” 2024. [Online]. Available: https://www.3gpp.org/ DynaReport/22137.htm

  35. [35]

    TR 22.837: Feasibility study on integrated sensing and communication,

    ——, “TR 22.837: Feasibility study on integrated sensing and communication,” 2024. [Online]. Available: https://www.3gpp.org/ DynaReport/22837.htm

  36. [36]

    Attention-based deep multiple instance learning,

    M. Ilse, J. Tomczak, and M. Welling, “Attention-based deep multiple instance learning,” inProc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 2127–2136

  37. [37]

    Multi-Scale Context Aggregation by Dilated Convolutions

    F. Yu and V . Koltun, “Multi-scale context aggregation by dilated convolutions,”arXiv preprint arXiv:1511.07122, 2015

  38. [38]

    CBAM: Convolutional block attention module,

    S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” inProc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 3–19

  39. [39]

    Attention is all you need,

    A. Vaswaniet al., “Attention is all you need,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017

  40. [40]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778

  41. [41]

    Sionna RT: Differentiable ray tracing for radio propagation modeling,

    J. Hoydiset al., “Sionna RT: Differentiable ray tracing for radio propagation modeling,” inProc. IEEE GLOBECOM Workshops, Dec. 2023, pp. 317–321