pith. machine review for the scientific record. sign in

arxiv: 2605.12566 · v1 · submitted 2026-05-12 · 📡 eess.IV · cs.LG

Recognition: no theorem link

On Privacy-Preserving Image Transmission in Low-Altitude Networks: A Swin Transformer-Based Framework with Federated Learning

Dongwei Zhao, Kexin Zhang, Lixin Li, Rui Li, Wensheng Lin, Xin Zhang, Yuna Yan, Zhu Han

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:45 UTC · model grok-4.3

classification 📡 eess.IV cs.LG
keywords semantic communicationfederated learningSwin TransformerUAV image transmissionprivacy preservationlow-altitude networksPSNR improvementbandwidth-constrained transmission
0
0 comments X

The pith

A Swin Transformer semantic communication system with federated learning improves UAV image transmission quality by at least 5.7 dB PSNR while keeping raw data private.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a semantic communication framework called STSC for transmitting images from UAVs to ground stations under tight bandwidth constraints. It uses a Swin Transformer to extract multi-scale semantic features from images and combines this with federated learning so that models train across devices without any raw image data leaving the UAVs. Simulations on the CIFAR-10 dataset show the approach delivers higher reconstructed image quality than standard DeepJSCC methods and converges more reliably. The work targets practical low-altitude applications such as logistics and inspection where both bandwidth and privacy rules are strict.

Core claim

The STSC architecture extracts multi-scale semantic features via a Swin Transformer under bandwidth limits, pairs it with federated learning for distributed training without raw data exchange, and achieves at least 5.7 dB higher PSNR on CIFAR-10 reconstructions than DeepJSCC baselines while improving convergence and generalization.

What carries the argument

The Swin Transformer-based Semantic Communication (STSC) architecture, which extracts multi-scale semantic features from images for bandwidth-efficient transmission and integrates federated learning to train models across UAVs without sharing raw data.

If this is right

  • UAV image transmissions maintain higher visual quality despite severe bandwidth limits.
  • Raw images never leave the UAV, satisfying strict privacy rules for distributed operations.
  • Dedicated on-board nodes allow flexible real-time coverage without central data aggregation.
  • The model shows faster convergence and better generalization across different transmission scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same architecture could be adapted to transmit other sensor streams such as video or LiDAR from UAVs.
  • If the privacy mechanism scales, it might support multi-UAV swarms sharing semantic updates without a central server.
  • Variable real-world channel fading not present in CIFAR-10 simulations remains an open variable for deployment.

Load-bearing premise

That performance gains measured on CIFAR-10 under simulated conditions will hold for real UAV deployments facing actual bandwidth limits, channel noise, and privacy regulations.

What would settle it

A field experiment with actual UAVs sending real images over live wireless links that shows no PSNR gain or leaks raw data would disprove the central performance and privacy claims.

Figures

Figures reproduced from arXiv: 2605.12566 by Dongwei Zhao, Kexin Zhang, Lixin Li, Rui Li, Wensheng Lin, Xin Zhang, Yuna Yan, Zhu Han.

Figure 1
Figure 1. Figure 1: The overall architecture of the multi-user federated learning semantic communication system. channel, and a joint source-channel decoder. Each of these three components will be described in the following subsections, respectively. A. Joint Source-Channel Encoder As shown in Fig.1, when the transmitter sends an image, the joint source-channel encoder performs feature extraction. The aim is to focus on infor… view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of the proposed semantic communication system structure. diverse user data without requiring centralized data collec￾tion, thereby ensuring robust communication performance across varying channel conditions. A. Swin Transformer-based Semantic Communication Model Semantic Model. Fig.2 shows the semantic communica￾tion network structure based on Swin Transformer. In this paper, joint source-c… view at source ↗
Figure 3
Figure 3. Figure 3: PSNR performance comparison under different channel conditions. the proposed method against DeepJSCC, WITT, and conventional separation-based schemes (JPEG 2000 with capacity-achieving codes and JPEG 2000 + LDPC) under three distinct channel conditions. As shown in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: PSNR versus Es/N0 comparison between the global model and local models. The global model trained via federated learning achieves 2-3 dB PSNR gain over local models, demonstrating improved noise resilience and generalization through model aggregation. 0 10 20 30 40 50 60 Epoch 0 0.5 1 1.5 2 2.5 3 MSELoss 10-3 FL client0 client1 client2 (a) AWGN channel 0 10 20 30 40 50 60 Epoch 1 1.5 2 2.5 3 3.5 4 4.5 5 MSE… view at source ↗
Figure 5
Figure 5. Figure 5: Convergence results of the training MSE loss versus communication rounds (epochs), with the global and local models based on STSC at the left and DeepJSCC at the right. Compared to the global model of DeepJSCC, the federated convergence of the STSC model achieves a significantly faster processing speed and lower loss accuracy. only blocks of pixels, compared to the randomly selected original image in Fig.6… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of reconstructed images obtained by different methods. The STSC model demonstrates superiority over traditional communication algorithms by visually reconstructing images with higher fidelity. 0 10 20 30 40 50 60 Communication Rounds (Epochs) 0.5 1.5 2.5 3.5 4.5 5.5 6.5 Training Loss (MSE) 10-3 IID Non-IID-Mild ( =1.0) Non-IID-Severe ( =0.1) [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Convergence comparison under full participation (K = 3/3) and partial participation (K = 2/3) on the Rician channel (KR = 10) at SNR = 3 dB. from Gradients (DLG) method [35] as a baseline gradient inversion attack, which optimizes a randomly initialized dummy image over 300 L-BFGS iterations to match the gradient computed from the true input. We test both single-image gradients (batch size = 1, the most fa… view at source ↗
Figure 9
Figure 9. Figure 9: DLG gradient inversion results under the Rician channel (KR = 10, SNR = 3 dB). Top: original images. Middle: batch size = 1. Bottom: batch size = 32. All reconstructions are indistinguishable from random noise. TABLE II Inference-phase privacy evaluation under the Rician channel (K = 10, SNR = 3 dB). Method PSNR (dB) SSIM Legitimate decoder (upper bound) 28.4 0.882 Trained inversion network 22.2 0.629 Opti… view at source ↗
read the original abstract

The rapid development of low-altitude economy has driven the proliferation of Unmanned Aerial Vehicle (UAV) applications, including logistics, inspection, and emergency response. However, transmitting high-volume image data from UAVs to ground stations faces significant challenges due to limited bandwidth and stringent privacy requirements. To address these issues, a Semantic Communication (SC) framework based on Federated Learning (FL) is proposed for efficient and privacy-preserving image transmission. A Swin Transformer-based Semantic Communication (STSC) architecture is designed to extract multi-scale semantic features under constrained bandwidth conditions. Dedicated communication and computing nodes are deployed on UAVs to enhance real-time coverage and flexibility. Meanwhile, a FL mechanism enables global model training across distributed devices without sharing raw data, thus preserving user privacy. Simulation experiments conducted on the CIFAR-10 dataset demonstrate that the proposed STSC framework achieves at least 5.7 dB improvement in Peak Signal-to-Noise Ratio (PSNR) compared to DeepJSCC baselines, while also showing superior convergence and generalization performance. The framework effectively integrates UAV-assisted deployment with SC and privacy protection, offering a practical solution for bandwidth-constrained image transmission in low-altitude networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Swin Transformer-based Semantic Communication (STSC) framework integrated with Federated Learning (FL) for privacy-preserving image transmission in low-altitude UAV networks. It designs multi-scale semantic feature extraction under bandwidth constraints, deploys dedicated nodes on UAVs, and uses FL to train without sharing raw data. Simulations on CIFAR-10 are reported to yield at least 5.7 dB PSNR improvement over DeepJSCC baselines together with better convergence and generalization.

Significance. If the performance claims are robustly supported, the work could advance semantic communications for bandwidth-limited, privacy-sensitive UAV applications by combining transformer-based feature extraction with distributed training. The integration of SC and FL addresses a timely problem in low-altitude networks, though the simulation-only evidence on a standard image dataset restricts immediate claims about real-world UAV viability.

major comments (2)
  1. [Simulation Experiments] Simulation Experiments section: the headline claim of ≥5.7 dB PSNR gain over DeepJSCC is presented without an experimental protocol, baseline implementation details, error bars, or statistical tests, leaving the central performance result weakly supported.
  2. [Methods] Methods / Channel Model subsection: the simulations appear to rely on static or simplified channel models (e.g., AWGN) without explicit incorporation of UAV-specific impairments such as variable path loss, Doppler shifts, or interference; this undermines applicability to the stated low-altitude network setting.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'at least 5.7 dB' should be accompanied by the precise SNR, bandwidth, and model-size conditions under which the gain is measured.
  2. [STSC Architecture] Notation: the definition of semantic feature maps and the FL aggregation rule should be stated explicitly with equation numbers to avoid ambiguity when comparing to DeepJSCC.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment point by point below, indicating the revisions we will make to improve the manuscript.

read point-by-point responses
  1. Referee: [Simulation Experiments] Simulation Experiments section: the headline claim of ≥5.7 dB PSNR gain over DeepJSCC is presented without an experimental protocol, baseline implementation details, error bars, or statistical tests, leaving the central performance result weakly supported.

    Authors: We agree that additional details are required to robustly support the central performance claim. In the revised manuscript, we will expand the Simulation Experiments section with a complete experimental protocol (including hyperparameters, training schedules, and data splits), explicit implementation details for the DeepJSCC baselines, results reported with error bars from multiple independent runs, and statistical significance tests (e.g., paired t-tests) to validate the reported PSNR gains. revision: yes

  2. Referee: [Methods] Methods / Channel Model subsection: the simulations appear to rely on static or simplified channel models (e.g., AWGN) without explicit incorporation of UAV-specific impairments such as variable path loss, Doppler shifts, or interference; this undermines applicability to the stated low-altitude network setting.

    Authors: The current work uses an AWGN model as a controlled baseline to isolate the contributions of the Swin Transformer semantic extractor and federated learning under bandwidth limits. We acknowledge that this simplification limits direct applicability to real low-altitude UAV channels. In the revision, we will augment the Channel Model subsection with a discussion of UAV-specific impairments and include additional simulation results that incorporate standard models for path loss and Doppler shift. Full modeling of dynamic interference and hardware effects is noted as future work, as it would require specialized UAV channel datasets beyond the scope of this study. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical simulation results are independent of framework definition

full rationale

The paper defines an STSC architecture and FL mechanism, then reports separate simulation outcomes (PSNR gains on CIFAR-10) as measured performance. No equation reduces to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no self-citation chain supplies the central claim. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the STSC architecture is described as a designed system whose internal hyperparameters and convergence assumptions are not detailed.

pith-pipeline@v0.9.0 · 5537 in / 1250 out tokens · 61853 ms · 2026-05-14T20:45:25.535252+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Rethink- ing modern communication from semantic coding to semantic communication

    Lu K, Zhou Q, Li R, Zhao Z, Chen X, Wu J, et al. Rethink- ing modern communication from semantic coding to semantic communication. IEEE Wirel Commun 2023;30(1):158-64

  2. [2]

    Beyond transmitting bits: Context, semantics, and task-oriented communications

    Gündüz D, Qin Z, Aguerri IE, Dhillon HS, Yang Z, Yener A, et al. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE J Sel Areas Commun 2023;41(1):5-41

  3. [3]

    FSSC: Federated learning of transformer neural networks for semantic image communication

    Yan Y, Zhang X, Li L, Lin W, Li R, Cheng W, et al. FSSC: Federated learning of transformer neural networks for semantic image communication. GLOBECOM 2024 - 2024 IEEE Global Communications Conference; 2024 Dec 8-12; Cape Town, South Africa. 2024. p. 1659-64

  4. [4]

    FLSC-CI: Federated learning and semantic communication empowered multimodal terminal col- laborative inferencing framework for IoT businesses

    Xu S, Qi Y, Qi F, et al. FLSC-CI: Federated learning and semantic communication empowered multimodal terminal col- laborative inferencing framework for IoT businesses. IEEE Trans Netw Sci Eng 2026. Forthcoming

  5. [5]

    Federated learning based audio semantic communication over wireless net- works

    Tong H, Yang Z, Wang S, Hu Y, Saad W, Yin C. Federated learning based audio semantic communication over wireless net- works. GLOBECOM 2021 - 2021 IEEE Global Communications Conference; 2021 Dec 7-11; Madrid, Spain. 2021

  6. [6]

    Deep joint source- channel coding for wireless image transmission

    Bourtsoulatze E, Burth Kurka D, Gündüz D. Deep joint source- channel coding for wireless image transmission. IEEE Trans Cogn Commun Netw 2019;5(3):567-79

  7. [7]

    Wireless communications with unmanned aerial vehicles: Opportunities and challenges

    Zeng Y, Zhang R, Lim TJ. Wireless communications with unmanned aerial vehicles: Opportunities and challenges. IEEE Commun Mag 2016;54(5):36-42

  8. [8]

    Beyond Gaussian assump- tions: A general fractional HJB control framework for Lévy- driven heavy-tailed channels in 6G

    Li M, Li L, Lin W, Han Z, Basar T. Beyond Gaussian assump- tions: A general fractional HJB control framework for Lévy- driven heavy-tailed channels in 6G. IEEE Trans Wirel Commun 2026;25:7535-50

  9. [9]

    Reconfigurable intelligent surface equipped UA V in emergency wireless communications: A new fading-shadowing model and performance analysis

    Chen Y, Cheng W, Zhang W. Reconfigurable intelligent surface equipped UA V in emergency wireless communications: A new fading-shadowing model and performance analysis. IEEE Trans Commun 2024;72(3):1821-34

  10. [10]

    Timeliness optimization of unmanned aerial vehicle lossy communications for Internet-of- Things

    Lin W, Li L, Liu Y, He Y, Liu Y. Timeliness optimization of unmanned aerial vehicle lossy communications for Internet-of- Things. Chin J Aeronaut 2023;36(6):249-55

  11. [11]

    Optimal trajectory and downlink power control for multi-type UA V aerial base stations

    Li L, Sun Y, Cheng Q, Wang D, Lin W, Chen W. Optimal trajectory and downlink power control for multi-type UA V aerial base stations. Chin J Aeronaut 2021;34(9):11-23

  12. [12]

    Neuro-symbolic causal reasoning meets signaling game for emergent semantic communications

    Thomas CK, Saad W. Neuro-symbolic causal reasoning meets signaling game for emergent semantic communications. IEEE Trans Wirel Commun 2024;23(5):4546-63

  13. [13]

    Distributed user pairing and effective computation offloading in aerial edge networks

    Liang W, Wen S, Li L, Cui J, Fang F. Distributed user pairing and effective computation offloading in aerial edge networks. Chin J Aeronaut 2024;37(4):378-90

  14. [14]

    Low-end hand held communication devices in a post-disaster scenario

    Sarkar RR, Chakrabarty A, Rahman MZ. Low-end hand held communication devices in a post-disaster scenario. 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN); 2022 Dec 16-17; Al-Khobar, Saudi Arabia. 2022. p. 595-9

  15. [15]

    Digital- analog transmission based emergency semantic communica- tions

    Fu Y, Cheng W, Wang J, Yin L, Zhang W. Digital- analog transmission based emergency semantic communica- tions. arXiv:2501.01616. 2025

  16. [16]

    Optimal transport framework for ISAC in low-altitude networks: Joint resource allocation for cooperative communication and non- cooperative localization

    Zheng Y, Li L, Lin W, Liang W, Du Q, Han Z. Optimal transport framework for ISAC in low-altitude networks: Joint resource allocation for cooperative communication and non- cooperative localization. IEEE Trans Commun 2026;74:1984- 2000

  17. [17]

    Semantics-empowered communica- tion for networked intelligent systems

    Kountouris M, Pappas N. Semantics-empowered communica- tion for networked intelligent systems. IEEE Commun Mag 2021;59(6):96-102

  18. [18]

    Semantic communications for future Internet: Fundamen- tals, applications, and challenges

    Yang W, Du H, Liew ZQ, Lim WYB, Xiong Z, Niyato D. Semantic communications for future Internet: Fundamen- tals, applications, and challenges. IEEE Commun Surv Tutor 2023;25(1):213-50

  19. [19]

    Deep joint source-channel coding for wireless image transmission with adaptive rate control

    Yang M, Kim HS. Deep joint source-channel coding for wireless image transmission with adaptive rate control. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing; 2022 May 23-27; Singapore. 2022. p. 5193-7

  20. [20]

    Task-oriented multi- user semantic communications

    Xie H, Qin Z, Tao X, Letaief KB. Task-oriented multi- user semantic communications. IEEE J Sel Areas Commun 2022;40(9):2584-97

  21. [21]

    DeepJSCC-f: Deep joint source-channel coding of images with feedback

    Kurka DB, Gündüz D. DeepJSCC-f: Deep joint source-channel coding of images with feedback. IEEE J Sel Areas Inf Theory 2020;1(1):178-93

  22. [22]

    WITT: A wireless image transmission transformer for semantic communi- cations

    Yang K, Wang S, Dai J, Tan K, Niu K, Zhang P. WITT: A wireless image transmission transformer for semantic communi- cations. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing; 2023 Jun 4-10; Rhodes Island, Greece. 2023. p. 1-5

  23. [23]

    Demo: Real-time semantic communications with a vision transformer

    Yoo H, Jung T, Dai L, Kim S, Chae CB. Demo: Real-time semantic communications with a vision transformer. ICC 2022 - IEEE International Conference on Communications Workshops; 2022 May 16-20; Seoul, South Korea. 2022. p. 1-2

  24. [24]

    Semantic successive refinement: A generative AI-aided semantic communication framework

    Zhang K, Li L, Lin W, Yan Y, Li R, Cheng W, et al. Semantic successive refinement: A generative AI-aided semantic communication framework. IEEE Trans Cogn Commun Netw 2025;11(2):687-99

  25. [25]

    Adap- tive semantic generation and NOMA-based interference-aware conveying for 6G networks

    Yan Y, Li L, Zhang X, Lin W, Cheng W, Han Z. Adap- tive semantic generation and NOMA-based interference-aware conveying for 6G networks. IEEE Trans Wirel Commun 2025;24(3):2404-16

  26. [26]

    Federated learning in mobile edge net- works: A comprehensive survey

    Lim WYB, et al. Federated learning in mobile edge net- works: A comprehensive survey. IEEE Commun Surv Tutor 2020;22(3):2031-63

  27. [27]

    Joint client scheduling and wireless resource allocation for heterogeneous federated edge learning with non-IID data

    Yin T, Li L, Lin W, Ni T, Liu Y, Xu H, et al. Joint client scheduling and wireless resource allocation for heterogeneous federated edge learning with non-IID data. IEEE Trans Veh Technol 2024;73(4):5742-54

  28. [28]

    Toward energy- efficient multiple IRSs: Federated learning-based configuration optimization

    Li L, Ma D, Ren H, Wang P, Lin W, Han Z. Toward energy- efficient multiple IRSs: Federated learning-based configuration optimization. IEEE Trans Green Commun Netw 2022;6(2):755- 65

  29. [29]

    From semantic communication to semantic-aware networking: Model, architecture, and open problems

    Shi G, Xiao Y, Li Y, Xie X. From semantic communication to semantic-aware networking: Model, architecture, and open problems. IEEE Commun Mag 2021;59(8):44-50

  30. [30]

    Attention is all you need

    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS); 2017 Dec 4-9; Long Beach, CA. 2017

  31. [31]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. 2020

  32. [32]

    Swin Transformer: Hierarchical vision transformer using shifted windows

    Liu Z, et al. Swin Transformer: Hierarchical vision transformer using shifted windows. IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 11-17; Montreal, QC, Canada. 2021

  33. [33]

    Communication-efficient learning of deep networks from decen- tralized data

    McMahan HB, Moore E, Ramage D, Hampson S, Arcas BAy. Communication-efficient learning of deep networks from decen- tralized data. International Conference on Artificial Intelligence and Statistics (AISTATS); 2017 Apr 20-22; Fort Lauderdale, FL. 2017

  34. [34]

    MobileViT: Light-weight, general- purpose, and mobile-friendly vision transformer

    Mehta S, Rastegari M. MobileViT: Light-weight, general- purpose, and mobile-friendly vision transformer. International Conference on Learning Representations (ICLR); 2022 Apr 25-

  35. [35]

    Deep leakage from gradients

    Zhu L, Liu Z, Han S. Deep leakage from gradients. Advances in Neural Information Processing Systems (NeurIPS); 2019 Dec 8-14; Vancouver, Canada. 2019

  36. [36]

    GradViT: Gradient inversion of vision transformers

    Hatamizadeh A, Yin H, Roth H, Li W, Kautz J, Xu D, et al. GradViT: Gradient inversion of vision transformers. IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR); 2022 Jun 18-24; New Orleans, LA. 2022. p. 10011-20

  37. [37]

    The model inversion eavesdropping attack in semantic communication systems

    Chen Y, Guo Z, Liang Y. The model inversion eavesdropping attack in semantic communication systems. GLOBECOM 2023 - 2023 IEEE Global Communications Conference; 2023 Dec 4-8; Kuala Lumpur, Malaysia. 2023. p. 1-6

  38. [38]

    Deep learning with differential privacy

    Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, et al. Deep learning with differential privacy. Proceedings of the ACM SIGSAC Conference on Computer 14 and Communications Security (CCS); 2016 Oct 24-28; Vienna, Austria. 2016. p. 308-18

  39. [39]

    Practical secure aggregation for privacy- preserving machine learning

    Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, et al. Practical secure aggregation for privacy- preserving machine learning. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS); 2017 Oct 30 - Nov 3; Dallas, TX. 2017. p. 1175-91

  40. [40]

    Learning multiple layers of features from tiny images

    Krizhevsky A. Learning multiple layers of features from tiny images. Toronto: University of Toronto; 2009

  41. [41]

    Adam: A Method for Stochastic Optimization

    Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980. 2014