pith. sign in

arxiv: 2602.14191 · v2 · pith:7JEYI2EXnew · submitted 2026-02-15 · 📡 eess.SP

Robust SAC-Enabled UAV-RIS Assisted Secure MISO Systems With Untrusted EH Receivers

Pith reviewed 2026-05-21 12:47 UTC · model grok-4.3

classification 📡 eess.SP
keywords UAV-RISsecure communicationssoft actor-criticsecrecy energy efficiencyimperfect CSIuntrusted receiversMISO systemsreinforcement learning
0
0 comments X

The pith

Soft actor-critic optimization maximizes worst-case secrecy energy efficiency in UAV-RIS secure MISO systems with imperfect CSI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses secure downlink transmission in UAV-assisted RIS-enabled multiuser MISO systems with untrusted energy-harvesting receivers under imperfect channel state information. The objective is to maximize the worst-case secrecy energy efficiency by jointly optimizing UAV location, transmit power allocation, and discrete RIS phase shifts. Two approaches are proposed: a block coordinate descent with successive convex approximation as a benchmark, and a tailored soft actor-critic framework for the general problem. Simulation results demonstrate that the SAC method outperforms conventional optimization and other DRL benchmarks such as DDPG and TD3 while showing robustness to CSI uncertainty.

Core claim

In this work, the authors propose a soft actor-critic reinforcement learning method to solve the highly non-convex worst-case secrecy energy efficiency maximization problem in a secure UAV-RIS assisted multiuser MISO system with untrusted energy harvesting receivers, where the UAV location, power allocation, and discrete RIS phase shifts are jointly designed under imperfect CSI, and demonstrate through simulations its consistent outperformance over BCD-SCA and DRL alternatives with stable performance.

What carries the argument

A tailored soft actor-critic (SAC) framework that learns a stochastic policy for joint continuous and discrete action selection to maximize the objective while handling CSI uncertainty.

If this is right

  • The SAC policy enables faster decision making for UAV-RIS configuration compared to iterative optimization methods.
  • Higher secrecy energy efficiency can be achieved in practical scenarios with CSI errors.
  • The system maintains performance stability when varying the number of users or RIS elements.
  • Robustness allows deployment in environments with uncertain channel conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This RL-based design could be extended to scenarios with mobile UAVs rather than fixed hovering positions.
  • Potential integration with other emerging technologies like terahertz communications for enhanced security.
  • Future work might explore transfer learning to adapt the policy to new system configurations quickly.

Load-bearing premise

The simulation scenarios and training procedures used for the SAC agent are representative of practical performance and that the learned policy generalizes beyond the specific training distributions without overfitting to the chosen channel models or uncertainty bounds.

What would settle it

Deployment of the trained SAC policy in a physical testbed with real UAV and RIS hardware under actual wireless channel conditions with varying CSI uncertainty; superior performance over BCD-SCA in terms of measured secrecy energy efficiency would support the claim, while inferior performance would falsify it.

Figures

Figures reproduced from arXiv: 2602.14191 by Duy H. N. Nguyen, Hamid Reza Hashempour, Hien Quoc Ngo, Le-Nam Tran.

Figure 1
Figure 1. Figure 1: System model of the UAV-assisted RIS-aided secure downlink [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SAC framework for joint power allocation, RIS phase optimization, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: SEE versus transmit power for SCA, DDPG, and the proposed SAC [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence behavior under the ideal setting at [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence comparison under ICSI with discrete RIS phases ( [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Secure downlink transmission in UAV-assisted reconfigurable intelligent surface (RIS)-enabled multiuser MISO systems is challenging due to imperfect channel state information (CSI), untrusted energy-harvesting receivers (UEHRs), and the strong coupling among UAV deployment, transmit power control, and RIS configuration. In this paper, we study a secure UAV-assisted RIS-enabled multiuser MISO system with UEHRs, where a hovering UAV-mounted RIS is jointly optimized in terms of its location, transmit power allocation, and discrete RIS phase shifts. The objective is to maximize the worst-case secrecy energy efficiency (WCSEE) under imperfect CSI and practical discrete phase-shift constraints. The resulting problem is highly nonconvex due to the fractional objective, coupled design variables, discrete phase shifts, and CSI uncertainty. To address these challenges, we propose two complementary approaches. First, a block coordinate descent (BCD) framework combined with successive convex approximation (SCA) is developed to solve a secrecy energy efficiency (SEE) formulation, serving as a structured model-based benchmark. Second, for the more general WCSEE problem, we propose a tailored soft actor-critic (SAC) framework that captures the coupling among variables and avoids repeated iterative optimization. Simulation results show that the proposed SAC method consistently outperforms conventional optimization and deep reinforcement learning (DRL)-based benchmarks, including deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), while maintaining robustness to CSI uncertainty and stable performance across system configurations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript studies secure downlink transmission in a UAV-mounted RIS assisted multiuser MISO system with untrusted energy-harvesting receivers under imperfect CSI. It formulates a worst-case secrecy energy efficiency (WCSEE) maximization problem jointly optimizing UAV location, power allocation, and discrete RIS phase shifts. Two methods are proposed: a BCD-SCA benchmark for a related SEE problem and a tailored soft actor-critic (SAC) reinforcement learning framework for the full WCSEE problem. Simulations claim that SAC consistently outperforms DDPG, TD3, and conventional optimization baselines while remaining robust to CSI uncertainty.

Significance. If the performance claims hold under proper statistical validation, the work offers a scalable RL alternative to iterative convex optimization for non-convex, mixed discrete-continuous problems with uncertainty in UAV-RIS secure communications, which is relevant for practical 6G physical-layer security designs.

major comments (3)
  1. [§V] §V (Simulation Results): The reported outperformance of SAC over BCD+SCA, DDPG, and TD3 is presented without standard deviations, confidence intervals, or results aggregated over multiple independent training seeds. This omission prevents assessment of whether the gains are statistically reliable or sensitive to random initialization.
  2. [§V] §V (Simulation Results): All evaluated channel realizations and uncertainty bounds match the training distribution (Rician fading with fixed error variance and discrete phases from the same set). No distribution-shift experiments or out-of-sample CSI tests are included, weakening the robustness-to-CSI-uncertainty claim.
  3. [§IV] §IV (SAC Formulation): The state-action-reward design for the WCSEE objective is described at a high level; it is unclear how the worst-case secrecy rate under bounded CSI errors is encoded in the reward without introducing excessive conservatism or requiring additional inner optimization loops.
minor comments (2)
  1. [§II] Notation for the uncertainty set and the discrete phase-shift constraint should be introduced earlier and used consistently in the problem formulation.
  2. [§V] Figure captions for the convergence and performance plots would benefit from explicit mention of the number of Monte-Carlo trials and the exact parameter settings used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate to strengthen the presentation and claims.

read point-by-point responses
  1. Referee: [§V] §V (Simulation Results): The reported outperformance of SAC over BCD+SCA, DDPG, and TD3 is presented without standard deviations, confidence intervals, or results aggregated over multiple independent training seeds. This omission prevents assessment of whether the gains are statistically reliable or sensitive to random initialization.

    Authors: We agree that the current results would benefit from explicit statistical validation. In the revised manuscript, we will report performance metrics averaged over at least 10 independent training seeds with different random initializations, including standard deviations and 95% confidence intervals in the figures and tables of Section V. This will allow readers to evaluate the reliability of the observed gains. revision: yes

  2. Referee: [§V] §V (Simulation Results): All evaluated channel realizations and uncertainty bounds match the training distribution (Rician fading with fixed error variance and discrete phases from the same set). No distribution-shift experiments or out-of-sample CSI tests are included, weakening the robustness-to-CSI-uncertainty claim.

    Authors: We acknowledge that the primary simulation setup uses channel realizations drawn from the same distribution employed during training. To better substantiate the robustness claim, the revised version will incorporate additional experiments with distribution shifts, such as varying Rician K-factors, increased CSI error bounds beyond the training range, and altered phase-shift quantization levels. These will be presented in an extended subsection of Section V. revision: yes

  3. Referee: [§IV] §IV (SAC Formulation): The state-action-reward design for the WCSEE objective is described at a high level; it is unclear how the worst-case secrecy rate under bounded CSI errors is encoded in the reward without introducing excessive conservatism or requiring additional inner optimization loops.

    Authors: We appreciate this request for clarification. In the revised manuscript, we will expand Section IV with a detailed description of the reward function. The worst-case secrecy rate is incorporated by evaluating a conservative lower bound on the secrecy rate using the bounded CSI error model (worst-case legitimate channel gain and best-case eavesdropper gains within the uncertainty set) directly in the reward computation. This is achieved via closed-form approximations derived from the uncertainty bounds, avoiding nested optimization loops while controlling conservatism through a tunable robustness parameter. The explicit reward expression and design rationale will be added. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on simulation comparisons without reduction to fitted inputs or self-citations

full rationale

The paper presents a BCD+SCA benchmark and a SAC RL framework for WCSEE maximization under CSI uncertainty and discrete phases. The central results are empirical outperformance in simulations against DDPG, TD3, and conventional methods. No equations or sections reduce a claimed prediction to a fitted parameter by construction, nor do they rely on load-bearing self-citations or imported uniqueness theorems. The derivation chain for the optimization problem and the RL policy is self-contained against external benchmarks, with performance evaluated on standard Rician fading and uncertainty models.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all modeling assumptions remain implicit.

pith-pipeline@v0.9.0 · 5816 in / 1003 out tokens · 28579 ms · 2026-05-21T12:47:06.105136+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    A Survey on Reconfigurable Intelligent Surface for Physical Layer Security of Next-Generation Wireless Communications,

    R. Kaur, et al., “A Survey on Reconfigurable Intelligent Surface for Physical Layer Security of Next-Generation Wireless Communications,” IEEE Open J. Veh. Technol., vol. 5, pp. 172-199, 2024

  2. [2]

    Deep Learning for Secure UA V-Assisted RIS Communication Networks,

    U. A. Mughal, et al., “Deep Learning for Secure UA V-Assisted RIS Communication Networks,”IEEE Internet Things Mag., vol. 7, no. 2, pp. 38-44, March 2024

  3. [3]

    Robust Secure UA V Communications With the Aid of Reconfigurable Intelligent Surfaces,

    S. Li, et al., “Robust Secure UA V Communications With the Aid of Reconfigurable Intelligent Surfaces,”IEEE Trans. Wirel. Commun., vol. 20, no. 10, pp. 6402-6417, Oct. 2021

  4. [4]

    RIS-Assisted Secure UA V Communication Scheme Against Active Jamming and Passive Eavesdropping,

    Y . Shang, et al., “RIS-Assisted Secure UA V Communication Scheme Against Active Jamming and Passive Eavesdropping,”IEEE Trans. Intell. Transp. Syst., vol. 25, no. 11, pp. 16953-16963, Nov. 2024

  5. [5]

    Performance Analysis of RIS-Assisted Wireless Communications With Energy Harvesting,

    B. Zhang, et al., “Performance Analysis of RIS-Assisted Wireless Communications With Energy Harvesting,”IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 1325-1330, Jan. 2023

  6. [6]

    Phase-Shift and Transmit Power Optimization for RIS-Aided Massive MIMO SWIPT IoT Networks,

    M. Mohammadi, H. Q. Ngo and M. Matthaiou, “Phase-Shift and Transmit Power Optimization for RIS-Aided Massive MIMO SWIPT IoT Networks,”IEEE Trans. Communs, vol. 73, no. 1, pp. 631-647, Jan. 2025

  7. [7]

    Secrecy Energy Efficiency in Full-Duplex AF Relay Systems With Untrusted Energy Harvesters,

    J. Ouyang, et al., “Secrecy Energy Efficiency in Full-Duplex AF Relay Systems With Untrusted Energy Harvesters,”IEEE Commun. Lett., vol. 25, no. 11, pp. 3493-3497, Nov. 2021

  8. [8]

    Secure SWIPT in the Multiuser STAR-RIS Aided MISO Rate Splitting Downlink,

    H. R. Hashempour et al., “Secure SWIPT in the Multiuser STAR-RIS Aided MISO Rate Splitting Downlink,”IEEE Trans. Veh. Technol., vol. 73, no. 9, pp. 13466-13481, Sept. 2024

  9. [9]

    Enhancing secrecy energy efficiency in UA V-RIS assisted mobile IoV networks through deep reinforcement learning,

    J. Li, D. Wang, H. Zhao, Y . Jin, and Y . He, “Enhancing secrecy energy efficiency in UA V-RIS assisted mobile IoV networks through deep reinforcement learning,”IEEE Trans. Wirel. Commun., doi: 10.1109/TWC.2025.3594691

  10. [10]

    Movable antenna SWIPT systems with STAR-RIS: A meta deep reinforcement learning approach,

    M. Amiri, A. Mohammadzadeh, F. Zeinali, M. R. Mili, M. B. Mashhadi, and P. Xiao, “Movable antenna SWIPT systems with STAR-RIS: A meta deep reinforcement learning approach,”IEEE Trans. Veh. Technol., doi: 10.1109/TVT.2025.3622305

  11. [11]

    Optimiza- tion and DRL-based joint beamforming design for active-RIS enabled cognitive multicast systems,

    C. Luo, W. Jiang, D. Niyato, Z. Ding, J. Li, and Z. Xiong, “Optimiza- tion and DRL-based joint beamforming design for active-RIS enabled cognitive multicast systems,”IEEE Trans. Wirel. Commun., vol. 23, no. 11, pp. 16234-16247, Nov. 2024

  12. [12]

    Active RIS-aided EH-NOMA networks: A deep reinforcement learning approach,

    Z. Shi, H. Lu, X. Xie, H. Yang, C. Huang, J. Cai, and Z. Ding, “Active RIS-aided EH-NOMA networks: A deep reinforcement learning approach,”IEEE Trans. Communs, vol. 71, no. 10, pp. 5846-5861, Oct. 2023

  13. [13]

    Deep Reinforcement Learning for Secrecy Energy Efficiency Maximization in RIS-Assisted Networks,

    Y . Zhang, et al., “Deep Reinforcement Learning for Secrecy Energy Efficiency Maximization in RIS-Assisted Networks,”IEEE Trans. Veh. Technol., vol. 72, no. 9, pp. 12413-12418, Sept. 2023

  14. [14]

    DRL-based physical-layer security opti- mization in near-field MIMO systems,

    M. M. Razaq and L. Peng, “DRL-based physical-layer security opti- mization in near-field MIMO systems,”IEEE Internet Things J., vol. 12, no. 12, pp. 18606-18615, 15 June15, 2025

  15. [15]

    Deep reinforcement learning for energy efficiency maximization in RSMA- IRS-assisted ISAC systems,

    Z. Ma, R. Zhang, B. Ai, Z. Lian, L. Zeng, D. Niyato, and Y . Peng, “Deep reinforcement learning for energy efficiency maximization in RSMA- IRS-assisted ISAC systems,”IEEE Trans. Wireless Commun., vol. 74, no. 11, pp. 18273-18278, Nov. 2025

  16. [16]

    Robust secure beamforming design for two-user downlink MISO rate-splitting systems,

    H. Fu, S. Feng, W. Tang, and D. W. K. Ng, “Robust secure beamforming design for two-user downlink MISO rate-splitting systems,”IEEE Trans. Wirel. Commun., vol. 19, no. 12, pp. 8351-8365, Dec. 2020

  17. [17]

    Robust resource allocation for MIMO wireless powered communication networks based on a non-linear EH model,

    E. Boshkovska, D. W. K. Ng, N. Zlatanov, A. Koelpin and R. Schober, “Robust resource allocation for MIMO wireless powered communication networks based on a non-linear EH model,”IEEE Trans. Communs, vol. 65, no. 5, pp. 1984-1999, May 2017

  18. [18]

    Robust downlink beam- forming in multiuser MISO cognitive radio networks with imperfect channel-state information,

    E. A. Gharavol, Y . Liang and K. Mouthaan, “Robust downlink beam- forming in multiuser MISO cognitive radio networks with imperfect channel-state information,”IEEE Trans. Veh. Technol., vol. 59, no. 6, pp. 2852-2860, July 2010

  19. [19]

    Practical Non-Linear Energy Harvesting Model and Resource Allocation for SWIPT Systems,

    E. Boshkovska, et al., “Practical Non-Linear Energy Harvesting Model and Resource Allocation for SWIPT Systems,”IEEE Commun. Lett., vol. 19, no. 12, pp. 2082-2085, Dec. 2015

  20. [20]

    Energy Efficiency in Cell-Free Massive MIMO with Zero-Forcing Precoding Design,

    L. D. Nguyen, et al., “Energy Efficiency in Cell-Free Massive MIMO with Zero-Forcing Precoding Design,”IEEE Commun. Lett., vol. 21, no. 8, pp. 1871-1874, Aug. 2017

  21. [21]

    Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,”Proc. IEEE Int. Conf. Mach. Learn., 2018, pp. 1861–1870

  22. [22]

    Soft Actor-Critic Algorithms and Applications

    T. Haarnoja et al., “Soft actor-critic algorithms and applications,” 2018, arXiv:1812.05905.[Online]. Available: http://arxiv.org/abs/1812.05905

  23. [23]

    On nonlinear fractional programming,

    W. Dinkelbach, “On nonlinear fractional programming,”Manage. Sci., vol. 13, no. 7, pp. 492–498, Mar. 1967

  24. [24]

    A Novel SCA-Based Method for Beamforming Optimization in IRS/RIS-Assisted MU-MISO Downlink,

    V . Kumar, R. Zhang, M. D. Renzo, and L.-N. Tran, “A Novel SCA-Based Method for Beamforming Optimization in IRS/RIS-Assisted MU-MISO Downlink,”IEEE Wirel. Commun. Lett., vol. 12, no. 2, pp. 297–301, Feb. 2023

  25. [25]

    Joint Trajectory and Passive Beamforming Design for Secure UA V Networks with RIS,

    H. Long et al., “Joint Trajectory and Passive Beamforming Design for Secure UA V Networks with RIS,”2020 IEEE Globecom Workshops (GC Wkshps, Taipei, Taiwan, 2020, pp. 1-6