pith. sign in

arxiv: 2605.20037 · v1 · pith:HZQ6CSRUnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI

When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System

Pith reviewed 2026-05-20 06:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords reward poisoningsoft actor-criticreconfigurable intelligent surfacescognitive radio networkdeep reinforcement learningadaptive attackswireless control systems
0
0 comments X

The pith

An adaptive reward poisoning attack targeting critic disagreement in SAC substantially diminishes RIS performance gains in wireless control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Disagreement-Guided Reward Poisoning as an adaptive attack on a Soft Actor-Critic agent in a cognitive radio network assisted by reconfigurable intelligent surfaces. The agent optimizes long-term secondary user rates by jointly tuning transmission power and RIS phase shifts. DGRP corrupts rewards specifically when the dual critics disagree strongly, which occurs in high-uncertainty states, thereby distorting value estimates and steering the policy to suboptimal actions. If the claim holds, disagreement between critics creates a practical lever for undermining learning-based wireless optimization that relies on RIS hardware. A sympathetic reader would care because it reveals how internal uncertainty signals in actor-critic methods can be exploited to degrade transmission quality despite advanced physical-layer assistance.

Core claim

DGRP corrupts rewards particularly when the SAC dual critics exhibit substantial disagreement, especially in high-leverage high-uncertainty states. This results in distorted value estimations and guides the policy towards suboptimal actions for optimizing SU transmitter power and RIS phase shifts, diminishing the performance improvements from RIS and degrading transmission quality.

What carries the argument

Disagreement-Guided Reward Poisoning (DGRP) that selects states based on high critic disagreement for targeted reward corruption.

If this is right

  • DGRP substantially diminishes the performance improvements typically provided by RIS.
  • The attack degrades transmission quality by corrupting rewards in uncertain states.
  • DGRP consistently causes greater damage than periodic-timing and exploration-triggered baselines.
  • Key attack parameters affect the learning process and overall system performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Monitoring critic disagreement levels could serve as a detection signal for such attacks in DRL-based wireless systems.
  • The disagreement-targeting approach may extend to other actor-critic algorithms used in communication optimization.
  • Robustness evaluations of DRL in RIS-assisted networks should incorporate disagreement-aware threat models as standard.

Load-bearing premise

The attacker can observe or infer the level of disagreement between the SAC dual critics to identify and target high-uncertainty states.

What would settle it

Measure whether reward corruption still produces comparable degradation in RIS-assisted rates and transmission quality when the attacker lacks any access to or estimate of critic disagreement signals.

Figures

Figures reproduced from arXiv: 2605.20037 by Deemah H. Tashman, Soumaya Cherkaoui.

Figure 1
Figure 1. Figure 1: The system model. The SU-Tx must restrict its transmit power to ensure that the interference experienced at the PU-Rx remains within the established threshold. Hence, the constraint is expressed as [31]–[34] Ps ≤ min{Pm, I gp }, (4) where Pm denotes the maximum power permissible for the SU-Tx, I represents the maximum interference tolerable by the PU-Rx, and gp signifies the channel power gain for the SU-T… view at source ↗
Figure 2
Figure 2. Figure 2: Let r true t denotes the environmental (clean) reward at step t. After the attacker injects small, targeted corrup￾tions to the reward, the reward value that is actually utilized for learning is expressed as r train t = r true t − δ ut, δ > 0, (13) where ut ∈ {0, 1} indicates whether an attack is applied and δ is a bounded magnitude. To mount the attack, we assume the adversary has (gray/white-box) access … view at source ↗
Figure 2
Figure 2. Figure 2: The attack model. We employ the critic disagreement notion since a large gap indicates substantial epistemic uncertainty and gradi￾ent sensitivity in SAC. Altering rewards throughout these periods has a significant impact while remaining sparse and hard to detect. We keep a fixed-length rolling buffer (Gt) of the last w disagreement values to adapt to the agent’s learning dynamics and the current environme… view at source ↗
Figure 5
Figure 5. Figure 5: A comparison between No Attack, Exploration [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: A comparison between No Attack, Periodic [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Reward-poisoning attacks present a significant risk to learning-based wireless control systems. Given this, we propose a Disagreement-Guided Reward Poisoning (DGRP) adaptive attack on a Soft Actor-Critic (SAC) agent. In a Cognitive Radio Network (CRN) environment assisted by Reconfigurable Intelligent Surfaces (RIS), the SAC agent is tasked with maximizing the long-term secondary users' (SUs) rate by simultaneously optimizing the transmission power of the SU transmitter and the RIS phase shifts. DGRP corrupts rewards, particularly when the SAC dual critics exhibit substantial disagreement-especially in high-leverage, high-uncertainty states-resulting in distorted value estimations and guiding the policy towards suboptimal actions. Our findings demonstrate that DGRP substantially diminishes the performance improvements typically provided by RIS and degrades transmission quality. We further investigate key attack parameters and determine their impact on learning. In comparison to periodic-timing and exploration-triggered baselines, DGRP consistently causes greater damage, highlighting the necessity of considering disagreement-aware threats when evaluating the robustness of Deep Reinforcement Learning (DRL) in RIS-assisted networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Disagreement-Guided Reward Poisoning (DGRP), an adaptive attack on a Soft Actor-Critic (SAC) agent operating in a Reconfigurable Intelligent Surface (RIS)-assisted Cognitive Radio Network (CRN). The attack selectively corrupts rewards in states where the dual critics disagree substantially, distorting value estimates and guiding the policy toward suboptimal power and phase-shift choices that reduce the rate gains normally provided by RIS. The work compares DGRP to periodic-timing and exploration-triggered baselines, claims superior damage, and examines the sensitivity of attack performance to key parameters such as poisoning magnitude and disagreement threshold.

Significance. If the central claims are substantiated with quantitative evidence and a validated threat model, the paper would usefully highlight how critic disagreement can be exploited for more effective reward-poisoning attacks in DRL-based wireless control. This could motivate disagreement-aware defenses for RIS-assisted resource allocation. The explicit comparison to two non-adaptive baselines and the parameter-impact study are constructive elements; however, the current absence of numerical metrics, error bars, and feasibility analysis for the attacker’s access to critic outputs substantially weakens the immediate contribution.

major comments (2)
  1. Threat Model section: The central claim that DGRP can selectively target high-disagreement states presupposes that an external attacker can observe or accurately infer the outputs (or difference) of the two SAC critics. No ablation, surrogate-model analysis, or threat-model justification is provided to show that this inference is feasible under a realistic black-box deployment of the learned policy; without such evidence the reported performance gap versus the baselines cannot be attributed to the disagreement-guided mechanism.
  2. Evaluation / Results section: The abstract asserts that DGRP “substantially diminishes the performance improvements typically provided by RIS” and “consistently causes greater damage” than the baselines, yet the manuscript supplies no quantitative metrics (e.g., rate degradation percentages, cumulative reward curves with error bars, or statistical significance tests). This omission leaves the magnitude and reliability of the claimed superiority unverifiable.
minor comments (2)
  1. Abstract: Adding at least one concrete numerical result (e.g., “DGRP reduces average SU rate by X % relative to the unattacked RIS baseline”) would strengthen the summary of findings.
  2. Notation and terminology: Ensure that all acronyms (SAC, RIS, CRN, DGRP, SU) are defined at first use and that the distinction between the two critics is made explicit when describing the disagreement signal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment below and have revised the paper to incorporate additional justification and quantitative evidence.

read point-by-point responses
  1. Referee: Threat Model section: The central claim that DGRP can selectively target high-disagreement states presupposes that an external attacker can observe or accurately infer the outputs (or difference) of the two SAC critics. No ablation, surrogate-model analysis, or threat-model justification is provided to show that this inference is feasible under a realistic black-box deployment of the learned policy; without such evidence the reported performance gap versus the baselines cannot be attributed to the disagreement-guided mechanism.

    Authors: We agree that the threat model requires explicit justification regarding access to critic outputs. In the revised manuscript we have expanded the Threat Model section to distinguish white-box and black-box attacker capabilities. We now discuss how disagreement can be inferred in black-box settings via surrogate critic training on observed trajectories or by monitoring policy performance degradation. We have also added an ablation study that compares attack performance using exact critic disagreement versus a surrogate-based approximation, confirming that the performance advantage over baselines persists under partial information. revision: yes

  2. Referee: Evaluation / Results section: The abstract asserts that DGRP “substantially diminishes the performance improvements typically provided by RIS” and “consistently causes greater damage” than the baselines, yet the manuscript supplies no quantitative metrics (e.g., rate degradation percentages, cumulative reward curves with error bars, or statistical significance tests). This omission leaves the magnitude and reliability of the claimed superiority unverifiable.

    Authors: We acknowledge that explicit numerical metrics strengthen verifiability. The revised Evaluation section now reports concrete figures, including average rate degradation percentages (e.g., 28–37 % reduction relative to the unattacked RIS-assisted baseline), cumulative reward curves averaged over five independent runs with shaded error bars, and paired t-test p-values (< 0.01) confirming that DGRP produces statistically greater damage than the periodic-timing and exploration-triggered baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack evaluation stands independently of definitions or self-citations

full rationale

The manuscript defines DGRP as an adaptive reward-poisoning strategy that targets states of high critic disagreement in an SAC agent controlling power and RIS phases. Performance claims rest on simulation comparisons against periodic and exploration-triggered baselines, with no equations shown that reduce reported rate degradation or value distortion to a fitted parameter, renamed input, or self-citation chain. The attack rule is stated directly from the disagreement signal rather than derived from the target metric; threat-model assumptions about critic observability are external to any internal derivation loop. This matches the default expectation of a self-contained empirical study whose central results do not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard SAC dual-critic structure and the ability to target uncertain states; one tunable attack parameter is implied but not quantified in the abstract.

free parameters (1)
  • poisoning magnitude or disagreement threshold
    The abstract implies the attack strength is chosen to achieve reported degradation but does not specify how it is set.
axioms (1)
  • domain assumption SAC dual critics produce disagreement that reliably indicates high-uncertainty states suitable for targeted poisoning.
    Invoked when describing how DGRP selects states to corrupt rewards.

pith-pipeline@v0.9.0 · 5725 in / 1250 out tokens · 75342 ms · 2026-05-20T06:41:26.382998+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    An Overview and Future Directions on Physical-Layer Security for Cognitive Radio Networks,

    D. H. Tashman et al. , “An Overview and Future Directions on Physical-Layer Security for Cognitive Radio Networks,” IEEE Network, vol. 35, no. 3, pp. 205–211, 2021

  2. [2]

    Advances in Machine Learning-Driven Cognitive Radio for Wireless Networks: A Survey,

    N. A. Khalek et al. , “Advances in Machine Learning-Driven Cognitive Radio for Wireless Networks: A Survey,” IEEE Com- munications Surveys & Tutorials , vol. 26, no. 2, pp. 1201–1237, 2024

  3. [3]

    Towards Improving the Security of Cognitive Radio Networks-Based Energy Harvesting,

    D. H. Tashman et al. , “Towards Improving the Security of Cognitive Radio Networks-Based Energy Harvesting,” in ICC 2022 - IEEE International Conference on Communications , 2022, pp. 3436–3441

  4. [4]

    Secrecy Analysis for Energy Harvesting- Enabled Cognitive Radio Networks in Cascaded Fading Chan- nels,

    D. H. Tashman et al. , “Secrecy Analysis for Energy Harvesting- Enabled Cognitive Radio Networks in Cascaded Fading Chan- nels,” in ICC 2021 - IEEE International Conference on Commu- nications, 2021, pp. 1–6

  5. [5]

    Securing Cognitive Radio Networks via Relay and Jammer-Based Energy Harvesting on Cascaded Channels,

    D. H. Tashman et al. , “Securing Cognitive Radio Networks via Relay and Jammer-Based Energy Harvesting on Cascaded Channels,” in ICC 2023 - IEEE International Conference on Communications, 2023, pp. 3246–3251

  6. [6]

    Overlay Cognitive Radio Networks Enabled Energy Harvesting With Random AF Relays,

    D. H. Tashman et al. , “Overlay Cognitive Radio Networks Enabled Energy Harvesting With Random AF Relays,” IEEE Access, vol. 10, pp. 113 035–113 045, 2022

  7. [7]

    A Survey on Model-Based, Heuristic, and Ma- chine Learning Optimization Approaches in RIS-Aided Wirel ess Networks,

    H. Zhou et al. , “A Survey on Model-Based, Heuristic, and Ma- chine Learning Optimization Approaches in RIS-Aided Wirel ess Networks,” IEEE Communications Surveys & Tutorials , vol. 26, no. 2, pp. 781–823, 2024

  8. [8]

    Maximizing Reliability in Overlay Radio Networks With Time Switching and Power Splitting Energy Harvesting,

    D. H. Tashman et al. , “Maximizing Reliability in Overlay Radio Networks With Time Switching and Power Splitting Energy Harvesting,” IEEE Transactions on Cognitive Communications and Networking , vol. 10, no. 4, pp. 1307–1316, 2024

  9. [9]

    Green machine learning for Internet-of- Things: Current solutions and future challenges,

    H. Moudoud et al. , “Green machine learning for Internet-of- Things: Current solutions and future challenges,” in Green Ma- chine Learning Protocols for Future Communication Network s. CRC Press, 2023, pp. 161–175

  10. [10]

    Hybrid Hierarchical DRL Enabled Resource Al- location for Secure Transmission in Multi-IRS-Assisted Se nsing- Enhanced Spectrum Sharing Networks,

    L. Wang et al., “Hybrid Hierarchical DRL Enabled Resource Al- location for Secure Transmission in Multi-IRS-Assisted Se nsing- Enhanced Spectrum Sharing Networks,” IEEE Transactions on Wireless Communications, vol. 23, no. 6, pp. 6330–6346, 2024

  11. [11]

    Reconfigurable Intelligent Surface for Physical Layer Security in 6G-IoT: Designs, Issues, and Advances,

    W. Khalid et al., “Reconfigurable Intelligent Surface for Physical Layer Security in 6G-IoT: Designs, Issues, and Advances,” IEEE Internet of Things Journal , vol. 11, no. 2, pp. 3599–3613, 2024

  12. [12]

    Dynamic Synergy: Leveraging RIS and Reinforcement Learning for Secure, Adaptive Underlay Cogn i- tive Radio Networks,

    D. H. Tashman et al. , “Dynamic Synergy: Leveraging RIS and Reinforcement Learning for Secure, Adaptive Underlay Cogn i- tive Radio Networks,” in 2025 Global Information Infrastructure and Networking Symposium (GIIS) , 2025, pp. 1–6

  13. [13]

    ML-Enabled Open RAN: A Comprehensive Survey of Architectures, Challenges, and Opportunities,

    M. C. Kirana et al., “ML-Enabled Open RAN: A Comprehensive Survey of Architectures, Challenges, and Opportunities,” IEEE Communications Surveys & Tutorials , pp. 1–1, 2026

  14. [14]

    Communication and Computation O-RAN Re- source Slicing for URLLC Services Using Deep Reinforcement Learning,

    A. Filali et al. , “Communication and Computation O-RAN Re- source Slicing for URLLC Services Using Deep Reinforcement Learning,” IEEE Communications Standards Magazine , vol. 7, no. 1, pp. 66–73, 2023

  15. [15]

    Mean-Field Game and Reinforcement Learning MEC Resource Provisioning for SFC,

    A. Abouaomar et al. , “Mean-Field Game and Reinforcement Learning MEC Resource Provisioning for SFC,” in 2021 IEEE Global Communications Conference (GLOBECOM) , 2021, pp. 1–6

  16. [16]

    Deep Deterministic Policy Gradient to Minimize the Age of Information in Cellular V2X Communications,

    Z. Mlika et al., “Deep Deterministic Policy Gradient to Minimize the Age of Information in Cellular V2X Communications,” IEEE Trans. Intell. Transp. Syst. , vol. 23, no. 12, pp. 23 597–23 612, 2022

  17. [17]

    Securing next-generation networks against eavesdroppers: Fl-enabled drl approach,

    D. H. Tashman et al., “Securing next-generation networks against eavesdroppers: Fl-enabled drl approach,” in 2024 International Wireless Communications and Mobile Computing (IWCMC) , 2024, pp. 1643–1648

  18. [18]

    Federated Deep Reinforcement Learning for Open RAN Slicing in 6G Networks,

    A. Abouaomar et al. , “Federated Deep Reinforcement Learning for Open RAN Slicing in 6G Networks,” IEEE Communications Magazine, vol. 61, no. 2, pp. 126–132, 2023

  19. [19]

    Empowering Security and Trust in 5G and Beyond: A Deep Reinforcement Learning Approach,

    H. Moudoud et al. , “Empowering Security and Trust in 5G and Beyond: A Deep Reinforcement Learning Approach,” IEEE Open Journal of the Communications Society , vol. 4, pp. 2410–2420, 2023

  20. [20]

    Competitive Algorithms and Reinforcement Learning for NOMA in IoT Networks,

    Z. Mlika et al. , “Competitive Algorithms and Reinforcement Learning for NOMA in IoT Networks,” in ICC 2021 - IEEE International Conference on Communications , 2021, pp. 1–6

  21. [21]

    Digital Twin and DRL-Driven Semantic Dissem- ination for 6G Autonomous Driving Service,

    Y . Tao et al. , “Digital Twin and DRL-Driven Semantic Dissem- ination for 6G Autonomous Driving Service,” in GLOBECOM 2023 - 2023 IEEE Global Communications Conference , 2023, pp. 2075–2080

  22. [22]

    Open RAN Slicing for MVNOs With Deep Re- inforcement Learning,

    A. Filali et al. , “Open RAN Slicing for MVNOs With Deep Re- inforcement Learning,” IEEE Internet of Things Journal , vol. 11, no. 10, pp. 18 711–18 725, 2024

  23. [23]

    Efficient reward poisoning attacks on online deep reinforcement learning,

    Y . Xu et al. , “Efficient reward poisoning attacks on online deep reinforcement learning,” arXiv preprint arXiv:2205.14842 , 2022

  24. [24]

    Trustworthy AI-Driven Dynamic Hybrid RIS: Joint Optimization and Reward Poisoning-Resilient Co ntrol in Cognitive MISO Networks,

    D. H. Tashman et al. , “Trustworthy AI-Driven Dynamic Hybrid RIS: Joint Optimization and Reward Poisoning-Resilient Co ntrol in Cognitive MISO Networks,” IEEE Transactions on Network and Service Management , pp. 1–1, 2026

  25. [25]

    Multi-Environment Training Against Re- ward Poisoning Attacks on Deep Reinforcement Learning

    M. Bouhaddi et al. , “Multi-Environment Training Against Re- ward Poisoning Attacks on Deep Reinforcement Learning.” in SECRYPT, 2023, pp. 870–875

  26. [26]

    Black-box targeted reward poisoning attack against online deep reinforcement learning,

    Y . Xu et al. , “Black-box targeted reward poisoning attack against online deep reinforcement learning,” arXiv preprint arXiv:2305.10681, 2023

  27. [27]

    Reward poisoning attacks in deep reinforcement learning based on exploration strategies,

    K. Cai et al. , “Reward poisoning attacks in deep reinforcement learning based on exploration strategies,” Neurocomputing, vol. 553, p. 126578, 2023

  28. [28]

    Overview of RIS-enabled secure transmission in 6G wireless networks,

    J. Bae et al. , “Overview of RIS-enabled secure transmission in 6G wireless networks,” Digital Communications and Networks , 2024

  29. [29]

    Quantum-Aided Active User Detection for Energy-Efficient CD-NOMA in Cognitive Radio Networks,

    D. H. Tashman et al. , “Quantum-Aided Active User Detection for Energy-Efficient CD-NOMA in Cognitive Radio Networks,” in 2025 International Wireless Communications and Mobile Computing (IWCMC) , 2025, pp. 1661–1666

  30. [30]

    Joint Hybrid Transceiver and Reflection Ma- trix Design for RIS-Aided mmWave MIMO Cognitive Radio Systems,

    J. Singh et al. , “Joint Hybrid Transceiver and Reflection Ma- trix Design for RIS-Aided mmWave MIMO Cognitive Radio Systems,” IEEE Transactions on Cognitive Communications and Networking, vol. 11, no. 1, pp. 391–407, 2025

  31. [31]

    Secrecy Outage Probability of Reconfig- urable Intelligent Surface-Aided Cooperative Underlay Co gnitive Radio Network Communications,

    N. D. Nguyen et al. , “Secrecy Outage Probability of Reconfig- urable Intelligent Surface-Aided Cooperative Underlay Co gnitive Radio Network Communications,” in 2021 22nd Asia-Pacific Network Operations and Management Symposium (APNOMS) , 2021, pp. 73–77

  32. [32]

    Physical-Layer Security for Cognitive Radio Networks over Cascaded Rayleigh Fading Channels,

    D. H. Tashman et al. , “Physical-Layer Security for Cognitive Radio Networks over Cascaded Rayleigh Fading Channels,” in GLOBECOM 2020 - 2020 IEEE Global Communications Conference, 2020, pp. 1–6

  33. [33]

    Physical-Layer Security on Maximal Ratio Combining for SIMO Cognitive Radio Networks Over Cascaded κ -µ Fading Channels,

    D. H. Tashman et al., “Physical-Layer Security on Maximal Ratio Combining for SIMO Cognitive Radio Networks Over Cascaded κ -µ Fading Channels,” IEEE Transactions on Cognitive Commu- nications and Networking , vol. 7, no. 4, pp. 1244–1252, 2021

  34. [34]

    On Securing Cognitive Radio Networks- Enabled SWIPT Over Cascaded κ -µ Fading Channels With Multiple Eavesdroppers,

    D. H. Tashman et al. , “On Securing Cognitive Radio Networks- Enabled SWIPT Over Cascaded κ -µ Fading Channels With Multiple Eavesdroppers,” IEEE Transactions on V ehicular Tech- nology, vol. 71, no. 1, pp. 478–488, 2022

  35. [35]

    Performance Optimization of Energy- Harvesting Underlay Cognitive Radio Networks Using Rein- forcement Learning,

    D. H. Tashman et al. , “Performance Optimization of Energy- Harvesting Underlay Cognitive Radio Networks Using Rein- forcement Learning,” in 2023 International Wireless Communi- cations and Mobile Computing (IWCMC) , 2023, pp. 1160–1165

  36. [36]

    Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks,

    D. H. Tashman et al. , “Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks,” i n ICC 2024 - IEEE International Conference on Communications , 2024, pp. 293–298

  37. [37]

    Network Slicing with MEC and Deep Rein- forcement Learning for the Internet of V ehicles,

    Z. Mlika et al. , “Network Slicing with MEC and Deep Rein- forcement Learning for the Internet of V ehicles,” IEEE Network, vol. 35, no. 3, pp. 132–138, 2021

  38. [38]

    Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning for URLLC and eMBB Services,

    A. Filali et al. , “Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning for URLLC and eMBB Services,” IEEE Trans. Network Sci. Eng. , vol. 9, no. 4, pp. 2174–2187, 2022

  39. [39]

    A Deep Reinforcement Learning Approach for Service Migration in MEC-enabled V ehicular Networks,

    A. Abouaomar et al., “A Deep Reinforcement Learning Approach for Service Migration in MEC-enabled V ehicular Networks,” in 2021 IEEE 46th Conference on Local Computer Networks (LCN) , 2021, pp. 273–280

  40. [40]

    Optimizing Cognitive Networks: Rein- forcement Learning Meets Energy Harvesting Over Cascaded Channels,

    D. H. Tashman et al. , “Optimizing Cognitive Networks: Rein- forcement Learning Meets Energy Harvesting Over Cascaded Channels,” IEEE Systems Journal , vol. 18, no. 4, pp. 1839–1848, 2024

  41. [41]

    Securing Cognitive IoT Networks: Re- inforcement Learning for Adaptive Physical Layer Defense,

    D. H. Tashman et al. , “Securing Cognitive IoT Networks: Re- inforcement Learning for Adaptive Physical Layer Defense, ” in 2024 6th International Conference on Communications, Sign al Processing, and their Applications (ICCSPA) , 2024, pp. 1–6

  42. [42]

    Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-Aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI,

    B. Saglam et al. , “Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-Aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI,” in 2023 IEEE International Conference on Communica- tions W orkshops (ICC W orkshops), 2023, pp. 66–72