pith. sign in

arxiv: 2605.17266 · v1 · pith:JZXSMH5Snew · submitted 2026-05-17 · 📡 eess.SP

Leveraging Deep Reinforcement Learning for Clustered Cell-Free Networking Over User Mobility

Pith reviewed 2026-05-19 23:22 UTC · model grok-4.3

classification 📡 eess.SP
keywords clustered cell-free networkingdeep reinforcement learninguser mobilitynetwork clusteringhandover reductionDDPGcell-free massive MIMOdynamic networks
0
0 comments X p. Extension
pith:JZXSMH5S Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{JZXSMH5S}

Prints a linked pith:JZXSMH5S badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Deep reinforcement learning partitions cell-free networks into clusters using only one channel estimate per access point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a deep deterministic policy gradient framework can handle the clustering of access points in cell-free networks more efficiently than traditional methods, particularly when users are moving. Traditional approaches require measuring channels between every user and every access point, which becomes costly and slow to update as people move around. The new method instead feeds the neural network with just a single channel estimate from each access point, allowing quick re-clustering and lower overhead. This framework can be adjusted for different goals such as maximizing data rates or minimizing power use, and it performs better than previous techniques while cutting down on handovers caused by mobility and staying effective when users enter or exit the network.

Core claim

The authors introduce the DDPG-C²F framework based on deep deterministic policy gradient that learns to partition the cell-free network into non-overlapping subnetworks for joint transmission. It takes as input only one channel estimate per access point rather than the full channel matrix, which reduces measurement and computational costs. The framework is demonstrated to adapt to multiple clustered cell-free problems with varying objectives and constraints, to reduce handover costs under user mobility, and to remain robust in scenarios where users randomly join or leave.

What carries the argument

The DDPG-C²F framework, which uses a deep deterministic policy gradient agent to select network partitions from single per-access-point channel estimates as state input.

If this is right

  • The framework lowers channel measurement costs substantially by needing only one estimate per access point.
  • It enables faster adaptation to user movements, reducing the frequency and cost of handovers.
  • The same trained structure applies to different optimization targets and constraints without major redesign.
  • It maintains performance when the number of active users changes dynamically.
  • It achieves better results than clustering algorithms, graph partitioning, and conventional optimization in simulated scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If single channel estimates prove sufficient, reinforcement learning could replace full CSI requirements in other network management tasks.
  • The method suggests a path to scaling cell-free systems by reducing pilot overhead for clustering decisions.
  • Further work could test whether policies learned in simulation transfer to real-world mobility without additional fine-tuning.

Load-bearing premise

A single channel estimate per access point supplies enough state information for the neural network to generate effective clustering decisions that work across varying network sizes and real mobility patterns.

What would settle it

Running experiments where full channel information is used for clustering and comparing the resulting sum-rate or handover rates against the single-estimate version under the same mobility traces; if the single-estimate version shows large performance gaps, the cost-saving claim would not hold.

Figures

Figures reproduced from arXiv: 2605.17266 by Antonio P\'erez Yuste, Bo Qian, Junyuan Wang, Ouyang Zhou, Yusheng Ji.

Figure 1
Figure 1. Figure 1: Graphical illustration of clustered cell-free networking. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detailed architecture of the proposed DDPG-C [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Balance-aware sum rate Rρ during the training process in the case of joint optimization of sum rate and balance of subnetworks. K = 50. L = 100. M = 5. Vmax = 5m/s. • User-centric clustering [15]: The user-centric benchmark in [15] first clusters users with K-means algorithm, and then assign each AP to the closest user cluster. • Graph partitioning [12]: The graph partitioning bench￾mark in [12] first merg… view at source ↗
Figure 4
Figure 4. Figure 4: Clustered cell-free networking results of a random network snapshot with the proposed DDPG-C [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Balance-aware sum rate Rρ with the proposed DDPG-C2F framework and the benchmarks in the case of joint optimization of sum rate and balance of subnetworks. By default, K = 50, L = 100, M = 5 and Vmax = 5m/s. 3 5 7 M 0.0 0.2 0.4 0.6 0.8 1.0 DDPG-C 2F AP-centric clustering User-centric clustering Graph partitioning (a) 3 5 7 M 0 300 600 900 1200 1500 C m a x DDPG-C 2F AP-centric clustering User-centric clust… view at source ↗
Figure 6
Figure 6. Figure 6: (a) Balance of subnetworks ρ, (b) maximum number of channels across all subnetworks Cmax and (c) sum rate Rsum versus the number of subnetworks M in the case of joint optimization of sum rate and balance of subnetworks. K = 50. L = 100. Vmax = 5m/s. varying the number of users K, the number of APs L, the number of subnetworks M, and the maximum speed of users Vmax, respectively. We can see from [PITH_FULL… view at source ↗
Figure 7
Figure 7. Figure 7: Clustered cell-free networking results of a random network snapshot with the proposed DDPG-C [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Balance of subnetworks ρ and sum rate Rsum with the proposed DDPG-C2F framework and the benchmarks in the case of maximization of balance of subnetworks with sum rate constraint. The results of the proposed framework are obtained by setting the rate threshold Rth as the sum rate achieved with the benchmarks in the same dashed circle. K = 50. L = 100. M = 5. Vmax = 5m/s. 2) Maximization of Balance of Subnet… view at source ↗
Figure 9
Figure 9. Figure 9: Balance-aware energy efficiency ηee during the training process in the case of joint optimization of energy efficiency and balance of subnetworks. K = 50. L = 100. M = 5. Vmax = 5m/s. APs in the mth subnetwork. After learning the anchor features, users and APs are first affiliated with their closest anchors to form subnetworks, and then in each subnetwork, the APs close to users are selected for transmissi… view at source ↗
Figure 10
Figure 10. Figure 10: Balance-aware energy efficiency ηρ with the proposed DDPG-C2F framework and the benchmarks in the case of joint optimization of energy efficiency and balance of subnetworks. By default, K = 50, L = 100, M = 5 and Vmax = 5m/s. 3 5 7 M 0.0 0.2 0.4 0.6 0.8 1.0 DDPG-C 2F UCR-ApSel UC-ApSel (a) 3 5 7 M 0 150 300 450 600 750 C m a x DDPG-C 2F UCR-ApSel UC-ApSel (b) 3 5 7 M 0.0 0.2 0.4 0.6 0.8 e e (bit /Jo ule /… view at source ↗
Figure 11
Figure 11. Figure 11: (a) Balance of subnetworks ρ, (b) maximum number of channels across all subnetworks Cmax and (c) energy efficiency ηee versus the number of subnetworks M in the case of joint optimization of energy efficiency and balance of subnetworks. K = 50. L = 100. Vmax = 5m/s [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Balance-aware energy efficiency ηρ and sum rate Rsum with the proposed DDPG-C2F framework and the benchmarks in the case of joint optimization of energy efficiency and balance of subnetworks with sum rate constraint. The results of the proposed framework are obtained by setting the rate threshold Rth as the sum rate achieved with the benchmarks in the same dashed circle. K = 50. L = 100. M = 5. Vmax = 5m/… view at source ↗
Figure 14
Figure 14. Figure 14: Number of handovers Φ with the proposed DDPG-C2F framework and the benchmarks versus the maximum speed of users Vmax. K = 50. L = 100. M = 5. networking result could vary from one time interval to another due to user mobility. As a result, users need to frequently reassociate with APs, bringing high handover cost. In addition, the number of users is assumed to be fixed in the previous sections, whereas us… view at source ↗
Figure 16
Figure 16. Figure 16: Comparison of the balance-aware sum rate [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Comparison of the balance-aware sum rate [PITH_FULL_IMAGE:figures/full_fig_p014_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Balance-aware sum rate Rρ during the training process in the case of joint optimization of sum rate and balance of subnetworks with (a) σa,max = 0.25 and (b) αϕ = 0.0001, αθ = 0.001. K = 50. L = 100. M = 5. Vmax = 5m/s. space. Moreover, [PITH_FULL_IMAGE:figures/full_fig_p015_18.png] view at source ↗
read the original abstract

Clustered cell-free networking paves a new way for enabling scalable joint transmission among access points (APs) by partitioning the whole network into non-overlapping subnetworks. Previous works adopted clustering algorithms, graph partitioning methods or conventional continuous optimization theories to partition a network based on the channels between all users and all APs, resulting in huge channel measurement and computational costs. This makes these methods difficult to be implemented in practical systems since the optimal network partition could vary frequently due to user mobility. In addition, existing methods were usually designed for specific clustered cell-free networking problems with different optimization algorithms employed. In this paper, we leverage deep reinforcement learning (DRL) for clustered cell-free networking so as to rapidly adapt to user movements in dynamic environments, and propose a deep deterministic policy gradient based clustered cell-free networking (DDPG-C$^{2}$F) framework that can be adapted in various application scenarios. Moreover, in our framework, only one single channel needs to be estimated at each AP as the input of the neural network, which greatly reduces the channel measurement costs for clustered cell-free networking, and the training and inference costs of our framework. The proposed DDPG-C$^{2}$F framework is then applied to various clustered cell-free networking problems with different objectives and constraints to demonstrate its performance. Simulation results show that our framework outperforms existing baselines in all scenarios. Moreover, we show that the proposed framework can reduce the handover cost over user mobility, and is robust to dynamic scenarios with random user joining or leaving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a deep deterministic policy gradient based clustered cell-free networking (DDPG-C²F) framework that partitions cell-free networks into non-overlapping subnetworks using DRL. It claims that only a single channel estimate per access point is needed as neural network input, enabling rapid adaptation to user mobility, reduced handover costs, outperformance over baselines across scenarios with varying objectives/constraints, and robustness to dynamic user join/leave events.

Significance. If the performance claims hold under detailed scrutiny, the work could advance practical deployment of cell-free systems by drastically lowering channel measurement overhead compared to full-matrix methods, while providing a flexible DRL template adaptable to different optimization goals. The emphasis on mobility handling addresses a key practical limitation of prior clustering approaches.

major comments (3)
  1. Abstract: the central claims of outperformance, handover reduction, and robustness rest on simulation results, yet the abstract (and by extension the evaluation) provides no details on simulation parameters, baseline implementations, number of Monte Carlo runs, statistical significance tests, or error bars, preventing assessment of whether the reported gains are reliable or generalizable.
  2. Framework description (state input): the assertion that a single channel estimate per AP suffices as NN state for effective clustering decisions is load-bearing for the reduced-cost claim and all mobility results, but no analysis or ablation is provided showing that this scalar/vector captures the cross-user spatial correlations present in the full channel matrix; standard cell-free objectives depend on the entire matrix, and the paper should demonstrate why mobility-induced changes remain trackable.
  3. Evaluation section: the robustness claim to random user joining/leaving and the generalization across network sizes lack supporting experiments with varying AP/user counts or explicit tests of the single-estimate state under realistic mobility traces; without these, the adaptability advantage over conventional methods cannot be confirmed.
minor comments (2)
  1. Notation: the superscript in DDPG-C²F should be consistently rendered and defined on first use.
  2. References: ensure all cited clustering and DRL baselines are from the most recent relevant literature in cell-free MIMO.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: Abstract: the central claims of outperformance, handover reduction, and robustness rest on simulation results, yet the abstract (and by extension the evaluation) provides no details on simulation parameters, baseline implementations, number of Monte Carlo runs, statistical significance tests, or error bars, preventing assessment of whether the reported gains are reliable or generalizable.

    Authors: We agree that additional details on the experimental setup would improve reproducibility and allow better assessment of result reliability. In the revised manuscript we will expand the evaluation section to specify simulation parameters, baseline implementation details, the number of Monte Carlo runs, error bars, and statistical significance tests where appropriate. revision: yes

  2. Referee: Framework description (state input): the assertion that a single channel estimate per AP suffices as NN state for effective clustering decisions is load-bearing for the reduced-cost claim and all mobility results, but no analysis or ablation is provided showing that this scalar/vector captures the cross-user spatial correlations present in the full channel matrix; standard cell-free objectives depend on the entire matrix, and the paper should demonstrate why mobility-induced changes remain trackable.

    Authors: The single-channel state per AP is selected specifically to reduce measurement overhead while still permitting the DRL agent to learn clustering policies that adapt to mobility, as evidenced by the reported simulation performance. We acknowledge the absence of an explicit ablation study. We will add a discussion of the rationale for this state representation together with an ablation comparing it against fuller channel information to show that the essential spatial correlations for mobility tracking are retained. revision: yes

  3. Referee: Evaluation section: the robustness claim to random user joining/leaving and the generalization across network sizes lack supporting experiments with varying AP/user counts or explicit tests of the single-estimate state under realistic mobility traces; without these, the adaptability advantage over conventional methods cannot be confirmed.

    Authors: We agree that further experiments would strengthen the robustness and generalization claims. In the revised manuscript we will include additional results with varying numbers of APs and users, together with evaluations under realistic mobility traces that explicitly test the single-estimate state in dynamic join/leave scenarios. revision: yes

Circularity Check

0 steps flagged

No significant circularity in DDPG-C²F framework proposal

full rationale

The paper introduces a novel DRL-based framework (DDPG-C²F) for dynamic clustered cell-free networking, using a single channel estimate per AP as NN input to enable adaptation to user mobility. It evaluates the approach via simulations against external baselines across multiple scenarios with varying objectives, showing empirical gains in performance, handover reduction, and robustness to join/leave dynamics. No load-bearing step reduces by construction to a fitted input, self-definition, or self-citation chain; the central claims rest on independent simulation results rather than renaming or re-deriving the inputs themselves.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard reinforcement-learning modeling assumptions and simulation-based validation rather than new physical axioms.

free parameters (1)
  • DDPG hyperparameters (learning rates, network sizes, exploration noise)
    Typical DRL training choices that are not reported in the abstract but required for the claimed performance.
axioms (1)
  • domain assumption The clustering decision process can be formulated as a Markov decision process with the chosen single-channel state representation.
    Standard assumption when applying DRL to sequential decision problems in dynamic networks.

pith-pipeline@v0.9.0 · 5818 in / 1250 out tokens · 47331 ms · 2026-05-19T23:22:26.827109+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    A deep reinforcement learning framework for clustered cell-free networking over user mobility,

    O. Zhou, J. Wang, and Y . Ji, “A deep reinforcement learning framework for clustered cell-free networking over user mobility,” inProc. IEEE WCNC, Mar. 2025

  2. [2]

    5G-advanced toward 6G: Past, present, and future,

    W. Chen, X. Lin, J. Lee, A. Toskala, S. Sun, C. F. Chiasserini, and L. Liu, “5G-advanced toward 6G: Past, present, and future,”IEEE J. Sel. Areas Commun., vol. 41, no. 6, pp. 1592–1619, Jun. 2023

  3. [3]

    Asymptotic rate analysis of downlink multi-user systems with co-located and distributed antennas,

    J. Wang and L. Dai, “Asymptotic rate analysis of downlink multi-user systems with co-located and distributed antennas,”IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3046–3058, Jun. 2015

  4. [4]

    Network MIMO with linear zero- forcing beamforming: Large system analysis, impact of channel esti- mation, and reduced-complexity scheduling,

    H. Huh, A. M. Tulino, and G. Caire, “Network MIMO with linear zero- forcing beamforming: Large system analysis, impact of channel esti- mation, and reduced-complexity scheduling,”IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 2911–2934, May 2012

  5. [5]

    User-centric C- RAN architecture for ultra-dense 6G networks: Challenges and method- ologies,

    C. Pan, M. Elkashlan, J. Wang, J. Yuan, and H. Lajos, “User-centric C- RAN architecture for ultra-dense 6G networks: Challenges and method- ologies,”IEEE Commun. Mag., vol. 56, no. 6, pp. 14–20, Jun. 2018

  6. [6]

    User-centric cell-free massive MIMO networks: A survey of opportunities, challenges and solutions,

    H. A. Ammar, R. Adve, S. Shahbazpanahi, G. Boudreau, and K. V . Srinivas, “User-centric cell-free massive MIMO networks: A survey of opportunities, challenges and solutions,”IEEE Commun. Surv. Tutor., vol. 24, no. 1, pp. 611–652, Jan. 2022

  7. [7]

    An uplink capacity analysis of the distributed antenna system (DAS): From cellular DAS to DAS with virtual cells,

    L. Dai, “An uplink capacity analysis of the distributed antenna system (DAS): From cellular DAS to DAS with virtual cells,”IEEE Trans. Wireless Commun., vol. 13, no. 5, pp. 2717–2731, May 2014

  8. [8]

    User-centric joint transmission in virtual-cell-based ultra-dense networks,

    Y . Zhang, S. Bi, and Y .-J. A. Zhang, “User-centric joint transmission in virtual-cell-based ultra-dense networks,”IEEE Trans. Veh. Technol., vol. 67, no. 5, pp. 4640–4644, May 2018

  9. [9]

    Downlink rate analysis for virtual-cell based large-scale distributed antenna systems,

    J. Wang and L. Dai, “Downlink rate analysis for virtual-cell based large-scale distributed antenna systems,”IEEE Trans. Wireless Commun., vol. 15, no. 3, pp. 1998–2011, Mar. 2016

  10. [10]

    Optimal decomposition for large-scale infrastructure- based wireless networks,

    L. Dai and B. Bai, “Optimal decomposition for large-scale infrastructure- based wireless networks,”IEEE Trans. Wireless Commun., vol. 16, no. 8, pp. 4956–4969, Aug. 2017

  11. [11]

    Rate-constrained network decomposition for clustered cell-free networking,

    J. Wang, L. Dai, L. Yang, and B. Bai, “Rate-constrained network decomposition for clustered cell-free networking,” inProc. IEEE ICC, May 2022, pp. 2549–2554

  12. [12]

    Clustered cell-free networking: A graph partitioning approach,

    J. Wang, L. Dai, L. Yang, and B. Bai, “Clustered cell-free networking: A graph partitioning approach,”IEEE Trans. Wireless Commun., vol. 22, no. 8, pp. 5349–5364, Aug. 2023

  13. [13]

    Optimal resource allocation for cellular networks with virtual cell joint decoding,

    M. Yemini and A. J. Goldsmith, “Optimal resource allocation for cellular networks with virtual cell joint decoding,” inProc. IEEE ISIT, Jul. 2019, pp. 2519–2523. 16

  14. [14]

    Optimal access point centric clustering for cell-free massive MIMO using Gaussian mixture model clustering,

    P. Biswas, R. K. Mallik, and K. B. Letaief, “Optimal access point centric clustering for cell-free massive MIMO using Gaussian mixture model clustering,”IEEE Trans. Mach. Learn. Commun. Netw., vol. 2, pp. 675– 687, May 2024

  15. [15]

    Clustered cell-free massive MIMO,

    F. Riera-Palou, G. Femenias, A. G. Armada, and A. P ´erez-Neira, “Clustered cell-free massive MIMO,” inProc. IEEE Globecom, Dec. 2018

  16. [16]

    Energy-efficient clustered cell- free networking with access point selection,

    O. Zhou, J. Wang, F. Liu, and J. Wang, “Energy-efficient clustered cell- free networking with access point selection,”IEEE Open J. Commun. Soc., vol. 5, pp. 1551–1565, Mar. 2024

  17. [17]

    Tunable weighted kernel k-means for clustered cell-free networking acceleration and beam on-off control,

    X. Zeng, J. Wang, K. Yue, M. Dong, and B. Bai, “Tunable weighted kernel k-means for clustered cell-free networking acceleration and beam on-off control,” inProc. IEEE ICC, Jun. 2024, pp. 4311–4316

  18. [18]

    Exploring evolutionary spectral clustering for temporal-smoothed clustered cell-free networking,

    J. Wang, T. Wu, O. Zhou, and Y . Zhu, “Exploring evolutionary spectral clustering for temporal-smoothed clustered cell-free networking,”IEEE Wireless Commun. Lett., vol. 14, no. 2, pp. 494–498, Dec. 2024

  19. [19]

    Complexity-constrained clustered cell-free network- ing for sum capacity maximization,

    F. Xia and J. Wang, “Complexity-constrained clustered cell-free network- ing for sum capacity maximization,” inProc. IEEE ISIT, Jun. 2023, pp. 2691–2696

  20. [20]

    Tight differ- entiable relaxation of sum ergodic capacity maximization for clustered cell-free networking,

    B. Ren, H. Hao, Z. Lyu, J. Peng, J. Wang, and H. Wu, “Tight differ- entiable relaxation of sum ergodic capacity maximization for clustered cell-free networking,” inProc. IEEE ISIT, Jul. 2024, pp. 2448–2453

  21. [21]

    Optimizing clustered cell-free networking for sum ergodic capacity maximization with joint processing constraint,

    F. Xia, J. Wang, and L. Dai, “Optimizing clustered cell-free networking for sum ergodic capacity maximization with joint processing constraint,” IEEE Trans. Wireless Commun., vol. 24, no. 1, pp. 571–584, Jan. 2025

  22. [22]

    Balanced clustered cell-free networking with individual rate guarantees,

    C. Deng, B. Ren, Z. Lyu, J. Wang, and H. Wu, “Balanced clustered cell-free networking with individual rate guarantees,”IEEE Trans. Veh. Technol., pp. 1–5, Apr. 2025

  23. [23]

    Double-layer power control for mobile cell-free XL-MIMO with multi-agent reinforcement learning,

    Z. Liu, J. Zhang, Z. Liu, H. Xiao, and B. Ai, “Double-layer power control for mobile cell-free XL-MIMO with multi-agent reinforcement learning,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 4658–4674, May 2024

  24. [24]

    Accelerated deep reinforcement learning for uplink power control in a dynamic cell-free massive MIMO network,

    C. F. Mendoza, M. Kaneko, M. Rupp, and S. Schwarz, “Accelerated deep reinforcement learning for uplink power control in a dynamic cell-free massive MIMO network,”IEEE Wireless Commun. Lett., vol. 13, no. 6, pp. 1710–1714, Apr. 2024

  25. [25]

    Downlink power control for cell-free massive MIMO with deep reinforcement learning,

    L. Luo, J. Zhang, S. Chen, X. Zhang, B. Ai, and D. W. K. Ng, “Downlink power control for cell-free massive MIMO with deep reinforcement learning,”IEEE Trans. Veh. Technol., vol. 71, no. 6, pp. 6772–6777, Mar. 2022

  26. [26]

    Distributed beamforming techniques for cell-free wireless networks using deep reinforcement learning,

    F. Fredj, Y . Al-Eryani, S. Maghsudi, M. Akrout, and E. Hossain, “Distributed beamforming techniques for cell-free wireless networks using deep reinforcement learning,”IEEE Trans. Cogn. Commun. Netw., vol. 8, no. 2, pp. 1186–1201, Apr. 2022

  27. [27]

    Distributed beam selection for millimeter-wave cell-free massive MIMO based on multi-agent deep reinforcement learning,

    Y . Li, C. Zhang, and Y . Huang, “Distributed beam selection for millimeter-wave cell-free massive MIMO based on multi-agent deep reinforcement learning,” inProc. IEEE WCNC, Apr. 2024

  28. [28]

    Self-organizing mmwave MIMO cell-free networks with hybrid beamforming: A hierarchical DRL-based design,

    Y . Al-Eryani and E. Hossain, “Self-organizing mmwave MIMO cell-free networks with hybrid beamforming: A hierarchical DRL-based design,” IEEE Trans. Commun., vol. 70, no. 5, pp. 3169–3185, Mar. 2022

  29. [29]

    Reinforcement learning-based joint cooperation clustering and content caching in cell-free massive MIMO networks,

    R. Y . Chang, S.-F. Han, and F.-T. Chien, “Reinforcement learning-based joint cooperation clustering and content caching in cell-free massive MIMO networks,” inProc. IEEE VTC, Sep. 2021, pp. 1–7

  30. [30]

    Energy efficient AP selection for cell-free massive MIMO systems: Deep reinforcement learning approach,

    N. Ghiasi, S. Mashhadi, S. Farahmand, S. M. Razavizadeh, and I. Lee, “Energy efficient AP selection for cell-free massive MIMO systems: Deep reinforcement learning approach,”IEEE Trans. Green Commun. Netw., vol. 7, no. 1, pp. 29–41, Aug. 2023

  31. [31]

    DRL-based AP selection in downlink cell-free massive MIMO network with pilot contamination,

    Z. Gao, Q. Zhang, J. Liu, Z. Du, and Y . Li, “DRL-based AP selection in downlink cell-free massive MIMO network with pilot contamination,” IEEE Commun. Lett., vol. 28, no. 6, pp. 1432–1436, Apr. 2024

  32. [32]

    Energy-efficient user association in mmwave/THz ultra-dense network via multi-agent deep reinforcement learning,

    J. Moon, S. Kim, H. Ju, and B. Shim, “Energy-efficient user association in mmwave/THz ultra-dense network via multi-agent deep reinforcement learning,”IEEE Trans. Green Commun. Netw., vol. 7, no. 2, pp. 692–706, Jan. 2023

  33. [33]

    Multi-agent deep reinforcement learning for access point activation strategy in cell-free massive MIMO networks,

    L. Sun, J. Hou, and R. Chapman, “Multi-agent deep reinforcement learning for access point activation strategy in cell-free massive MIMO networks,” inProc. IEEE Infocom Workshops, May. 2023

  34. [34]

    Access point clustering in cell-free massive MIMO using conventional and federated multi-agent reinforcement learning,

    B. Banerjee, R. C. Elliott, W. A. Krzymie ˜n, and M. Medra, “Access point clustering in cell-free massive MIMO using conventional and federated multi-agent reinforcement learning,”IEEE Trans. Mach. Learn. Commun. Netw., vol. 1, pp. 107–123, Jun. 2023

  35. [35]

    A quantitative measure of fairness and discrimination,

    R. K. Jain, D.-M. W. Chiu, and W. R. Hawe, “A quantitative measure of fairness and discrimination,” Eastern Res. Lab., Digit. Equip. Corp., Hudson, MA, USA, 1984

  36. [36]

    Applications of deep reinforcement learning in communications and networking: A survey,

    N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y .-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,”IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3133–3174, May 2019

  37. [37]

    Convergence results for single-step on-policy reinforcement-learning algorithms,

    S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesv ´ari, “Convergence results for single-step on-policy reinforcement-learning algorithms,” Mach. Learn., vol. 38, no. 3, pp. 287–308, Mar. 2000

  38. [38]

    Sutton and A

    R. Sutton and A. Barto,Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 2018

  39. [39]

    Fundamentals of mobility- aware performance characterization of cellular networks: A tutorial,

    H. Tabassum, M. Salehi, and E. Hossain, “Fundamentals of mobility- aware performance characterization of cellular networks: A tutorial,” IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2288–2308, Mar. 2019

  40. [40]

    Energy-efficient resource allocation in coordinated downlink multicell OFDMA systems,

    X. Wang, F.-C. Zheng, P. Zhu, and X. You, “Energy-efficient resource allocation in coordinated downlink multicell OFDMA systems,”IEEE Trans. Veh. Technol., vol. 65, no. 3, pp. 1395–1408, Mar. 2016

  41. [41]

    Joint power allocation and access point selection for cell-free massive MIMO,

    T. X. Vu, S. Chatzinotas, S. ShahbazPanahi, and B. Ottersten, “Joint power allocation and access point selection for cell-free massive MIMO,” inProc. IEEE ICC, Jul. 2020. Ouyang Zhoureceived the B.S. degree from Nankai University, Tianjin, China, in 2019. He is currently pursuing the Ph.D. degree with the College of Electronic and Information Engineering,...