FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

Amin Farajzadeh; Melike Erol-Kantarci

arxiv: 2605.21418 · v1 · pith:675ZDYU2new · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.CV· cs.NI

FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G

Amin Farajzadeh , Melike Erol-Kantarci This is my paper

Pith reviewed 2026-05-21 05:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.NI

keywords federated learningmulti-agent reinforcement learningresource allocationOFDMA6G networksinter-cell interferenceactor-criticdistributed scheduling

0 comments

The pith

Serverless federated critic learning coordinates multi-cell OFDMA resource allocation in 6G by gossiping parameters over the interference graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes FedCritic to handle joint subcarrier scheduling and power control in dense 6G networks where inter-cell interference tightly couples decisions across cells. It enforces long-term QoS using virtual-queue deficit weights and employs a multi-agent actor-critic architecture with decentralized policy execution. The key innovation is federating the critic via lightweight gossip-based averaging of parameters along the interference graph, which removes the need for a central coordinator to collect joint trajectories. This yields more stable training and lower overhead than centralized training with decentralized execution approaches. In simulations under aggressive frequency reuse, the method delivers higher average SINR, improved cell-edge performance, greater network sum-rate, and better fairness.

Core claim

FedCritic is a serverless federated multi-agent actor-critic framework with decentralized execution. Unlike CTDE methods that require centralized critic learning and joint trajectory aggregation, FedCritic federates the critic through lightweight gossip-based parameter averaging over the interference graph. This enables stable value estimation without a central coordinator while keeping policies local. Simulations in an interference-rich reuse-1 setting demonstrate improvements in mean SINR and cell-edge rate, higher network-wide average sum-rate and fairness relative to non-coordinated and CTDE baselines, and more stable training with lower coordination overhead.

What carries the argument

FedCritic, a serverless federated multi-agent actor-critic framework that uses gossip-based parameter averaging over the interference graph to federate critic learning while keeping actor policies local.

If this is right

Higher mean SINR and cell-edge rates in interference-rich reuse-1 deployments.
Increased network-wide average sum-rate under long-term per-user QoS constraints.
Improved fairness across users compared with non-coordinated and CTDE baselines.
More stable training and lower coordination overhead than methods requiring joint trajectory aggregation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Gossip-based federation may scale more readily than centralized critics when backhaul capacity limits trajectory sharing in larger ultra-dense networks.
The same virtual-queue plus gossip-critic pattern could apply to other graph-structured multi-agent problems with local interaction constraints.
Performance gains might compound when combined with emerging 6G physical-layer techniques such as intelligent reflecting surfaces or THz links.
Convergence guarantees under varying interference-graph densities remain open for formal analysis.

Load-bearing premise

Lightweight gossip-based parameter averaging over the interference graph enables stable value estimation without a central coordinator.

What would settle it

In the same interference-rich reuse-1 multi-cell OFDMA simulations, observing that FedCritic fails to improve mean SINR, cell-edge rate, sum-rate, fairness, or training stability relative to a CTDE baseline with centralized critic learning would falsify the claimed advantages.

Figures

Figures reproduced from arXiv: 2605.21418 by Amin Farajzadeh, Melike Erol-Kantarci.

**Figure 2.** Figure 2: Distribution of the per-slot average network sum-ra [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Mean SINR and (b) neighbor-collision rate, over [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Activity (reuse intensity) heatmaps over BSs and [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

In sixth-generation (6G) ultra-dense networks, aggressive frequency reuse amplifies inter-cell interference (ICI), making multi-cell orthogonal frequency-division multiple access (OFDMA) scheduling and power control strongly coupled across neighboring cells. We study distributed downlink resource management -- joint subcarrier scheduling and power allocation -- under interference coupling and long-term per-user quality-of-service (QoS) minimum-rate constraints. By using virtual-queue deficit weights to enforce long-term QoS, we develop FedCritic, a serverless federated multi-agent actor-critic framework with decentralized execution. Unlike centralized training with decentralized execution (CTDE) approaches that require centralized critic learning and joint trajectory aggregation, FedCritic federates the critic through lightweight gossip-based parameter averaging over the interference graph, enabling stable value estimation without a central coordinator while keeping policies local. Simulations in an interference-rich reuse-1 setting show that FedCritic improves mean signal-to-interference-plus-noise ratio (SINR) and cell-edge rate, increases network-wide average sum-rate and fairness relative to non-coordinated and CTDE baselines, and achieves more stable training with lower coordination overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedCritic uses gossip averaging over the interference graph to federate critics in a serverless multi-agent setup for 6G OFDMA, but local training leaves cross-cell value estimates vulnerable.

read the letter

The main point is that FedCritic applies gossip-based parameter averaging to share critic information across cells without a central coordinator, while keeping actor policies local for joint subcarrier scheduling and power control under virtual-queue QoS constraints. This is positioned as an alternative to CTDE methods that aggregate trajectories centrally. The simulations in a reuse-1 interference setting report gains in mean SINR, cell-edge rate, network sum-rate, fairness, and training stability with lower overhead than non-coordinated and CTDE baselines. That framing addresses a concrete engineering issue in ultra-dense 6G networks where ICI couples decisions across cells. The virtual-queue approach for long-term minimum-rate enforcement is a straightforward way to handle QoS in the distributed case, and the gossip mechanism over the interference graph is a reasonable adaptation of federated ideas to this topology. The work shows clear engagement with actor-critic and federated RL literature without obvious internal contradictions. The soft spot is the value estimation step. Local critics train on per-cell trajectories where rewards and next states depend on neighbors' unknown actions, so gradients capture only partial interference. Simple parameter averaging after local updates does not restore the joint-action information a true centralized critic would use; in reuse-1 OFDMA this can produce biased or high-variance estimates even if training curves look smoother. The abstract gives no error bars, exact baseline implementations, or training details, which leaves the reported metric improvements open to question. This paper is for people working on distributed RL for wireless resource allocation. A reader focused on 6G interference management or multi-agent methods in communications would find the specific gossip application and QoS handling worth examining. It has enough of a concrete proposal and empirical results to deserve peer review rather than a desk reject, though revisions would need to strengthen the analysis of critic accuracy under partial observations.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces FedCritic, a serverless federated multi-agent actor-critic framework for joint subcarrier scheduling and power allocation in multi-cell OFDMA networks under inter-cell interference and long-term QoS constraints. It replaces centralized critic training with lightweight gossip-based parameter averaging over the interference graph, keeps policies local, and reports simulation gains in mean SINR, cell-edge rate, network sum-rate, fairness, and training stability versus non-coordinated and CTDE baselines in a reuse-1 setting.

Significance. If the performance claims hold under rigorous verification, the approach offers a practical route to distributed 6G resource management with reduced coordination overhead. The combination of virtual-queue weighting with gossip-averaged critics is a clear technical contribution, and the simulation evidence of both performance and stability improvements is a strength of the work.

major comments (1)

[Section 3.2 and Algorithm 1] The description of the critic update and gossip mechanism (Section 3.2 and Algorithm 1): local critics are trained solely on per-cell trajectories whose rewards and next-states depend on neighboring cells' unknown actions. Gossip averaging after local SGD steps therefore cannot restore the joint-action information that a true centralized critic would use. In a reuse-1 OFDMA setting this mismatch risks biased or high-variance value estimates, which directly undermines the claim that the observed SINR and sum-rate gains arise from accurate value estimation rather than from other algorithmic or simulation artifacts.

minor comments (2)

[Simulation results] Simulation section: error bars, number of independent runs, and exact hyper-parameter settings for the actor-critic networks and gossip rounds are not reported; these details are required to assess statistical significance of the reported gains.
[System model] Notation: the precise construction of the interference graph used for gossip averaging should be stated explicitly, including how edges are determined from the reuse-1 layout.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The major comment raises an important point about the information available to the federated critic. We address it directly below and have revised the manuscript to clarify the approximation and strengthen the supporting analysis.

read point-by-point responses

Referee: [Section 3.2 and Algorithm 1] The description of the critic update and gossip mechanism (Section 3.2 and Algorithm 1): local critics are trained solely on per-cell trajectories whose rewards and next-states depend on neighboring cells' unknown actions. Gossip averaging after local SGD steps therefore cannot restore the joint-action information that a true centralized critic would use. In a reuse-1 OFDMA setting this mismatch risks biased or high-variance value estimates, which directly undermines the claim that the observed SINR and sum-rate gains arise from accurate value estimation rather than from other algorithmic or simulation artifacts.

Authors: We agree that the local trajectories do not contain explicit joint actions and that gossip averaging of critic parameters cannot literally reconstruct the full joint-action value function of a centralized critic. In the revised manuscript we now explicitly state this limitation in Section 3.2 and add a short paragraph explaining that the approach is an approximation: each local critic observes the realized interference (which is a deterministic function of the unknown neighbor actions) as part of its state, and gossip over the interference graph propagates parameter updates that have been shaped by these interference observations. While this does not eliminate all bias or variance relative to a true centralized critic, the design still yields more stable training and higher performance than both non-coordinated and standard CTDE baselines in our experiments. To further address the concern we have added (i) a discussion of the approximation error and (ii) an ablation study that varies gossip frequency and reports the resulting changes in value-estimate variance and final network metrics. These additions make the source of the reported gains more transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents FedCritic as a serverless federated multi-agent actor-critic method that applies gossip-based averaging over the interference graph to enable decentralized critic updates. This builds directly on standard actor-critic and federated learning primitives without any quoted equations or steps that reduce a claimed prediction or result back to a fitted parameter or self-referential definition. Performance claims rest on simulation comparisons to non-coordinated and CTDE baselines rather than on a closed derivation loop. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are evident in the provided text that would force the central result by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; specific free parameters such as learning rates or virtual-queue weights are not detailed in available text.

axioms (1)

domain assumption Multi-agent actor-critic reinforcement learning is suitable for joint subcarrier scheduling and power allocation under interference coupling.
The FedCritic construction relies on this established approach in wireless resource management literature.

invented entities (1)

FedCritic no independent evidence
purpose: Serverless federated multi-agent actor-critic framework enabling decentralized critic learning via gossip averaging.
This is the name and core contribution of the proposed method.

pith-pipeline@v0.9.0 · 5746 in / 1422 out tokens · 78563 ms · 2026-05-21T05:11:39.198096+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

6G cellular networks: Ma pping the landscape for the IMT-2030 framework,

E. Hossain and A. V era-Rivera, “6G cellular networks: Ma pping the landscape for the IMT-2030 framework,” IEEE Trans. Technol. Soc. , vol. 6, no. 4, pp. 377–392, Dec. 2025

work page 2030
[2]

Operator’s perspective on 6G: 6G services, vision, and spectrum,

M. Na et al., “Operator’s perspective on 6G: 6G services, vision, and spectrum,” IEEE Commun. Mag., vol. 62, no. 8, pp. 178–184, Aug. 2024

work page 2024
[3]

Time synchronous OFDMA for dense wireless access in open-RAN,

F. Mazzenga and A. Vizzarri, “Time synchronous OFDMA for dense wireless access in open-RAN,” IEEE Commun. Lett. , vol. 30, pp. 66–70, 2026

work page 2026
[4]

Interference coordination for autonomo us small cell networks based on distributed learning,

Y . Wang et al., “Interference coordination for autonomo us small cell networks based on distributed learning,” in Proc. IEEE Int. Conf. Commun. (ICC) , Dublin, Ireland, 2020, pp. 1–6

work page 2020
[5]

Interference burden in wireless communications: A comprehensive survey from PHY layer pers pective,

A. Tusha and H. Arslan, “Interference burden in wireless communications: A comprehensive survey from PHY layer pers pective,” IEEE Commun. Surv. Tutor ., vol. 27, no. 4, pp. 2204–2246, Aug. 2025

work page 2025
[6]

Multi-agent reinforcement learning f or resources allocation optimization: a survey,

M. A. Hady et al., “Multi-agent reinforcement learning f or resources allocation optimization: a survey,” Artif. Intell. Rev. , vol. 58, Art. no. 354, Nov. 2025

work page 2025
[7]

A collaborative mu lti-agent deep reinforcement learning-based wireless power allocation w ith centralized training and decentralized execution,

A. Kopic, E. Perenda, and H. Gacanin, “A collaborative mu lti-agent deep reinforcement learning-based wireless power allocation w ith centralized training and decentralized execution,” IEEE Trans. Commun. , vol. 72, no. 11, pp. 7006–7016, Nov. 2024

work page 2024
[8]

Knowledge distillation-based MAPPO app roach of wireless power and spectrum resource joint allocation for 6 G networks,

Y . Zhou et al., “Knowledge distillation-based MAPPO app roach of wireless power and spectrum resource joint allocation for 6 G networks,” in Proc. IEEE Wirel. Commun. Netw. Conf. (WCNC) , Milan, Italy, 2025, pp. 1–6

work page 2025
[9]

Multi-agent reinforcement l earning based distributed multi-user scheduling and beamforming design in multi-cell systems,

S. Bai, Z. Gao, and X. Liao, “Multi-agent reinforcement l earning based distributed multi-user scheduling and beamforming design in multi-cell systems,” IEEE Trans. V eh. Technol. , vol. 74, no. 3, pp. 4432–4444, Mar. 2025

work page 2025
[10]

Federated reinforcement learning for wireless networks: Fundamentals, challenges and future research tr ends,

S. K. Das et al., “Federated reinforcement learning for wireless networks: Fundamentals, challenges and future research tr ends,” IEEE Open J. V eh. Technol., vol. 5, pp. 1400–1440, 2024

work page 2024
[11]

Federated d eep reinforcement learning for the distributed control of nextG wireless netw orks,

P . Tehrani, F. Restuccia, and M. Levorato, “Federated d eep reinforcement learning for the distributed control of nextG wireless netw orks,” in Proc. IEEE Int. Symp. Dyn. Spectr . Access Netw. (DySPAN) , Los Angeles, CA, USA, 2021, pp. 248–253

work page 2021
[12]

Decentralized federated reinforcement learning for user-centric dynamic TFDD control,

Z. Yin et al., “Decentralized federated reinforcement learning for user-centric dynamic TFDD control,” IEEE J. Sel. Topics Signal Process., vol. 17, no. 1, pp. 40–53, Jan. 2023

work page 2023
[13]

MADRL-based uplink joint resource bloc k allocation and power control in multi-cell systems,

Y . Y ang et al., “MADRL-based uplink joint resource bloc k allocation and power control in multi-cell systems,” in Proc. IEEE Wirel. Commun. Netw. Conf. (WCNC) , Glasgow, United Kingdom, 2023, pp. 1–6

work page 2023
[14]

Multi-agent reinforcement learning for d ynamic resource management in 6G in-X subnetworks,

X. Du et al., “Multi-agent reinforcement learning for d ynamic resource management in 6G in-X subnetworks,” IEEE Trans. Wirel. Commun. , vol. 22, no. 3, pp. 1900–1914, Mar. 2023

work page 1900
[15]

Trafﬁc-aware cellular user associ ation via multi-agent reinforcement learning,

Y . Zhang and D. Guo, “Trafﬁc-aware cellular user associ ation via multi-agent reinforcement learning,” in Proc. IEEE 102nd V eh. Technol. Conf. (VTC2025-Fall), Chengdu, China, 2025, pp. 1–5

work page 2025
[16]

An ofﬂine multi-agent reinforc ement learning framework for radio resource management,

E. Eldeeb and H. Alves, “An ofﬂine multi-agent reinforc ement learning framework for radio resource management,” IEEE Trans. Mobile Comput., vol. 25, no. 1, pp. 1137–1150, Jan. 2026

work page 2026
[17]

Optimized power allocation in multi-cell 4G/5G systems using multi-agent deep reinfor cement learning,

N. Y . Mitsuishi, Y . Ma, and J. B. Coder, “Optimized power allocation in multi-cell 4G/5G systems using multi-agent deep reinfor cement learning,” in Proc. IEEE 102nd V eh. Technol. Conf. (VTC-Fall) , Chengdu, China, 2025, pp. 1–6

work page 2025
[18]

S-DIGing: A stochastic gradient tracking algorithm for distributed optimization,

H. Li et al., “S-DIGing: A stochastic gradient tracking algorithm for distributed optimization,” IEEE Trans. Emerg. Top. Comput. Intell. , vol. 6, no. 1, pp. 53–65, Feb. 2022

work page 2022

[1] [1]

6G cellular networks: Ma pping the landscape for the IMT-2030 framework,

E. Hossain and A. V era-Rivera, “6G cellular networks: Ma pping the landscape for the IMT-2030 framework,” IEEE Trans. Technol. Soc. , vol. 6, no. 4, pp. 377–392, Dec. 2025

work page 2030

[2] [2]

Operator’s perspective on 6G: 6G services, vision, and spectrum,

M. Na et al., “Operator’s perspective on 6G: 6G services, vision, and spectrum,” IEEE Commun. Mag., vol. 62, no. 8, pp. 178–184, Aug. 2024

work page 2024

[3] [3]

Time synchronous OFDMA for dense wireless access in open-RAN,

F. Mazzenga and A. Vizzarri, “Time synchronous OFDMA for dense wireless access in open-RAN,” IEEE Commun. Lett. , vol. 30, pp. 66–70, 2026

work page 2026

[4] [4]

Interference coordination for autonomo us small cell networks based on distributed learning,

Y . Wang et al., “Interference coordination for autonomo us small cell networks based on distributed learning,” in Proc. IEEE Int. Conf. Commun. (ICC) , Dublin, Ireland, 2020, pp. 1–6

work page 2020

[5] [5]

Interference burden in wireless communications: A comprehensive survey from PHY layer pers pective,

A. Tusha and H. Arslan, “Interference burden in wireless communications: A comprehensive survey from PHY layer pers pective,” IEEE Commun. Surv. Tutor ., vol. 27, no. 4, pp. 2204–2246, Aug. 2025

work page 2025

[6] [6]

Multi-agent reinforcement learning f or resources allocation optimization: a survey,

M. A. Hady et al., “Multi-agent reinforcement learning f or resources allocation optimization: a survey,” Artif. Intell. Rev. , vol. 58, Art. no. 354, Nov. 2025

work page 2025

[7] [7]

A collaborative mu lti-agent deep reinforcement learning-based wireless power allocation w ith centralized training and decentralized execution,

A. Kopic, E. Perenda, and H. Gacanin, “A collaborative mu lti-agent deep reinforcement learning-based wireless power allocation w ith centralized training and decentralized execution,” IEEE Trans. Commun. , vol. 72, no. 11, pp. 7006–7016, Nov. 2024

work page 2024

[8] [8]

Knowledge distillation-based MAPPO app roach of wireless power and spectrum resource joint allocation for 6 G networks,

Y . Zhou et al., “Knowledge distillation-based MAPPO app roach of wireless power and spectrum resource joint allocation for 6 G networks,” in Proc. IEEE Wirel. Commun. Netw. Conf. (WCNC) , Milan, Italy, 2025, pp. 1–6

work page 2025

[9] [9]

Multi-agent reinforcement l earning based distributed multi-user scheduling and beamforming design in multi-cell systems,

S. Bai, Z. Gao, and X. Liao, “Multi-agent reinforcement l earning based distributed multi-user scheduling and beamforming design in multi-cell systems,” IEEE Trans. V eh. Technol. , vol. 74, no. 3, pp. 4432–4444, Mar. 2025

work page 2025

[10] [10]

Federated reinforcement learning for wireless networks: Fundamentals, challenges and future research tr ends,

S. K. Das et al., “Federated reinforcement learning for wireless networks: Fundamentals, challenges and future research tr ends,” IEEE Open J. V eh. Technol., vol. 5, pp. 1400–1440, 2024

work page 2024

[11] [11]

Federated d eep reinforcement learning for the distributed control of nextG wireless netw orks,

P . Tehrani, F. Restuccia, and M. Levorato, “Federated d eep reinforcement learning for the distributed control of nextG wireless netw orks,” in Proc. IEEE Int. Symp. Dyn. Spectr . Access Netw. (DySPAN) , Los Angeles, CA, USA, 2021, pp. 248–253

work page 2021

[12] [12]

Decentralized federated reinforcement learning for user-centric dynamic TFDD control,

Z. Yin et al., “Decentralized federated reinforcement learning for user-centric dynamic TFDD control,” IEEE J. Sel. Topics Signal Process., vol. 17, no. 1, pp. 40–53, Jan. 2023

work page 2023

[13] [13]

MADRL-based uplink joint resource bloc k allocation and power control in multi-cell systems,

Y . Y ang et al., “MADRL-based uplink joint resource bloc k allocation and power control in multi-cell systems,” in Proc. IEEE Wirel. Commun. Netw. Conf. (WCNC) , Glasgow, United Kingdom, 2023, pp. 1–6

work page 2023

[14] [14]

Multi-agent reinforcement learning for d ynamic resource management in 6G in-X subnetworks,

X. Du et al., “Multi-agent reinforcement learning for d ynamic resource management in 6G in-X subnetworks,” IEEE Trans. Wirel. Commun. , vol. 22, no. 3, pp. 1900–1914, Mar. 2023

work page 1900

[15] [15]

Trafﬁc-aware cellular user associ ation via multi-agent reinforcement learning,

Y . Zhang and D. Guo, “Trafﬁc-aware cellular user associ ation via multi-agent reinforcement learning,” in Proc. IEEE 102nd V eh. Technol. Conf. (VTC2025-Fall), Chengdu, China, 2025, pp. 1–5

work page 2025

[16] [16]

An ofﬂine multi-agent reinforc ement learning framework for radio resource management,

E. Eldeeb and H. Alves, “An ofﬂine multi-agent reinforc ement learning framework for radio resource management,” IEEE Trans. Mobile Comput., vol. 25, no. 1, pp. 1137–1150, Jan. 2026

work page 2026

[17] [17]

Optimized power allocation in multi-cell 4G/5G systems using multi-agent deep reinfor cement learning,

N. Y . Mitsuishi, Y . Ma, and J. B. Coder, “Optimized power allocation in multi-cell 4G/5G systems using multi-agent deep reinfor cement learning,” in Proc. IEEE 102nd V eh. Technol. Conf. (VTC-Fall) , Chengdu, China, 2025, pp. 1–6

work page 2025

[18] [18]

S-DIGing: A stochastic gradient tracking algorithm for distributed optimization,

H. Li et al., “S-DIGing: A stochastic gradient tracking algorithm for distributed optimization,” IEEE Trans. Emerg. Top. Comput. Intell. , vol. 6, no. 1, pp. 53–65, Feb. 2022

work page 2022