Agentic AI for Bilevel Long-Term Optimization of Policy-Driven Physical Layer Systems

Bingnan Xiao; Chenhao Yang; Tony Q. S. Quek; Wei Ni; Xin Wang

arxiv: 2606.24416 · v1 · pith:XJRLC5J6new · submitted 2026-06-23 · 💻 cs.AI

Agentic AI for Bilevel Long-Term Optimization of Policy-Driven Physical Layer Systems

Bingnan Xiao , Chenhao Yang , Wei Ni , Xin Wang , Tony Q. S. Quek This is my paper

Pith reviewed 2026-06-25 23:40 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentic AIbilevel optimizationphysical layer systemslong-term performance optimizationcell-free MIMObeamformingpolicy adaptationmulti-agent process

0 comments

The pith

A bilevel framework uses agentic AI to adapt physical-layer optimizations to changing operator policies and raises long-term performance by 57.2 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Agentic-LTPO, a nested bilevel optimization approach for physical layer systems that must respond to shifting operator policies and service requirements. Agentic AI at the upper level converts policies, environment summaries, and past experiences into structured configurations for lower-level real-time solvers. In the cell-free MIMO beamforming case, a multi-agent process with retrieval-augmented verification supplies the upper-level configurations while a closed-form beamformer handles the lower level. Traditional fixed-objective methods lose effectiveness under these dynamics, so the framework restores adaptability. A reader would care because networks routinely face policy changes that break static designs.

Core claim

Agentic-LTPO is a nested bilevel optimization framework that employs agentic AI to generate upper-level configurations in a bilevel optimization structure, where evolving operator policies, environment summaries, and historical experiences are translated into structured lower-level optimization problem configurations. The lower level solves the problems with updated configurations for real-time physical-layer decisions. For the cell-free MIMO beamforming use case, the framework is embodied by a multi-agent decision process with retrieval-augmented experience-based verification in the upper level together with a closed-form beamformer in the lower level.

What carries the argument

The multi-agent decision process with retrieval-augmented experience-based verification that produces upper-level configurations for the bilevel structure.

If this is right

The framework adapts physical-layer configurations to dynamic operator policies without requiring redesign of the core optimization.
Long-term system performance improves by 57.2 percent relative to traditional fixed-objective methods.
Lower-level real-time decisions remain feasible while the upper level absorbs policy evolution.
The same bilevel structure applies to other adaptive physical-layer problem configurations beyond the MIMO beamforming example.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the upper-level verification step scales, the same agentic layer could be attached to other communications optimization tasks that currently rely on static objectives.
The reliance on historical experience retrieval suggests the method may tolerate policy changes at higher frequency than manual reconfiguration allows.
Extending the closed-form lower-level solver to additional physical-layer problems would test how broadly the 57.2 percent gain generalizes.

Load-bearing premise

The multi-agent upper-level process can consistently produce lower-level problem configurations that improve long-term performance without introducing instability or hidden costs when policies evolve.

What would settle it

A long-duration experiment that applies rapidly changing operator policies and measures whether the reported performance gain is sustained or whether instability emerges in the lower-level solutions.

Figures

Figures reproduced from arXiv: 2606.24416 by Bingnan Xiao, Chenhao Yang, Tony Q. S. Quek, Wei Ni, Xin Wang.

**Figure 2.** Figure 2: The upper-level multi-agent architecture of Agentic-LTPO for CF [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative communication utility of different upper-level controllers [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 5.** Figure 5: Policy-compliance operating points of different adaptive upper-level [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 7.** Figure 7: Mean–variability tradeoff of interval-level communication utility under [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Configuration trajectories generated by different upper-level controllers [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Ablation study on the contributions of retrieval-assisted grounding [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

read the original abstract

Network operators' changing policies, service requirements, and stringent real-time constraints render existing methods designed with fixed objectives and constraints ineffective. This paper presents Agentic long-term performance optimization (Agentic-LTPO), a nested bilevel optimization framework that can be applied to adaptive physical layer problem configuration. The key idea is to employ agentic AI to generate upper-level configurations in a bilevel optimization structure, where evolving operator policies, environment summaries, and historical experiences are translated into structured lower-level optimization problem configurations. The lower level solves the problems with updated configurations for real-time physical-layer decisions. Considering cell-free MIMO beamforming as a use case, we embody Agentic-LTPO by designing a new multi-agent decision process with retrieval-augmented experience-based verification in the upper level, together with a closed-form beamformer in the lower level. Experiments demonstrate that Agentic-LTPO exhibits strong adaptability to dynamic operator policies and effectively enhances the system's long-term performance by 57.2% compared to traditional methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper nests agentic AI inside a bilevel optimizer to adapt physical layer configs to changing policies, with a 57.2% gain claimed but no experimental details supplied.

read the letter

This paper's main contribution is a bilevel optimization framework called Agentic-LTPO that uses agentic AI at the upper level to generate configurations for physical layer problems based on changing operator policies. It reports a 57.2% gain in long-term performance for a cell-free MIMO beamforming case.

What is new here is the multi-agent decision process with retrieval-augmented experience-based verification in the upper level, which translates policies and history into structured lower-level optimization problems, while the lower level uses a closed-form beamformer for real-time decisions. This approach directly targets the limitation of fixed-objective methods in dynamic environments.

The paper does well in outlining a concrete embodiment for the MIMO use case and emphasizing the need for adaptability in real-time systems with stringent constraints.

The soft spots center on the experimental results. The 57.2% figure is presented without information on the baselines, number of trials, or statistical analysis, making it difficult to verify the claim. The key assumption that the upper-level multi-agent process produces stable, improving configurations without instability needs more evidence from the full experiments to hold up.

This work is for researchers in wireless communications interested in integrating AI for long-term, policy-driven optimization. Readers dealing with bilevel problems or agentic systems in physical layer contexts could find the framework useful.

I recommend sending it to peer review. The idea is worth a closer look at the details and experiments, even with the current gaps in result presentation.

Referee Report

2 major / 1 minor

Summary. The paper introduces Agentic-LTPO, a bilevel optimization framework in which an upper-level multi-agent AI process (with retrieval-augmented verification) translates dynamic operator policies and historical data into structured configurations for a lower-level physical-layer optimizer. The framework is instantiated for cell-free MIMO beamforming using a closed-form lower-level beamformer. Experiments are reported to show strong adaptability to policy changes and a 57.2% long-term performance gain relative to traditional methods.

Significance. If the reported performance gain and stability claims are substantiated with full experimental protocols, the work would offer a concrete mechanism for policy-driven adaptation in real-time wireless systems, bridging agentic AI with classical optimization. The bilevel separation and closed-form lower level are potentially reusable design patterns for other physical-layer problems under evolving constraints.

major comments (2)

[Abstract / Experiments] Abstract and experimental section: the central claim of a 57.2% long-term performance improvement is presented without any information on the baselines used, the number of independent trials, statistical tests, confidence intervals, or the precise definition of the long-term metric. This information is required to evaluate whether the gain is attributable to the proposed framework rather than to implementation details or favorable simulation conditions.
[Section describing upper-level agentic process] Upper-level multi-agent process: the manuscript must demonstrate that the retrieval-augmented verification step reliably prevents the generation of lower-level configurations that become unstable or infeasible when operator policies change; without such evidence or a counter-example analysis, the adaptability claim rests on an unverified assumption about the upper-level process.

minor comments (1)

[Introduction / Framework overview] Notation for the bilevel nesting (upper-level policy mapping versus lower-level beamformer) should be introduced with explicit symbols and a diagram early in the paper to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our experimental results and the reliability of the upper-level process. We address each point below and will revise the manuscript to incorporate the requested details and analyses.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental section: the central claim of a 57.2% long-term performance improvement is presented without any information on the baselines used, the number of independent trials, statistical tests, confidence intervals, or the precise definition of the long-term metric. This information is required to evaluate whether the gain is attributable to the proposed framework rather than to implementation details or favorable simulation conditions.

Authors: We agree that these experimental details are missing from the current version and are essential for substantiating the claims. In the revised manuscript, we will expand the experimental section (and update the abstract) to explicitly list the baselines (fixed-policy weighted sum-rate maximization and myopic single-agent RL), report results over 50 independent Monte Carlo trials with different random seeds, include paired t-tests (p < 0.01) and 95% confidence intervals on the 57.2% gain, and define the long-term metric as the time-averaged cumulative weighted sum-rate over a 1000-slot horizon under policy switches every 200 slots. These additions will directly address the concern that the gain might stem from implementation artifacts. revision: yes
Referee: [Section describing upper-level agentic process] Upper-level multi-agent process: the manuscript must demonstrate that the retrieval-augmented verification step reliably prevents the generation of lower-level configurations that become unstable or infeasible when operator policies change; without such evidence or a counter-example analysis, the adaptability claim rests on an unverified assumption about the upper-level process.

Authors: The current manuscript describes the retrieval-augmented verification but does not provide explicit counter-example analysis or failure-rate statistics. We will add a dedicated subsection with new experiments that quantify the effect: without verification, 18% of generated configurations become infeasible or cause beamformer instability under sudden policy changes (e.g., from sum-rate to fairness emphasis); with verification, all 200 tested policy transitions remain feasible and stable. These results will be obtained by deliberately injecting policy shifts and logging the verification rejection rate, thereby substantiating the adaptability claim rather than leaving it as an assumption. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces Agentic-LTPO as a bilevel framework where agentic AI generates upper-level problem configurations from policies and experiences, solved at the lower level via closed-form beamforming for cell-free MIMO. The 57.2% long-term performance gain is reported strictly as an experimental outcome from simulations comparing against traditional methods, not as a quantity derived from or fitted within the framework equations themselves. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain; the multi-agent retrieval-augmented verification is presented as an independent mechanism rather than a tautology. The central claim remains externally falsifiable via the described experiments and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that lower-level physical layer problems admit closed-form solutions once upper-level configurations are supplied, plus the implicit assumption that agent-generated configurations remain stable across policy changes.

axioms (1)

domain assumption Lower-level physical layer problems admit closed-form solutions once upper-level configurations are supplied.
Invoked to enable real-time decisions in the lower level.

pith-pipeline@v0.9.1-grok · 5716 in / 1181 out tokens · 26616 ms · 2026-06-25T23:40:06.409181+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 1 linked inside Pith

[1]

Cell-free massive MIMO versus small cells,

H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,”IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, 2017

2017
[2]

An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,

Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,”IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331–4340, 2011

2011
[3]

From cells to freedom: 6G’s evolution- ary shift with cell-free massive MIMO,

G. Femenias and F. Riera-Palou, “From cells to freedom: 6G’s evolution- ary shift with cell-free massive MIMO,”IEEE Trans. Mobile Comput., vol. 24, no. 2, pp. 812–829, 2025

2025
[5]

Deep reinforcement learning for online resource allocation in network slicing,

Y . Cai, P. Cheng, Z. Chen, M. Ding, B. Vucetic, and Y . Li, “Deep reinforcement learning for online resource allocation in network slicing,” IEEE Trans. Mobile Comput., vol. 23, no. 6, pp. 7099–7116, 2024

2024
[6]

Joint trajectory and passive beamforming design for intelligent reflecting surface-aided UA V communications: A deep reinforcement learning approach,

L. Wang, K. Wang, C. Pan, and N. Aslam, “Joint trajectory and passive beamforming design for intelligent reflecting surface-aided UA V communications: A deep reinforcement learning approach,”IEEE Trans. Mobile Comput., vol. 22, no. 11, pp. 6543–6553, 2023

2023
[7]

Hierarchical index retrieval-driven wireless network intent translation with LLM,

J. Wang, L. Guo, J. Wu, C. Yan, H. Sun, L. Zhang, Z. Zhuang, Q. Qi, and J. Liao, “Hierarchical index retrieval-driven wireless network intent translation with LLM,”IEEE Trans. Mobile Comput., vol. 24, no. 10, pp. 9837–9851, 2025

2025
[8]

Intent-based radio scheduler for RAN slicing: Learning to deal with different network scenarios,

C. V . Nahum, S. D’Oro, P. Batista, C. B. Both, K. V . Cardoso, A. Klautau, and T. Melodia, “Intent-based radio scheduler for RAN slicing: Learning to deal with different network scenarios,”IEEE Trans. Mobile Comput., vol. 25, no. 3, pp. 3229–3246, 2026

2026
[10]

Large language models for wireless communications: From adaptation to autonomy,

L. Liang, H. Ye, Y . Sheng, O. Wang, J. Wang, S. Jin, and G. Y . Li, “Large language models for wireless communications: From adaptation to autonomy,”IEEE Commun. Mag., 2026, early Access

2026
[12]

Multi-cell MIMO cooperative networks: A new look at interference,

D. Gesbert, S. Hanly, H. Huang, S. S. Shitz, O. Simeone, and W. Yu, “Multi-cell MIMO cooperative networks: A new look at interference,” IEEE J. Sel. Areas Commun., vol. 28, no. 9, pp. 1380–1408, 2010

2010
[13]

Noncooperative cellular wireless with unlimited num- bers of base station antennas,

T. L. Marzetta, “Noncooperative cellular wireless with unlimited num- bers of base station antennas,”IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, 2010

2010
[14]

Massive MIMO for next generation wireless systems,

E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,”IEEE Commun. Mag., vol. 52, no. 2, pp. 186–195, 2014

2014
[15]

Pre- coding and power optimization in cell-free massive MIMO systems,

E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao, “Pre- coding and power optimization in cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459, 2017

2017
[16]

Ubiquitous cell-free massive MIMO communications,

G. Interdonato, E. Bj ¨ornson, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,”EURASIP J. Wireless Commun. Netw., vol. 2019, p. 197, 2019

2019
[17]

Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,

E. Bj ¨ornson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, 2020

2020
[18]

Local partial zero-forcing precoding for cell-free massive MIMO,

G. Interdonato, M. Karlsson, E. Bj ¨ornson, and E. G. Larsson, “Local partial zero-forcing precoding for cell-free massive MIMO,”IEEE Trans. Wireless Commun., vol. 19, no. 7, pp. 4758–4774, 2020

2020
[19]

Deep learning for intelligent wireless networks: A comprehensive survey,

Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless networks: A comprehensive survey,”IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 2595–2621, 2018

2018
[20]

Deep-learning-based wireless resource allocation with application to vehicular networks,

L. Liang, H. Ye, G. Yu, and G. Y . Li, “Deep-learning-based wireless resource allocation with application to vehicular networks,”Proc. IEEE, vol. 108, no. 2, pp. 341–356, 2020

2020
[21]

Deep reinforce- ment learning for dynamic multichannel access in wireless networks,

S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforce- ment learning for dynamic multichannel access in wireless networks,” IEEE Trans. Cogn. Commun. Netw., vol. 4, no. 2, pp. 257–265, 2018

2018
[22]

Learning to optimize: Training deep neural networks for interference management,

H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: Training deep neural networks for interference management,”IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438– 5453, 2018

2018
[23]

Lorm: Learning to optimize for resource management in wireless networks with few training samples,

Y . Shen, Y . Shi, J. Zhang, and K. B. Letaief, “Lorm: Learning to optimize for resource management in wireless networks with few training samples,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 665–679, 2020

2020
[24]

Model- driven deep learning for physical layer communications,

H. He, S. Jin, C.-K. Wen, F. Gao, G. Y . Li, and Z. Xu, “Model- driven deep learning for physical layer communications,”IEEE Wireless Commun., vol. 26, no. 5, pp. 77–83, 2019

2019
[25]

Matrix-inverse-free deep unfolding of the weighted MMSE beamforming algorithm,

L. Pellaco, M. Bengtsson, and J. Jald ´en, “Matrix-inverse-free deep unfolding of the weighted MMSE beamforming algorithm,”IEEE Open J. Commun. Soc., vol. 3, pp. 65–81, 2022

2022
[26]

Unfolding WMMSE using graph neural networks for efficient power allocation,

A. Chowdhury, G. Verma, C. Rao, A. Swami, and S. Segarra, “Unfolding WMMSE using graph neural networks for efficient power allocation,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 6004–6017, 2021

2021
[27]

Deep graph unfolding for beamforming in MU-MIMO interference networks,

A. Chowdhury, G. Verma, A. Swami, and S. Segarra, “Deep graph unfolding for beamforming in MU-MIMO interference networks,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 4889–4903, 2024

2024
[28]

Knowledge-driven resource allocation for wireless networks: A WMMSE unrolled graph neural network ap- proach,

H. Yang, N. Cheng, R. Sun, W. Quan, R. Chai, K. Aldubaikhy, A. Alqasir, and X. Shen, “Knowledge-driven resource allocation for wireless networks: A WMMSE unrolled graph neural network ap- proach,”IEEE Internet Things J., vol. 11, no. 10, 2024

2024
[29]

A survey on intent-driven networks,

L. Pang, C. Yang, D. Chen, Y . Song, and M. Guizani, “A survey on intent-driven networks,”IEEE Access, vol. 8, pp. 22 862–22 873, 2020

2020
[30]

A survey on intent-based networking,

A. Leivadeas and M. Falkner, “A survey on intent-based networking,” IEEE Commun. Surveys Tuts., vol. 25, no. 1, pp. 625–655, 2023

2023
[31]

Towards intent-based network management: Large language models for intent extraction in 5G core networks,

D. M. Manias, A. Chouman, and A. Shami, “Towards intent-based network management: Large language models for intent extraction in 5G core networks,” inProc. Int. Conf. Des. Reliable Commun. Netw., 2024

2024
[32]

Large language models meet next-generation networking technologies: A review,

C.-N. Hang, P.-D. Yu, R. Morabito, and C.-W. Tan, “Large language models meet next-generation networking technologies: A review,”Fu- ture Internet, vol. 16, no. 10, p. 365, 2024

2024
[33]

Large language models for wireless communications: From adaptation to autonomy,

L. Liang, H. Ye, Y . Sheng, O. Wang, J. Wang, S. Jin, and G. Y . Li, “Large language models for wireless communications: From adaptation to autonomy,” arXiv preprint arXiv:2507.21524, 2025

arXiv 2025
[34]

Large language model (LLM)-enabled in-context learning for wireless network optimization: A case study of power control,

H. Zhou, C. Hu, D. Yuan, Y . Yuan, D. Wu, X. Liu, and J. C. Zhang, “Large language model (LLM)-enabled in-context learning for wireless network optimization: A case study of power control,” arXiv preprint arXiv:2408.00214, 2024

arXiv 2024
[35]

LLM-optira: LLM- driven optimization of resource allocation for non-convex problems in wireless communications,

X. Peng, Y . Liu, Y . Cang, C. Cao, and M. Chen, “LLM-optira: LLM- driven optimization of resource allocation for non-convex problems in wireless communications,” arXiv preprint arXiv:2505.02091, 2025

arXiv 2025
[36]

Designing network algorithms via large language models,

Z. He, A. Gottipati, L. Qiu, X. Luo, K. Xu, Y . Yang, and F. Y . Yan, “Designing network algorithms via large language models,” inProc. ACM Workshop Hot Topics Networks, 2024, pp. 205–212

2024
[37]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inInt. Conf. Learn. Represent., 2023

2023
[38]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdv. Neural Inf. Process. Syst., 2023

2023
[39]

Autogen: Enabling next-gen LLM applications via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhanget al., “Autogen: Enabling next-gen LLM applications via multi-agent conversation,” inConf. Lang. Model., 2024

2024
[40]

AgentRAN: An agentic AI architecture for autonomous control of open 6G networks,

M. Elkael, S. D’Oro, L. Bonati, M. Polese, Y . Lee, K. Furueda, and T. Melodia, “AgentRAN: An agentic AI architecture for autonomous control of open 6G networks,” arXiv preprint arXiv:2508.17778, 2025

arXiv 2025
[41]

Toward autonomous O-RAN: A multi-scale agentic AI framework for real-time network control and management,

H. Navidan, M. Cheraghinia, J. Fontaine, M. Seif, E. De Poorter, H. V . Poor, I. Moerman, and A. Shahid, “Toward autonomous O-RAN: A multi-scale agentic AI framework for real-time network control and management,” arXiv preprint arXiv:2602.14117, 2026

Pith/arXiv arXiv 2026
[42]

Multi-agentic AI for conflict- aware rapp policy orchestration in open RAN,

H. Li, Y . Wu, and D. Simeonidou, “Multi-agentic AI for conflict- aware rapp policy orchestration in open RAN,” arXiv preprint arXiv:2603.07375, 2026

arXiv 2026
[43]

Agentic AI for intent-driven optimization in cell-free O-RAN,

M. H. Shokouhi and V . W. S. Wong, “Agentic AI for intent-driven optimization in cell-free O-RAN,” arXiv preprint arXiv:2602.22539, 2026

arXiv 2026
[44]

SweetSpot: An analytical model for predicting energy efficiency of LLM inference,

H. P. Cavagna, A. Proia, G. Madella, G. B. Esposito, F. Antici, D. Cesarini, Z. Kiziltan, and A. Bartolini, “SweetSpot: An analytical model for predicting energy efficiency of LLM inference,” inProc. ACM/SPEC Int. Conf. Perform. Eng., 2026

2026
[45]

CVX: Matlab software for disciplined convex programming, version 2.0,

CVX Research, Inc., “CVX: Matlab software for disciplined convex programming, version 2.0,” Aug. 2012. [Online]. Available: https: //cvxr.com/cvx

2012

[1] [1]

Cell-free massive MIMO versus small cells,

H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,”IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, 2017

2017

[2] [2]

An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,

Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,”IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331–4340, 2011

2011

[3] [3]

From cells to freedom: 6G’s evolution- ary shift with cell-free massive MIMO,

G. Femenias and F. Riera-Palou, “From cells to freedom: 6G’s evolution- ary shift with cell-free massive MIMO,”IEEE Trans. Mobile Comput., vol. 24, no. 2, pp. 812–829, 2025

2025

[4] [5]

Deep reinforcement learning for online resource allocation in network slicing,

Y . Cai, P. Cheng, Z. Chen, M. Ding, B. Vucetic, and Y . Li, “Deep reinforcement learning for online resource allocation in network slicing,” IEEE Trans. Mobile Comput., vol. 23, no. 6, pp. 7099–7116, 2024

2024

[5] [6]

Joint trajectory and passive beamforming design for intelligent reflecting surface-aided UA V communications: A deep reinforcement learning approach,

L. Wang, K. Wang, C. Pan, and N. Aslam, “Joint trajectory and passive beamforming design for intelligent reflecting surface-aided UA V communications: A deep reinforcement learning approach,”IEEE Trans. Mobile Comput., vol. 22, no. 11, pp. 6543–6553, 2023

2023

[6] [7]

Hierarchical index retrieval-driven wireless network intent translation with LLM,

J. Wang, L. Guo, J. Wu, C. Yan, H. Sun, L. Zhang, Z. Zhuang, Q. Qi, and J. Liao, “Hierarchical index retrieval-driven wireless network intent translation with LLM,”IEEE Trans. Mobile Comput., vol. 24, no. 10, pp. 9837–9851, 2025

2025

[7] [8]

Intent-based radio scheduler for RAN slicing: Learning to deal with different network scenarios,

C. V . Nahum, S. D’Oro, P. Batista, C. B. Both, K. V . Cardoso, A. Klautau, and T. Melodia, “Intent-based radio scheduler for RAN slicing: Learning to deal with different network scenarios,”IEEE Trans. Mobile Comput., vol. 25, no. 3, pp. 3229–3246, 2026

2026

[8] [10]

Large language models for wireless communications: From adaptation to autonomy,

L. Liang, H. Ye, Y . Sheng, O. Wang, J. Wang, S. Jin, and G. Y . Li, “Large language models for wireless communications: From adaptation to autonomy,”IEEE Commun. Mag., 2026, early Access

2026

[9] [12]

Multi-cell MIMO cooperative networks: A new look at interference,

D. Gesbert, S. Hanly, H. Huang, S. S. Shitz, O. Simeone, and W. Yu, “Multi-cell MIMO cooperative networks: A new look at interference,” IEEE J. Sel. Areas Commun., vol. 28, no. 9, pp. 1380–1408, 2010

2010

[10] [13]

Noncooperative cellular wireless with unlimited num- bers of base station antennas,

T. L. Marzetta, “Noncooperative cellular wireless with unlimited num- bers of base station antennas,”IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, 2010

2010

[11] [14]

Massive MIMO for next generation wireless systems,

E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,”IEEE Commun. Mag., vol. 52, no. 2, pp. 186–195, 2014

2014

[12] [15]

Pre- coding and power optimization in cell-free massive MIMO systems,

E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao, “Pre- coding and power optimization in cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459, 2017

2017

[13] [16]

Ubiquitous cell-free massive MIMO communications,

G. Interdonato, E. Bj ¨ornson, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Ubiquitous cell-free massive MIMO communications,”EURASIP J. Wireless Commun. Netw., vol. 2019, p. 197, 2019

2019

[14] [17]

Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,

E. Bj ¨ornson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 77–90, 2020

2020

[15] [18]

Local partial zero-forcing precoding for cell-free massive MIMO,

G. Interdonato, M. Karlsson, E. Bj ¨ornson, and E. G. Larsson, “Local partial zero-forcing precoding for cell-free massive MIMO,”IEEE Trans. Wireless Commun., vol. 19, no. 7, pp. 4758–4774, 2020

2020

[16] [19]

Deep learning for intelligent wireless networks: A comprehensive survey,

Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless networks: A comprehensive survey,”IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 2595–2621, 2018

2018

[17] [20]

Deep-learning-based wireless resource allocation with application to vehicular networks,

L. Liang, H. Ye, G. Yu, and G. Y . Li, “Deep-learning-based wireless resource allocation with application to vehicular networks,”Proc. IEEE, vol. 108, no. 2, pp. 341–356, 2020

2020

[18] [21]

Deep reinforce- ment learning for dynamic multichannel access in wireless networks,

S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforce- ment learning for dynamic multichannel access in wireless networks,” IEEE Trans. Cogn. Commun. Netw., vol. 4, no. 2, pp. 257–265, 2018

2018

[19] [22]

Learning to optimize: Training deep neural networks for interference management,

H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: Training deep neural networks for interference management,”IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438– 5453, 2018

2018

[20] [23]

Lorm: Learning to optimize for resource management in wireless networks with few training samples,

Y . Shen, Y . Shi, J. Zhang, and K. B. Letaief, “Lorm: Learning to optimize for resource management in wireless networks with few training samples,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 665–679, 2020

2020

[21] [24]

Model- driven deep learning for physical layer communications,

H. He, S. Jin, C.-K. Wen, F. Gao, G. Y . Li, and Z. Xu, “Model- driven deep learning for physical layer communications,”IEEE Wireless Commun., vol. 26, no. 5, pp. 77–83, 2019

2019

[22] [25]

Matrix-inverse-free deep unfolding of the weighted MMSE beamforming algorithm,

L. Pellaco, M. Bengtsson, and J. Jald ´en, “Matrix-inverse-free deep unfolding of the weighted MMSE beamforming algorithm,”IEEE Open J. Commun. Soc., vol. 3, pp. 65–81, 2022

2022

[23] [26]

Unfolding WMMSE using graph neural networks for efficient power allocation,

A. Chowdhury, G. Verma, C. Rao, A. Swami, and S. Segarra, “Unfolding WMMSE using graph neural networks for efficient power allocation,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 6004–6017, 2021

2021

[24] [27]

Deep graph unfolding for beamforming in MU-MIMO interference networks,

A. Chowdhury, G. Verma, A. Swami, and S. Segarra, “Deep graph unfolding for beamforming in MU-MIMO interference networks,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 4889–4903, 2024

2024

[25] [28]

Knowledge-driven resource allocation for wireless networks: A WMMSE unrolled graph neural network ap- proach,

H. Yang, N. Cheng, R. Sun, W. Quan, R. Chai, K. Aldubaikhy, A. Alqasir, and X. Shen, “Knowledge-driven resource allocation for wireless networks: A WMMSE unrolled graph neural network ap- proach,”IEEE Internet Things J., vol. 11, no. 10, 2024

2024

[26] [29]

A survey on intent-driven networks,

L. Pang, C. Yang, D. Chen, Y . Song, and M. Guizani, “A survey on intent-driven networks,”IEEE Access, vol. 8, pp. 22 862–22 873, 2020

2020

[27] [30]

A survey on intent-based networking,

A. Leivadeas and M. Falkner, “A survey on intent-based networking,” IEEE Commun. Surveys Tuts., vol. 25, no. 1, pp. 625–655, 2023

2023

[28] [31]

Towards intent-based network management: Large language models for intent extraction in 5G core networks,

D. M. Manias, A. Chouman, and A. Shami, “Towards intent-based network management: Large language models for intent extraction in 5G core networks,” inProc. Int. Conf. Des. Reliable Commun. Netw., 2024

2024

[29] [32]

Large language models meet next-generation networking technologies: A review,

C.-N. Hang, P.-D. Yu, R. Morabito, and C.-W. Tan, “Large language models meet next-generation networking technologies: A review,”Fu- ture Internet, vol. 16, no. 10, p. 365, 2024

2024

[30] [33]

Large language models for wireless communications: From adaptation to autonomy,

L. Liang, H. Ye, Y . Sheng, O. Wang, J. Wang, S. Jin, and G. Y . Li, “Large language models for wireless communications: From adaptation to autonomy,” arXiv preprint arXiv:2507.21524, 2025

arXiv 2025

[31] [34]

Large language model (LLM)-enabled in-context learning for wireless network optimization: A case study of power control,

H. Zhou, C. Hu, D. Yuan, Y . Yuan, D. Wu, X. Liu, and J. C. Zhang, “Large language model (LLM)-enabled in-context learning for wireless network optimization: A case study of power control,” arXiv preprint arXiv:2408.00214, 2024

arXiv 2024

[32] [35]

LLM-optira: LLM- driven optimization of resource allocation for non-convex problems in wireless communications,

X. Peng, Y . Liu, Y . Cang, C. Cao, and M. Chen, “LLM-optira: LLM- driven optimization of resource allocation for non-convex problems in wireless communications,” arXiv preprint arXiv:2505.02091, 2025

arXiv 2025

[33] [36]

Designing network algorithms via large language models,

Z. He, A. Gottipati, L. Qiu, X. Luo, K. Xu, Y . Yang, and F. Y . Yan, “Designing network algorithms via large language models,” inProc. ACM Workshop Hot Topics Networks, 2024, pp. 205–212

2024

[34] [37]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inInt. Conf. Learn. Represent., 2023

2023

[35] [38]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdv. Neural Inf. Process. Syst., 2023

2023

[36] [39]

Autogen: Enabling next-gen LLM applications via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhanget al., “Autogen: Enabling next-gen LLM applications via multi-agent conversation,” inConf. Lang. Model., 2024

2024

[37] [40]

AgentRAN: An agentic AI architecture for autonomous control of open 6G networks,

M. Elkael, S. D’Oro, L. Bonati, M. Polese, Y . Lee, K. Furueda, and T. Melodia, “AgentRAN: An agentic AI architecture for autonomous control of open 6G networks,” arXiv preprint arXiv:2508.17778, 2025

arXiv 2025

[38] [41]

Toward autonomous O-RAN: A multi-scale agentic AI framework for real-time network control and management,

H. Navidan, M. Cheraghinia, J. Fontaine, M. Seif, E. De Poorter, H. V . Poor, I. Moerman, and A. Shahid, “Toward autonomous O-RAN: A multi-scale agentic AI framework for real-time network control and management,” arXiv preprint arXiv:2602.14117, 2026

Pith/arXiv arXiv 2026

[39] [42]

Multi-agentic AI for conflict- aware rapp policy orchestration in open RAN,

H. Li, Y . Wu, and D. Simeonidou, “Multi-agentic AI for conflict- aware rapp policy orchestration in open RAN,” arXiv preprint arXiv:2603.07375, 2026

arXiv 2026

[40] [43]

Agentic AI for intent-driven optimization in cell-free O-RAN,

M. H. Shokouhi and V . W. S. Wong, “Agentic AI for intent-driven optimization in cell-free O-RAN,” arXiv preprint arXiv:2602.22539, 2026

arXiv 2026

[41] [44]

SweetSpot: An analytical model for predicting energy efficiency of LLM inference,

H. P. Cavagna, A. Proia, G. Madella, G. B. Esposito, F. Antici, D. Cesarini, Z. Kiziltan, and A. Bartolini, “SweetSpot: An analytical model for predicting energy efficiency of LLM inference,” inProc. ACM/SPEC Int. Conf. Perform. Eng., 2026

2026

[42] [45]

CVX: Matlab software for disciplined convex programming, version 2.0,

CVX Research, Inc., “CVX: Matlab software for disciplined convex programming, version 2.0,” Aug. 2012. [Online]. Available: https: //cvxr.com/cvx

2012