pith. machine review for the scientific record. sign in

arxiv: 2604.21587 · v1 · submitted 2026-04-23 · 💻 cs.IT · math.IT

Recognition: unknown

Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications

Cheng Zhang, Shuangbo Xiong, Wen Wang, Wenwu Yu, Yongming Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:43 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords cell-free MIMOenergy efficiencyconstrained Markov decision processreinforcement learning pretrainingresource allocationdelay constraintsGaussian mixture modelPPO algorithm
0
0 comments X

The pith

A virtual constrained Markov decision process with evidence-aware Gaussian mixture modeling enables safer and more sample-efficient reinforcement learning for energy-efficient resource allocation in cell-free MIMO systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a safe reinforcement learning method for resource allocation in cell-free multiple-input multiple-output networks to maximize energy efficiency while respecting delay violation constraints. It proposes an offline pretraining framework built on a virtual constrained Markov decision process that includes separate modules for reward and cost prediction, initial state distribution, and state transitions. An evidence-aware conditional Gaussian mixture model handles sparse data and distribution shifts in the transition module. If the approach succeeds, agents can begin with substantially higher performance and lower risk before any real-system interaction occurs. A reader would care because this reduces the safety and efficiency problems that currently limit deployment of learning-based controllers in delay-sensitive wireless applications.

Core claim

By modeling the cell-free MIMO resource allocation task as a virtual constrained Markov decision process and pretraining a proximal policy optimization agent offline, the framework lets the agent reach twice the initial energy efficiency, sustain only a 1 percent delay constraint violation rate, converge to a 4.7 percent higher final energy efficiency, and cut exploration steps by half relative to a non-pretrained baseline, all while matching diffusion-model performance at one-fourteenth the computational cost.

What carries the argument

The virtual constrained Markov decision process (CMDP) whose state-transition module is realized by an evidence-aware conditional Gaussian mixture model (EA-CGMM) to support safe offline pretraining of the primal-dual PPO policy.

If this is right

  • The pretrained agent begins with twice the energy efficiency of the non-pretrained baseline.
  • It maintains a delay violation rate of only 1 percent throughout learning.
  • Final converged energy efficiency is 4.7 percent higher and requires 50 percent fewer exploration steps.
  • The framework delivers performance comparable to diffusion-model pretraining at 14 times lower computational complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same virtual-CMDP construction could be applied to other constrained wireless problems such as power control or user scheduling in different network topologies.
  • If the EA-CGMM successfully captures distribution drift, the method may reduce the volume of real-world interaction data needed for safe learning in time-varying channels.
  • The observed reduction in exploration steps suggests the approach could scale to larger cell-free deployments where online trial-and-error would be prohibitively costly or disruptive.
  • Hybrid offline-online training pipelines that begin with this style of virtual pretraining might become a practical route for deploying learning agents in live 6G-style networks.

Load-bearing premise

The virtual CMDP faithfully reproduces the dynamics of the real cell-free MIMO system so that performance gains observed in simulation transfer to actual deployments without large degradation.

What would settle it

Running the pretrained agent on a physical cell-free MIMO testbed and measuring whether its initial energy efficiency is at least twice that of the cold-start baseline while keeping delay violations at or below 1 percent; a clear shortfall in either metric would falsify the transfer claim.

Figures

Figures reproduced from arXiv: 2604.21587 by Cheng Zhang, Shuangbo Xiong, Wen Wang, Wenwu Yu, Yongming Huang.

Figure 1
Figure 1. Figure 1: Proposed pretraining framework based on CMDP modeling. view at source ↗
Figure 3
Figure 3. Figure 3: The structure of VAE-ChMDN. To address these limitations, we employ our Cholesky decomposition-based VAE-Mixture Density Network (VAE￾ChMDN) [49] that integrates VAE with MDN featuring full￾covariance GMM modeling. This hybrid architecture employs a novel Cholesky-decomposition based training mechanism that simultaneously ensures the validity of covariance matrix and numerical stability during training pro… view at source ↗
Figure 4
Figure 4. Figure 4: The joint distribution over (st+1, wt) is first modeled by a three-component GMM. After marginalizing wt, component-wise density evaluation is performed. Using the Mahalanobis distance in Eq. (34) together with the chi-squared distribution property, unreliable components (e.g., component 2) are filtered out, and the mixture weights are uniformly redistributed among the remaining components. The conditional… view at source ↗
Figure 5
Figure 5. Figure 5: Partial top view of the “O1” scenario [51], showing selected APs (6, 17, 18) and constrained UE mobility within User Grid 3. The selected APs and User Grid constraint are marked within red boxes. B. Performance of Module Implementations in Virtual CMDP In this subsection, our proposed virtual CMDP module implementations are evaluated and compared with correspond￾ing SOTA and baseline methods. Specifically,… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of prediction residual distributions between KAN and MLP architectures, where horizontal axis represents ground truth and vertical axis view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of conditional generative modeling on a toy example using two half-moon benchmark, where horizontal axis represents target condition view at source ↗
Figure 8
Figure 8. Figure 8: Policy evolution during pretraining phase: (a) Average EE across view at source ↗
Figure 11
Figure 11. Figure 11: Performance comparison of pretrained and non-pretrained PPO view at source ↗
Figure 9
Figure 9. Figure 9: Learning curves comparison of pretrained agents and non-pretrained view at source ↗
Figure 10
Figure 10. Figure 10: Log inference FLOPs of each module versus system scale view at source ↗
read the original abstract

Cell-free multiple-input multiple-output (CF-MIMO) architecture significantly enhances wireless network performance, offering a promising solution for delay-sensitive applications. This paper investigates the resource allocation problem in CF-MIMO systems, aiming to maximize energy efficiency (EE) while satisfying delay violation rate constraint. We design a Proximal Policy Optimization (PPO) with a primal-dual method to solve it. To address the low sample efficiency and safety risks caused by cold-start of the designed safe deep reinforcement learning (DRL) method, we propose a novel offline pretraining framework based on virtual constrained Markov decision process (CMDP) modeling. The virtual CMDP consists of reward and cost prediction module, initial-state distribution module and state transition module. Notably, we propose an evidence-aware conditional Gaussian Mixture Model (EA-CGMM) inference approach to mitigate data sparsity and distribution drift issues in state transition modeling. Simulation results demonstrate the effectiveness of CMDP modeling and validate the safety and efficiency of the proposed pretraining framework. Specifically, compared with non-pretrained baseline, the agent pretrained through our proposed framework achieves twice the initial EE and maintains a low delay constraint violation rate of $1\%$, while ultimately converging to an EE that is $4.7\%$ higher with a $50\%$ reduction in exploration steps. Additionally, our proposed pretraining framework implementation exhibits comparable performance to the SOTA diffusion model-based implementation, while achieving a $14$-fold reduction in computational complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper addresses energy-efficient resource allocation in cell-free MIMO systems under delay violation constraints by formulating it as a constrained Markov decision process solved via PPO with a primal-dual method. To mitigate cold-start issues in safe DRL, it introduces an offline pretraining framework based on a virtual CMDP whose components include reward/cost predictors, initial-state distribution, and state transitions modeled by a proposed evidence-aware conditional Gaussian mixture model (EA-CGMM) to handle sparsity and drift. Simulations claim that the pretrained agent achieves 2× initial EE, 1% delay violation rate, 4.7% higher converged EE, and 50% fewer exploration steps versus a non-pretrained baseline, while matching SOTA diffusion-model performance at 14× lower complexity.

Significance. If the transfer claims are substantiated, the work offers a practical, lower-complexity generative pretraining approach for safe RL in delay-sensitive wireless systems, with concrete quantitative gains over baselines and a complexity advantage over diffusion methods. The explicit comparison to non-pretrained and SOTA methods, along with the CMDP decomposition, provides a clear benchmark for future generative-RL work in communications.

major comments (2)
  1. [Simulation results] Simulation results section: All reported gains (2× initial EE, 1% violation, +4.7% final EE, 50% step reduction) are obtained by training and evaluating the PPO agent inside the same virtual CMDP whose transition kernel is the EA-CGMM fitted to the pretraining data. No mismatch experiments (different mobility patterns, hardware impairments, or pilot contamination not captured by the CGMM) are described, so the safety and zero-shot transfer claims rest on an untested modeling assumption.
  2. [Virtual CMDP modeling] Virtual CMDP and EA-CGMM modeling sections: The paper asserts that the virtual CMDP accurately represents real CF-MIMO dynamics and that EA-CGMM mitigates distribution drift, yet provides no quantitative validation (e.g., prediction error on held-out real traces or sensitivity analysis) that would confirm these modules are load-bearing for the observed performance lift.
minor comments (2)
  1. [Abstract] The abstract and simulation description omit key experimental details such as number of Monte Carlo runs, statistical significance tests, exact baseline implementations, and hyperparameter settings for the PPO/primal-dual agent.
  2. [EA-CGMM inference] Notation for the EA-CGMM parameters (evidence weighting, mixture components) and the precise form of the reward/cost predictors could be clarified with explicit equations to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the validation of the virtual CMDP and transfer performance.

read point-by-point responses
  1. Referee: [Simulation results] Simulation results section: All reported gains (2× initial EE, 1% violation, +4.7% final EE, 50% step reduction) are obtained by training and evaluating the PPO agent inside the same virtual CMDP whose transition kernel is the EA-CGMM fitted to the pretraining data. No mismatch experiments (different mobility patterns, hardware impairments, or pilot contamination not captured by the CGMM) are described, so the safety and zero-shot transfer claims rest on an untested modeling assumption.

    Authors: We agree that the reported gains are demonstrated within the virtual CMDP environment. This is by design for offline pretraining, where the EA-CGMM is fitted to collected system data to approximate dynamics. To substantiate robustness and transfer, we will add new experiments in the revised manuscript evaluating the pretrained agent in online environments with altered mobility patterns, hardware impairments, and pilot contamination levels not fully represented in the pretraining data. revision: yes

  2. Referee: [Virtual CMDP modeling] Virtual CMDP and EA-CGMM modeling sections: The paper asserts that the virtual CMDP accurately represents real CF-MIMO dynamics and that EA-CGMM mitigates distribution drift, yet provides no quantitative validation (e.g., prediction error on held-out real traces or sensitivity analysis) that would confirm these modules are load-bearing for the observed performance lift.

    Authors: We will incorporate quantitative validation in the revised manuscript. This includes the mean squared error of EA-CGMM state transition predictions on held-out simulation traces and a sensitivity analysis varying the number of Gaussian components and evidence threshold to quantify their effect on pretraining gains and drift mitigation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims.

full rationale

The paper defines a PPO primal-dual solver for the EE maximization under delay constraints, then introduces an offline pretraining stage that fits an EA-CGMM transition model to generate a virtual CMDP. Performance is reported via simulation comparisons against a non-pretrained baseline and a diffusion-model baseline, yielding concrete deltas (2× initial EE, 1 % violation, +4.7 % final EE, 50 % fewer steps). These metrics are obtained by running the trained policy inside the simulator and measuring against the same simulator's ground-truth trajectories; they do not reduce to a fitted parameter being relabeled as a prediction, nor does any load-bearing step rely on a self-citation whose content is itself unverified. The virtual-model assumption is stated explicitly as an approximation whose fidelity is tested only inside the synthetic environment, but that is a modeling limitation rather than a circular derivation. No equation or section equates the claimed gain to the fitting procedure by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claims rest on newly introduced modeling constructs whose validity is asserted via simulation outcomes only, with no external benchmarks or proofs provided in the abstract.

invented entities (2)
  • virtual constrained Markov decision process (CMDP) no independent evidence
    purpose: To enable offline pretraining by modeling reward, cost, initial state, and transition dynamics for the resource allocation problem
    Introduced to address cold-start and safety issues in the DRL method
  • evidence-aware conditional Gaussian Mixture Model (EA-CGMM) no independent evidence
    purpose: To model state transitions while mitigating data sparsity and distribution drift
    Proposed specifically for the state transition module in the virtual CMDP

pith-pipeline@v0.9.0 · 5570 in / 1393 out tokens · 46983 ms · 2026-05-08T13:43:29.084360+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 5 canonical work pages · 4 internal anchors

  1. [1]

    The quick and the dead: The rise of deterministic networks,

    B. Varga, J. Farkas, D. Fedyk, L. Berger, and D. Brungard, “The quick and the dead: The rise of deterministic networks,”ComSoc TECHNOLOGY NEWS(CTN), FEBRUARY, 2021

  2. [2]

    Performance of integrated 3GPP 5G and IEEE TSN networks,

    P. M. Rost and T. Kolding, “Performance of integrated 3GPP 5G and IEEE TSN networks,”IEEE Communications Standards Magazine, vol. 6, no. 2, pp. 51–56, 2022

  3. [3]

    A comprehensive survey of wireless time-sensitive networking (TSN): Architecture, technologies, applications, and open issues,

    K. Zanbouri, M. Noor-A-Rahim, J. John, C. J. Sreenan, H. V . Poor, and D. Pesch, “A comprehensive survey of wireless time-sensitive networking (TSN): Architecture, technologies, applications, and open issues,”IEEE Communications Surveys & Tutorials, 2024

  4. [4]

    Toward deterministic communications in 6G networks: State of the art, open challenges and the way forward,

    G. P. Sharma, D. Patel, J. Sachs, M. De Andrade, J. Farkas, J. Harmatos, B. Varga, H.-P. Bernhard, R. Muzaffar, M. Ahmedet al., “Toward deterministic communications in 6G networks: State of the art, open challenges and the way forward,”IEEE Access, vol. 11, pp. 106 898– 106 923, 2023

  5. [5]

    Simple bounds on delay-constrained capacity and delay-violation probability of joint queue and channel- aware wireless transmissions,

    L. Li, W. Chen, and K. B. Letaief, “Simple bounds on delay-constrained capacity and delay-violation probability of joint queue and channel- aware wireless transmissions,”IEEE Transactions on Wireless Commu- nications, vol. 22, no. 4, pp. 2744–2759, 2022

  6. [6]

    Delay perfor- mance of wireless communications with imperfect CSI and finite-length coding,

    S. Schiessl, H. Al-Zubaidy, M. Skoglund, and J. Gross, “Delay perfor- mance of wireless communications with imperfect CSI and finite-length coding,”IEEE Transactions on Communications, vol. 66, no. 12, pp. 6527–6541, 2018

  7. [7]

    Joint power and blocklength optimization for URLLC in a factory automation scenario,

    H. Ren, C. Pan, Y . Deng, M. Elkashlan, and A. Nallanathan, “Joint power and blocklength optimization for URLLC in a factory automation scenario,”IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp. 1786–1801, 2020

  8. [8]

    Joint uplink and downlink resource allocation toward energy-efficient transmission for URLLC,

    K. Li, P. Zhu, Y . Wang, F.-C. Zheng, and X. You, “Joint uplink and downlink resource allocation toward energy-efficient transmission for URLLC,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 7, pp. 2176–2192, 2023

  9. [9]

    Delay deterministic cell-free mimo transmission via safety reinforcement learning,

    F. Meng, C. Zhang, Y . Huang, and X. You, “Delay deterministic cell-free mimo transmission via safety reinforcement learning,”IEEE Transactions on Wireless Communications, 2025

  10. [10]

    Study on scenarios and requirements for next generation access technologies,

    3GPP, “Study on scenarios and requirements for next generation access technologies,” 3rd Generation Partnership Project (3GPP), Tech. Rep. TS 38.913, 2024

  11. [11]

    Performance of multidevice downlink cell-free system under finite blocklength for URLLC with hard deadlines,

    Z. Zhang, X. You, D. Wang, X. Xia, P. Zhu, Y . Jiang, C. Liang, and J. Wang, “Performance of multidevice downlink cell-free system under finite blocklength for URLLC with hard deadlines,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 7, pp. 2090–2106, 2023

  12. [12]

    Diversity enabled low-latency wireless communications with hard delay constraints,

    C. Li, W. Chen, and H. V . Poor, “Diversity enabled low-latency wireless communications with hard delay constraints,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 7, pp. 2107–2122, 2023

  13. [13]

    Radio resource management for ultra-reliable and low-latency communications,

    C. She, C. Yang, and T. Q. S. Quek, “Radio resource management for ultra-reliable and low-latency communications,”IEEE Communications Magazine, vol. 55, no. 6, pp. 72–78, 2017

  14. [14]

    Evaluating the impact of delay constraints in network services for intelligent network slicing based on SKM model,

    A. El-mekkawi, X. Hesselbach, and J. R. Piney, “Evaluating the impact of delay constraints in network services for intelligent network slicing based on SKM model,”Journal of Communications and Networks, vol. 23, no. 4, pp. 281–298, 2021

  15. [15]

    Achieving energy- efficient uplink urllc with mimo-aided grant-free access,

    L. Zhao, S. Yang, X. Chi, W. Chen, and S. Ma, “Achieving energy- efficient uplink urllc with mimo-aided grant-free access,”IEEE Trans- actions on Wireless Communications, vol. 21, no. 2, pp. 1407–1420, 2022

  16. [16]

    Through- put analysis of low-latency iot systems with qos constraints and finite blocklength codes,

    Y . Hu, Y . Li, M. C. Gursoy, S. Velipasalar, and A. Schmeink, “Through- put analysis of low-latency iot systems with qos constraints and finite blocklength codes,”IEEE Transactions on Vehicular Technology, vol. 69, no. 3, pp. 3093–3104, 2020

  17. [17]

    Cross-layer optimization for statistical QoS provision in C-RAN with finite-length coding,

    C. Wu, H. Lu, Y . Chen, and L. Qin, “Cross-layer optimization for statistical QoS provision in C-RAN with finite-length coding,”IEEE Transactions on Communications, vol. 72, no. 6, pp. 3393–3407, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16

  18. [18]

    Joint URLLC traffic scheduling and resource allocation for semantic communication systems,

    G. Ding, S. Liu, J. Yuan, and G. Yu, “Joint URLLC traffic scheduling and resource allocation for semantic communication systems,”IEEE Transactions on Wireless Communications, vol. 23, no. 7, pp. 7278– 7290, 2023

  19. [19]

    Achievable rate region for URLLC interference channel with finite blocklength transmission,

    W. Huang, S. Xiao, L. Wu, C. Kai, S. He, and C. Li, “Achievable rate region for URLLC interference channel with finite blocklength transmission,”IEEE Transactions on Vehicular Technology, vol. 72, no. 7, pp. 8857–8868, 2023

  20. [20]

    Machine learning for large-scale optimization in 6G wireless networks,

    Y . Shi, L. Lian, Y . Shi, Z. Wang, Y . Zhou, L. Fu, L. Bai, J. Zhang, and W. Zhang, “Machine learning for large-scale optimization in 6G wireless networks,”IEEE Communications Surveys & Tutorials, vol. 25, no. 4, pp. 2088–2132, 2023

  21. [21]

    A recursive DRL-based resource allocation method for multibeam satellite communication systems,

    H. Meng, N. Xin, H. Qin, and D. Zhao, “A recursive DRL-based resource allocation method for multibeam satellite communication systems,” Chinese Journal of Electronics, vol. 33, no. 5, pp. 1286–1295, 2024

  22. [22]

    Wireless network digital twin for 6G: Generative AI as a key enabler,

    Z. Tao, W. Xu, Y . Huang, X. Wang, and X. You, “Wireless network digital twin for 6G: Generative AI as a key enabler,”IEEE Wireless Communications, vol. 31, no. 4, pp. 24–31, 2024

  23. [23]

    Achieving maximum energy- efficiency in multi-relay ofdma cellular networks: A fractional program- ming approach,

    K. T. K. Cheung, S. Yang, and L. Hanzo, “Achieving maximum energy- efficiency in multi-relay ofdma cellular networks: A fractional program- ming approach,”IEEE Transactions on Communications, vol. 61, no. 7, pp. 2746–2757, 2013

  24. [24]

    Spectral and energy spectral efficiency optimization of joint trans- mit and receive beamforming based multi-relay mimo-ofdma cellular networks,

    ——, “Spectral and energy spectral efficiency optimization of joint trans- mit and receive beamforming based multi-relay mimo-ofdma cellular networks,”IEEE Transactions on Wireless Communications, vol. 13, no. 11, pp. 6147–6165, 2014

  25. [25]

    Flexible resource allocation for joint optimization of energy and spectral efficiency in ofdma multi- cell networks,

    W. Jing, Z. Lu, X. Wen, Z. Hu, and S. Yang, “Flexible resource allocation for joint optimization of energy and spectral efficiency in ofdma multi- cell networks,”IEEE Communications Letters, vol. 19, no. 3, pp. 451– 454, 2015

  26. [26]

    Distributed energy spectral efficiency optimization for partial/full interference alignment in multi- user multi-relay multi-cell mimo systems,

    K. T. K. Cheung, S. Yang, and L. Hanzo, “Distributed energy spectral efficiency optimization for partial/full interference alignment in multi- user multi-relay multi-cell mimo systems,”IEEE Transactions on Signal Processing, vol. 64, no. 4, pp. 882–896, 2015

  27. [27]

    Power allocation optimization for energy- efficient massive mimo aided multi-pair decode-and-forward relay sys- tems,

    F. Tan, T. Lv, and S. Yang, “Power allocation optimization for energy- efficient massive mimo aided multi-pair decode-and-forward relay sys- tems,”IEEE Transactions on Communications, vol. 65, no. 6, pp. 2368– 2381, 2017

  28. [28]

    On the energy efficiency of interference alignment in thek-user interference channel,

    X. Miao, S. Yang, C. Wang, S. Wang, and L. Hanzo, “On the energy efficiency of interference alignment in thek-user interference channel,” IEEE Access, vol. 7, pp. 97 253–97 263, 2019

  29. [29]

    Energy efficient ofdma networks maintaining statistical qos guarantees for delay-sensitive traffic,

    T. Abr ˜ao, L. D. H. Sampaio, S. Yang, K. T. K. Cheung, P. J. E. Jeszensky, and L. Hanzo, “Energy efficient ofdma networks maintaining statistical qos guarantees for delay-sensitive traffic,”IEEE Access, vol. 4, pp. 774– 791, 2016

  30. [30]

    Achieving maximum effective capacity in ofdma networks operating under statistical delay guarantee,

    T. Abr ˜ao, S. Yang, L. D. H. Sampaio, P. J. E. Jeszensky, and L. Hanzo, “Achieving maximum effective capacity in ofdma networks operating under statistical delay guarantee,”IEEE Access, vol. 5, pp. 14 333– 14 346, 2017

  31. [31]

    User scheduling and task offloading in multi-tier computing 6G vehicular network,

    H. Zhang, L. Feng, X. Liu, K. Long, and G. K. Karagiannidis, “User scheduling and task offloading in multi-tier computing 6G vehicular network,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 2, pp. 446–456, 2022

  32. [32]

    Accuracy-guaranteed collaborative DNN inference in industrial IoT via deep reinforcement learning,

    W. Wu, P. Yang, W. Zhang, C. Zhou, and X. Shen, “Accuracy-guaranteed collaborative DNN inference in industrial IoT via deep reinforcement learning,”IEEE Transactions on Industrial Informatics, vol. 17, no. 7, pp. 4988–4998, 2020

  33. [33]

    Digital twin for UA V- RIS assisted vehicular communication systems,

    M. Wu, Y . Xiao, Y . Gao, and M. Xiao, “Digital twin for UA V- RIS assisted vehicular communication systems,”IEEE Transactions on Wireless Communications, vol. 23, no. 7, pp. 7638–7651, 2024

  34. [34]

    Resource allocation for integrated sensing and communication in digital twin enabled internet of vehicles,

    Y . Gong, Y . Wei, Z. Feng, F. R. Yu, and Y . Zhang, “Resource allocation for integrated sensing and communication in digital twin enabled internet of vehicles,”IEEE Transactions on Vehicular Technology, vol. 72, no. 4, pp. 4510–4524, 2023

  35. [35]

    RIS-empowered MEC for URLLC systems with digital-twin-driven architecture,

    S. Kurma, M. Katwe, K. Singh, C. Pan, S. Mumtaz, and C.-P. Li, “RIS-empowered MEC for URLLC systems with digital-twin-driven architecture,”IEEE Transactions on Communications, vol. 72, no. 4, pp. 1983–1997, 2024

  36. [36]

    Digital twin-enhanced deep reinforcement learning for resource management in networks slicing,

    Z. Zhang, Y . Huang, C. Zhang, Q. Zheng, L. Yang, and X. You, “Digital twin-enhanced deep reinforcement learning for resource management in networks slicing,”IEEE Transactions on Communications, vol. 72, no. 10, pp. 6209–6224, 2024

  37. [37]

    Toward a fully-observable markov decision process with generative models for integrated 6G- non-terrestrial networks,

    A. Machumilane, P. Cassara, and A. Gotta, “Toward a fully-observable markov decision process with generative models for integrated 6G- non-terrestrial networks,”IEEE Open Journal of the Communications Society, vol. 4, pp. 1913–1930, 2023

  38. [38]

    Time- sensitive networking-driven deterministic low-latency communication for real-time telemedicine and e-health services,

    Y . Lu, G. Zhao, C. Chakraborty, C. Xu, L. Yang, and K. Yu, “Time- sensitive networking-driven deterministic low-latency communication for real-time telemedicine and e-health services,”IEEE Transactions on Consumer Electronics, vol. 69, no. 4, pp. 734–744, 2023

  39. [39]

    Enhancing radio resource management in ran slicing by diffusion model and digital twin,

    S. Xiong, Y . Huang, S. He, and C. Zhang, “Enhancing radio resource management in ran slicing by diffusion model and digital twin,”IEEE Transactions on Communications, 2025

  40. [40]

    Generative AI-driven digital twin for mobile networks,

    H. Chai, H. Wang, T. Li, and Z. Wang, “Generative AI-driven digital twin for mobile networks,”IEEE Network, 2024

  41. [41]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

  42. [42]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

  43. [43]

    Theme transformer: Symbolic music generation with theme-conditioned trans- former,

    Y .-J. Shih, S.-L. Wu, F. Zalkow, M. M ¨uller, and Y .-H. Yang, “Theme transformer: Symbolic music generation with theme-conditioned trans- former,”IEEE Transactions on Multimedia, vol. 25, pp. 3495–3508, 2023

  44. [44]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347

  45. [45]

    KAN: Kolmogorov-Arnold Networks

    Z. Liu, Y . Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Solja ˇci´c, T. Y . Hou, and M. Tegmark, “KAN: Kolmogorov-Arnold networks,” 2025. [Online]. Available: https://arxiv.org/abs/2404.19756

  46. [46]

    A novel interpretable short- term load forecasting method based on Kolmogorov-Arnold networks,

    B. Jiang, Y . Wang, Q. Wang, and H. Geng, “A novel interpretable short- term load forecasting method based on Kolmogorov-Arnold networks,” IEEE Transactions on Power Systems, vol. 40, no. 1, pp. 1180–1183, 2025

  47. [47]

    Kolmogorov-Arnold networks for semi-supervised impedance inversion,

    M. Liu, F. Bossmann, and J. Ma, “Kolmogorov-Arnold networks for semi-supervised impedance inversion,”IEEE Geoscience and Remote Sensing Letters, vol. 22, pp. 1–5, 2025

  48. [48]

    On a constructive proof of Kolmogorov’s superposition theorem,

    J. Braun and M. Griebel, “On a constructive proof of Kolmogorov’s superposition theorem,”Constructive approximation, vol. 30, pp. 653– 675, 2009

  49. [49]

    Generative learning-powered probing beam optimization for Cell-Free hybrid beamforming,

    C. Zhang, S. Xiong, M. He, L. Wei, Y . Huang, and W. Zhang, “Generative learning-powered probing beam optimization for Cell-Free hybrid beamforming,”IEEE Wireless Communications Letters, vol. 13, no. 12, pp. 3380–3384, 2024

  50. [50]

    Technical report: Training mixture density networks with full covariance matrices,

    J. Kruse, “Technical report: Training mixture density networks with full covariance matrices,”arXiv preprint arXiv:2003.05739, 2020

  51. [51]

    DeepMIMO: A generic deep learning dataset for mil- limeter wave and massive MIMO applications,

    A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for mil- limeter wave and massive MIMO applications,” inProc. of Information Theory and Applications Workshop (ITA), San Diego, CA, Feb 2019, pp. 1–8

  52. [52]

    Wireless InSite,

    Remcom, “Wireless InSite,” http://www.remcom.com/wireless-insite

  53. [53]

    Denoising Diffusion Implicit Models

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

  54. [54]

    Classifier-Free Diffusion Guidance

    J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

  55. [55]

    A kernel two-sample test,

    A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ¨olkopf, and A. Smola, “A kernel two-sample test,”J. Mach. Learn. Res., vol. 13, no. null, p. 723–773, Mar. 2012

  56. [56]

    U-Net: Convolutional net- works for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III

  57. [57]

    Springer, 2015, pp. 234–241

  58. [58]

    Neely,Stochastic network optimization with application to commu- nication and queueing systems

    M. Neely,Stochastic network optimization with application to commu- nication and queueing systems. Morgan & Claypool Publishers, 2010