Temporally Encoded Double DQN for Proactive PRB Allocation in O-RAN Enabled Industrial Networks
Pith reviewed 2026-06-29 00:02 UTC · model grok-4.3
The pith
An LSTM encoder inside Double DQN enables proactive PRB allocation that raises slice satisfaction and buffer stability under industrial traffic loads.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Embedding an LSTM encoder inside a Double DQN xApp lets the controller learn sequential dependencies among slice KPIs and issue predictive PRB decisions that maintain higher slice satisfaction and buffer stability than non-temporal baselines, with the advantage growing as the input sequence length increases under the CTMC traffic model.
What carries the argument
LSTM encoder placed before the Double DQN that processes time-ordered slice KPIs to produce predictive Q-values for proactive PRB allocation.
If this is right
- Slice satisfaction rises under both moderate and heavy loads.
- Buffer occupancy becomes more stable, reducing latency violations.
- Longer KPI history windows produce the largest measured improvements.
- The approach meets URLLC reliability targets more consistently than reactive schedulers in non-stationary conditions.
Where Pith is reading between the lines
- The same temporal encoding could be applied to other time-correlated 5G slices if their KPI sequences exhibit similar autocorrelation.
- Replacing the CTMC generator with online learning from live counters would remove the main modeling assumption and allow direct comparison on real hardware.
- The framework implies that over-provisioning margins can be tightened once the allocator anticipates process bursts instead of reacting to them.
Load-bearing premise
The continuous-time Markov chain traffic model accurately reproduces the concurrency and burst patterns of real industrial machines.
What would settle it
Run the same xApp on a live O-RAN testbed with recorded industrial machine traces and check whether the reported gains in slice satisfaction and buffer stability disappear or shrink.
Figures
read the original abstract
Fifth-generation (5G) wireless systems are increasingly adopted in smart manufacturing to support heterogeneous industrial workloads through services such as enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low-Latency Communication (URLLC). However, industrial traffic is inherently process-driven and temporally correlated. So, static or reactive schedulers in the Open Radio Access Network (O-RAN) are inadequate for such non-stationary conditions, leading to sub-optimal utilization and violation of latency-reliability guarantees. This paper proposes a temporal-aware deep reinforcement learning (DRL) xApp for proactive Physical Resource Block (PRB) allocation in O-RAN-enabled industrial networks. The proposed framework integrates a long short-term memory (LSTM) encoder within a Double Deep Q-Network (DQN) to model sequential dependencies among slice-level Key Performance Indicators (KPIs), enabling predictive and stable decision-making. A continuous-time Markov chain (CTMC) traffic model is incorporated to emulate machine concurrency and process burstiness. Experimental results show that the LSTM-Double DQN improves slice satisfaction, and buffer stability under moderate and heavy load, with the longest sequence window providing the strongest gains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes integrating an LSTM encoder into a Double DQN xApp for proactive PRB allocation in O-RAN industrial networks to handle temporally correlated traffic. A CTMC model emulates machine concurrency and burstiness; simulation results are reported to show gains in slice satisfaction and buffer stability under moderate/heavy load, with the longest sequence window performing best.
Significance. If the results hold under a traffic model validated against real traces, the work would offer a concrete approach to predictive resource allocation that accounts for process-driven correlations, potentially improving utilization and QoS guarantees in 5G industrial deployments beyond static or reactive baselines.
major comments (2)
- [Abstract] Abstract: the claim that LSTM-Double DQN improves slice satisfaction and buffer stability is presented without any description of the experimental setup, choice of baselines, number of runs, statistical significance tests, or error bars, rendering the central empirical claim unverifiable from the manuscript.
- [Traffic model] Traffic model section: the CTMC is used exclusively to generate all reported results, yet no evidence is supplied that its parameters were fitted to or tested against actual industrial packet traces (inter-arrival distributions, burst lengths, or temporal correlations); this is load-bearing for whether the observed gains generalize beyond the synthetic model.
minor comments (1)
- Notation for the LSTM state encoding and the precise definition of the sequence window length should be clarified with an accompanying diagram or pseudocode.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help improve the clarity and framing of our work. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that LSTM-Double DQN improves slice satisfaction and buffer stability is presented without any description of the experimental setup, choice of baselines, number of runs, statistical significance tests, or error bars, rendering the central empirical claim unverifiable from the manuscript.
Authors: We agree that the abstract would benefit from additional methodological context to support verifiability of the claims. In the revised manuscript we will expand the abstract to briefly note the simulation environment (O-RAN xApp evaluated via ns-3 with the CTMC traffic generator), the baselines (standard Double DQN and non-predictive threshold-based allocation), the use of multiple independent runs with results reported as averages accompanied by error bars, and that reported gains are observed under moderate and heavy load regimes. Full experimental details, including any statistical tests, will remain in the body of the paper. revision: yes
-
Referee: [Traffic model] Traffic model section: the CTMC is used exclusively to generate all reported results, yet no evidence is supplied that its parameters were fitted to or tested against actual industrial packet traces (inter-arrival distributions, burst lengths, or temporal correlations); this is load-bearing for whether the observed gains generalize beyond the synthetic model.
Authors: The CTMC is a synthetic model constructed to reproduce machine concurrency and process-driven burstiness characteristic of industrial traffic; its parameters were not obtained by fitting to real packet traces. We will revise the traffic-model section to state explicitly that the model is synthetic and motivated by typical industrial process behaviors rather than calibrated to measured traces. We will also add a short limitations paragraph noting that broader generalization would require validation against real industrial traces and identifying this as future work. We cannot supply such validation in the current revision because the requisite proprietary trace data are not available to the authors. revision: partial
Circularity Check
No circularity: derivation relies on independent simulation benchmarks
full rationale
The paper describes a proposed LSTM-Double DQN xApp trained within a CTMC-generated traffic environment and reports comparative performance gains against baselines in slice satisfaction and buffer stability. No equations, definitions, or self-citations are quoted that reduce any claimed prediction or result to its own inputs by construction (e.g., no fitted parameter renamed as a prediction, no uniqueness theorem imported from prior self-work, and no ansatz smuggled via citation). The experimental framing treats the CTMC as an external emulation tool whose outputs serve as independent test cases, keeping the central claim self-contained against those benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption CTMC traffic model emulates machine concurrency and process burstiness
Reference graph
Works this paper leans on
-
[1]
Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,
M. Noor-A-Rahim, R. Zhang, M. Bennis, and H. V . Poor, “Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,”IEEE Communications Surveys & Tutori- als, vol. 24, no. 3, pp. 1621–1661, 2022. 1 2 4 6 8 Number of UEs 101 102 103 104 Buffer (log scale) Buffer (Avg: eMBB & URLLC) vs Number of UEs MLP LSTM (s...
2022
-
[2]
Boosting smart manufacturing with 5G wireless connectivity,
Ericsson Research, “Boosting smart manufacturing with 5G wireless connectivity,”Ericsson Technology Review, 2019
2019
-
[3]
Reference network and localization architecture for smart manufacturing based on 5G,
S. Ludwiget al., “Reference network and localization architecture for smart manufacturing based on 5G,” inAdvances in System-Integrated Intelligence, (Cham), pp. 470–479, Springer International Publishing, 2023
2023
-
[4]
Survey on Network Slicing for Internet of Things Realization in 5G Networks,
S. Wijethilaka and M. Liyanage, “Survey on Network Slicing for Internet of Things Realization in 5G Networks,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 957–994, 2021
2021
-
[5]
Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,
N. H. Stephenson, A. J. Chiejina, N. B. Kabigting, and V . K. Shah, “Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,” inMILCOM 2023 - 2023 IEEE Military Communications Conference (MILCOM), pp. 241–242, 2023
2023
-
[6]
Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,
C.-F. Hung, C.-H. Tseng, and S.-M. Cheng, “Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,”IEEE Open Journal of the Communications Society, vol. 6, pp. 1682–1694, 2025
2025
-
[7]
RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,
G. M. Almeida, G. Z. Bruno, A. Huff, M. Hiltunen, E. P. Duarte, C. B. Both, and K. V . Cardoso, “RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 446–459, 2024
2024
-
[8]
Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,
M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 1376–1411, 2023
2023
-
[9]
Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,
Y . Shi, Y . E. Sagduyu, and T. Erpek, “Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,” in2020 IEEE 25th international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp. 1–6, IEEE, 2020
2020
-
[10]
Deep reinforcement learning-based network slicing for beyond 5G,
K. Suh, S. Kim, Y . Ahn, S. Kim, H. Ju, and B. Shim, “Deep reinforcement learning-based network slicing for beyond 5G,”IEEE Access, vol. 10, pp. 7384–7395, 2022
2022
-
[11]
Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,
M. Alsenwi, N. H. Tran, M. Bennis, S. R. Pandey, A. K. Bairagi, and C. S. Hong, “Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,”IEEE Transactions on Wireless Communications, vol. 20, no. 7, pp. 4585–4600, 2021
2021
-
[12]
The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,
R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, “The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,”IEEE Communications Letters, vol. 24, pp. 2005–2009, Sept. 2020
2005
-
[13]
LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,
K. Li, W. Ni, and F. Dressler, “LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,”IEEE Internet of Things Journal, vol. 9, pp. 4179–4189, Mar. 2022
2022
-
[14]
Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,
F. Lotfi and F. Afghah, “Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,” in2023 57th Asilomar Conference on Signals, Systems, and Computers, pp. 646–650, IEEE, 2023
2023
-
[15]
Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,
Y . Cai, P. Cheng, Z. Chen, M. Ding, B. Vucetic, and Y . Li, “Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,”IEEE Transactions on Mobile Computing, vol. 23, pp. 7099– 7116, June 2024
2024
-
[16]
AI-RAN simulator
“AI-RAN simulator.” https://github.com/ntutangyun/ai-ran-sim. Accessed: 2025-11-24
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.