Temporally Encoded Double DQN for Proactive PRB Allocation in O-RAN Enabled Industrial Networks

Elahe Delavari; Junaid Farooq; Xingqi Wu

arxiv: 2605.30630 · v1 · pith:434ZWNZRnew · submitted 2026-05-28 · 💻 cs.NI

Temporally Encoded Double DQN for Proactive PRB Allocation in O-RAN Enabled Industrial Networks

Elahe Delavari , Xingqi Wu , Junaid Farooq This is my paper

Pith reviewed 2026-06-29 00:02 UTC · model grok-4.3

classification 💻 cs.NI

keywords O-RANDeep Reinforcement LearningPRB AllocationIndustrial NetworksLSTMSlice SatisfactionProactive SchedulingURLLC

0 comments

The pith

An LSTM encoder inside Double DQN enables proactive PRB allocation that raises slice satisfaction and buffer stability under industrial traffic loads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that industrial workloads produce temporally correlated, process-driven traffic that defeats static or reactive PRB schedulers in O-RAN. It therefore builds an xApp that feeds slice KPI sequences into an LSTM before the Double DQN selects allocations, letting the agent anticipate future demand rather than react after buffers grow. A CTMC model supplies the bursty arrival patterns that arise from concurrent machines. Experiments under moderate and heavy loads report higher slice satisfaction and steadier buffers, with the longest observation window delivering the largest measured gains.

Core claim

Embedding an LSTM encoder inside a Double DQN xApp lets the controller learn sequential dependencies among slice KPIs and issue predictive PRB decisions that maintain higher slice satisfaction and buffer stability than non-temporal baselines, with the advantage growing as the input sequence length increases under the CTMC traffic model.

What carries the argument

LSTM encoder placed before the Double DQN that processes time-ordered slice KPIs to produce predictive Q-values for proactive PRB allocation.

If this is right

Slice satisfaction rises under both moderate and heavy loads.
Buffer occupancy becomes more stable, reducing latency violations.
Longer KPI history windows produce the largest measured improvements.
The approach meets URLLC reliability targets more consistently than reactive schedulers in non-stationary conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same temporal encoding could be applied to other time-correlated 5G slices if their KPI sequences exhibit similar autocorrelation.
Replacing the CTMC generator with online learning from live counters would remove the main modeling assumption and allow direct comparison on real hardware.
The framework implies that over-provisioning margins can be tightened once the allocator anticipates process bursts instead of reacting to them.

Load-bearing premise

The continuous-time Markov chain traffic model accurately reproduces the concurrency and burst patterns of real industrial machines.

What would settle it

Run the same xApp on a live O-RAN testbed with recorded industrial machine traces and check whether the reported gains in slice satisfaction and buffer stability disappear or shrink.

Figures

Figures reproduced from arXiv: 2605.30630 by Elahe Delavari, Junaid Farooq, Xingqi Wu.

**Figure 2.** Figure 2: Proposed LSTM–Double DQN xApp. greedy policy, and transitions are sampled from a replay buffer for stable training. The MLP–Double DQN processes the instantaneous sixdimensional state through two fully connected layers of width 256 with ReLU activations. A final linear layer outputs Q-values for the nine discrete PRB-adjustment actions. This feedforward design corresponds to the memoryless agent used as a… view at source ↗

**Figure 3.** Figure 3: Training episodic return comparison for the MLP-Double [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 6.** Figure 6: Buffer vs. number of UEs. [2] Ericsson Research, “Boosting smart manufacturing with 5G wireless connectivity,” Ericsson Technology Review, 2019. [3] S. Ludwig et al., “Reference network and localization architecture for smart manufacturing based on 5G,” in Advances in System-Integrated Intelligence, (Cham), pp. 470–479, Springer International Publishing, 2023. [4] S. Wijethilaka and M. Liyanage, “Survey on… view at source ↗

**Figure 5.** Figure 5: Efficiency vs. number of UEs. The results confirm that temporally encoded DRL agents provide improved stability and robustness compared to memoryless counterparts. In particular, the LSTM–Double DQN with longer sequence lengths (ℓ = 16) consistently outperforms both the MLP and shorter LSTM variants across all metrics. Short sequences (e.g., ℓ = 4) may fail to capture the CTMC structure, leading to noisy … view at source ↗

read the original abstract

Fifth-generation (5G) wireless systems are increasingly adopted in smart manufacturing to support heterogeneous industrial workloads through services such as enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low-Latency Communication (URLLC). However, industrial traffic is inherently process-driven and temporally correlated. So, static or reactive schedulers in the Open Radio Access Network (O-RAN) are inadequate for such non-stationary conditions, leading to sub-optimal utilization and violation of latency-reliability guarantees. This paper proposes a temporal-aware deep reinforcement learning (DRL) xApp for proactive Physical Resource Block (PRB) allocation in O-RAN-enabled industrial networks. The proposed framework integrates a long short-term memory (LSTM) encoder within a Double Deep Q-Network (DQN) to model sequential dependencies among slice-level Key Performance Indicators (KPIs), enabling predictive and stable decision-making. A continuous-time Markov chain (CTMC) traffic model is incorporated to emulate machine concurrency and process burstiness. Experimental results show that the LSTM-Double DQN improves slice satisfaction, and buffer stability under moderate and heavy load, with the longest sequence window providing the strongest gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows simulation gains from adding LSTM to Double DQN for proactive PRB allocation under a CTMC traffic model, but the model itself is unvalidated against real industrial traces.

read the letter

The core takeaway is that this work applies an LSTM encoder inside Double DQN to capture temporal patterns in industrial slice KPIs and claims better slice satisfaction and buffer stability than baselines, especially with longer history windows. The setup targets O-RAN xApps for 5G industrial use cases with eMBB and URLLC traffic.

What is new is the specific integration of the LSTM inside the Double DQN agent for this O-RAN PRB allocation task. Prior DRL schedulers exist, but the temporal encoding step for process-driven industrial traffic is a reasonable incremental step. The CTMC traffic generator is also presented as a way to capture machine concurrency and burstiness.

The paper does a clean job of framing the problem: static or reactive schedulers fall short when traffic is non-stationary, and the agent is meant to act proactively. The reported trends with sequence length are at least internally consistent within the simulation.

The main soft spot is the traffic model. The results rest entirely on a CTMC that emulates concurrency and burstiness, yet there is no indication the parameters were fitted or tested against actual packet traces from manufacturing equipment. If the synthetic arrivals and correlations do not match live workloads, the performance edge is tied to the model rather than the algorithm. The abstract also gives no experimental details on baselines, run counts, or statistical tests, which makes it hard to judge how robust the gains are.

This paper is aimed at researchers working on reinforcement learning for wireless resource allocation in O-RAN or industrial 5G. A reader already familiar with DQN variants and O-RAN xApp design will get the most out of it. It is worth sending to peer review because the problem is practical and the proposed combination is clearly described, even though the traffic validation gap will need to be addressed in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes integrating an LSTM encoder into a Double DQN xApp for proactive PRB allocation in O-RAN industrial networks to handle temporally correlated traffic. A CTMC model emulates machine concurrency and burstiness; simulation results are reported to show gains in slice satisfaction and buffer stability under moderate/heavy load, with the longest sequence window performing best.

Significance. If the results hold under a traffic model validated against real traces, the work would offer a concrete approach to predictive resource allocation that accounts for process-driven correlations, potentially improving utilization and QoS guarantees in 5G industrial deployments beyond static or reactive baselines.

major comments (2)

[Abstract] Abstract: the claim that LSTM-Double DQN improves slice satisfaction and buffer stability is presented without any description of the experimental setup, choice of baselines, number of runs, statistical significance tests, or error bars, rendering the central empirical claim unverifiable from the manuscript.
[Traffic model] Traffic model section: the CTMC is used exclusively to generate all reported results, yet no evidence is supplied that its parameters were fitted to or tested against actual industrial packet traces (inter-arrival distributions, burst lengths, or temporal correlations); this is load-bearing for whether the observed gains generalize beyond the synthetic model.

minor comments (1)

Notation for the LSTM state encoding and the precise definition of the sequence window length should be clarified with an accompanying diagram or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity and framing of our work. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that LSTM-Double DQN improves slice satisfaction and buffer stability is presented without any description of the experimental setup, choice of baselines, number of runs, statistical significance tests, or error bars, rendering the central empirical claim unverifiable from the manuscript.

Authors: We agree that the abstract would benefit from additional methodological context to support verifiability of the claims. In the revised manuscript we will expand the abstract to briefly note the simulation environment (O-RAN xApp evaluated via ns-3 with the CTMC traffic generator), the baselines (standard Double DQN and non-predictive threshold-based allocation), the use of multiple independent runs with results reported as averages accompanied by error bars, and that reported gains are observed under moderate and heavy load regimes. Full experimental details, including any statistical tests, will remain in the body of the paper. revision: yes
Referee: [Traffic model] Traffic model section: the CTMC is used exclusively to generate all reported results, yet no evidence is supplied that its parameters were fitted to or tested against actual industrial packet traces (inter-arrival distributions, burst lengths, or temporal correlations); this is load-bearing for whether the observed gains generalize beyond the synthetic model.

Authors: The CTMC is a synthetic model constructed to reproduce machine concurrency and process-driven burstiness characteristic of industrial traffic; its parameters were not obtained by fitting to real packet traces. We will revise the traffic-model section to state explicitly that the model is synthetic and motivated by typical industrial process behaviors rather than calibrated to measured traces. We will also add a short limitations paragraph noting that broader generalization would require validation against real industrial traces and identifying this as future work. We cannot supply such validation in the current revision because the requisite proprietary trace data are not available to the authors. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on independent simulation benchmarks

full rationale

The paper describes a proposed LSTM-Double DQN xApp trained within a CTMC-generated traffic environment and reports comparative performance gains against baselines in slice satisfaction and buffer stability. No equations, definitions, or self-citations are quoted that reduce any claimed prediction or result to its own inputs by construction (e.g., no fitted parameter renamed as a prediction, no uniqueness theorem imported from prior self-work, and no ansatz smuggled via citation). The experimental framing treats the CTMC as an external emulation tool whose outputs serve as independent test cases, keeping the central claim self-contained against those benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents full enumeration; CTMC model and LSTM sequence modeling are invoked without stated validation against real traces.

axioms (1)

domain assumption CTMC traffic model emulates machine concurrency and process burstiness
Invoked to generate test scenarios for the DRL agent

pith-pipeline@v0.9.1-grok · 5744 in / 1124 out tokens · 18630 ms · 2026-06-29T00:02:53.062965+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references

[1]

Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,

M. Noor-A-Rahim, R. Zhang, M. Bennis, and H. V . Poor, “Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,”IEEE Communications Surveys & Tutori- als, vol. 24, no. 3, pp. 1621–1661, 2022. 1 2 4 6 8 Number of UEs 101 102 103 104 Buffer (log scale) Buffer (Avg: eMBB & URLLC) vs Number of UEs MLP LSTM (s...

2022
[2]

Boosting smart manufacturing with 5G wireless connectivity,

Ericsson Research, “Boosting smart manufacturing with 5G wireless connectivity,”Ericsson Technology Review, 2019

2019
[3]

Reference network and localization architecture for smart manufacturing based on 5G,

S. Ludwiget al., “Reference network and localization architecture for smart manufacturing based on 5G,” inAdvances in System-Integrated Intelligence, (Cham), pp. 470–479, Springer International Publishing, 2023

2023
[4]

Survey on Network Slicing for Internet of Things Realization in 5G Networks,

S. Wijethilaka and M. Liyanage, “Survey on Network Slicing for Internet of Things Realization in 5G Networks,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 957–994, 2021

2021
[5]

Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,

N. H. Stephenson, A. J. Chiejina, N. B. Kabigting, and V . K. Shah, “Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,” inMILCOM 2023 - 2023 IEEE Military Communications Conference (MILCOM), pp. 241–242, 2023

2023
[6]

Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,

C.-F. Hung, C.-H. Tseng, and S.-M. Cheng, “Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,”IEEE Open Journal of the Communications Society, vol. 6, pp. 1682–1694, 2025

2025
[7]

RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,

G. M. Almeida, G. Z. Bruno, A. Huff, M. Hiltunen, E. P. Duarte, C. B. Both, and K. V . Cardoso, “RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 446–459, 2024

2024
[8]

Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,

M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 1376–1411, 2023

2023
[9]

Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,

Y . Shi, Y . E. Sagduyu, and T. Erpek, “Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,” in2020 IEEE 25th international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp. 1–6, IEEE, 2020

2020
[10]

Deep reinforcement learning-based network slicing for beyond 5G,

K. Suh, S. Kim, Y . Ahn, S. Kim, H. Ju, and B. Shim, “Deep reinforcement learning-based network slicing for beyond 5G,”IEEE Access, vol. 10, pp. 7384–7395, 2022

2022
[11]

Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,

M. Alsenwi, N. H. Tran, M. Bennis, S. R. Pandey, A. K. Bairagi, and C. S. Hong, “Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,”IEEE Transactions on Wireless Communications, vol. 20, no. 7, pp. 4585–4600, 2021

2021
[12]

The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,

R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, “The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,”IEEE Communications Letters, vol. 24, pp. 2005–2009, Sept. 2020

2005
[13]

LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,

K. Li, W. Ni, and F. Dressler, “LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,”IEEE Internet of Things Journal, vol. 9, pp. 4179–4189, Mar. 2022

2022
[14]

Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,

F. Lotfi and F. Afghah, “Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,” in2023 57th Asilomar Conference on Signals, Systems, and Computers, pp. 646–650, IEEE, 2023

2023
[15]

Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,

Y . Cai, P. Cheng, Z. Chen, M. Ding, B. Vucetic, and Y . Li, “Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,”IEEE Transactions on Mobile Computing, vol. 23, pp. 7099– 7116, June 2024

2024
[16]

AI-RAN simulator

“AI-RAN simulator.” https://github.com/ntutangyun/ai-ran-sim. Accessed: 2025-11-24

2025

[1] [1]

Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,

M. Noor-A-Rahim, R. Zhang, M. Bennis, and H. V . Poor, “Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,”IEEE Communications Surveys & Tutori- als, vol. 24, no. 3, pp. 1621–1661, 2022. 1 2 4 6 8 Number of UEs 101 102 103 104 Buffer (log scale) Buffer (Avg: eMBB & URLLC) vs Number of UEs MLP LSTM (s...

2022

[2] [2]

Boosting smart manufacturing with 5G wireless connectivity,

Ericsson Research, “Boosting smart manufacturing with 5G wireless connectivity,”Ericsson Technology Review, 2019

2019

[3] [3]

Reference network and localization architecture for smart manufacturing based on 5G,

S. Ludwiget al., “Reference network and localization architecture for smart manufacturing based on 5G,” inAdvances in System-Integrated Intelligence, (Cham), pp. 470–479, Springer International Publishing, 2023

2023

[4] [4]

Survey on Network Slicing for Internet of Things Realization in 5G Networks,

S. Wijethilaka and M. Liyanage, “Survey on Network Slicing for Internet of Things Realization in 5G Networks,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 957–994, 2021

2021

[5] [5]

Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,

N. H. Stephenson, A. J. Chiejina, N. B. Kabigting, and V . K. Shah, “Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,” inMILCOM 2023 - 2023 IEEE Military Communications Conference (MILCOM), pp. 241–242, 2023

2023

[6] [6]

Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,

C.-F. Hung, C.-H. Tseng, and S.-M. Cheng, “Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,”IEEE Open Journal of the Communications Society, vol. 6, pp. 1682–1694, 2025

2025

[7] [7]

RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,

G. M. Almeida, G. Z. Bruno, A. Huff, M. Hiltunen, E. P. Duarte, C. B. Both, and K. V . Cardoso, “RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 446–459, 2024

2024

[8] [8]

Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,

M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 1376–1411, 2023

2023

[9] [9]

Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,

Y . Shi, Y . E. Sagduyu, and T. Erpek, “Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,” in2020 IEEE 25th international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp. 1–6, IEEE, 2020

2020

[10] [10]

Deep reinforcement learning-based network slicing for beyond 5G,

K. Suh, S. Kim, Y . Ahn, S. Kim, H. Ju, and B. Shim, “Deep reinforcement learning-based network slicing for beyond 5G,”IEEE Access, vol. 10, pp. 7384–7395, 2022

2022

[11] [11]

Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,

M. Alsenwi, N. H. Tran, M. Bennis, S. R. Pandey, A. K. Bairagi, and C. S. Hong, “Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,”IEEE Transactions on Wireless Communications, vol. 20, no. 7, pp. 4585–4600, 2021

2021

[12] [12]

The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,

R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, “The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,”IEEE Communications Letters, vol. 24, pp. 2005–2009, Sept. 2020

2005

[13] [13]

LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,

K. Li, W. Ni, and F. Dressler, “LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,”IEEE Internet of Things Journal, vol. 9, pp. 4179–4189, Mar. 2022

2022

[14] [14]

Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,

F. Lotfi and F. Afghah, “Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,” in2023 57th Asilomar Conference on Signals, Systems, and Computers, pp. 646–650, IEEE, 2023

2023

[15] [15]

Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,

Y . Cai, P. Cheng, Z. Chen, M. Ding, B. Vucetic, and Y . Li, “Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,”IEEE Transactions on Mobile Computing, vol. 23, pp. 7099– 7116, June 2024

2024

[16] [16]

AI-RAN simulator

“AI-RAN simulator.” https://github.com/ntutangyun/ai-ran-sim. Accessed: 2025-11-24

2025