pith. sign in

arxiv: 2605.30630 · v1 · pith:434ZWNZRnew · submitted 2026-05-28 · 💻 cs.NI

Temporally Encoded Double DQN for Proactive PRB Allocation in O-RAN Enabled Industrial Networks

Pith reviewed 2026-06-29 00:02 UTC · model grok-4.3

classification 💻 cs.NI
keywords O-RANDeep Reinforcement LearningPRB AllocationIndustrial NetworksLSTMSlice SatisfactionProactive SchedulingURLLC
0
0 comments X

The pith

An LSTM encoder inside Double DQN enables proactive PRB allocation that raises slice satisfaction and buffer stability under industrial traffic loads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that industrial workloads produce temporally correlated, process-driven traffic that defeats static or reactive PRB schedulers in O-RAN. It therefore builds an xApp that feeds slice KPI sequences into an LSTM before the Double DQN selects allocations, letting the agent anticipate future demand rather than react after buffers grow. A CTMC model supplies the bursty arrival patterns that arise from concurrent machines. Experiments under moderate and heavy loads report higher slice satisfaction and steadier buffers, with the longest observation window delivering the largest measured gains.

Core claim

Embedding an LSTM encoder inside a Double DQN xApp lets the controller learn sequential dependencies among slice KPIs and issue predictive PRB decisions that maintain higher slice satisfaction and buffer stability than non-temporal baselines, with the advantage growing as the input sequence length increases under the CTMC traffic model.

What carries the argument

LSTM encoder placed before the Double DQN that processes time-ordered slice KPIs to produce predictive Q-values for proactive PRB allocation.

If this is right

  • Slice satisfaction rises under both moderate and heavy loads.
  • Buffer occupancy becomes more stable, reducing latency violations.
  • Longer KPI history windows produce the largest measured improvements.
  • The approach meets URLLC reliability targets more consistently than reactive schedulers in non-stationary conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same temporal encoding could be applied to other time-correlated 5G slices if their KPI sequences exhibit similar autocorrelation.
  • Replacing the CTMC generator with online learning from live counters would remove the main modeling assumption and allow direct comparison on real hardware.
  • The framework implies that over-provisioning margins can be tightened once the allocator anticipates process bursts instead of reacting to them.

Load-bearing premise

The continuous-time Markov chain traffic model accurately reproduces the concurrency and burst patterns of real industrial machines.

What would settle it

Run the same xApp on a live O-RAN testbed with recorded industrial machine traces and check whether the reported gains in slice satisfaction and buffer stability disappear or shrink.

Figures

Figures reproduced from arXiv: 2605.30630 by Elahe Delavari, Junaid Farooq, Xingqi Wu.

Figure 1
Figure 1. Figure 1: O-RAN–enabled industrial network architecture with [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proposed LSTM–Double DQN xApp. greedy policy, and transitions are sampled from a replay buffer for stable training. The MLP–Double DQN processes the instantaneous six￾dimensional state through two fully connected layers of width 256 with ReLU activations. A final linear layer outputs Q-values for the nine discrete PRB-adjustment actions. This feedforward design corresponds to the memoryless agent used as a… view at source ↗
Figure 3
Figure 3. Figure 3: Training episodic return comparison for the MLP-Double [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Buffer vs. number of UEs. [2] Ericsson Research, “Boosting smart manufacturing with 5G wireless connectivity,” Ericsson Technology Review, 2019. [3] S. Ludwig et al., “Reference network and localization architecture for smart manufacturing based on 5G,” in Advances in System-Integrated Intelligence, (Cham), pp. 470–479, Springer International Publishing, 2023. [4] S. Wijethilaka and M. Liyanage, “Survey on… view at source ↗
Figure 5
Figure 5. Figure 5: Efficiency vs. number of UEs. The results confirm that temporally encoded DRL agents provide improved stability and robustness compared to mem￾oryless counterparts. In particular, the LSTM–Double DQN with longer sequence lengths (ℓ = 16) consistently outperforms both the MLP and shorter LSTM variants across all metrics. Short sequences (e.g., ℓ = 4) may fail to capture the CTMC structure, leading to noisy … view at source ↗
read the original abstract

Fifth-generation (5G) wireless systems are increasingly adopted in smart manufacturing to support heterogeneous industrial workloads through services such as enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low-Latency Communication (URLLC). However, industrial traffic is inherently process-driven and temporally correlated. So, static or reactive schedulers in the Open Radio Access Network (O-RAN) are inadequate for such non-stationary conditions, leading to sub-optimal utilization and violation of latency-reliability guarantees. This paper proposes a temporal-aware deep reinforcement learning (DRL) xApp for proactive Physical Resource Block (PRB) allocation in O-RAN-enabled industrial networks. The proposed framework integrates a long short-term memory (LSTM) encoder within a Double Deep Q-Network (DQN) to model sequential dependencies among slice-level Key Performance Indicators (KPIs), enabling predictive and stable decision-making. A continuous-time Markov chain (CTMC) traffic model is incorporated to emulate machine concurrency and process burstiness. Experimental results show that the LSTM-Double DQN improves slice satisfaction, and buffer stability under moderate and heavy load, with the longest sequence window providing the strongest gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes integrating an LSTM encoder into a Double DQN xApp for proactive PRB allocation in O-RAN industrial networks to handle temporally correlated traffic. A CTMC model emulates machine concurrency and burstiness; simulation results are reported to show gains in slice satisfaction and buffer stability under moderate/heavy load, with the longest sequence window performing best.

Significance. If the results hold under a traffic model validated against real traces, the work would offer a concrete approach to predictive resource allocation that accounts for process-driven correlations, potentially improving utilization and QoS guarantees in 5G industrial deployments beyond static or reactive baselines.

major comments (2)
  1. [Abstract] Abstract: the claim that LSTM-Double DQN improves slice satisfaction and buffer stability is presented without any description of the experimental setup, choice of baselines, number of runs, statistical significance tests, or error bars, rendering the central empirical claim unverifiable from the manuscript.
  2. [Traffic model] Traffic model section: the CTMC is used exclusively to generate all reported results, yet no evidence is supplied that its parameters were fitted to or tested against actual industrial packet traces (inter-arrival distributions, burst lengths, or temporal correlations); this is load-bearing for whether the observed gains generalize beyond the synthetic model.
minor comments (1)
  1. Notation for the LSTM state encoding and the precise definition of the sequence window length should be clarified with an accompanying diagram or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity and framing of our work. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that LSTM-Double DQN improves slice satisfaction and buffer stability is presented without any description of the experimental setup, choice of baselines, number of runs, statistical significance tests, or error bars, rendering the central empirical claim unverifiable from the manuscript.

    Authors: We agree that the abstract would benefit from additional methodological context to support verifiability of the claims. In the revised manuscript we will expand the abstract to briefly note the simulation environment (O-RAN xApp evaluated via ns-3 with the CTMC traffic generator), the baselines (standard Double DQN and non-predictive threshold-based allocation), the use of multiple independent runs with results reported as averages accompanied by error bars, and that reported gains are observed under moderate and heavy load regimes. Full experimental details, including any statistical tests, will remain in the body of the paper. revision: yes

  2. Referee: [Traffic model] Traffic model section: the CTMC is used exclusively to generate all reported results, yet no evidence is supplied that its parameters were fitted to or tested against actual industrial packet traces (inter-arrival distributions, burst lengths, or temporal correlations); this is load-bearing for whether the observed gains generalize beyond the synthetic model.

    Authors: The CTMC is a synthetic model constructed to reproduce machine concurrency and process-driven burstiness characteristic of industrial traffic; its parameters were not obtained by fitting to real packet traces. We will revise the traffic-model section to state explicitly that the model is synthetic and motivated by typical industrial process behaviors rather than calibrated to measured traces. We will also add a short limitations paragraph noting that broader generalization would require validation against real industrial traces and identifying this as future work. We cannot supply such validation in the current revision because the requisite proprietary trace data are not available to the authors. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on independent simulation benchmarks

full rationale

The paper describes a proposed LSTM-Double DQN xApp trained within a CTMC-generated traffic environment and reports comparative performance gains against baselines in slice satisfaction and buffer stability. No equations, definitions, or self-citations are quoted that reduce any claimed prediction or result to its own inputs by construction (e.g., no fitted parameter renamed as a prediction, no uniqueness theorem imported from prior self-work, and no ansatz smuggled via citation). The experimental framing treats the CTMC as an external emulation tool whose outputs serve as independent test cases, keeping the central claim self-contained against those benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents full enumeration; CTMC model and LSTM sequence modeling are invoked without stated validation against real traces.

axioms (1)
  • domain assumption CTMC traffic model emulates machine concurrency and process burstiness
    Invoked to generate test scenarios for the DRL agent

pith-pipeline@v0.9.1-grok · 5744 in / 1124 out tokens · 18630 ms · 2026-06-29T00:02:53.062965+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references

  1. [1]

    Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,

    M. Noor-A-Rahim, R. Zhang, M. Bennis, and H. V . Poor, “Wireless communications for smart manufacturing and industrial IoT: Existing technologies, 5G, and beyond,”IEEE Communications Surveys & Tutori- als, vol. 24, no. 3, pp. 1621–1661, 2022. 1 2 4 6 8 Number of UEs 101 102 103 104 Buffer (log scale) Buffer (Avg: eMBB & URLLC) vs Number of UEs MLP LSTM (s...

  2. [2]

    Boosting smart manufacturing with 5G wireless connectivity,

    Ericsson Research, “Boosting smart manufacturing with 5G wireless connectivity,”Ericsson Technology Review, 2019

  3. [3]

    Reference network and localization architecture for smart manufacturing based on 5G,

    S. Ludwiget al., “Reference network and localization architecture for smart manufacturing based on 5G,” inAdvances in System-Integrated Intelligence, (Cham), pp. 470–479, Springer International Publishing, 2023

  4. [4]

    Survey on Network Slicing for Internet of Things Realization in 5G Networks,

    S. Wijethilaka and M. Liyanage, “Survey on Network Slicing for Internet of Things Realization in 5G Networks,”IEEE Communications Surveys & Tutorials, vol. 23, no. 2, pp. 957–994, 2021

  5. [5]

    Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,

    N. H. Stephenson, A. J. Chiejina, N. B. Kabigting, and V . K. Shah, “Demonstration of closed loop AI-Driven RAN controllers using O-RAN SDR testbed,” inMILCOM 2023 - 2023 IEEE Military Communications Conference (MILCOM), pp. 241–242, 2023

  6. [6]

    Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,

    C.-F. Hung, C.-H. Tseng, and S.-M. Cheng, “Anomaly detection for mitigating xApp and E2 interface threats in O-RAN near-RT RIC,”IEEE Open Journal of the Communications Society, vol. 6, pp. 1682–1694, 2025

  7. [7]

    RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,

    G. M. Almeida, G. Z. Bruno, A. Huff, M. Hiltunen, E. P. Duarte, C. B. Both, and K. V . Cardoso, “RIC-O: Efficient placement of a disaggregated and distributed RAN intelligent controller with dynamic clustering of radio nodes,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 2, pp. 446–459, 2024

  8. [8]

    Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,

    M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “Understand- ing O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 1376–1411, 2023

  9. [9]

    Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,

    Y . Shi, Y . E. Sagduyu, and T. Erpek, “Reinforcement learning for dynamic resource optimization in 5G radio access network slicing,” in2020 IEEE 25th international workshop on computer aided modeling and design of communication links and networks (CAMAD), pp. 1–6, IEEE, 2020

  10. [10]

    Deep reinforcement learning-based network slicing for beyond 5G,

    K. Suh, S. Kim, Y . Ahn, S. Kim, H. Ju, and B. Shim, “Deep reinforcement learning-based network slicing for beyond 5G,”IEEE Access, vol. 10, pp. 7384–7395, 2022

  11. [11]

    Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,

    M. Alsenwi, N. H. Tran, M. Bennis, S. R. Pandey, A. K. Bairagi, and C. S. Hong, “Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning based approach,”IEEE Transactions on Wireless Communications, vol. 20, no. 7, pp. 4585–4600, 2021

  12. [12]

    The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,

    R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, “The LSTM-Based Advantage Actor-Critic Learning for Resource Management in Network Slicing With User Mobility,”IEEE Communications Letters, vol. 24, pp. 2005–2009, Sept. 2020

  13. [13]

    LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,

    K. Li, W. Ni, and F. Dressler, “LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UA V-Assisted Sensor Network,”IEEE Internet of Things Journal, vol. 9, pp. 4179–4189, Mar. 2022

  14. [14]

    Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,

    F. Lotfi and F. Afghah, “Open RAN LSTM traffic prediction and slice management using deep reinforcement learning,” in2023 57th Asilomar Conference on Signals, Systems, and Computers, pp. 646–650, IEEE, 2023

  15. [15]

    Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,

    Y . Cai, P. Cheng, Z. Chen, M. Ding, B. Vucetic, and Y . Li, “Deep Reinforcement Learning for Online Resource Allocation in Network Slicing,”IEEE Transactions on Mobile Computing, vol. 23, pp. 7099– 7116, June 2024

  16. [16]

    AI-RAN simulator

    “AI-RAN simulator.” https://github.com/ntutangyun/ai-ran-sim. Accessed: 2025-11-24