arxiv: 2605.14339 · v1 · submitted 2026-05-14 · 💻 cs.NI

Recognition: no theorem link

Sub-Band Full Duplex Resource Allocation: A Predictive Deep Reinforcement Learning Approach

Abhiram D , Aiswarya Rajan , Arin Shemeem , Vipindev Adat Vasudevan , Abdulla P

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:57 UTC · model grok-4.3

classification 💻 cs.NI

keywords sub-band full duplexresource allocationtraffic predictionBi-LSTMDDQNreinforcement learning6G networksspectrum utilization

0 comments

The pith

A hybrid Bi-LSTM and DDQN framework enables proactive sub-band allocation in SBFD systems by using traffic forecasts to guide real-time decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a predictive deep reinforcement learning method for managing sub-band allocation in sub-band full duplex wireless systems. It combines a Bi-LSTM model to forecast upcoming traffic with a DDQN agent that uses those forecasts plus current queue information to choose uplink versus downlink sub-band splits. The approach shifts from static or purely reactive allocation to proactive scheduling that matches resources to expected demand. Evaluation shows the prediction component captures bursty patterns accurately while the agent adjusts splits to cut queue buildup and raise overall spectrum use.

Core claim

The central claim is that the Bi-LSTM-DDQN combination allows SBFD systems to set dynamic uplink-downlink sub-band ratios based on predicted traffic demand and observed queues, producing higher spectrum utilization and lower delays than fixed or non-predictive baselines under varying loads.

What carries the argument

The hybrid Bi-LSTM-DDQN framework, where the Bi-LSTM generates forecasts of future traffic and the DDQN agent selects sub-band split ratios using both forecasts and live queue states.

If this is right

Spectrum utilization rises because sub-band splits track predicted demand instead of remaining fixed.
Queue lengths fall as the system schedules resources ahead of traffic arrivals.
Static allocation waste is eliminated by continuous adaptation to observed and forecasted loads.
The overall system supports autonomous operation suitable for 6G environments with highly variable traffic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prediction-plus-reinforcement structure could transfer to resource problems in other duplexing or multiple-access schemes that face bursty demand.
Periodic retraining on fresh data would likely be needed to maintain prediction quality as traffic statistics evolve over time.
Scaling the approach to multi-cell networks would require testing whether the DDQN state space remains manageable.

Load-bearing premise

The Bi-LSTM continues to predict accurately on real-world traffic patterns never seen in training and the DDQN agent converges to stable allocation policies without large overhead or instability.

What would settle it

Feeding the trained system a set of live network traffic traces and measuring whether prediction error exceeds the reported accuracy or whether the resulting allocations produce sustained queue growth or policy oscillation.

Figures

Figures reproduced from arXiv: 2605.14339 by Abdulla P, Abhiram D, Aiswarya Rajan, Arin Shemeem, Vipindev Adat Vasudevan.

**Figure 1.** Figure 1: Slot split in TDD and SBFD SBFD system partitions resources for DL and UL in both the time domain and frequency domain, providing flexibility for adaptation to different types of traffic. In the SBFD pattern, the DL frequency is split, separated by a guard band, and UL is sandwiched between two DL resource blocks, such as “DUD”, within a slot. Based on the latest 5G specification [6], almost all the bands … view at source ↗

**Figure 3.** Figure 3: Bi-LSTM traffic prediction performance for UL and DL over a 10-slot [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Dynamic Allocation based on UL demand in DDQN Model vs SAC [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Dynamic Allocation on DL demand in DDQN model vs SAC-Discrete [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

read the original abstract

This paper presents a predictive deep learning framework for dynamic sub-band allocation in Sub-Band Full Duplex (SBFD) systems, addressing the challenge of balancing uplink (UL) and downlink (DL) performance under highly dynamic traffic conditions. The key contribution lies in integrating a hybrid Bidirectional Long Short-Term Memory (Bi-LSTM) model for traffic forecasting with a Double Deep Q-Network (DDQN) for real-time resource allocation. Using both predicted traffic and current queue states, the proposed system enables proactive scheduling based on traffic demand. Evaluation results show that the prediction model achieves high accuracy in capturing bursty traffic patterns, while the DDQN agent effectively adapts UL/DL split ratios according to traffic variations. The framework improves spectrum utilization, reduces queue buildup, and avoids inefficient static configurations. The proposed approach demonstrates that combining predictive intelligence with reinforcement learning significantly enhances the efficiency and adaptability of SBFD systems, making it a strong candidate for autonomous resource management in future 6G networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This paper proposes a predictive deep reinforcement learning framework for dynamic sub-band allocation in Sub-Band Full Duplex (SBFD) systems. It integrates a Bi-LSTM model for forecasting bursty traffic patterns with a DDQN agent that uses both predicted traffic and current queue states to adaptively determine UL/DL sub-band splits in real time. Simulation results indicate that the approach achieves high prediction accuracy, improves spectrum utilization, reduces queue buildup, and outperforms static configurations under dynamic traffic conditions.

Significance. If the central claims hold under rigorous validation, the work could contribute to autonomous resource management in 6G networks by showing how predictive models combined with RL enable proactive, traffic-aware scheduling in SBFD systems. The integration addresses a practical challenge in balancing UL/DL performance without relying on inefficient fixed allocations.

major comments (2)

[Evaluation section] Evaluation section: The results separate Bi-LSTM prediction accuracy from final DDQN allocation metrics without a joint sensitivity study that injects realistic forecast noise into the DDQN state and re-measures gains over static baselines. This is load-bearing for the claim of stable, beneficial performance, as modest errors on unseen bursty patterns can lead to over-allocation and increased queue buildup.
[Proposed framework and results] Proposed framework and results: No explicit held-out validation, non-ML baselines, or analysis of how Bi-LSTM forecast errors propagate into DDQN decisions is provided. The central claim that the framework significantly enhances efficiency requires demonstrating that predictions remain sufficiently accurate for the agent to converge to stable policies without excessive overhead.

minor comments (1)

[Abstract] Abstract: Specific quantitative metrics, baselines, error bars, and validation details are missing, which would allow readers to better gauge the magnitude of reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional validation analyses that strengthen the claims regarding robustness and efficiency.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: The results separate Bi-LSTM prediction accuracy from final DDQN allocation metrics without a joint sensitivity study that injects realistic forecast noise into the DDQN state and re-measures gains over static baselines. This is load-bearing for the claim of stable, beneficial performance, as modest errors on unseen bursty patterns can lead to over-allocation and increased queue buildup.

Authors: We agree that a joint sensitivity study is necessary to confirm robustness. In the revised manuscript we will add a dedicated subsection that injects realistic forecast noise (drawn from the observed Bi-LSTM error distribution on bursty test traces) directly into the DDQN state vector. We will then re-evaluate spectrum utilization and queue-length metrics against the static baseline across multiple noise levels, explicitly showing that performance gains persist for error magnitudes typical of the target traffic. revision: yes
Referee: [Proposed framework and results] Proposed framework and results: No explicit held-out validation, non-ML baselines, or analysis of how Bi-LSTM forecast errors propagate into DDQN decisions is provided. The central claim that the framework significantly enhances efficiency requires demonstrating that predictions remain sufficiently accurate for the agent to converge to stable policies without excessive overhead.

Authors: We will clarify the data partitioning (70 % training, 15 % validation, 15 % held-out test) and report Bi-LSTM accuracy on the unseen test set. Non-ML baselines (fixed 50/50 split and random allocation) will be added to all performance tables. We will also include a propagation analysis that correlates Bi-LSTM MSE with DDQN policy stability and training overhead, demonstrating convergence to stable policies once prediction accuracy exceeds 85 % with negligible extra computational cost relative to non-predictive RL. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a standard pipeline: train a Bi-LSTM on traffic traces to produce forecasts, then feed those forecasts plus queue states into a DDQN whose state-action space and reward are defined independently of the predictor outputs. No equation or procedure reduces a claimed prediction or allocation gain to a fitted parameter by construction, nor does any load-bearing step rest on a self-citation chain or imported uniqueness theorem. The derivation therefore remains self-contained against external traffic data and RL benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new free parameters, axioms, or invented entities are introduced; the framework uses standard neural network and reinforcement learning components.

pith-pipeline@v0.9.0 · 7519 in / 981 out tokens · 55509 ms · 2026-05-15T01:57:24.223269+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Augmented reality with mobility awareness in mobile edge computing over 6g network: A survey,

S. I. Loutfi, U. Tureli, and I. Shayea, “Augmented reality with mobility awareness in mobile edge computing over 6g network: A survey,” in 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM), 2023, pp. 1–6

work page 2023
[2]

Augmented and virtual reality services supported by 6g for improving smart cities,

H. H. J. Mahdi, L. Fouad, F. H. T. Hussain, A. J. Kadhim, M. A. Mohammed, and N. A. Othman, “Augmented and virtual reality services supported by 6g for improving smart cities,” in2024 IEEE 9th International Conference on Engineering Technologies and Applied Sciences (ICETAS), 2024, pp. 1–6

work page 2024
[3]

The road towards 6g: A comprehensive survey,

W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten, “The road towards 6g: A comprehensive survey,”IEEE Open Journal of the Communications Society, vol. 2, pp. 334–366, 2021

work page 2021
[4]

Performance analysis of subband full duplex for 5g-advanced and 6g networks through simulations and field tests,

X. Wei, J. Li, C. Liang, and R. Liu, “Performance analysis of subband full duplex for 5g-advanced and 6g networks through simulations and field tests,”IEEE Open Journal of the Communications Society, vol. 4, pp. 2572–2585, 2023

work page 2023
[5]

Subband full-duplex large-scale deployed network designs and tradeoffs,

T. Chen, S. Garimapati, I. Kadota, T. Dinc, S. L. Garimella, M. Kohli, A. S. Levin, G. Zussman, and H. Krishnaswamy, “Subband full-duplex large-scale deployed network designs and tradeoffs,”Proceedings of the IEEE, vol. 112, no. 8, pp. 1054–1084, 2024

work page 2024
[6]

5g sub- band full duplex: 3gpp standardization progress and performance analysis,

H. Li, C. Sun, S. Wang, T. Cui, X. Wang, Y . Gong, and W. Zhang, “5g sub- band full duplex: 3gpp standardization progress and performance analysis,” in2024 IEEE/CIC International Conference on Communications in China (ICCC), 2024, pp. 1–6

work page 2024
[7]

6g vision, value, use cases and technologies from european 6g flagship project hexa-x,

M. Uusitaloet al., “6g vision, value, use cases and technologies from european 6g flagship project hexa-x,”IEEE Access, vol. 11, pp. 26 004– 26 020, 2023

work page 2023
[8]

The performance of lstm and bilstm in forecasting time series,

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The performance of lstm and bilstm in forecasting time series,” in2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 3285–3292

work page 2019
[9]

Double deep q-network-based energy-efficient resource allocation in cloud radio access network,

A. Iqbal, M.-L. Tham, and Y . C. Chang, “Double deep q-network-based energy-efficient resource allocation in cloud radio access network,”IEEE Access, vol. 9, pp. 20 440–20 449, 2021

work page 2021
[10]

A markovian decision process,

R. Bellman, “A markovian decision process,”Indiana University Mathe- matics Journal, vol. 6, no. 4, pp. 679–684, 1957

work page 1957
[11]

Flexible resource allocation scheme for non-overlapping subband full duplex systems,

S. Wu, S. Zhang, Z. Xu, and Z. Pan, “Flexible resource allocation scheme for non-overlapping subband full duplex systems,” in2023 International Conference on Wireless Communications and Signal Processing (WCSP), 2023, pp. 1067–1072

work page 2023
[12]

Interference mitigation for non-overlapping sub-band full duplex for 5g-advanced wireless networks,

X. Han, R. Liu, X. Liu, C. Liang, X. Wei, Y . Hao, Z. Zhang, and S. Jin, “Interference mitigation for non-overlapping sub-band full duplex for 5g-advanced wireless networks,”IEEE Access, vol. 11, pp. 1894–1910, 2022

work page 1910
[13]

Markov-modulated poisson process modeling for machine-to-machine heterogeneous traffic,

A. H. El Fawal, A. Mansour, and A. Nasser, “Markov-modulated poisson process modeling for machine-to-machine heterogeneous traffic,”Applied Sciences, vol. 14, no. 18, p. 8561, 2024

work page 2024
[14]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 2015

work page 2015
[15]

Deep reinforcement learning based dynamic sub-band full duplex for 5g- advanced and 6g,

M. Mokhtari, P. Kela, G. Pocovi, R. Maldonado, and K. I. Pedersen, “Deep reinforcement learning based dynamic sub-band full duplex for 5g- advanced and 6g,” in2025 IEEE 101st V ehicular Technology Conference (VTC2025-Spring), 2025, pp. 1–6

work page 2025