Recognition: no theorem link
Sub-Band Full Duplex Resource Allocation: A Predictive Deep Reinforcement Learning Approach
Pith reviewed 2026-05-15 01:57 UTC · model grok-4.3
The pith
A hybrid Bi-LSTM and DDQN framework enables proactive sub-band allocation in SBFD systems by using traffic forecasts to guide real-time decisions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Bi-LSTM-DDQN combination allows SBFD systems to set dynamic uplink-downlink sub-band ratios based on predicted traffic demand and observed queues, producing higher spectrum utilization and lower delays than fixed or non-predictive baselines under varying loads.
What carries the argument
The hybrid Bi-LSTM-DDQN framework, where the Bi-LSTM generates forecasts of future traffic and the DDQN agent selects sub-band split ratios using both forecasts and live queue states.
If this is right
- Spectrum utilization rises because sub-band splits track predicted demand instead of remaining fixed.
- Queue lengths fall as the system schedules resources ahead of traffic arrivals.
- Static allocation waste is eliminated by continuous adaptation to observed and forecasted loads.
- The overall system supports autonomous operation suitable for 6G environments with highly variable traffic.
Where Pith is reading between the lines
- The same prediction-plus-reinforcement structure could transfer to resource problems in other duplexing or multiple-access schemes that face bursty demand.
- Periodic retraining on fresh data would likely be needed to maintain prediction quality as traffic statistics evolve over time.
- Scaling the approach to multi-cell networks would require testing whether the DDQN state space remains manageable.
Load-bearing premise
The Bi-LSTM continues to predict accurately on real-world traffic patterns never seen in training and the DDQN agent converges to stable allocation policies without large overhead or instability.
What would settle it
Feeding the trained system a set of live network traffic traces and measuring whether prediction error exceeds the reported accuracy or whether the resulting allocations produce sustained queue growth or policy oscillation.
Figures
read the original abstract
This paper presents a predictive deep learning framework for dynamic sub-band allocation in Sub-Band Full Duplex (SBFD) systems, addressing the challenge of balancing uplink (UL) and downlink (DL) performance under highly dynamic traffic conditions. The key contribution lies in integrating a hybrid Bidirectional Long Short-Term Memory (Bi-LSTM) model for traffic forecasting with a Double Deep Q-Network (DDQN) for real-time resource allocation. Using both predicted traffic and current queue states, the proposed system enables proactive scheduling based on traffic demand. Evaluation results show that the prediction model achieves high accuracy in capturing bursty traffic patterns, while the DDQN agent effectively adapts UL/DL split ratios according to traffic variations. The framework improves spectrum utilization, reduces queue buildup, and avoids inefficient static configurations. The proposed approach demonstrates that combining predictive intelligence with reinforcement learning significantly enhances the efficiency and adaptability of SBFD systems, making it a strong candidate for autonomous resource management in future 6G networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper proposes a predictive deep reinforcement learning framework for dynamic sub-band allocation in Sub-Band Full Duplex (SBFD) systems. It integrates a Bi-LSTM model for forecasting bursty traffic patterns with a DDQN agent that uses both predicted traffic and current queue states to adaptively determine UL/DL sub-band splits in real time. Simulation results indicate that the approach achieves high prediction accuracy, improves spectrum utilization, reduces queue buildup, and outperforms static configurations under dynamic traffic conditions.
Significance. If the central claims hold under rigorous validation, the work could contribute to autonomous resource management in 6G networks by showing how predictive models combined with RL enable proactive, traffic-aware scheduling in SBFD systems. The integration addresses a practical challenge in balancing UL/DL performance without relying on inefficient fixed allocations.
major comments (2)
- [Evaluation section] Evaluation section: The results separate Bi-LSTM prediction accuracy from final DDQN allocation metrics without a joint sensitivity study that injects realistic forecast noise into the DDQN state and re-measures gains over static baselines. This is load-bearing for the claim of stable, beneficial performance, as modest errors on unseen bursty patterns can lead to over-allocation and increased queue buildup.
- [Proposed framework and results] Proposed framework and results: No explicit held-out validation, non-ML baselines, or analysis of how Bi-LSTM forecast errors propagate into DDQN decisions is provided. The central claim that the framework significantly enhances efficiency requires demonstrating that predictions remain sufficiently accurate for the agent to converge to stable policies without excessive overhead.
minor comments (1)
- [Abstract] Abstract: Specific quantitative metrics, baselines, error bars, and validation details are missing, which would allow readers to better gauge the magnitude of reported improvements.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional validation analyses that strengthen the claims regarding robustness and efficiency.
read point-by-point responses
-
Referee: [Evaluation section] Evaluation section: The results separate Bi-LSTM prediction accuracy from final DDQN allocation metrics without a joint sensitivity study that injects realistic forecast noise into the DDQN state and re-measures gains over static baselines. This is load-bearing for the claim of stable, beneficial performance, as modest errors on unseen bursty patterns can lead to over-allocation and increased queue buildup.
Authors: We agree that a joint sensitivity study is necessary to confirm robustness. In the revised manuscript we will add a dedicated subsection that injects realistic forecast noise (drawn from the observed Bi-LSTM error distribution on bursty test traces) directly into the DDQN state vector. We will then re-evaluate spectrum utilization and queue-length metrics against the static baseline across multiple noise levels, explicitly showing that performance gains persist for error magnitudes typical of the target traffic. revision: yes
-
Referee: [Proposed framework and results] Proposed framework and results: No explicit held-out validation, non-ML baselines, or analysis of how Bi-LSTM forecast errors propagate into DDQN decisions is provided. The central claim that the framework significantly enhances efficiency requires demonstrating that predictions remain sufficiently accurate for the agent to converge to stable policies without excessive overhead.
Authors: We will clarify the data partitioning (70 % training, 15 % validation, 15 % held-out test) and report Bi-LSTM accuracy on the unseen test set. Non-ML baselines (fixed 50/50 split and random allocation) will be added to all performance tables. We will also include a propagation analysis that correlates Bi-LSTM MSE with DDQN policy stability and training overhead, demonstrating convergence to stable policies once prediction accuracy exceeds 85 % with negligible extra computational cost relative to non-predictive RL. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes a standard pipeline: train a Bi-LSTM on traffic traces to produce forecasts, then feed those forecasts plus queue states into a DDQN whose state-action space and reward are defined independently of the predictor outputs. No equation or procedure reduces a claimed prediction or allocation gain to a fitted parameter by construction, nor does any load-bearing step rest on a self-citation chain or imported uniqueness theorem. The derivation therefore remains self-contained against external traffic data and RL benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Augmented reality with mobility awareness in mobile edge computing over 6g network: A survey,
S. I. Loutfi, U. Tureli, and I. Shayea, “Augmented reality with mobility awareness in mobile edge computing over 6g network: A survey,” in 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM), 2023, pp. 1–6
work page 2023
-
[2]
Augmented and virtual reality services supported by 6g for improving smart cities,
H. H. J. Mahdi, L. Fouad, F. H. T. Hussain, A. J. Kadhim, M. A. Mohammed, and N. A. Othman, “Augmented and virtual reality services supported by 6g for improving smart cities,” in2024 IEEE 9th International Conference on Engineering Technologies and Applied Sciences (ICETAS), 2024, pp. 1–6
work page 2024
-
[3]
The road towards 6g: A comprehensive survey,
W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten, “The road towards 6g: A comprehensive survey,”IEEE Open Journal of the Communications Society, vol. 2, pp. 334–366, 2021
work page 2021
-
[4]
X. Wei, J. Li, C. Liang, and R. Liu, “Performance analysis of subband full duplex for 5g-advanced and 6g networks through simulations and field tests,”IEEE Open Journal of the Communications Society, vol. 4, pp. 2572–2585, 2023
work page 2023
-
[5]
Subband full-duplex large-scale deployed network designs and tradeoffs,
T. Chen, S. Garimapati, I. Kadota, T. Dinc, S. L. Garimella, M. Kohli, A. S. Levin, G. Zussman, and H. Krishnaswamy, “Subband full-duplex large-scale deployed network designs and tradeoffs,”Proceedings of the IEEE, vol. 112, no. 8, pp. 1054–1084, 2024
work page 2024
-
[6]
5g sub- band full duplex: 3gpp standardization progress and performance analysis,
H. Li, C. Sun, S. Wang, T. Cui, X. Wang, Y . Gong, and W. Zhang, “5g sub- band full duplex: 3gpp standardization progress and performance analysis,” in2024 IEEE/CIC International Conference on Communications in China (ICCC), 2024, pp. 1–6
work page 2024
-
[7]
6g vision, value, use cases and technologies from european 6g flagship project hexa-x,
M. Uusitaloet al., “6g vision, value, use cases and technologies from european 6g flagship project hexa-x,”IEEE Access, vol. 11, pp. 26 004– 26 020, 2023
work page 2023
-
[8]
The performance of lstm and bilstm in forecasting time series,
S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The performance of lstm and bilstm in forecasting time series,” in2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 3285–3292
work page 2019
-
[9]
Double deep q-network-based energy-efficient resource allocation in cloud radio access network,
A. Iqbal, M.-L. Tham, and Y . C. Chang, “Double deep q-network-based energy-efficient resource allocation in cloud radio access network,”IEEE Access, vol. 9, pp. 20 440–20 449, 2021
work page 2021
-
[10]
R. Bellman, “A markovian decision process,”Indiana University Mathe- matics Journal, vol. 6, no. 4, pp. 679–684, 1957
work page 1957
-
[11]
Flexible resource allocation scheme for non-overlapping subband full duplex systems,
S. Wu, S. Zhang, Z. Xu, and Z. Pan, “Flexible resource allocation scheme for non-overlapping subband full duplex systems,” in2023 International Conference on Wireless Communications and Signal Processing (WCSP), 2023, pp. 1067–1072
work page 2023
-
[12]
Interference mitigation for non-overlapping sub-band full duplex for 5g-advanced wireless networks,
X. Han, R. Liu, X. Liu, C. Liang, X. Wei, Y . Hao, Z. Zhang, and S. Jin, “Interference mitigation for non-overlapping sub-band full duplex for 5g-advanced wireless networks,”IEEE Access, vol. 11, pp. 1894–1910, 2022
work page 1910
-
[13]
Markov-modulated poisson process modeling for machine-to-machine heterogeneous traffic,
A. H. El Fawal, A. Mansour, and A. Nasser, “Markov-modulated poisson process modeling for machine-to-machine heterogeneous traffic,”Applied Sciences, vol. 14, no. 18, p. 8561, 2024
work page 2024
-
[14]
Adam: A method for stochastic optimization,
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 2015
work page 2015
-
[15]
Deep reinforcement learning based dynamic sub-band full duplex for 5g- advanced and 6g,
M. Mokhtari, P. Kela, G. Pocovi, R. Maldonado, and K. I. Pedersen, “Deep reinforcement learning based dynamic sub-band full duplex for 5g- advanced and 6g,” in2025 IEEE 101st V ehicular Technology Conference (VTC2025-Spring), 2025, pp. 1–6
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.