arxiv: 2605.06437 · v1 · submitted 2026-05-07 · 📡 eess.SY · cs.SY

Recognition: unknown

Distributed Online Learning for Time-Critical Communication in 6G Industrial Subnetworks

Gilberto Berardinelli, Hossam Farag, Samira Abdelrahman

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:27 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords 6Gindustrial subnetworksmedium access controldeep reinforcement learningalarm deliverytime-critical communicationdistributed learningcontention management

0 comments

The pith

A distributed DRL protocol lets local access points in 6G industrial subnetworks learn transmission patterns from a shared contention signal to raise the odds of delivering alarms on time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a distributed deep reinforcement learning method for medium access control that addresses time-critical alarm reporting in large-scale 6G industrial settings with mobility and bursty traffic. Each local access point uses a lightweight neural network and epsilon-greedy policy to infer contention from a broadcast signature signal and pick its own transmission pattern over available channels. This setup is intended to outperform conventional random-access schemes when multiple subnetworks activate at once after a common event. Simulations show the method delivers higher in-time alarm success rates that improve as network density rises.

Core claim

Each local access point autonomously learns, in an online fashion, to map a broadcast contention-signature signal to a transmission pattern over the available channels via a lightweight deep neural network and an epsilon-greedy policy, yielding a higher probability of in-time alarm delivery than random-access baselines and improved scalability as the number of subnetworks grows.

What carries the argument

Lightweight deep neural network with epsilon-greedy policy that processes a broadcast contention-signature signal to select channel transmission patterns in a fully distributed online setting.

If this is right

Probability of in-time alarm delivery rises by at least 7 percent for 40 subnetworks and by 21 percent for 60 subnetworks.
Performance advantage over random-access schemes grows with increasing network density.
Each subnetwork operates without centralized coordination while still adapting to shared contention conditions.
The protocol remains functional under mobility and bursty alarm traffic patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The online learning loop could allow the system to track slow changes in traffic statistics without periodic retraining.
If the contention-signature signal remains reliable at scale, similar distributed learning could be layered onto other industrial wireless protocols.
The approach implies that explicit coordination messages may be replaceable by simple broadcast signatures in dense time-critical networks.

Load-bearing premise

The simulation models accurately represent real-world conditions including mobility, bursty event-driven traffic, and radio resource constraints.

What would settle it

A real 6G industrial testbed deployment with 60 simultaneously active subnetworks that measures the fraction of alarms delivered before deadline under the learned policy versus random-access baselines.

Figures

Figures reproduced from arXiv: 2605.06437 by Gilberto Berardinelli, Hossam Farag, Samira Abdelrahman.

**Figure 1.** Figure 1: Network model of an industrial scenario comprising a view at source ↗

**Figure 2.** Figure 2: Illustration of an emergency event within a Z-subnet view at source ↗

**Figure 4.** Figure 4: DTMC model for a tagged alarm packet. Next, we calculate the probability that the alarm packet is delivered within its deadline of D slots. Define the packet age state d ∈ {0, 1, . . . , D} as the number of elapsed slots since alarm generation. We construct a discrete-time Markov chain (DTMC) with transient states 0, 1, . . . , D and two absorbing states: S (success: packet received) and F (failure: deadli… view at source ↗

**Figure 5.** Figure 5: Architecture of the considered DNN. C. Architecture and Training of the DNN The architecture of the DNN at each LAP is depicted by view at source ↗

**Figure 6.** Figure 6: Illustration of the proposed protocol with view at source ↗

**Figure 7.** Figure 7: Evolution of the MSE value over the training phase view at source ↗

**Figure 9.** Figure 9: Performance comparison of probability of in-time alarm view at source ↗

**Figure 10.** Figure 10: Performance comparison of probability of in-time view at source ↗

**Figure 12.** Figure 12: Performance comparison of probability of in-time view at source ↗

read the original abstract

6G industrial in-X subnetworks are expected to support highly time-critical alarm reporting in large-scale environments characterized by mobility, bursty event-driven traffic, and limited radio resources. In such settings, conventional medium access solutions are ill-suited to guarantee reliable delivery of critical traffic, e.g., emergency alarms, within strict deadlines, especially when multiple subnetworks become simultaneously active after a common alarm event, a scenario widely referred as medium access with a shared message. This paper proposes a distributed deep reinforcement learning (DRL)-based medium access control protocol for timely alarm transmission in time-critical industrial subnetworks. The proposed method enables each local access point (LAP) to learn, in an online manner, to infer contention conditions from a broadcast contention-signature signal and to autonomously select a transmission pattern over the available channels using a lightweight deep neural network and an (ephsilon)-greedy policy. Simulation results demonstrate that the proposed approach consistently achieves a higher probability of in-time alarm delivery than benchmark random-access schemes, while exhibiting better scalability with increasing network density. For instance, the proposed method improves probability of in-time alarm delivery by at least 7% with a network size of 40 subnetworks, while the gain increases to 21% when the number of subnetworks increases to 60.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies online DRL with a contention-signature signal to MAC for time-critical alarms in 6G industrial subnetworks and reports simulation gains that grow with density, but the contribution stays empirical and application-specific.

read the letter

The main thing here is a distributed online DRL protocol that lets each local access point infer contention from a broadcast signature signal and then pick a transmission pattern autonomously with a lightweight neural net and epsilon-greedy policy. It targets the shared-message case where multiple subnetworks activate together after an alarm event in mobile, bursty industrial settings. The simulations show the approach beats random-access baselines on in-time delivery probability, with the margin widening from at least 7% at 40 subnetworks to 21% at 60. That scaling behavior is the clearest practical result. The specific combination of the signature signal for online inference in this multi-subnetwork alarm scenario looks like the fresh piece, even though DRL for MAC has appeared before. The paper frames the problem cleanly and keeps the agent simple enough for distributed use. The soft spot is that everything rests on simulation. The gains are presented as empirical rather than guaranteed, and they depend on how well the mobility, traffic burst, and radio models match deployment conditions. The stress-test finds no internal contradictions or missing baselines, but the abstract leaves out training details, run-to-run variance, and exact parameter settings, so the evidence strength stays moderate until those are checked. No circular definitions or invented entities show up. This is for researchers and engineers focused on 6G industrial IoT, time-sensitive networking, or applied reinforcement learning in wireless systems. A reader working on practical MAC protocols would find the protocol description and scaling results useful. It deserves peer review because the problem is relevant, the method is coherent, and the empirical comparison is direct enough for referees to evaluate and improve.

Referee Report

2 major / 2 minor

Summary. The paper proposes a distributed deep reinforcement learning (DRL)-based medium access control (MAC) protocol for time-critical alarm reporting in 6G industrial in-X subnetworks. Each local access point (LAP) learns online to infer contention conditions from a broadcast contention-signature signal and selects a transmission pattern over available channels via a lightweight deep neural network and an epsilon-greedy policy. Simulation results show the approach achieves higher probability of in-time alarm delivery than benchmark random-access schemes, with reported gains of at least 7% at 40 subnetworks increasing to 21% at 60 subnetworks, along with better scalability as network density grows.

Significance. If the simulation results hold, the work would be significant for 6G industrial IoT by addressing the shared-message medium access problem in dense, mobile, bursty-traffic environments through a practical online learning solution. The lightweight DNN design and distributed nature are strengths for real-time deployment, and the empirical scalability gains provide concrete evidence of advantage over conventional schemes in a challenging setting.

major comments (2)

[Simulation Results] Simulation Results section: the reported 7% and 21% gains in probability of in-time alarm delivery are presented as point values without specifying the number of Monte Carlo runs, variance across runs, or any statistical significance testing; this weakens support for the claim of consistent outperformance and scalability.
[System Model] System Model section: the modeling assumptions for bursty event-driven traffic (e.g., exact arrival process and rate parameters) and mobility (e.g., velocity distribution and correlation across subnetworks) are not fully specified with numerical values, which is load-bearing for assessing whether the observed gains would translate to practical 6G deployments.

minor comments (2)

[Abstract] Abstract: the phrase 'at least 7%' should be clarified with the exact network and traffic conditions under which the minimum gain occurs, to avoid ambiguity.
[Introduction] Notation and acronyms: ensure LAP, DRL, and MAC are defined at first use in the main text, and consider adding a table summarizing key simulation parameters for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address the two major comments point by point below. We will update the manuscript accordingly to incorporate the suggested clarifications.

read point-by-point responses

Referee: [Simulation Results] Simulation Results section: the reported 7% and 21% gains in probability of in-time alarm delivery are presented as point values without specifying the number of Monte Carlo runs, variance across runs, or any statistical significance testing; this weakens support for the claim of consistent outperformance and scalability.

Authors: We agree that providing details on the simulation methodology would strengthen the support for our claims. The results in the paper are derived from extensive Monte Carlo simulations, but the exact number of runs and variance measures are not explicitly stated in the current version. In the revised manuscript, we will add this information to the Simulation Results section, including the number of independent runs performed, observed variance, and a brief discussion of statistical significance to demonstrate the consistency of the outperformance. revision: yes
Referee: [System Model] System Model section: the modeling assumptions for bursty event-driven traffic (e.g., exact arrival process and rate parameters) and mobility (e.g., velocity distribution and correlation across subnetworks) are not fully specified with numerical values, which is load-bearing for assessing whether the observed gains would translate to practical 6G deployments.

Authors: We thank the referee for highlighting this aspect. While the System Model section describes the traffic as bursty event-driven and the mobility model, we concur that explicit numerical values would improve clarity and allow better evaluation of practical applicability. We will revise the manuscript to include a table or subsection with all key parameter values used in the simulations, such as the arrival process details, rate parameters, velocity distributions, and assumptions regarding correlation across subnetworks. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on direct simulation comparisons

full rationale

The paper proposes a DRL-based distributed MAC protocol and evaluates it through simulations against random-access baselines under modeled conditions (mobility, bursty traffic). No derivation chain, equations, or predictions are presented that reduce by construction to fitted parameters, self-definitions, or self-citation load-bearing steps. The reported gains (7-21%) are empirical outcomes of the protocol implementation versus benchmarks, with no renaming of known results or ansatz smuggling. The evaluation is self-contained against external simulation benchmarks and does not invoke uniqueness theorems or prior self-citations as the central justification.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract describes an empirical simulation study of a learning-based protocol with no analytical derivations, so no free parameters, axioms, or invented entities are explicitly introduced or required for the central claim.

pith-pipeline@v0.9.0 · 5535 in / 1155 out tokens · 45991 ms · 2026-05-08T06:27:46.281152+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references

[1]

Communications in the 6g era,

H. Viswanathan and P. E. Mogensen, “Communications in the 6g era,” IEEE Access, vol. 8, pp. 57 063–57 074, 2020

2020
[2]

Boosting short-range wireless communications in entities: the 6g-shine vision,

G. Berardinelliet al., “Boosting short-range wireless communications in entities: the 6g-shine vision,” in2023 IEEE Future Networks World F orum (FNWF), 2023, pp. 1–6

2023
[3]

Towards 6g in-x subnetworks with sub-millisecond communication cycles and extreme reliability,

R. Adeogun, G. Berardinelli, P. E. Mogensen, I. Rodriguez, and M. Razzaghpour, “Towards 6g in-x subnetworks with sub-millisecond communication cycles and extreme reliability,”IEEE Access, vol. 8, pp. 110 172–110 188, 2020

2020
[4]

D2.2: Refined definition of scenarios, use cases and service requirements for in- x subnetworks,

G.-S. Consortium, “D2.2: Refined definition of scenarios, use cases and service requirements for in- x subnetworks,” 6G-SHINE, Tech. Rep., Feb. 2024. [Online]. Available: https://6gshine.eu/deliverables-ii/

2024
[5]

Specification group services and system aspects; Study on communi- cation for automation in vertical domains (Release 16),

“Specification group services and system aspects; Study on communi- cation for automation in vertical domains (Release 16),” 3GPP, Tech. Rep. TR 22.804, Jul. 2020, v16.3.0

2020
[6]

5g swarm production: Advanced industrial manufac- turing concepts enabled by wireless automation,

I. Rodriguezet al., “5g swarm production: Advanced industrial manufac- turing concepts enabled by wireless automation,”IEEE Communications Magazine, vol. 59, no. 1, pp. 48–54, 2021

2021
[7]

Private 5g networks for cyber-physical control applications in vertical domains,

L. M. Bartol ´ın-Arnauet al., “Private 5g networks for cyber-physical control applications in vertical domains,” in2023 IEEE 19th Interna- tional Conference on Factory Communication Systems (WFCS), 2023, pp. 1–4

2023
[8]

Review on robotic systems for environmental monitoring,

D. M. G. Preethichandra, L. Piyathilaka, and U. Izhar, “Review on robotic systems for environmental monitoring,”IEEE Open Journal of Instrumentation and Measurement, vol. 4, pp. 1–17, 2025

2025
[9]

Autonomous mobile inspection robots in deep underground mining—the current state of the art and future perspectives,

M. Konieczna-Fuławkaet al., “Autonomous mobile inspection robots in deep underground mining—the current state of the art and future perspectives,”Sensors, vol. 25, no. 12, 2025. [Online]. Available: https://www.mdpi.com/1424-8220/25/12/3598

2025
[10]

Learning to speak on behalf of a group: Medium access control for sending a shared message,

S. u. Haqueet al., “Learning to speak on behalf of a group: Medium access control for sending a shared message,”IEEE Communications Letters, vol. 26, no. 8, pp. 1843–1847, 2022

2022
[11]

A delay-bounded mac protocol for mission- and time-critical applications in industrial wireless sensor networks,

H. Farag, M. Gidlund, and P. ¨Osterberg, “A delay-bounded mac protocol for mission- and time-critical applications in industrial wireless sensor networks,”IEEE Sens. J., vol. 18, no. 6, pp. 2607–2616, 2018

2018
[12]

Priority-aware wireless fieldbus protocol for mixed- criticality industrial wireless sensor networks,

H. Faraget al., “Priority-aware wireless fieldbus protocol for mixed- criticality industrial wireless sensor networks,”IEEE Sens. J., vol. 19, no. 7, pp. 2767–2780, 2019

2019
[13]

Deep reinforcement learning-based multi-access in massive machine-type communication,

N. Ravi, N. Lourenc ¸o, M. Curado, and a. Edmundo Monteiro, “Deep reinforcement learning-based multi-access in massive machine-type communication,”IEEE Access, vol. 12, pp. 178 690–178 704, 2024

2024
[14]

Reinforcement learning random access for delay-constrained heterogeneous wireless networks: A two-user case,

D. Wu, L. Deng, Z. Liu, Y . Zhang, and Y . S. Han, “Reinforcement learning random access for delay-constrained heterogeneous wireless networks: A two-user case,” in2021 IEEE Globecom Workshops (GC Wkshps), 2021, pp. 1–7

2021
[15]

Joint delay-energy optimization for multi-priority random access in machine-type communications,

W. Fan, P. Fan, and Y . Long, “Joint delay-energy optimization for multi-priority random access in machine-type communications,”IEEE Transactions on Wireless Communications, vol. 23, no. 2, pp. 1416– 1431, 2024

2024
[16]

Federated multi-agent drl for radio resource management in industrial 6g in-x subnetworks,

B. Madsen and R. Adeogun, “Federated multi-agent drl for radio resource management in industrial 6g in-x subnetworks,” in2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2024, pp. 1–7

2024
[17]

Learning random access schemes for massive machine-type communication with marl,

M. A. Jadoon, A. Pastore, M. Navarro, and A. Valcarce, “Learning random access schemes for massive machine-type communication with marl,”IEEE Transactions on Machine Learning in Communications and Networking, vol. 2, pp. 95–109, 2024

2024
[18]

Coordinated random access for industrial iot with correlated traffic by reinforcement-learning,

A. Rech and S. Tomasin, “Coordinated random access for industrial iot with correlated traffic by reinforcement-learning,” in2021 IEEE Globecom Workshops (GC Wkshps), 2021, pp. 1–6

2021
[19]

Power control for 6g in- factory subnetworks with partial channel information using graph neural networks,

D. Abode, R. Adeogun, and G. Berardinelli, “Power control for 6g in- factory subnetworks with partial channel information using graph neural networks,”IEEE Open Journal of the Communications Society, vol. 5, pp. 3120–3135, 2024

2024
[20]

Power control for 6g industrial wireless subnetworks: A graph neural network approach,

——, “Power control for 6g industrial wireless subnetworks: A graph neural network approach,” in2023 IEEE Wireless Communications and Networking Conference (WCNC), 2023, pp. 1–6

2023
[21]

Multi-agent reinforcement learning for dynamic resource management in 6g in-x subnetworks,

X. Duet al., “Multi-agent reinforcement learning for dynamic resource management in 6g in-x subnetworks,”IEEE Transactions on Wireless Communications, vol. 22, no. 3, pp. 1900–1914, 2023

1900
[22]

Distributed channel allocation for mobile 6g subnetworks via multi-agent deep q-learning,

R. Adeogun and G. Berardinelli, “Distributed channel allocation for mobile 6g subnetworks via multi-agent deep q-learning,” in2023 IEEE Wireless Communications and Networking Conference (WCNC), 2023, pp. 1–6

2023
[23]

O. A. L ´opez and H. Alves,Wireless RF Energy Transfer in the Massive IoT Era: Towards Sustainable Zero-energy Networks. Hoboken, NJ, USA: John Wiley & Sons, 2022. [Online]. Available: https://www.wiley. com/en-us/Wireless+RF+Energy+Transfer+in+the+Massive+IoT+Era% 3A+Towards+Sustainable+Zero-energy+Networks-p-9781119718666

2022
[24]

Energy-efficient wake-up signalling for machine-type devices based on traffic-aware long short-term memory prediction,

D. E. Ru ´ız-Guirolaet al., “Energy-efficient wake-up signalling for machine-type devices based on traffic-aware long short-term memory prediction,”IEEE Internet of Things Journal, vol. 9, no. 21, pp. 21 620– 21 631, 2022

2022
[25]

Resource and power allocation in swipt-enabled device-to-device communications based on a nonlinear energy harvesting model,

H. Yang, Y . Ye, X. Chu, and M. Dong, “Resource and power allocation in swipt-enabled device-to-device communications based on a nonlinear energy harvesting model,”IEEE Internet of Things Journal, vol. 7, no. 11, pp. 10 813–10 825, 2020

2020
[26]

Modeling and clustering attacker activities in iot through machine learning techniques,

P. Sun, J. Li, M. Z. Alam Bhuiyan, L. Wang, and B. Li, “Modeling and clustering attacker activities in iot through machine learning techniques,” Information Sciences, vol. 479, pp. 456–471, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0020025518303311

2019
[27]

Heaton,Introduction to Neural Networks for Java

J. Heaton,Introduction to Neural Networks for Java. Heaton Research, Inc., 2008

2008
[28]

Goal-oriented scheduling in sensor networks with applica- tion timing awareness,

J. Holm, F. Chiariotti, A. E. Kalør, B. Soret, T. B. Pedersen, and P. Popovski, “Goal-oriented scheduling in sensor networks with applica- tion timing awareness,”IEEE Transactions on Communications, vol. 71, no. 8, pp. 4513–4527, 2023

2023
[29]

Channel scheduling for iot access with spatial correlation,

P. Raghuwanshi, O. L. A. L ´opez, P. Popovski, and M. Latva-Aho, “Channel scheduling for iot access with spatial correlation,”IEEE Communications Letters, vol. 28, no. 5, pp. 1014–1018, 2024

2024
[30]

Why gradient clipping accel- erates training: A theoretical justification for adaptivity,

J. Zhang, T. He, S. Sra, and A. Jadbabaie, “Why gradient clipping accel- erates training: A theoretical justification for adaptivity,” inProceedings of the International Conference on Learning Representations (ICLR), Feb. 2020, pp. 1–21

2020
[31]

Enhanced interfer- ence management for 6g in-x subnetworks,

R. Adeogun, G. Berardinelli, and P. E. Mogensen, “Enhanced interfer- ence management for 6g in-x subnetworks,”IEEE Access, vol. 10, pp. 45 784–45 798, 2022

2022
[32]

Control-aware transmit power allocation for 6g in- factory subnetwork control systems,

D. Abode, P. M. de Sant Ana, A. Artemenko, R. Adeogun, and G. Berardinelli, “Control-aware transmit power allocation for 6g in- factory subnetwork control systems,” in2024 IEEE 100th V ehicular Technology Conference (VTC2024-Fall), 2024, pp. 1–6

2024
[33]

3gpp standardized 5g channel model for iiot scenarios: A survey,

T. Jianget al., “3gpp standardized 5g channel model for iiot scenarios: A survey,”IEEE Internet of Things Journal, vol. 8, no. 11, pp. 8799–8815, 2021

2021
[34]

Study on channel model for frequencies from 0.5 to 100 GHz, Version 17.0.0,

3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz, Version 17.0.0,” 3rd Generation Partnership Project (3GPP), Sophia Antipolis, France, Technical Report TR 38.901, Apr. 2022

2022
[35]

Smart factory of industry 4.0: Key technologies, application case, and challenges,

B. Chen, J. Wan, L. Shu, P. Li, M. Mukherjee, and B. Yin, “Smart factory of industry 4.0: Key technologies, application case, and challenges,” IEEE Access, vol. 6, pp. 6505–6519, 2018

2018
[36]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction. Cambridge, MA, USA: The MIT press, 2018

2018
[37]

Priority-aware wireless fieldbus protocol for mixed-criticality industrial wireless sensor networks,

H. Farag, E. Sisinni, M. Gidlund, and P. ¨Osterberg, “Priority-aware wireless fieldbus protocol for mixed-criticality industrial wireless sensor networks,”IEEE Sensors Journal, vol. 19, no. 7, pp. 2767–2780, 2019

2019