Dynamic Scheduling for Flexible Manufacturing Systems Based on Multi-Agent Deep Reinforcement Learning and Petri Nets
Pith reviewed 2026-07-01 03:45 UTC · model grok-4.3
The pith
A multi-agent reinforcement learning method built on timed Petri nets and their basis reachability graphs solves dynamic scheduling in flexible manufacturing systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The scheduling problem is formulated as a Markov decision process (MDP) with timed Petri nets, where the future evolution of the system depends exclusively on the current marking and the subsequently executed transitions, independent of historical trajectories. The state space and action space of the MDP are constructed using the notion of basis reachability graph of Petri nets to alleviate the state explosion problem. A hierarchical dense reward function is constructed by integrating stepwise guidance with terminal evaluation. Then, a multi-agent proximal policy optimization algorithm is employed for model training under the centralized training and decentralized execution paradigm. Numeric
What carries the argument
The basis reachability graph of the timed Petri net, which provides a compact yet sufficient representation of system states for constructing the Markov decision process.
If this is right
- The compact state representation accelerates convergence of the reinforcement learning model.
- Dynamic events such as new order arrivals, cancellations, and machine failures can be handled in real time.
- The hierarchical reward function improves learning efficiency over purely terminal rewards.
- Centralized training with decentralized execution allows efficient multi-agent coordination.
Where Pith is reading between the lines
- Manufacturers could integrate this scheduler directly with existing Petri-net models of their production lines.
- Extending the approach to include energy consumption or quality metrics in the reward could address additional factory goals.
- The same MDP construction might apply to scheduling in other timed discrete-event systems like transportation networks.
Load-bearing premise
The basis reachability graph of the timed Petri net supplies a compact yet sufficient state representation that preserves all information needed for optimal dynamic decisions.
What would settle it
A direct comparison on benchmark flexible manufacturing system instances showing that the multi-agent method produces longer completion times or fails to recover from machine failures as well as conventional rescheduling methods would disprove the performance claim.
Figures
read the original abstract
This paper investigates dynamic scheduling for flexible manufacturing systems (FMSs) subject to dynamic events, such as new order arrivals, temporary order cancellations, and machine failures. Traditional methods often face significant challenges in achieving real-time responsiveness under such conditions. To address this issue, the scheduling problem is formulated as a Markov decision process (MDP) with timed Petri nets, where the future evolution of the system depends exclusively on the current marking and the subsequently executed transitions, independent of historical trajectories. The state space and action space of the MDP are constructed using the notion of basis reachability graph (a compact state space representation) of Petri nets to alleviate the state explosion problem, thereby accelerating model training convergence. Meanwhile, a hierarchical dense reward function is constructed by integrating stepwise guidance with terminal evaluation. Then, a multi-agent proximal policy optimization algorithm is employed for model training under the centralized training and decentralized execution paradigm to improve scheduling efficiency. Numerical experiments are conducted involving typical dynamic events, and the results demonstrate that the proposed method can effectively handle dynamic events and achieve superior scheduling performance compared with conventional approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates dynamic scheduling for flexible manufacturing systems under events such as order arrivals, cancellations, and machine failures as a Markov decision process (MDP) whose state space is the basis reachability graph (BRG) of a timed Petri net. A hierarchical dense reward is defined, and a multi-agent proximal policy optimization algorithm is trained under centralized training with decentralized execution. Numerical experiments are reported to show that the resulting policies handle the dynamic events and outperform conventional approaches.
Significance. If the BRG-based MDP is Markovian and the reported performance gains are reproducible, the work would demonstrate a practical route to combining Petri-net state compression with multi-agent RL for real-time manufacturing control, addressing the state-explosion barrier that limits many formal scheduling methods.
major comments (1)
- [Abstract and MDP construction] Abstract (MDP construction paragraph): the claim that the BRG supplies a state representation that 'preserves all information needed for optimal dynamic decisions' is load-bearing for the superiority result. When machine-failure events are injected, two markings that collapse to the same basis marking can differ in enabled transitions or residual firing times; if those distinctions affect recovery rewards, the resulting MDP is not Markovian with respect to the true objective, so the multi-agent PPO policy cannot be guaranteed to dominate baselines that retain the full timed marking.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The major comment raises an important point about the Markov property of the BRG-based MDP under machine failures, which we address below.
read point-by-point responses
-
Referee: Abstract (MDP construction paragraph): the claim that the BRG supplies a state representation that 'preserves all information needed for optimal dynamic decisions' is load-bearing for the superiority result. When machine-failure events are injected, two markings that collapse to the same basis marking can differ in enabled transitions or residual firing times; if those distinctions affect recovery rewards, the resulting MDP is not Markovian with respect to the true objective, so the multi-agent PPO policy cannot be guaranteed to dominate baselines that retain the full timed marking.
Authors: We appreciate the referee highlighting this subtlety in the timed setting. The manuscript formulates the problem using a timed Petri net whose evolution is stated to depend only on the current marking and fired transitions. The BRG is adopted as the state space precisely because, per its standard definition in the Petri-net scheduling literature, each basis marking encodes the essential reachability information needed to determine enabled transitions and future behavior for the considered events (including failures, which are modeled as exogenous transitions leading to updated markings). Residual firing times are handled via the timed net semantics, with the hierarchical reward incorporating timing-based penalties. We agree that an explicit argument showing the BRG mapping preserves the Markov property for the dynamic events in our experiments would strengthen the presentation. We will therefore revise the abstract and Section III (MDP construction) to add a short paragraph referencing the BRG properties and clarifying the handling of failures and timing, without altering the core method or results. This constitutes a partial revision. revision: partial
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper formulates dynamic scheduling as an MDP whose state space is the basis reachability graph of a timed Petri net, constructs a hierarchical reward, and trains a multi-agent PPO policy; performance is then evaluated empirically on numerical instances with injected dynamic events. None of these steps reduces a claimed prediction or optimality result to a fitted parameter or self-citation by construction. The sufficiency of the BRG representation is stated as an assumption whose validity is assessed by the reported experiments rather than asserted definitionally. No load-bearing self-citation chain or renaming of known results appears in the provided text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Industry 4.0: Smart scheduling,
D. A. Rossit, F. Tohm ´e, and M. Frutos, “Industry 4.0: Smart scheduling,” Int. J. Prod. Res., vol. 57, no. 12, p. 3802–3813, Aug. 2018
2018
-
[2]
Scheduling of resource allocation systems with timed Petri nets: A survey,
B. Huang, M. Zhou, X. S. Lu, and A. Abusorrah, “Scheduling of resource allocation systems with timed Petri nets: A survey,”ACM Comput. Surv., vol. 55, no. 11, p. 1–27, Feb. 2023
2023
-
[3]
Industry 4.0: contributions of holonic manufacturing control architectures and future challenges,
W. Derigent, O. Cardin, and D. Trentesaux, “Industry 4.0: contributions of holonic manufacturing control architectures and future challenges,” J. Intell. Manuf., vol. 32, no. 7, p. 1797–1818, Jan. 2020
2020
-
[4]
Scheduling internal operations in post-distribution cross docking systems,
M. P. Fanti, G. Stecco, and W. Ukovich, “Scheduling internal operations in post-distribution cross docking systems,”IEEE Trans. Autom. Sci. Eng., vol. 13, no. 1, pp. 296–312, Jan. 2016
2016
-
[5]
Performance safety enforcement in stochastic event graphs against boost and slow attacks,
Z. He and Z. Ma, “Performance safety enforcement in stochastic event graphs against boost and slow attacks,”Nonlinear Anal. Hybrid Syst, vol. 41, p. 101057, Aug. 2021
2021
-
[6]
Optimization of deterministic timed weighted marked graphs,
Z. He, Z. Li, and A. Giua, “Optimization of deterministic timed weighted marked graphs,”IEEE Trans. Autom. Sci. Eng., vol. 14, no. 2, pp. 1084– 1095, Nov. 2017
2017
-
[7]
Performance optimization for timed weighted marked graphs under infinite server semantics,
Z. He, Z. Li, and A. Giua, “Performance optimization for timed weighted marked graphs under infinite server semantics,”IEEE Trans. Autom. Control, vol. 63, no. 8, p. 2573–2580, Aug. 2018
2018
-
[8]
A survey on performance evaluations in time-dependent petri nets,
M. P. Fanti, R. Liu, S. Zhang, Z. He, and D. Lefebvre, “A survey on performance evaluations in time-dependent petri nets,”Discrete Event Dynamic Systems, vol. 36, no. 1, p. 13, 2026
2026
-
[9]
Design of supervisors for linear marking specifications in labeled Petri nets,
Z. Ma, Z. He, Z. Li, and A. Giua, “Design of supervisors for linear marking specifications in labeled Petri nets,”Automatica, vol. 136, p. 110031, Feb. 2022
2022
-
[10]
FMS scheduling with knowl- edge based genetic algorithm approach,
A. Prakash, F. T. Chan, and S. Deshmukh, “FMS scheduling with knowl- edge based genetic algorithm approach,”Expert Syst. Appl., vol. 38, no. 4, p. 3161–3171, Apr. 2011
2011
-
[11]
Deadlock-free genetic scheduling algorithm for automated manufacturing systems based on deadlock control policy,
K. Xing, L. Han, M. Zhou, and F. Wang, “Deadlock-free genetic scheduling algorithm for automated manufacturing systems based on deadlock control policy,”IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 3, pp. 603–615, Jun. 2012
2012
-
[12]
A Petri net-based particle swarm optimization approach for scheduling deadlock-prone flexible manufacturing systems,
L. Han, K. Xing, X. Chen, and F. Xiong, “A Petri net-based particle swarm optimization approach for scheduling deadlock-prone flexible manufacturing systems,”J. Intell. Manuf., vol. 29, no. 5, pp. 1083–1096, Oct. 2015
2015
-
[13]
Deadlock-free scheduling of flexible assembly systems based on Petri nets and Local search,
J. Luo, Z. Liu, M. Zhou, and K. Xing, “Deadlock-free scheduling of flexible assembly systems based on Petri nets and Local search,”IEEE Trans. Syst. Man Cybern.: Syst., vol. 50, no. 10, p. 3658–3669, Oct. 2020
2020
-
[14]
Scheduling flexible manufacturing systems using Petri nets and heuristic search,
D. Y . Lee and F. DiCesare, “Scheduling flexible manufacturing systems using Petri nets and heuristic search,”IEEE Trans. Autom. Sci. Eng., vol. 10, no. 2, pp. 123–132, Apr. 1994
1994
-
[15]
Scheduling AMSs with generalized Petri nets and highly informed heuristic search,
F. Yuan, B. Huang, J. Lv, and M. Cui, “Scheduling AMSs with generalized Petri nets and highly informed heuristic search,”Computers & Operations Research, vol. 175, p. 106912, Mar. 2025
2025
-
[16]
Scheduling robotic cellular manufacturing systems with timed Petri net, A* search, and admissible heuristic function,
B. Huang, M. Zhou, A. Abusorrah, and K. Sedraoui, “Scheduling robotic cellular manufacturing systems with timed Petri net, A* search, and admissible heuristic function,”IEEE Trans. Autom. Sci. Eng., vol. 19, no. 1, pp. 243–250, Jan. 2022
2022
-
[17]
Heuristic scheduling for robotic job shops using Petri nets and artificial potential fields,
S. Yi and J. Luo, “Heuristic scheduling for robotic job shops using Petri nets and artificial potential fields,”IEEE Trans. Autom. Sci. Eng., vol. 22, p. 7556–7568, 2025
2025
-
[18]
Deadlock-free scheduling of automated manufacturing systems using Petri nets and hybrid heuristic search,
J. Luo, K. Xing, M. Zhou, X. Li, and X. Wang, “Deadlock-free scheduling of automated manufacturing systems using Petri nets and hybrid heuristic search,”IEEE Trans. Syst. Man Cybern.: Syst., vol. 45, no. 3, pp. 530–541, Mar. 2015
2015
-
[19]
Scheduling of a class of partial routing FMS in uncertain environments with beam search,
G. Cherif, E. Leclercq, and D. Lefebvre, “Scheduling of a class of partial routing FMS in uncertain environments with beam search,”J. Intell. Manuf., vol. 34, no. 2, pp. 493–514, Aug. 2021
2021
-
[20]
A new hybrid filtered beam search algorithm for deadlock-free scheduling of flexible manufacturing systems using Petri nets,
G. Mej ´ıa and K. Ni˜no, “A new hybrid filtered beam search algorithm for deadlock-free scheduling of flexible manufacturing systems using Petri nets,”Comput. Ind. Eng., vol. 108, pp. 165–176, Jun. 2017
2017
-
[21]
Z. He, N. Li, N. Ran, and L. Li, “Scheduling of flexible manufacturing systems based on place-timed Petri nets and basis reachability graphs,” IEEE Trans. Control Syst. Technol., p. 1–13, 2026. [Online]. Available: http://dx.doi.org/10.1109/TCST.2026.3664466
-
[22]
Manufacturing scheduling using colored Petri nets and reinforcement learning,
M. Drakaki and P. Tzionas, “Manufacturing scheduling using colored Petri nets and reinforcement learning,”Appl. Sci.-Basel, vol. 7, no. 2, p. 136, Feb. 2017
2017
-
[23]
Reinforcement learning for robotic flow shop scheduling with processing time variations,
J.-H. Lee and H.-J. Kim, “Reinforcement learning for robotic flow shop scheduling with processing time variations,”Int. J. Prod. Res., vol. 60, no. 7, p. 2346–2368, Feb. 2021
2021
-
[24]
Scheduling of robotic cellular manufacturing systems with timed Petri nets and reinforcement learning,
Z. Yao, B. Huang, J. Lv, X. Lu, M. Cui, and S. Yu, “Scheduling of robotic cellular manufacturing systems with timed Petri nets and reinforcement learning,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Oct. 2024, p. 2333–2338
2024
-
[25]
Introducing PetriRL: An innovative framework for JSSP resolution integrating Petri nets and event-based reinforcement learning,
S. Lassoued and A. Schwung, “Introducing PetriRL: An innovative framework for JSSP resolution integrating Petri nets and event-based reinforcement learning,”Journal of Manufacturing Systems, vol. 74, p. 690–702, Jun. 2024
2024
-
[26]
Research on deadlock-free scheduling of flexible manufacturing systems based on reinforcement learning,
X. Li and J. Gao, “Research on deadlock-free scheduling of flexible manufacturing systems based on reinforcement learning,” in2024 4th International Symposium on Artificial Intelligence and Intelligent Man- ufacturing (AIIM). IEEE, Dec. 2024, p. 64–73
2024
-
[27]
Petri-net- based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network,
L. Hu, Z. Liu, W. Hu, Y . Wang, J. Tan, and F. Wu, “Petri-net- based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network,”Journal of Manufacturing Systems, vol. 55, p. 1–14, Apr. 2020
2020
-
[28]
Heuristic scheduling method for FMSs based on p-timed Petri nets and deep learning,
J. Li, J. Luo, X. Li, S. Yi, and C. Pan, “Heuristic scheduling method for FMSs based on p-timed Petri nets and deep learning,” in2022 IEEE International Conference on Networking, Sensing and Control (ICNSC). IEEE, Dec. 2022, p. 1–6
2022
-
[29]
Deep reinforcement learning for near-optimal control sequence in discrete event systems,
R. Liu, A. M. Mangini, and M. P. Fanti, “Deep reinforcement learning for near-optimal control sequence in discrete event systems,”IEEE Transactions on Automation Science and Engineering, vol. 23, pp. 8086– 8096, 2026
2026
-
[30]
Petri-net-based deep rein- forcement learning for real-time scheduling of automated manufacturing systems,
J. Luo, S. Yi, Z. Lin, H. Zhang, and J. Zhou, “Petri-net-based deep rein- forcement learning for real-time scheduling of automated manufacturing systems,”J. Manuf. Syst., vol. 74, pp. 995–1008, Jun. 2024
2024
-
[31]
Distributed real-time scheduling in cloud manufacturing by deep reinforcement learning,
L. Zhang, C. Yang, Y . Yan, and Y . Hu, “Distributed real-time scheduling in cloud manufacturing by deep reinforcement learning,”IEEE Trans. Ind. Inf., vol. 18, no. 12, p. 8999–9007, Dec. 2022
2022
-
[32]
A Petri net based deadlock prevention policy for flexible manufacturing systems,
J. Ezpeleta, J. Colom, and J. Martinez, “A Petri net based deadlock prevention policy for flexible manufacturing systems,”IEEE Trans. Robot. Autom, vol. 11, no. 2, p. 173–184, Apr. 1995
1995
-
[33]
Timed state graphs for scheduling a class of place-timed petri nets,
J. Zhou, J. Luo, and S. Yi, “Timed state graphs for scheduling a class of place-timed petri nets,” in2023 IEEE International Conference on Networking, Sensing and Control (ICNSC). IEEE, Oct. 2023, p. 1–6
2023
-
[34]
Basis marking representation of Petri net reachability spaces and its application to the reachability problem,
Z. Ma, Y . Tong, Z. Li, and A. Giua, “Basis marking representation of Petri net reachability spaces and its application to the reachability problem,”IEEE Trans. Autom. Control, vol. 62, no. 3, pp. 1078–1093, Mar. 2017
2017
-
[35]
Design of optimal control sequences in Petri nets using basis marking analysis,
Z. Ma, M. Zou, J. Zhang, and Z. Li, “Design of optimal control sequences in Petri nets using basis marking analysis,”IEEE Trans. Autom. Control, vol. 67, no. 7, pp. 3685–3692, Jul. 2022
2022
-
[36]
The surprising effectiveness of PPO in cooperative multi-agent games,
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 24 611– 24 624, 2022
2022
-
[37]
A multi-action deep reinforcement learning framework for flexible job- shop scheduling problem,
K. Lei, P. Guo, W. Zhao, Y . Wang, L. Qian, X. Meng, and L. Tang, “A multi-action deep reinforcement learning framework for flexible job- shop scheduling problem,”Expert Syst. Appl., vol. 205, p. 117796, Nov. 2022
2022
-
[38]
Rectified linear units improve restricted boltz- mann machines,
V . Nair and G. E. Hinton, “Rectified linear units improve restricted boltz- mann machines,” inProceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814
2010
-
[39]
Multiple-objective scheduling and real- time dispatching for the semiconductor manufacturing system,
Y . Lee, Z. Jiang, and H. Liu, “Multiple-objective scheduling and real- time dispatching for the semiconductor manufacturing system,”Com- puters & amp; Operations Research, vol. 36, no. 3, p. 866–884, Mar. 2009
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.