pith. sign in

arxiv: 2606.31737 · v1 · pith:BGDA5OTWnew · submitted 2026-06-30 · 📡 eess.SY · cs.SY

Dynamic Scheduling for Flexible Manufacturing Systems Based on Multi-Agent Deep Reinforcement Learning and Petri Nets

Pith reviewed 2026-07-01 03:45 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords flexible manufacturing systemsdynamic schedulingPetri netsmulti-agent deep reinforcement learningMarkov decision processbasis reachability graphproximal policy optimization
0
0 comments X

The pith

A multi-agent reinforcement learning method built on timed Petri nets and their basis reachability graphs solves dynamic scheduling in flexible manufacturing systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper formulates dynamic scheduling for flexible manufacturing systems as a Markov decision process using timed Petri nets. The future system evolution depends only on the current state and next transitions, so historical data is not needed. A compact basis reachability graph reduces the state space to make training feasible. Hierarchical rewards combine immediate guidance with final performance scores, and multi-agent proximal policy optimization trains the system under centralized training and decentralized execution. Tests with order changes and machine failures show the method handles these events and beats standard scheduling techniques.

Core claim

The scheduling problem is formulated as a Markov decision process (MDP) with timed Petri nets, where the future evolution of the system depends exclusively on the current marking and the subsequently executed transitions, independent of historical trajectories. The state space and action space of the MDP are constructed using the notion of basis reachability graph of Petri nets to alleviate the state explosion problem. A hierarchical dense reward function is constructed by integrating stepwise guidance with terminal evaluation. Then, a multi-agent proximal policy optimization algorithm is employed for model training under the centralized training and decentralized execution paradigm. Numeric

What carries the argument

The basis reachability graph of the timed Petri net, which provides a compact yet sufficient representation of system states for constructing the Markov decision process.

If this is right

  • The compact state representation accelerates convergence of the reinforcement learning model.
  • Dynamic events such as new order arrivals, cancellations, and machine failures can be handled in real time.
  • The hierarchical reward function improves learning efficiency over purely terminal rewards.
  • Centralized training with decentralized execution allows efficient multi-agent coordination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Manufacturers could integrate this scheduler directly with existing Petri-net models of their production lines.
  • Extending the approach to include energy consumption or quality metrics in the reward could address additional factory goals.
  • The same MDP construction might apply to scheduling in other timed discrete-event systems like transportation networks.

Load-bearing premise

The basis reachability graph of the timed Petri net supplies a compact yet sufficient state representation that preserves all information needed for optimal dynamic decisions.

What would settle it

A direct comparison on benchmark flexible manufacturing system instances showing that the multi-agent method produces longer completion times or fails to recover from machine failures as well as conventional rescheduling methods would disprove the performance claim.

Figures

Figures reproduced from arXiv: 2606.31737 by Carla Seatzu, Liang Li, Ning Li, Ning Ran, Zhou He.

Figure 1
Figure 1. Figure 1: Flow chart of the proposed scheduling method. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An industrial FMS. The P-TdPN of the FMS is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The P-TdPN of the FMS in Example 1. C. Problem Statement Given a P-TdPN N = (P, T, P re, P ost, D) for an FMS as defined in Definition 1, the dynamic scheduling problem can be formally stated as follows. The system starts from an initial marking M0, in which tokens in start places PS represent jobs to be processed, and tokens in resource places PR indicate the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The BRG B of the P-TdPN in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The interaction between the policy and value networks. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence curve during training. IV. EXPERIMENTAL RESULTS The proposed algorithm is implemented in Python and executed on a personal computer equipped with a 2.1 GHz processor and 16 GB of RAM. In the subsequent subsections, several benchmark instances are evaluated, and the results are compared with those obtained from existing methods. The source code corresponding to all experimental cases is publicly… view at source ↗
Figure 8
Figure 8. Figure 8: Study of reward components on Makespan. increased makespan, as shown in [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Study of reward components on success rate. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

This paper investigates dynamic scheduling for flexible manufacturing systems (FMSs) subject to dynamic events, such as new order arrivals, temporary order cancellations, and machine failures. Traditional methods often face significant challenges in achieving real-time responsiveness under such conditions. To address this issue, the scheduling problem is formulated as a Markov decision process (MDP) with timed Petri nets, where the future evolution of the system depends exclusively on the current marking and the subsequently executed transitions, independent of historical trajectories. The state space and action space of the MDP are constructed using the notion of basis reachability graph (a compact state space representation) of Petri nets to alleviate the state explosion problem, thereby accelerating model training convergence. Meanwhile, a hierarchical dense reward function is constructed by integrating stepwise guidance with terminal evaluation. Then, a multi-agent proximal policy optimization algorithm is employed for model training under the centralized training and decentralized execution paradigm to improve scheduling efficiency. Numerical experiments are conducted involving typical dynamic events, and the results demonstrate that the proposed method can effectively handle dynamic events and achieve superior scheduling performance compared with conventional approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper formulates dynamic scheduling for flexible manufacturing systems under events such as order arrivals, cancellations, and machine failures as a Markov decision process (MDP) whose state space is the basis reachability graph (BRG) of a timed Petri net. A hierarchical dense reward is defined, and a multi-agent proximal policy optimization algorithm is trained under centralized training with decentralized execution. Numerical experiments are reported to show that the resulting policies handle the dynamic events and outperform conventional approaches.

Significance. If the BRG-based MDP is Markovian and the reported performance gains are reproducible, the work would demonstrate a practical route to combining Petri-net state compression with multi-agent RL for real-time manufacturing control, addressing the state-explosion barrier that limits many formal scheduling methods.

major comments (1)
  1. [Abstract and MDP construction] Abstract (MDP construction paragraph): the claim that the BRG supplies a state representation that 'preserves all information needed for optimal dynamic decisions' is load-bearing for the superiority result. When machine-failure events are injected, two markings that collapse to the same basis marking can differ in enabled transitions or residual firing times; if those distinctions affect recovery rewards, the resulting MDP is not Markovian with respect to the true objective, so the multi-agent PPO policy cannot be guaranteed to dominate baselines that retain the full timed marking.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The major comment raises an important point about the Markov property of the BRG-based MDP under machine failures, which we address below.

read point-by-point responses
  1. Referee: Abstract (MDP construction paragraph): the claim that the BRG supplies a state representation that 'preserves all information needed for optimal dynamic decisions' is load-bearing for the superiority result. When machine-failure events are injected, two markings that collapse to the same basis marking can differ in enabled transitions or residual firing times; if those distinctions affect recovery rewards, the resulting MDP is not Markovian with respect to the true objective, so the multi-agent PPO policy cannot be guaranteed to dominate baselines that retain the full timed marking.

    Authors: We appreciate the referee highlighting this subtlety in the timed setting. The manuscript formulates the problem using a timed Petri net whose evolution is stated to depend only on the current marking and fired transitions. The BRG is adopted as the state space precisely because, per its standard definition in the Petri-net scheduling literature, each basis marking encodes the essential reachability information needed to determine enabled transitions and future behavior for the considered events (including failures, which are modeled as exogenous transitions leading to updated markings). Residual firing times are handled via the timed net semantics, with the hierarchical reward incorporating timing-based penalties. We agree that an explicit argument showing the BRG mapping preserves the Markov property for the dynamic events in our experiments would strengthen the presentation. We will therefore revise the abstract and Section III (MDP construction) to add a short paragraph referencing the BRG properties and clarifying the handling of failures and timing, without altering the core method or results. This constitutes a partial revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper formulates dynamic scheduling as an MDP whose state space is the basis reachability graph of a timed Petri net, constructs a hierarchical reward, and trains a multi-agent PPO policy; performance is then evaluated empirically on numerical instances with injected dynamic events. None of these steps reduces a claimed prediction or optimality result to a fitted parameter or self-citation by construction. The sufficiency of the BRG representation is stated as an assumption whose validity is assessed by the reported experiments rather than asserted definitionally. No load-bearing self-citation chain or renaming of known results appears in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies almost no explicit free parameters, axioms, or invented entities; the MDP formulation and reward construction are treated as standard background.

pith-pipeline@v0.9.1-grok · 5727 in / 1020 out tokens · 31417 ms · 2026-07-01T03:45:53.855220+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 1 canonical work pages

  1. [1]

    Industry 4.0: Smart scheduling,

    D. A. Rossit, F. Tohm ´e, and M. Frutos, “Industry 4.0: Smart scheduling,” Int. J. Prod. Res., vol. 57, no. 12, p. 3802–3813, Aug. 2018

  2. [2]

    Scheduling of resource allocation systems with timed Petri nets: A survey,

    B. Huang, M. Zhou, X. S. Lu, and A. Abusorrah, “Scheduling of resource allocation systems with timed Petri nets: A survey,”ACM Comput. Surv., vol. 55, no. 11, p. 1–27, Feb. 2023

  3. [3]

    Industry 4.0: contributions of holonic manufacturing control architectures and future challenges,

    W. Derigent, O. Cardin, and D. Trentesaux, “Industry 4.0: contributions of holonic manufacturing control architectures and future challenges,” J. Intell. Manuf., vol. 32, no. 7, p. 1797–1818, Jan. 2020

  4. [4]

    Scheduling internal operations in post-distribution cross docking systems,

    M. P. Fanti, G. Stecco, and W. Ukovich, “Scheduling internal operations in post-distribution cross docking systems,”IEEE Trans. Autom. Sci. Eng., vol. 13, no. 1, pp. 296–312, Jan. 2016

  5. [5]

    Performance safety enforcement in stochastic event graphs against boost and slow attacks,

    Z. He and Z. Ma, “Performance safety enforcement in stochastic event graphs against boost and slow attacks,”Nonlinear Anal. Hybrid Syst, vol. 41, p. 101057, Aug. 2021

  6. [6]

    Optimization of deterministic timed weighted marked graphs,

    Z. He, Z. Li, and A. Giua, “Optimization of deterministic timed weighted marked graphs,”IEEE Trans. Autom. Sci. Eng., vol. 14, no. 2, pp. 1084– 1095, Nov. 2017

  7. [7]

    Performance optimization for timed weighted marked graphs under infinite server semantics,

    Z. He, Z. Li, and A. Giua, “Performance optimization for timed weighted marked graphs under infinite server semantics,”IEEE Trans. Autom. Control, vol. 63, no. 8, p. 2573–2580, Aug. 2018

  8. [8]

    A survey on performance evaluations in time-dependent petri nets,

    M. P. Fanti, R. Liu, S. Zhang, Z. He, and D. Lefebvre, “A survey on performance evaluations in time-dependent petri nets,”Discrete Event Dynamic Systems, vol. 36, no. 1, p. 13, 2026

  9. [9]

    Design of supervisors for linear marking specifications in labeled Petri nets,

    Z. Ma, Z. He, Z. Li, and A. Giua, “Design of supervisors for linear marking specifications in labeled Petri nets,”Automatica, vol. 136, p. 110031, Feb. 2022

  10. [10]

    FMS scheduling with knowl- edge based genetic algorithm approach,

    A. Prakash, F. T. Chan, and S. Deshmukh, “FMS scheduling with knowl- edge based genetic algorithm approach,”Expert Syst. Appl., vol. 38, no. 4, p. 3161–3171, Apr. 2011

  11. [11]

    Deadlock-free genetic scheduling algorithm for automated manufacturing systems based on deadlock control policy,

    K. Xing, L. Han, M. Zhou, and F. Wang, “Deadlock-free genetic scheduling algorithm for automated manufacturing systems based on deadlock control policy,”IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 42, no. 3, pp. 603–615, Jun. 2012

  12. [12]

    A Petri net-based particle swarm optimization approach for scheduling deadlock-prone flexible manufacturing systems,

    L. Han, K. Xing, X. Chen, and F. Xiong, “A Petri net-based particle swarm optimization approach for scheduling deadlock-prone flexible manufacturing systems,”J. Intell. Manuf., vol. 29, no. 5, pp. 1083–1096, Oct. 2015

  13. [13]

    Deadlock-free scheduling of flexible assembly systems based on Petri nets and Local search,

    J. Luo, Z. Liu, M. Zhou, and K. Xing, “Deadlock-free scheduling of flexible assembly systems based on Petri nets and Local search,”IEEE Trans. Syst. Man Cybern.: Syst., vol. 50, no. 10, p. 3658–3669, Oct. 2020

  14. [14]

    Scheduling flexible manufacturing systems using Petri nets and heuristic search,

    D. Y . Lee and F. DiCesare, “Scheduling flexible manufacturing systems using Petri nets and heuristic search,”IEEE Trans. Autom. Sci. Eng., vol. 10, no. 2, pp. 123–132, Apr. 1994

  15. [15]

    Scheduling AMSs with generalized Petri nets and highly informed heuristic search,

    F. Yuan, B. Huang, J. Lv, and M. Cui, “Scheduling AMSs with generalized Petri nets and highly informed heuristic search,”Computers & Operations Research, vol. 175, p. 106912, Mar. 2025

  16. [16]

    Scheduling robotic cellular manufacturing systems with timed Petri net, A* search, and admissible heuristic function,

    B. Huang, M. Zhou, A. Abusorrah, and K. Sedraoui, “Scheduling robotic cellular manufacturing systems with timed Petri net, A* search, and admissible heuristic function,”IEEE Trans. Autom. Sci. Eng., vol. 19, no. 1, pp. 243–250, Jan. 2022

  17. [17]

    Heuristic scheduling for robotic job shops using Petri nets and artificial potential fields,

    S. Yi and J. Luo, “Heuristic scheduling for robotic job shops using Petri nets and artificial potential fields,”IEEE Trans. Autom. Sci. Eng., vol. 22, p. 7556–7568, 2025

  18. [18]

    Deadlock-free scheduling of automated manufacturing systems using Petri nets and hybrid heuristic search,

    J. Luo, K. Xing, M. Zhou, X. Li, and X. Wang, “Deadlock-free scheduling of automated manufacturing systems using Petri nets and hybrid heuristic search,”IEEE Trans. Syst. Man Cybern.: Syst., vol. 45, no. 3, pp. 530–541, Mar. 2015

  19. [19]

    Scheduling of a class of partial routing FMS in uncertain environments with beam search,

    G. Cherif, E. Leclercq, and D. Lefebvre, “Scheduling of a class of partial routing FMS in uncertain environments with beam search,”J. Intell. Manuf., vol. 34, no. 2, pp. 493–514, Aug. 2021

  20. [20]

    A new hybrid filtered beam search algorithm for deadlock-free scheduling of flexible manufacturing systems using Petri nets,

    G. Mej ´ıa and K. Ni˜no, “A new hybrid filtered beam search algorithm for deadlock-free scheduling of flexible manufacturing systems using Petri nets,”Comput. Ind. Eng., vol. 108, pp. 165–176, Jun. 2017

  21. [21]

    Scheduling of flexible manufacturing systems based on place-timed Petri nets and basis reachability graphs,

    Z. He, N. Li, N. Ran, and L. Li, “Scheduling of flexible manufacturing systems based on place-timed Petri nets and basis reachability graphs,” IEEE Trans. Control Syst. Technol., p. 1–13, 2026. [Online]. Available: http://dx.doi.org/10.1109/TCST.2026.3664466

  22. [22]

    Manufacturing scheduling using colored Petri nets and reinforcement learning,

    M. Drakaki and P. Tzionas, “Manufacturing scheduling using colored Petri nets and reinforcement learning,”Appl. Sci.-Basel, vol. 7, no. 2, p. 136, Feb. 2017

  23. [23]

    Reinforcement learning for robotic flow shop scheduling with processing time variations,

    J.-H. Lee and H.-J. Kim, “Reinforcement learning for robotic flow shop scheduling with processing time variations,”Int. J. Prod. Res., vol. 60, no. 7, p. 2346–2368, Feb. 2021

  24. [24]

    Scheduling of robotic cellular manufacturing systems with timed Petri nets and reinforcement learning,

    Z. Yao, B. Huang, J. Lv, X. Lu, M. Cui, and S. Yu, “Scheduling of robotic cellular manufacturing systems with timed Petri nets and reinforcement learning,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Oct. 2024, p. 2333–2338

  25. [25]

    Introducing PetriRL: An innovative framework for JSSP resolution integrating Petri nets and event-based reinforcement learning,

    S. Lassoued and A. Schwung, “Introducing PetriRL: An innovative framework for JSSP resolution integrating Petri nets and event-based reinforcement learning,”Journal of Manufacturing Systems, vol. 74, p. 690–702, Jun. 2024

  26. [26]

    Research on deadlock-free scheduling of flexible manufacturing systems based on reinforcement learning,

    X. Li and J. Gao, “Research on deadlock-free scheduling of flexible manufacturing systems based on reinforcement learning,” in2024 4th International Symposium on Artificial Intelligence and Intelligent Man- ufacturing (AIIM). IEEE, Dec. 2024, p. 64–73

  27. [27]

    Petri-net- based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network,

    L. Hu, Z. Liu, W. Hu, Y . Wang, J. Tan, and F. Wu, “Petri-net- based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network,”Journal of Manufacturing Systems, vol. 55, p. 1–14, Apr. 2020

  28. [28]

    Heuristic scheduling method for FMSs based on p-timed Petri nets and deep learning,

    J. Li, J. Luo, X. Li, S. Yi, and C. Pan, “Heuristic scheduling method for FMSs based on p-timed Petri nets and deep learning,” in2022 IEEE International Conference on Networking, Sensing and Control (ICNSC). IEEE, Dec. 2022, p. 1–6

  29. [29]

    Deep reinforcement learning for near-optimal control sequence in discrete event systems,

    R. Liu, A. M. Mangini, and M. P. Fanti, “Deep reinforcement learning for near-optimal control sequence in discrete event systems,”IEEE Transactions on Automation Science and Engineering, vol. 23, pp. 8086– 8096, 2026

  30. [30]

    Petri-net-based deep rein- forcement learning for real-time scheduling of automated manufacturing systems,

    J. Luo, S. Yi, Z. Lin, H. Zhang, and J. Zhou, “Petri-net-based deep rein- forcement learning for real-time scheduling of automated manufacturing systems,”J. Manuf. Syst., vol. 74, pp. 995–1008, Jun. 2024

  31. [31]

    Distributed real-time scheduling in cloud manufacturing by deep reinforcement learning,

    L. Zhang, C. Yang, Y . Yan, and Y . Hu, “Distributed real-time scheduling in cloud manufacturing by deep reinforcement learning,”IEEE Trans. Ind. Inf., vol. 18, no. 12, p. 8999–9007, Dec. 2022

  32. [32]

    A Petri net based deadlock prevention policy for flexible manufacturing systems,

    J. Ezpeleta, J. Colom, and J. Martinez, “A Petri net based deadlock prevention policy for flexible manufacturing systems,”IEEE Trans. Robot. Autom, vol. 11, no. 2, p. 173–184, Apr. 1995

  33. [33]

    Timed state graphs for scheduling a class of place-timed petri nets,

    J. Zhou, J. Luo, and S. Yi, “Timed state graphs for scheduling a class of place-timed petri nets,” in2023 IEEE International Conference on Networking, Sensing and Control (ICNSC). IEEE, Oct. 2023, p. 1–6

  34. [34]

    Basis marking representation of Petri net reachability spaces and its application to the reachability problem,

    Z. Ma, Y . Tong, Z. Li, and A. Giua, “Basis marking representation of Petri net reachability spaces and its application to the reachability problem,”IEEE Trans. Autom. Control, vol. 62, no. 3, pp. 1078–1093, Mar. 2017

  35. [35]

    Design of optimal control sequences in Petri nets using basis marking analysis,

    Z. Ma, M. Zou, J. Zhang, and Z. Li, “Design of optimal control sequences in Petri nets using basis marking analysis,”IEEE Trans. Autom. Control, vol. 67, no. 7, pp. 3685–3692, Jul. 2022

  36. [36]

    The surprising effectiveness of PPO in cooperative multi-agent games,

    C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 24 611– 24 624, 2022

  37. [37]

    A multi-action deep reinforcement learning framework for flexible job- shop scheduling problem,

    K. Lei, P. Guo, W. Zhao, Y . Wang, L. Qian, X. Meng, and L. Tang, “A multi-action deep reinforcement learning framework for flexible job- shop scheduling problem,”Expert Syst. Appl., vol. 205, p. 117796, Nov. 2022

  38. [38]

    Rectified linear units improve restricted boltz- mann machines,

    V . Nair and G. E. Hinton, “Rectified linear units improve restricted boltz- mann machines,” inProceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814

  39. [39]

    Multiple-objective scheduling and real- time dispatching for the semiconductor manufacturing system,

    Y . Lee, Z. Jiang, and H. Liu, “Multiple-objective scheduling and real- time dispatching for the semiconductor manufacturing system,”Com- puters & amp; Operations Research, vol. 36, no. 3, p. 866–884, Mar. 2009