pith. machine review for the scientific record. sign in

arxiv: 2604.26126 · v2 · submitted 2026-04-28 · 📡 eess.SY · cs.SY· stat.ML

Recognition: unknown

Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems

Junya Ikemoto , Satoshi Maruyama , Kazumune Hashimoto

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:42 UTC · model grok-4.3

classification 📡 eess.SY cs.SYstat.ML
keywords deep reinforcement learningevent-triggered controlartificial pancreasnetworked control systemssemi-Markov decision processblood glucose regulationinsulin deliverycommunication efficiency
0
0 comments X

The pith

A rule based on blood glucose changes lets deep reinforcement learning trigger insulin updates only when needed in networked artificial pancreas systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to apply deep reinforcement learning to artificial pancreas control under network constraints by replacing periodic updates with event-triggered ones. Instead of forcing the learner to decide both insulin doses and when to communicate, a simple rule watches for blood glucose shifts to set the update times. This turns the irregular timing into a semi-Markov decision process that an existing reinforcement learning algorithm can handle after modest extension. Experiments indicate the approach lowers communication frequency yet keeps glucose regulation performance intact. The result matters for battery-powered medical devices that must conserve network and energy resources while still protecting patients from glucose excursions.

Core claim

The authors introduce a deep reinforcement learning controller for networked artificial pancreas systems that uses a rule-based criterion on blood glucose changes to decide control update instants. By avoiding joint learning of dosing policy and timing, the problem is cast as a semi-Markov decision process; a standard DRL algorithm is extended to this setting. Numerical experiments confirm that communication frequency drops while blood glucose control performance is preserved.

What carries the argument

A rule-based criterion defined by changes in blood glucose levels that triggers controller updates at irregular intervals without requiring the learner to discover the timing policy.

If this is right

  • Control actions are issued only at irregular intervals driven by observed glucose dynamics.
  • Network communication load decreases compared with fixed-periodic update schemes.
  • Blood glucose regulation quality remains comparable to continuously updated controllers.
  • The reinforcement learning task complexity is lowered by separating timing decisions from the dosing policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of timing rule from action policy could simplify deep reinforcement learning use in other battery-constrained networked medical controllers.
  • The glucose-change trigger might be replaced or augmented by additional physiological signals if clinical data show it occasionally misses rapid excursions.
  • Longer device runtime or reduced network congestion in home-based diabetes management systems would follow if the communication savings hold in real patient deployments.

Load-bearing premise

Changes in blood glucose levels alone are sufficient to decide when control updates must occur without missing critical events or allowing unsafe glucose excursions between updates.

What would settle it

A side-by-side simulation or trial in which the event-triggered controller permits blood glucose to reach unsafe levels or shows clear degradation in regulation metrics that the periodic-update version prevents.

Figures

Figures reproduced from arXiv: 2604.26126 by Junya Ikemoto, Kazumune Hashimoto, Satoshi Maruyama.

Figure 1
Figure 1. Figure 1: Illustration of a networked AP system. The controller, implemented view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the glucose-insulin dynamics (S2008). The mathemat view at source ↗
Figure 4
Figure 4. Figure 4: Time responses under the policy learned by CGM-ETPPO for view at source ↗
Figure 5
Figure 5. Figure 5: Histograms of the interval-averaged CGM values and the correspond view at source ↗
Figure 6
Figure 6. Figure 6: Time responses under the policy learned by CGM-ETPPO with view at source ↗
read the original abstract

This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a DRL-based event-triggered controller for networked artificial pancreas systems. To avoid the complexity of jointly learning insulin dosing and update timing, it introduces a rule-based trigger defined by changes in blood glucose levels, converting the problem into an SMDP for which a standard DRL algorithm is extended. Numerical experiments are presented to support the claim of improved communication efficiency while maintaining control performance.

Significance. If the empirical results hold under rigorous validation, the approach offers a practical route to energy-efficient networked control for safety-critical medical devices by reducing update frequency without explicit timing optimization. The use of an independent rule-based trigger to enable SMDP formulation is a pragmatic engineering choice that could generalize to other NCS applications.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method): The central claim that the rule-based BG-change criterion is sufficient to decide update instants without missing critical events rests on an unanalyzed assumption; no analytic bound, worst-case detection latency, or sensitivity analysis to sensor noise/meals/exercise is provided, directly undermining the 'maintained performance' half of the headline result.
  2. [§4] §4 (experiments): The reported numerical results compare communication efficiency and control metrics, but lack explicit baselines (e.g., periodic DRL, other event-triggered methods), statistical significance tests, or safety-specific outcomes such as time-in-range, hypoglycemia events, or maximum glucose excursions during transients; without these, the 'maintained performance' conclusion cannot be assessed as load-bearing.
minor comments (2)
  1. [§3] Notation for the SMDP transition kernel and the precise definition of the BG-change threshold (including units and filtering) should be stated explicitly in §3 to allow reproduction.
  2. [§4] Figure captions and axis labels in the experimental plots should include error bars or confidence intervals if multiple runs were performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments, which help improve the clarity and rigor of our work. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and §3] The central claim that the rule-based BG-change criterion is sufficient to decide update instants without missing critical events rests on an unanalyzed assumption; no analytic bound, worst-case detection latency, or sensitivity analysis to sensor noise/meals/exercise is provided, directly undermining the 'maintained performance' half of the headline result.

    Authors: We recognize that the paper lacks a formal analysis of the rule-based trigger's ability to detect critical events. The trigger is designed based on domain knowledge of blood glucose dynamics to capture significant deviations that warrant insulin adjustments. While providing analytic bounds or worst-case latency would be ideal, it is challenging due to the stochastic nature of the system and disturbances. In the revised version, we will include an empirical sensitivity analysis to sensor noise, meals, and exercise scenarios, demonstrating that the performance remains robust. This will support the claim without overclaiming theoretical guarantees. revision: partial

  2. Referee: [§4] The reported numerical results compare communication efficiency and control metrics, but lack explicit baselines (e.g., periodic DRL, other event-triggered methods), statistical significance tests, or safety-specific outcomes such as time-in-range, hypoglycemia events, or maximum glucose excursions during transients; without these, the 'maintained performance' conclusion cannot be assessed as load-bearing.

    Authors: We agree that the experimental evaluation can be strengthened. The current results focus on communication reduction and basic control metrics in simulation. For the revision, we will add comparisons against periodic DRL controllers and relevant event-triggered methods. We will perform multiple simulation runs and include statistical significance testing (e.g., t-tests or Wilcoxon tests). Additionally, we will report standard safety metrics for AP systems, including time-in-range (70-180 mg/dL), time in hypoglycemia, and peak glucose excursions during meal challenges and other transients. These will be presented in updated tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity: rule-based trigger is independent design choice, SMDP extension is standard

full rationale

The paper's central methodological step is to adopt an external rule-based blood-glucose change criterion so that timing is not learned jointly with dosing; this converts the problem to an SMDP that is then solved by extending a standard DRL algorithm. Neither the rule nor the SMDP formulation is derived from the DRL policy or from any fitted parameter inside the paper. The reported numerical improvement is an empirical outcome, not a quantity that reduces by construction to the inputs via the paper's own equations. No self-citation chain, uniqueness theorem, or ansatz smuggling is invoked in the provided text to justify the core claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract supplies no explicit free parameters, invented entities, or detailed axioms beyond reliance on standard DRL and control-theory concepts; the rule-based trigger is introduced as a practical simplification.

axioms (1)
  • domain assumption Standard deep reinforcement learning algorithms can be extended to semi-Markov decision processes arising from event-triggered control
    The paper states it extends a standard DRL algorithm to handle the SMDP formulation induced by irregular update times.

pith-pipeline@v0.9.0 · 5458 in / 1166 out tokens · 143076 ms · 2026-05-07T14:42:27.521380+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    Diagnosis and Classification of Dia- betes Mellitus,

    American Diabetes Association, “Diagnosis and Classification of Dia- betes Mellitus,”Diabetes Care, vol.37, no.1, pp. 581–590, 2013. Fig. 7. Time responses under the policy learned by CGM-ETPPO with the variable threshold scheme for adult#009. The first, second, and third plots show CGM value, insulin infusion rate, and the variable CGM-threshold, respect...

  2. [2]

    Diabetes Mellitus: Classification, Mediators, and Complications; A Gate to Identify Potential Targets for the Development of New Effective Treatments,

    S. A. Antar et al., “Diabetes Mellitus: Classification, Mediators, and Complications; A Gate to Identify Potential Targets for the Development of New Effective Treatments,”Biomedicine & Pharmacotherapy, vol. 168, 115734, 2023

  3. [3]

    2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025,

    American Diabetes Association, “2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025,”Diabetes Care, vol. 48, no.1, pp. S27–S49, 2025

  4. [4]

    Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes,

    W. V . Tamborlane et al., “Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes,”New England Journal of Medicine, vol. 359, no.14, pp. 1464–1476, 2008

  5. [5]

    C. K. Boughton et al., “Hybrid Closed-Loop Glucose Control Compared with Sensor Augmented Pump Therapy in Older Adults with Type 1 Diabetes: An Open-Label Multicentre, Multinational, Randomised, Crossover Study,”The Lancet Healthy Longevity, vol.3, no.3, pp. e135– e142, 2022

  6. [6]

    Cambridge Hybrid Closed-Loop Algorithm in Children and Adolescents with Type 1 Diabetes: A Multicentre 6-Month Ran- domised Controlled Trial,

    J. Ware et al., “Cambridge Hybrid Closed-Loop Algorithm in Children and Adolescents with Type 1 Diabetes: A Multicentre 6-Month Ran- domised Controlled Trial,”The Lancet Digital Health, vol.4, no.4, pp. e245–e255, 2022

  7. [7]

    Feasibility of Automating Insulin Deliv- ery for the Treatment of Type 1 Diabetes,

    G. M. Steil et al., “Feasibility of Automating Insulin Deliv- ery for the Treatment of Type 1 Diabetes,”Diabetes, vol.55, no.12, pp. 3344–3350, 2006

  8. [8]

    In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes,

    B. P. Kovatchev et al., “In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes,”Journal of Diabetes Science and Technology, vol.3, no.1, pp. 44–55, 2009

  9. [9]

    The UV A/PADOV A Type 1 Diabetes Simulator: New Features,

    C. D. Man et al., “The UV A/PADOV A Type 1 Diabetes Simulator: New Features,”Journal of Diabetes Science and Technology, vol.8, no.1, pp. 26–34, 2014. 13

  10. [10]

    Model Predictive Control of Type 1 Diabetes: An In Silico Trial,

    L. Magni et al., “Model Predictive Control of Type 1 Diabetes: An In Silico Trial,”Journal of Diabetes Science and Technology, vol.1, no. 6, pp. 804–812, 2007

  11. [11]

    Hypoglycemia Prevention via Pump Attenuation and Red-Yellow-Green “Traffic

    C. S. Hughes et al., “Hypoglycemia Prevention via Pump Attenuation and Red-Yellow-Green “Traffic” Lights Using Continuous Glucose Monitoring and Insulin Pump Data,”Journal of Diabetes Science and Technology, vol.4, no.5, pp. 1146–1155, 2010

  12. [12]

    MPC Based Artificial Pancreas: Strategies for Individu- alization and Meal Compensation,

    P. Soru et al., “MPC Based Artificial Pancreas: Strategies for Individu- alization and Meal Compensation,”Annual Reviews in Control, vol.36, no.1, pp. 118–128, 2012

  13. [13]

    Artificial Pancreas: Model Predictive Control Design from Clinical Experience,

    C. Toffanin et al., “Artificial Pancreas: Model Predictive Control Design from Clinical Experience,”Journal of Diabetes Science and Technology, vol.7, no.6, pp. 1470–1483, 2013

  14. [14]

    Fully Integrated Artificial Pancreas in Type 1 Diabetes Modular Closed-Loop Glucose Control Maintains Near Nor- moglycemia,

    M. Breton et al., “Fully Integrated Artificial Pancreas in Type 1 Diabetes Modular Closed-Loop Glucose Control Maintains Near Nor- moglycemia,”Diabetes, vol.61, no.9, pp. 2230–2237, 2012

  15. [15]

    The Diabetes Assistant: A Smartphone-Based System for Real-Time Control of Blood Glucose,

    P. Keith-Hynes et al., “The Diabetes Assistant: A Smartphone-Based System for Real-Time Control of Blood Glucose,”Electronics, vol.3, no.4, pp. 609–623, 2014

  16. [16]

    Realizing a Closed-Loop (Artificial Pancreas) System for the Treatment of Type 1 Diabetes,

    R. A. Lal et al., “Realizing a Closed-Loop (Artificial Pancreas) System for the Treatment of Type 1 Diabetes,”Endocrine Reviews, vol.40, no. 6, pp. 1521–1546, 2019

  17. [17]

    Synthesis of Model Predictive Control and Reinforce- ment Learning: Survey and Classification,

    R. Reiter et al., “Synthesis of Model Predictive Control and Reinforce- ment Learning: Survey and Classification,”Annual Reviews in Control, vol.61, 101045, 2026

  18. [18]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction Second Edition, MIT Press, 2018

  19. [19]

    Dong et al.,Deep Reinforcement Learning Fundamentals, Research and Applications, Springer, 2021

    H. Dong et al.,Deep Reinforcement Learning Fundamentals, Research and Applications, Springer, 2021

  20. [20]

    A survey of sim- to-real methods in rl: Progress, prospects and challenges with foundation models,

    L. Da et al., “A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models”arXiv Preprint, arXiv:2502.13187, 2025

  21. [21]

    Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,

    W. Zhao et al., “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,” in Proc. ofIEEE Symposium Series on Com- putational Intelligence, pp. 737–744, 2020

  22. [22]

    Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Vali- dation,

    S. Lee et al., “Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Vali- dation,”IEEE Journal of Biomedical and Health Informatics, vol.25, no.2, pp. 536–546, 2021

  23. [23]

    Networked Control Systems: A Survey of Trends and Techniques,

    X.-M. Zhang et al., “Networked Control Systems: A Survey of Trends and Techniques,”IEEE/CAA Journal of Automatica Sinica, vol.7, no. 1, pp. 1–17, 2019

  24. [24]

    An Introduction to Event-Triggered and Self-Triggered Control,

    W. P. M. H. Heemels et al., “An Introduction to Event-Triggered and Self-Triggered Control,” Proc. of2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 3270–3285, 2012

  25. [25]

    Learning Event-Triggered Control from Data through Joint Optimization,

    N. Funk et al., “Learning Event-Triggered Control from Data through Joint Optimization,”IFAC Journal of Systems and Control, vol.16, 100144, 2021

  26. [26]

    A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation,

    A. Termehchi and M. Rasti, “A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation,”IEEE Transactions on Vehicular Technology, vol.71, no.6, pp. 6322–6334, 2022

  27. [27]

    Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control,

    L. Kesper et al., “Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control,” in Proc. of5th Annual Conference on Learning for Dynamics and Control, vol.211, pp. 1072–1085, 2023

  28. [28]

    Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,

    R. S. Sutton et al., “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,”Artificial Intelli- gence, vol.112, no.1-2, pp. 181–211, 1999

  29. [29]

    Use of a “Fuzzy Logic

    R. Mauseth et al., “Use of a “Fuzzy Logic” Controller in a Closed- Loop Artificial Pancreas,”Diabetes Technology & Therapeutics, vol.15, no.8, pp. 628–633, 2013

  30. [30]

    The Use of Reinforcement Learning Algorithms to Meet the Challenges of an Artificial Pancreas,

    M. K. Bothe et al., “The Use of Reinforcement Learning Algorithms to Meet the Challenges of an Artificial Pancreas,”Expert Review of Medical Devices, vol.10, no.5, pp. 661–673, 2014

  31. [31]

    Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes,

    E. Daskalaki et al., “Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes,”PLoS One, vol.11, no.7, e0158722, 2016

  32. [32]

    A Dual Mode Adaptive Basal-Bolus Advisor Based on Reinforcement Learning,

    Q. Sun et al., “A Dual Mode Adaptive Basal-Bolus Advisor Based on Reinforcement Learning,”IEEE Journal of Biomedical and Health Informatics, vol.23, no.6, pp. 2633–2641, 2019

  33. [33]

    Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review,

    M Tejedor et al., “Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review,”Artificial Intelligence In Medicine, vol.104, 101836, 2020

  34. [34]

    Deep Reinforcement Learning for Closed-Loop Blood Glucose Control,

    I. Fox et al., “Deep Reinforcement Learning for Closed-Loop Blood Glucose Control,” in Proc. ofMachine Learning for Healthcare Confer- ence, pp. 508–536, 2020

  35. [35]

    AndroidAPS,

    “AndroidAPS,” [Online]. Available: https://androidaps.readthedocs.io

  36. [36]

    Deep Reinforcement Learning for Continuous-time Self- triggered Control,

    R. Wang et al., “Deep Reinforcement Learning for Continuous-time Self- triggered Control,”IFAC Papers Online, vol.54, no.14, pp. 203–208, 2021

  37. [37]

    Model-Free Self-Triggered Control Based on Deep Re- inforcement Learning for Unknown Nonlinear Systems,

    H. Wan et al., “Model-Free Self-Triggered Control Based on Deep Re- inforcement Learning for Unknown Nonlinear Systems,”International Journal of Robust and Nonlinear Control, vol.33, no.3, pp. 2238–2250, 2023

  38. [38]

    Policy Gradient Methods for Reinforcement Learn- ing with Function Approximation,

    R. S. Sutton et al., “Policy Gradient Methods for Reinforcement Learn- ing with Function Approximation,” in Proc. ofAdvances in Neural Information Processing Systems 12 (NIPS1999), pp. 1057–1063, 1999

  39. [39]

    High-Dimensional Continuous Control Using Generalized Advantage Estimation

    J. Schulman et al., “High-Dimensional Continuous Control Using Gener- alized Advantage Estimation,”arXiv Preprint, arXiv: 1506.02438, 2015

  40. [40]

    Asynchronous Methods for Deep Reinforcement Learning,

    V . Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” in Proc. ofThe 33rd International Conference on Machine Learning, vol.48, pp. 1928–1937, 2016

  41. [41]

    Trust Region Policy Optimization,

    J. Schulman et al., “Trust Region Policy Optimization,” in Proc. ofthe 32nd International Conference on Machine Learning, vol.37, pp. 1889– 1897, 2015

  42. [42]

    Proximal Policy Optimization Algorithms

    J. Schulman et al., “Proximal Policy Optimization,”arXiv Preprint, arXiv: 1707.06347, 2016

  43. [43]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv Preprint, arXiv:1412.6980, 2014

  44. [44]

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine

    L. Engstrom et al., “Implementation Matters in Deep RL: A Case Study on PPO and TRPO,”arXiv Preprint, arXiv:2005.12729, 2020

  45. [45]

    Andrychowicz, A

    M. Andrychowicz et al., “What Matters In On-Policy Reinforce- ment Learning? A Large-Scale Empirical Study,”arXiv Preprint, arXiv:2006.05990, 2020

  46. [46]

    Hairer et al.,Solving Ordinary Differential Equations I, Springer, 1993

    E. Hairer et al.,Solving Ordinary Differential Equations I, Springer, 1993

  47. [47]

    Meal Simulation Model of the Glucose-Insulin System,

    C. D. Man et al., “Meal Simulation Model of the Glucose-Insulin System,”IEEE Transactions on Biomedical Engineering, vol.54, no. 10, pp. 1740–1749, 2007

  48. [48]

    Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consen- sus on Time in Range,

    T. Battelino et al., “Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consen- sus on Time in Range,”Diabetes Care, vol.42, no.8, pp. 1593–1603, 2019

  49. [49]

    Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model,

    Y . Chen et al., “Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model,” Proc. ofIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3597–3604, 2023

  50. [50]

    Simglucose v0.2.1,

    J. Xie, “Simglucose v0.2.1,” [Online]. Available: https://github.com/ jxx123/simglucose?tab=readme-ov-file

  51. [51]

    Control-Informed Reinforcement Learning for Chem- ical Processes,

    M. Bloor et al., “Control-Informed Reinforcement Learning for Chem- ical Processes,”Industrial & Engineering Chemistry Research, vol.64, no.9, pp. 4966–4978, 2026. APPENDIXA MEALSCENARIOGENERATION At the beginning of each episode, a stochastic meal scenario is generated. The scenario consists of a set of meal events characterized by their occurrence times...