Recognition: unknown
Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems
Pith reviewed 2026-05-07 14:42 UTC · model grok-4.3
The pith
A rule based on blood glucose changes lets deep reinforcement learning trigger insulin updates only when needed in networked artificial pancreas systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a deep reinforcement learning controller for networked artificial pancreas systems that uses a rule-based criterion on blood glucose changes to decide control update instants. By avoiding joint learning of dosing policy and timing, the problem is cast as a semi-Markov decision process; a standard DRL algorithm is extended to this setting. Numerical experiments confirm that communication frequency drops while blood glucose control performance is preserved.
What carries the argument
A rule-based criterion defined by changes in blood glucose levels that triggers controller updates at irregular intervals without requiring the learner to discover the timing policy.
If this is right
- Control actions are issued only at irregular intervals driven by observed glucose dynamics.
- Network communication load decreases compared with fixed-periodic update schemes.
- Blood glucose regulation quality remains comparable to continuously updated controllers.
- The reinforcement learning task complexity is lowered by separating timing decisions from the dosing policy.
Where Pith is reading between the lines
- The same separation of timing rule from action policy could simplify deep reinforcement learning use in other battery-constrained networked medical controllers.
- The glucose-change trigger might be replaced or augmented by additional physiological signals if clinical data show it occasionally misses rapid excursions.
- Longer device runtime or reduced network congestion in home-based diabetes management systems would follow if the communication savings hold in real patient deployments.
Load-bearing premise
Changes in blood glucose levels alone are sufficient to decide when control updates must occur without missing critical events or allowing unsafe glucose excursions between updates.
What would settle it
A side-by-side simulation or trial in which the event-triggered controller permits blood glucose to reach unsafe levels or shows clear degradation in regulation metrics that the periodic-update version prevents.
Figures
read the original abstract
This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a DRL-based event-triggered controller for networked artificial pancreas systems. To avoid the complexity of jointly learning insulin dosing and update timing, it introduces a rule-based trigger defined by changes in blood glucose levels, converting the problem into an SMDP for which a standard DRL algorithm is extended. Numerical experiments are presented to support the claim of improved communication efficiency while maintaining control performance.
Significance. If the empirical results hold under rigorous validation, the approach offers a practical route to energy-efficient networked control for safety-critical medical devices by reducing update frequency without explicit timing optimization. The use of an independent rule-based trigger to enable SMDP formulation is a pragmatic engineering choice that could generalize to other NCS applications.
major comments (2)
- [Abstract and §3] Abstract and §3 (method): The central claim that the rule-based BG-change criterion is sufficient to decide update instants without missing critical events rests on an unanalyzed assumption; no analytic bound, worst-case detection latency, or sensitivity analysis to sensor noise/meals/exercise is provided, directly undermining the 'maintained performance' half of the headline result.
- [§4] §4 (experiments): The reported numerical results compare communication efficiency and control metrics, but lack explicit baselines (e.g., periodic DRL, other event-triggered methods), statistical significance tests, or safety-specific outcomes such as time-in-range, hypoglycemia events, or maximum glucose excursions during transients; without these, the 'maintained performance' conclusion cannot be assessed as load-bearing.
minor comments (2)
- [§3] Notation for the SMDP transition kernel and the precise definition of the BG-change threshold (including units and filtering) should be stated explicitly in §3 to allow reproduction.
- [§4] Figure captions and axis labels in the experimental plots should include error bars or confidence intervals if multiple runs were performed.
Simulated Author's Rebuttal
We thank the referee for the insightful comments, which help improve the clarity and rigor of our work. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and §3] The central claim that the rule-based BG-change criterion is sufficient to decide update instants without missing critical events rests on an unanalyzed assumption; no analytic bound, worst-case detection latency, or sensitivity analysis to sensor noise/meals/exercise is provided, directly undermining the 'maintained performance' half of the headline result.
Authors: We recognize that the paper lacks a formal analysis of the rule-based trigger's ability to detect critical events. The trigger is designed based on domain knowledge of blood glucose dynamics to capture significant deviations that warrant insulin adjustments. While providing analytic bounds or worst-case latency would be ideal, it is challenging due to the stochastic nature of the system and disturbances. In the revised version, we will include an empirical sensitivity analysis to sensor noise, meals, and exercise scenarios, demonstrating that the performance remains robust. This will support the claim without overclaiming theoretical guarantees. revision: partial
-
Referee: [§4] The reported numerical results compare communication efficiency and control metrics, but lack explicit baselines (e.g., periodic DRL, other event-triggered methods), statistical significance tests, or safety-specific outcomes such as time-in-range, hypoglycemia events, or maximum glucose excursions during transients; without these, the 'maintained performance' conclusion cannot be assessed as load-bearing.
Authors: We agree that the experimental evaluation can be strengthened. The current results focus on communication reduction and basic control metrics in simulation. For the revision, we will add comparisons against periodic DRL controllers and relevant event-triggered methods. We will perform multiple simulation runs and include statistical significance testing (e.g., t-tests or Wilcoxon tests). Additionally, we will report standard safety metrics for AP systems, including time-in-range (70-180 mg/dL), time in hypoglycemia, and peak glucose excursions during meal challenges and other transients. These will be presented in updated tables and figures. revision: yes
Circularity Check
No circularity: rule-based trigger is independent design choice, SMDP extension is standard
full rationale
The paper's central methodological step is to adopt an external rule-based blood-glucose change criterion so that timing is not learned jointly with dosing; this converts the problem to an SMDP that is then solved by extending a standard DRL algorithm. Neither the rule nor the SMDP formulation is derived from the DRL policy or from any fitted parameter inside the paper. The reported numerical improvement is an empirical outcome, not a quantity that reduces by construction to the inputs via the paper's own equations. No self-citation chain, uniqueness theorem, or ansatz smuggling is invoked in the provided text to justify the core claim.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard deep reinforcement learning algorithms can be extended to semi-Markov decision processes arising from event-triggered control
Reference graph
Works this paper leans on
-
[1]
Diagnosis and Classification of Dia- betes Mellitus,
American Diabetes Association, “Diagnosis and Classification of Dia- betes Mellitus,”Diabetes Care, vol.37, no.1, pp. 581–590, 2013. Fig. 7. Time responses under the policy learned by CGM-ETPPO with the variable threshold scheme for adult#009. The first, second, and third plots show CGM value, insulin infusion rate, and the variable CGM-threshold, respect...
2013
-
[2]
Diabetes Mellitus: Classification, Mediators, and Complications; A Gate to Identify Potential Targets for the Development of New Effective Treatments,
S. A. Antar et al., “Diabetes Mellitus: Classification, Mediators, and Complications; A Gate to Identify Potential Targets for the Development of New Effective Treatments,”Biomedicine & Pharmacotherapy, vol. 168, 115734, 2023
2023
-
[3]
2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025,
American Diabetes Association, “2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025,”Diabetes Care, vol. 48, no.1, pp. S27–S49, 2025
2025
-
[4]
Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes,
W. V . Tamborlane et al., “Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes,”New England Journal of Medicine, vol. 359, no.14, pp. 1464–1476, 2008
2008
-
[5]
C. K. Boughton et al., “Hybrid Closed-Loop Glucose Control Compared with Sensor Augmented Pump Therapy in Older Adults with Type 1 Diabetes: An Open-Label Multicentre, Multinational, Randomised, Crossover Study,”The Lancet Healthy Longevity, vol.3, no.3, pp. e135– e142, 2022
2022
-
[6]
Cambridge Hybrid Closed-Loop Algorithm in Children and Adolescents with Type 1 Diabetes: A Multicentre 6-Month Ran- domised Controlled Trial,
J. Ware et al., “Cambridge Hybrid Closed-Loop Algorithm in Children and Adolescents with Type 1 Diabetes: A Multicentre 6-Month Ran- domised Controlled Trial,”The Lancet Digital Health, vol.4, no.4, pp. e245–e255, 2022
2022
-
[7]
Feasibility of Automating Insulin Deliv- ery for the Treatment of Type 1 Diabetes,
G. M. Steil et al., “Feasibility of Automating Insulin Deliv- ery for the Treatment of Type 1 Diabetes,”Diabetes, vol.55, no.12, pp. 3344–3350, 2006
2006
-
[8]
In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes,
B. P. Kovatchev et al., “In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes,”Journal of Diabetes Science and Technology, vol.3, no.1, pp. 44–55, 2009
2009
-
[9]
The UV A/PADOV A Type 1 Diabetes Simulator: New Features,
C. D. Man et al., “The UV A/PADOV A Type 1 Diabetes Simulator: New Features,”Journal of Diabetes Science and Technology, vol.8, no.1, pp. 26–34, 2014. 13
2014
-
[10]
Model Predictive Control of Type 1 Diabetes: An In Silico Trial,
L. Magni et al., “Model Predictive Control of Type 1 Diabetes: An In Silico Trial,”Journal of Diabetes Science and Technology, vol.1, no. 6, pp. 804–812, 2007
2007
-
[11]
Hypoglycemia Prevention via Pump Attenuation and Red-Yellow-Green “Traffic
C. S. Hughes et al., “Hypoglycemia Prevention via Pump Attenuation and Red-Yellow-Green “Traffic” Lights Using Continuous Glucose Monitoring and Insulin Pump Data,”Journal of Diabetes Science and Technology, vol.4, no.5, pp. 1146–1155, 2010
2010
-
[12]
MPC Based Artificial Pancreas: Strategies for Individu- alization and Meal Compensation,
P. Soru et al., “MPC Based Artificial Pancreas: Strategies for Individu- alization and Meal Compensation,”Annual Reviews in Control, vol.36, no.1, pp. 118–128, 2012
2012
-
[13]
Artificial Pancreas: Model Predictive Control Design from Clinical Experience,
C. Toffanin et al., “Artificial Pancreas: Model Predictive Control Design from Clinical Experience,”Journal of Diabetes Science and Technology, vol.7, no.6, pp. 1470–1483, 2013
2013
-
[14]
Fully Integrated Artificial Pancreas in Type 1 Diabetes Modular Closed-Loop Glucose Control Maintains Near Nor- moglycemia,
M. Breton et al., “Fully Integrated Artificial Pancreas in Type 1 Diabetes Modular Closed-Loop Glucose Control Maintains Near Nor- moglycemia,”Diabetes, vol.61, no.9, pp. 2230–2237, 2012
2012
-
[15]
The Diabetes Assistant: A Smartphone-Based System for Real-Time Control of Blood Glucose,
P. Keith-Hynes et al., “The Diabetes Assistant: A Smartphone-Based System for Real-Time Control of Blood Glucose,”Electronics, vol.3, no.4, pp. 609–623, 2014
2014
-
[16]
Realizing a Closed-Loop (Artificial Pancreas) System for the Treatment of Type 1 Diabetes,
R. A. Lal et al., “Realizing a Closed-Loop (Artificial Pancreas) System for the Treatment of Type 1 Diabetes,”Endocrine Reviews, vol.40, no. 6, pp. 1521–1546, 2019
2019
-
[17]
Synthesis of Model Predictive Control and Reinforce- ment Learning: Survey and Classification,
R. Reiter et al., “Synthesis of Model Predictive Control and Reinforce- ment Learning: Survey and Classification,”Annual Reviews in Control, vol.61, 101045, 2026
2026
-
[18]
R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction Second Edition, MIT Press, 2018
2018
-
[19]
Dong et al.,Deep Reinforcement Learning Fundamentals, Research and Applications, Springer, 2021
H. Dong et al.,Deep Reinforcement Learning Fundamentals, Research and Applications, Springer, 2021
2021
-
[20]
A survey of sim- to-real methods in rl: Progress, prospects and challenges with foundation models,
L. Da et al., “A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models”arXiv Preprint, arXiv:2502.13187, 2025
-
[21]
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,
W. Zhao et al., “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,” in Proc. ofIEEE Symposium Series on Com- putational Intelligence, pp. 737–744, 2020
2020
-
[22]
Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Vali- dation,
S. Lee et al., “Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Vali- dation,”IEEE Journal of Biomedical and Health Informatics, vol.25, no.2, pp. 536–546, 2021
2021
-
[23]
Networked Control Systems: A Survey of Trends and Techniques,
X.-M. Zhang et al., “Networked Control Systems: A Survey of Trends and Techniques,”IEEE/CAA Journal of Automatica Sinica, vol.7, no. 1, pp. 1–17, 2019
2019
-
[24]
An Introduction to Event-Triggered and Self-Triggered Control,
W. P. M. H. Heemels et al., “An Introduction to Event-Triggered and Self-Triggered Control,” Proc. of2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 3270–3285, 2012
2012
-
[25]
Learning Event-Triggered Control from Data through Joint Optimization,
N. Funk et al., “Learning Event-Triggered Control from Data through Joint Optimization,”IFAC Journal of Systems and Control, vol.16, 100144, 2021
2021
-
[26]
A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation,
A. Termehchi and M. Rasti, “A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation,”IEEE Transactions on Vehicular Technology, vol.71, no.6, pp. 6322–6334, 2022
2022
-
[27]
Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control,
L. Kesper et al., “Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control,” in Proc. of5th Annual Conference on Learning for Dynamics and Control, vol.211, pp. 1072–1085, 2023
2023
-
[28]
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,
R. S. Sutton et al., “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,”Artificial Intelli- gence, vol.112, no.1-2, pp. 181–211, 1999
1999
-
[29]
Use of a “Fuzzy Logic
R. Mauseth et al., “Use of a “Fuzzy Logic” Controller in a Closed- Loop Artificial Pancreas,”Diabetes Technology & Therapeutics, vol.15, no.8, pp. 628–633, 2013
2013
-
[30]
The Use of Reinforcement Learning Algorithms to Meet the Challenges of an Artificial Pancreas,
M. K. Bothe et al., “The Use of Reinforcement Learning Algorithms to Meet the Challenges of an Artificial Pancreas,”Expert Review of Medical Devices, vol.10, no.5, pp. 661–673, 2014
2014
-
[31]
Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes,
E. Daskalaki et al., “Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes,”PLoS One, vol.11, no.7, e0158722, 2016
2016
-
[32]
A Dual Mode Adaptive Basal-Bolus Advisor Based on Reinforcement Learning,
Q. Sun et al., “A Dual Mode Adaptive Basal-Bolus Advisor Based on Reinforcement Learning,”IEEE Journal of Biomedical and Health Informatics, vol.23, no.6, pp. 2633–2641, 2019
2019
-
[33]
Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review,
M Tejedor et al., “Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review,”Artificial Intelligence In Medicine, vol.104, 101836, 2020
2020
-
[34]
Deep Reinforcement Learning for Closed-Loop Blood Glucose Control,
I. Fox et al., “Deep Reinforcement Learning for Closed-Loop Blood Glucose Control,” in Proc. ofMachine Learning for Healthcare Confer- ence, pp. 508–536, 2020
2020
-
[35]
AndroidAPS,
“AndroidAPS,” [Online]. Available: https://androidaps.readthedocs.io
-
[36]
Deep Reinforcement Learning for Continuous-time Self- triggered Control,
R. Wang et al., “Deep Reinforcement Learning for Continuous-time Self- triggered Control,”IFAC Papers Online, vol.54, no.14, pp. 203–208, 2021
2021
-
[37]
Model-Free Self-Triggered Control Based on Deep Re- inforcement Learning for Unknown Nonlinear Systems,
H. Wan et al., “Model-Free Self-Triggered Control Based on Deep Re- inforcement Learning for Unknown Nonlinear Systems,”International Journal of Robust and Nonlinear Control, vol.33, no.3, pp. 2238–2250, 2023
2023
-
[38]
Policy Gradient Methods for Reinforcement Learn- ing with Function Approximation,
R. S. Sutton et al., “Policy Gradient Methods for Reinforcement Learn- ing with Function Approximation,” in Proc. ofAdvances in Neural Information Processing Systems 12 (NIPS1999), pp. 1057–1063, 1999
1999
-
[39]
High-Dimensional Continuous Control Using Generalized Advantage Estimation
J. Schulman et al., “High-Dimensional Continuous Control Using Gener- alized Advantage Estimation,”arXiv Preprint, arXiv: 1506.02438, 2015
work page internal anchor Pith review arXiv 2015
-
[40]
Asynchronous Methods for Deep Reinforcement Learning,
V . Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” in Proc. ofThe 33rd International Conference on Machine Learning, vol.48, pp. 1928–1937, 2016
1928
-
[41]
Trust Region Policy Optimization,
J. Schulman et al., “Trust Region Policy Optimization,” in Proc. ofthe 32nd International Conference on Machine Learning, vol.37, pp. 1889– 1897, 2015
2015
-
[42]
Proximal Policy Optimization Algorithms
J. Schulman et al., “Proximal Policy Optimization,”arXiv Preprint, arXiv: 1707.06347, 2016
work page internal anchor Pith review arXiv 2016
-
[43]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv Preprint, arXiv:1412.6980, 2014
work page internal anchor Pith review arXiv 2014
-
[44]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine
L. Engstrom et al., “Implementation Matters in Deep RL: A Case Study on PPO and TRPO,”arXiv Preprint, arXiv:2005.12729, 2020
-
[45]
M. Andrychowicz et al., “What Matters In On-Policy Reinforce- ment Learning? A Large-Scale Empirical Study,”arXiv Preprint, arXiv:2006.05990, 2020
-
[46]
Hairer et al.,Solving Ordinary Differential Equations I, Springer, 1993
E. Hairer et al.,Solving Ordinary Differential Equations I, Springer, 1993
1993
-
[47]
Meal Simulation Model of the Glucose-Insulin System,
C. D. Man et al., “Meal Simulation Model of the Glucose-Insulin System,”IEEE Transactions on Biomedical Engineering, vol.54, no. 10, pp. 1740–1749, 2007
2007
-
[48]
Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consen- sus on Time in Range,
T. Battelino et al., “Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consen- sus on Time in Range,”Diabetes Care, vol.42, no.8, pp. 1593–1603, 2019
2019
-
[49]
Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model,
Y . Chen et al., “Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model,” Proc. ofIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3597–3604, 2023
2023
-
[50]
Simglucose v0.2.1,
J. Xie, “Simglucose v0.2.1,” [Online]. Available: https://github.com/ jxx123/simglucose?tab=readme-ov-file
-
[51]
Control-Informed Reinforcement Learning for Chem- ical Processes,
M. Bloor et al., “Control-Informed Reinforcement Learning for Chem- ical Processes,”Industrial & Engineering Chemistry Research, vol.64, no.9, pp. 4966–4978, 2026. APPENDIXA MEALSCENARIOGENERATION At the beginning of each episode, a stochastic meal scenario is generated. The scenario consists of a set of meal events characterized by their occurrence times...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.