Recognition: unknown
Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling
Pith reviewed 2026-05-10 11:37 UTC · model grok-4.3
The pith
Retrieval-grounded large language model responses scored higher than clinician-authored ones in blinded ratings for CGM diabetes counseling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a blinded multi-rater evaluation, the retrieval-grounded LLM conversational agent produced responses with a mean quality score of 4.37 compared with 3.58 for clinician-authored responses, yielding an estimated mean difference of 0.782 points (95% CI 0.692-0.872; P<.001), with the largest gains in empathy (1.062 points) and actionability (0.992 points); safety flag distributions were similar, with major concerns occurring in only 0.7% of ratings for each group.
What carries the argument
A retrieval-grounded LLM-based conversational agent that produces plain-language explanations of CGM data and diabetes counseling support while avoiding individualized therapeutic advice.
Load-bearing premise
The twelve constructed cases drawn from public datasets adequately represent the variety and complexity of real patient CGM records and questions, and that blinded clinician ratings accurately forecast what patients would experience in actual consultations.
What would settle it
A prospective study in which real patients use the conversational agent before or during clinic visits and researchers measure changes in patient understanding, adherence, or consultation errors relative to standard care.
read the original abstract
Continuous glucose monitoring (CGM) is central to diabetes care, but explaining CGM patterns clearly and empathetically remains time-intensive. Evidence for retrieval-grounded large language model (LLM) systems in CGM-informed counseling remains limited. To evaluate whether a retrieval-grounded LLM-based conversational agent (CA) could support patient understanding of CGM data and preparation for routine diabetes consultations. We developed a retrieval-grounded LLM-based CA for CGM interpretation and diabetes counseling support. The system generated plain-language responses while avoiding individualized therapeutic advice. Twelve CGM-informed cases were constructed from publicly available datasets. Between Oct 2025 and Feb 2026, 6 senior UK diabetes clinicians each reviewed 2 assigned cases and answered 24 questions. In a blinded multi-rater evaluation, each CA-generated and clinician-authored response was independently rated by 3 clinicians on 6 quality dimensions. Safety flags and perceived source labels were also recorded. Primary analyses used linear mixed-effects models. A total of 288 unique responses (144 CA and 144 clinician) generated 864 ratings. The CA received higher quality scores than clinician responses (mean 4.37 vs 3.58), with an estimated mean difference of 0.782 points (95% CI 0.692-0.872; P<.001). The largest differences were for empathy (1.062, 95% CI 0.948-1.177) and actionability (0.992, 95% CI 0.877-1.106). Safety flag distributions were similar, with major concerns rare in both groups (3/432, 0.7% each). Retrieval-grounded LLM systems may have value as adjunct tools for CGM review, patient education, and preconsultation preparation. However, these findings do not support autonomous therapeutic decision-making or unsupervised real-world use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the development and blinded multi-rater evaluation of a retrieval-grounded LLM-based conversational agent (CA) for CGM interpretation and diabetes counseling. Twelve cases were constructed from public datasets; six senior clinicians authored responses to 24 questions across assigned cases, and each of the resulting 288 responses was rated by three independent clinicians on six quality dimensions plus safety flags. Linear mixed-effects models showed CA responses scored higher overall (mean 4.37 vs 3.58, estimated difference 0.782, 95% CI 0.692-0.872, P<.001), with largest gains in empathy (1.062) and actionability (0.992); safety flags were comparable and major concerns rare (0.7% each). The authors conclude that such systems may serve as adjuncts for patient education and pre-consultation preparation but not for autonomous therapeutic use.
Significance. If the comparative results hold, the work supplies concrete evidence that a retrieval-grounded LLM can produce responses rated superior to clinician-authored ones in a blinded setting, especially on empathy and actionability, while maintaining similar safety profiles. The blinded design, use of three raters per response, and linear mixed-effects modeling are clear methodological strengths that support the internal validity of the quality-score differences. This could inform the design of LLM adjuncts that reduce clinician time on routine CGM explanations. The small scale and synthetic nature of the cases, however, constrain how far the findings can be extrapolated to real-world clinical impact or patient outcomes.
major comments (1)
- [Methods] Methods (case construction): The 12 CGM-informed cases are stated to have been 'constructed from publicly available datasets,' yet no explicit selection criteria, stratification by CGM pattern complexity, patient demographics, comorbidities, or edge cases are supplied. This detail is load-bearing for the central claim because the headline quality differences (e.g., empathy difference of 1.062) and the interpretation that the CA 'may have value as adjunct tools' rest on the assumption that these synthetic cases adequately proxy real patient consultations; without it, the observed advantages cannot be confidently generalized.
minor comments (3)
- [Abstract] Abstract: The abstract supplies limited information on the precise rating scale (presumably 1-5), the exact six quality dimensions, and any inter-rater reliability statistics, which would help readers assess the magnitude and robustness of the reported mean differences.
- [Results] Results: The linear mixed-effects model specification (fixed effects, random effects for rater and case, covariance structure) is not described, preventing full evaluation of how the estimated mean difference of 0.782 and its confidence interval were derived.
- [Discussion] Discussion: The manuscript does not report any patient-reported outcome measures or real-time CGM stream validation, which would be useful context even if outside the current scope.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the study's blinded design and methodological strengths, as well as for the constructive comment on case construction. We address the major comment below and commit to revisions that enhance transparency.
read point-by-point responses
-
Referee: [Methods] Methods (case construction): The 12 CGM-informed cases are stated to have been 'constructed from publicly available datasets,' yet no explicit selection criteria, stratification by CGM pattern complexity, patient demographics, comorbidities, or edge cases are supplied. This detail is load-bearing for the central claim because the headline quality differences (e.g., empathy difference of 1.062) and the interpretation that the CA 'may have value as adjunct tools' rest on the assumption that these synthetic cases adequately proxy real patient consultations; without it, the observed advantages cannot be confidently generalized.
Authors: We agree that explicit details on case construction are necessary to allow readers to evaluate the representativeness of the cases and the generalizability of the quality differences. The original manuscript provided only a high-level statement to maintain conciseness, but this omission limits assessment of how well the cases reflect real consultations. In the revised manuscript we will add a dedicated Methods subsection that specifies the publicly available datasets used, the selection criteria applied, stratification by CGM pattern complexity (e.g., glycemic variability, hypo- and hyperglycemia), patient demographics, comorbidities, and deliberate inclusion of both typical and edge cases. We will also describe how the 24 questions were derived from the cases. These additions will directly support the interpretation of the results while preserving the blinded evaluation and statistical analyses. We view this as a straightforward improvement that strengthens rather than alters the core findings. revision: yes
Circularity Check
No significant circularity: empirical evaluation with independent ratings
full rationale
The paper reports an empirical study: 12 constructed cases, clinician-authored responses, blinded ratings by 3 clinicians per response on 6 dimensions, and linear mixed-effects modeling of the resulting 864 ratings. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are invoked to derive the central quality-score differences. The analysis directly compares observed ratings between CA and clinician responses; the statistical model estimates mean differences from the data without reducing to self-definition or prior author results. The 12-case construction and proxy nature of clinician ratings are acknowledged limitations but do not create circularity in the reported findings.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The assumptions underlying linear mixed-effects models are valid for analyzing the ordinal rating data.
Reference graph
Works this paper leans on
-
[1]
American Diabetes Association Professional Practice Committee. 7. Diabetes technology: standards of care in diabetes -2024. Diabetes Care . 2024;47(Suppl 1):S126-S144. doi:10.2337/dc24-S007. PMID:38078575
-
[2]
Diabetes atlas
International Diabetes Federation . Diabetes atlas . 11th ed. Brussels, Belgium: International Diabetes Federation; 2025. ISBN:978 -2-930229- 96-6
2025
-
[3]
Digital management of diabetes global research trends: a bibliometric study
Zhu S, Bian H, Zhan J, Ni L, Huo L, et al. Digital management of diabetes global research trends: a bibliometric study. Front Med (Lausanne) . 2025;12:1620307. doi:10.3389/fmed.2025.1620307. PMID:41164162
-
[4]
A community-codesigned LLM-powered chatbot for primary care: a randomized controlled trial
Li S, Li Y, Zhou S, et al. A community-codesigned LLM-powered chatbot for primary care: a randomized controlled trial. Nat Health . 2026;1(2):238 -
2026
-
[5]
doi:10.1038/s44360-025-00021-w. PMID:41659358
-
[6]
Expanding the role of continuous glucose monitoring in modern diabetes care beyond type 1 disease
Klupa T, Czupryniak L, Dzida G, et al. Expanding the role of continuous glucose monitoring in modern diabetes care beyond type 1 disease. Diabetes Ther. 2023;14(8):1241-1266. doi:10.1007/s13300-023-01431-3. PMID:37322319
-
[7]
Understanding continuous glucose monitoring data
Bergenstal RM. Understanding continuous glucose monitoring data. In: Hirsch IB, editor. Role of continuous glucose monitoring in diabetes treatment. Arlington, VA: American Diabetes Association; 2018. ISBN:978- 1-58040-714-3. PMID:34251769
2018
-
[8]
Enhancing self - management in type 1 diabetes with wearables and deep learning
Zhu T, Uduku C, Li K, Herrero P, Oliver N, Georgiou P. Enhancing self - management in type 1 diabetes with wearables and deep learning. NPJ Digit Med . 2022;5(1):78. doi:10.1038/s41746 -022-00626-5. PMID:35760819
-
[9]
Bendixen BE, Madsen H, Thomsen RW, et al. Intermittent use of continuous glucose monitoring in type 2 diabetes is preferred: a qualitative study of patients' experiences. J Diabetes Sci Technol . 2025. doi:10.1177/19322968251314629. PMID:40116013
-
[10]
ElSayed NA, Aleppo G, Aroda VR, et al; on behalf of the American Diabetes Association. 9. Pharmacologic approaches to glycemic treatment: standards of care in diabetes-2023. Diabetes Care. 2023;46(Suppl 1):S140- S157. doi:10.2337/dc23-S009. PMID:36507650
-
[11]
Using continuous glucose monitoring data in daily clinical practice
Martens TW, Simonson GD, Bergenstal RM. Using continuous glucose monitoring data in daily clinical practice. Cleve Clin J Med . 2024;91(10):611-620. doi:10.3949/ccjm.91a.23090. PMID:39353661
-
[12]
Mayberry LS, Guy C, Hendrickson CD, McCoy AB, Elasy T . Rates and correlates of uptake of continuous glucose monitors among adults with type 2 diabetes in primary care and endocrinology settings. J Gen Intern Med. 2023;38(11):2546 -2552. doi:10.1007/s11606 -023-08222-3. PMID:37254011
-
[13]
Dexcom Clarity Reports Overview
Dexcom. Dexcom Clarity Reports Overview. Dexcom. URL: https://provider.dexcom.com/education-research/cgm-education- use/product-information/dexcom-clarity-reports-overview [accessed 2026-04-02]
2026
-
[14]
FreeStyle Libre software reports tour
Abbott. FreeStyle Libre software reports tour. LibreView. 2024. URL: https://www.libreview.com/files/documents/en- GB/FSReportTour_2024-08-19.pdf [accessed 2026-04-02]
2024
-
[15]
Natale P, et al. Patient experiences of continuous glucose monitoring and sensor-augmented insulin pump therapy for diabetes: a systematic review of qualitative studies. J Diabetes . 2023;15(12):1048 -1069. doi:10.1111/1753-0407.13454. PMID:37551735
-
[16]
Lawton J, Blackburn M, Allen J, et al. Patients' and caregivers' experiences of using continuous glucose monitoring to support diabetes self - management: qualitative study. BMC Endocr Disord . 2018;18(1):12. doi:10.1186/s12902-018-0239-1. PMID:29458348
-
[17]
Kongdee R, Parsia B, Thabit H, Harper S. Glucose interpretation meaning and action (GIMA): insights to blood glucose user interface interpretation in type 1 diabetes. Digit Health . 2025;11:20552076251332580. doi:10.1177/20552076251332580. PMID:40351844
-
[18]
Chapter 3: Diabetes distress [Internet]
Diabetes UK. Chapter 3: Diabetes distress [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/for-professionals/improving- care/good-practice/psychological-care/emotional-health-professionals- guide/chapter-3-diabetes-distress. Accessed 2026 Mar 17
2026
-
[19]
Holt RI, de Groot M, Golden SH. Diabetes and depression. Curr Diab Rep. 2014 Jun;14(6):491. doi: 10.1007/s11892 -014-0491-3. PMID: 24743941; PMCID: PMC4476048
-
[20]
Links between diabetes and depression [Internet]
Diabetes UK. Links between diabetes and depression [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/living-with- diabetes/emotional-wellbeing/depression. Accessed 2026 Mar 17
2026
-
[21]
Large language models for diabetes care: potentials and prospects
Sheng B, Guan Z, Lim LL, Jiang Z, Mathioudakis N, Li J, et al. Large language models for diabetes care: potentials and prospects. Sci Bull (Beijing) . 2024;69(5):583-588. doi:10.1016/j.scib.2024.01.004. PMID:38220476
-
[22]
Large language models in clinical trials: applications, technical advances, and future directions
Lin A, Wang Z, Jiang A, Chen L, Qi C, Zhu L, et al. Large language models in clinical trials: applications, technical advances, and future directions. BMC Med. 2025;23(1):563. doi:10.1186/s12916-025-04348-9. PMID:41088200
-
[23]
Large language models for mental health applications: systematic review
Guo Z, Lai A, Thygesen JH, Farrington J, Keen T , Li K. Large language models for mental health applications: systematic review. JMIR Ment Health . 2024;11:e57400. doi:10.2196/57400. PMID:39423368
-
[24]
Zeng J, Qi W, Shen S, Liu X, Li S, Wang B, et al. Embracing the future of medical education with large language model –based virtual patients: scoping review. J Med Internet Res . 2025;27:e79091. doi:10.2196/79091. PMID:41232097
-
[25]
Gong EJ, Bang CS, Lee JJ, Baik GH. Knowledge -practice performance gap in clinical large language models: systematic review of 39 benchmarks. J Med Internet Res. 2025;27:e84120. doi:10.2196/84120. PMID:41325597
-
[26]
Guo Z, Lai A, Ive J, Petcu A, Wang Y, Qi L, et al. Development and evaluation of HopeBot: an LLM -based chatbot for structured and interactive PHQ -9 depression screening. arXiv [Preprint]. 2025. Available from: https://doi.org/10.48550/arXiv.2507.05984. Accessed 2026 Mar 17
-
[27]
Kelly A, Noctor E, Ryan L, van de Ven P. The effectiveness of a custom AI chatbot for type 2 diabetes mellitus health literacy: development and evaluation study. J Med Internet Res. 2025;27:e70131. doi:10.2196/70131. PMID:40324160
-
[28]
Parameswaran V, Bernard J, Bernard A, Deo N, Tsung S, Lyytinen K, et al. Evaluating large language models and retrieval -augmented generation enhancement for delivering guideline -adherent nutrition information for cardiovascular disease prevention: cross -sectional study. J Med Internet Res. 2025;27:e78625. doi:10.2196/78625. PMID:41057043
-
[29]
Jeon S, Lee S, Kim EH, Eun J, Lee K, Lim H, Lee J. Generative AI chatbot for diabetes management: formative 2-part qualitative study using DTalksBot involving patients and clinicians. JMIR Form Res . 2025;9:e72553. doi:10.2196/72553. PMID:41223424
-
[31]
Healey E, Kohane IS. LLM -CGM: a benchmark for large language model - enabled querying of continuous glucose monitoring data for conversational diabetes management. Pac Symp Biocomput . 2025;30:82 -93. doi:10.1142/9789819807024_0007. PMID:39670363
-
[32]
Healey E, Tan ALM, Flint KL, Ruiz JL, Kohane IS. A case study on using a large language model to analyze continuous glucose monitoring data. Sci Rep. 2025;15(1):1143. doi:10.1038/s41598 -024-84003-0. PMID:39774031
-
[33]
Greenfield M, Stuber D, Stegman -Barber D, Kemmis K, Matthews B, Feuerstein-Simon CB, et al. Diabetes education and support tele-visit needs differ in duration, content, and satisfaction in older versus younger adults. Telemed Rep. 2022;3(1):107 -116. doi:10 .1089/tmr.2022.0007. PMID:35720451
-
[34]
Applied techniques for putting pre-visit planning in clinical practice to empower patient -centered care in the pandemic era: a systematic review and framework suggestion
Gholamzadeh M, Abtahi H, Ghazisaeeidi M. Applied techniques for putting pre-visit planning in clinical practice to empower patient -centered care in the pandemic era: a systematic review and framework suggestion. BMC Health Serv Res. 2021;21(1):458. doi:10. 1186/s12913-021-06456-7. PMID:33985502
2021
-
[35]
Bellido V, Aguilera E, Cardona-Hernandez R, Diaz-Soto G, Gonzalez Perez de Villar N, Picon-Cesar MJ, et al. Expert recommendations for using time -in- range and other continuous glucose monitoring metrics to achieve patient- centered glycemic control in peopl e with diabetes. J Diabetes Sci Technol . 2023;17(5):1326-1336. doi:10.1177/19322968221088601. PM...
-
[36]
What approvals and decisions do I need? Health Research Authority
Health Research Authority. What approvals and decisions do I need? Health Research Authority. URL: https://www.hra.nhs.uk/approvals - amendments/what-approvals-do-i-need/ [accessed 2026-04-02]
2026
-
[37]
Diabetes Datasets -ShanghaiT1DM and ShanghaiT2DM [dataset]
Zhu J. Diabetes Datasets -ShanghaiT1DM and ShanghaiT2DM [dataset]. figshare. 2022. doi:10.6084/m9.figshare.20444397.v3
-
[38]
Charlotte, NC: University of North Carolina at Charlotte
ioT1DM Dataset [dataset on the internet]. Charlotte, NC: University of North Carolina at Charlotte. Available from: https://webpages.charlotte.edu/rbunescu/data/ohiot1dm/OhioT1DM- dataset.html. Accessed 2026 Mar 17
2026
-
[39]
International consensus on use of continuous glucose monitoring
Danne T, Nimri R, Battelino T, Bergenstal RM, Close KL, DeVries JH, et al. International consensus on use of continuous glucose monitoring. Diabetes Care. 2017;40(12):1631-1640. doi:10.2337/dc17-1600. PMID:29162583
-
[40]
Utilizing the new glucometrics: a practical guide to ambulatory glucose profile interpretation
Doupis J, Horton ES. Utilizing the new glucometrics: a practical guide to ambulatory glucose profile interpretation. touchREV Endocrinol . 2022;18(1):20-26. doi:10.17925/EE.2022.18.1.20. PMID:35949362
-
[41]
Szmuilowicz ED, Aleppo G. Stepwise approach to continuous glucose monitoring interpretation for internists and family physicians. Postgrad Med. 2022;134(8):743 -751. doi:10.1080/00325481.2022.2110507. PMID:35930313
-
[42]
Quick guide: interpreting CGM data [Internet]
DiabetesontheNet. Quick guide: interpreting CGM data [Internet]. 2020. Available from: https://diabetesonthenet.com/journal-diabetes- nursing/quick-guide-interpreting-cgm-data/. Accessed 2026 Mar 17
2020
-
[43]
Quality statement 4: continuous glucose monitoring for adults who use insulin and need help monitoring their blood glucose [Internet]
NICE. Quality statement 4: continuous glucose monitoring for adults who use insulin and need help monitoring their blood glucose [Internet]. 2023. Available from: https://www.nice.org.uk/guidance/qs209/chapter/Quality- statement-4-Continuous-glucose-monitoring-for-adults-who-use-insulin- and-need-help-monitoring-their-blood-glucose. Accessed 2026 Mar 17
2023
-
[44]
AGP report [Internet]
Accu-Chek. AGP report [Internet]. Available from: https://www.accu- chek.co.uk/training/cgm/agp-report. Accessed 2026 Mar 17
2026
-
[45]
Managing diabetes [Internet]
National Institute of Diabetes and Digestive and Kidney Diseases. Managing diabetes [Internet]. Available from: https://www.niddk.nih.gov/health- information/diabetes/overview/managing-diabetes. Accessed 2026 Mar 17
2026
-
[46]
Diabetes: what it is, causes, symptoms, treatment and types [Internet]
Cleveland Clinic. Diabetes: what it is, causes, symptoms, treatment and types [Internet]. 2023 Feb 17. Available from: https://my.clevelandclinic.org/health/diseases/7104-diabetes. Accessed 2026 Mar 17
2023
-
[47]
Battelino T , Danne T , Bergenstal RM, Amiel SA, Beck R, Biester T , et al. Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range. Diabetes Care . 2019;42(8):1593 -1603. doi:10.2337/dci19 -0028. PMID:31177185
-
[48]
Type 2 diabetes in adults: management [Internet]
NICE. Type 2 diabetes in adults: management [Internet]. 2015 Dec 2. Available from: https://www.nice.org.uk/guidance/ng28. Accessed 2026 Mar 17
2015
-
[49]
ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 5. Facilitating positive health behaviors and well-being to improve health outcomes: standards of care in diabetes -2023. Diabetes Care . 2023;46(Suppl 1):S68-S96. doi:10.2337/dc23-S005. PMID:36507648
-
[50]
What is diabetes distress and burnout? [Internet]
Diabetes UK. What is diabetes distress and burnout? [Internet]. Available from: https://www.diabetes.org.uk/living-with-diabetes/emotional- wellbeing/diabetes-burnout. Accessed 2026 Mar 17
2026
-
[51]
Diabetes and your emotions [Internet]
Diabetes UK . Diabetes and your emotions [Internet]. Available from: https://www.diabetes.org.uk/living-with-diabetes/emotional-wellbeing. Accessed 2026 Mar 17
2026
-
[52]
10 tips to ease diabetes stress [Internet]
American Diabetes Association. 10 tips to ease diabetes stress [Internet]. Available from: https://diabetes.org/health-wellness/mental- health/ease-diabetes-care-stress. Accessed 2026 Mar 17
2026
-
[53]
MedDialog: large-scale medical dialogue datasets
Zeng G, Yang W, Ju Z, Yang M, Zhang J, Zhou H, et al. MedDialog: large-scale medical dialogue datasets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg, PA: Association for Computational Linguistics; 2020:9241 -9250. doi:10.18653/v1/2020.emnlp-main.743
-
[54]
OpenAIEmbeddings integration [Internet]
LangChain. OpenAIEmbeddings integration [Internet]. Available from: https://docs.langchain.com/oss/python/integrations/text_embedding/ope nai. Accessed 2026 Mar 17
2026
-
[55]
Faiss [Internet]
Meta AI. Faiss [Internet]. Available from: https://ai.meta.com/tools/faiss/. Accessed 2026 Mar 17
2026
-
[56]
Application of chatbots to help patients self-manage diabetes: systematic review and meta-analysis
Wu Y, Zhang J, Ge P, Duan T, Zhou J, Wu Y, et al. Application of chatbots to help patients self-manage diabetes: systematic review and meta-analysis. J Med Internet Res. 2024;26:e60380. doi:10.2196/60380. PMID:39626235
-
[57]
Journal of Chiropractic Medicine , author =
Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med . 2016;15(2):155 -163. doi:10.1016/j.jcm.2016.02.012. PMID:27330520
-
[58]
Using cluster bootstrapping to analyze nested data with a few clusters
Huang FL. Using cluster bootstrapping to analyze nested data with a few clusters. Educ Psychol Meas. 2018;78(2):297-318
2018
-
[59]
Mixed-effects model: a useful statistical tool for longitudinal and cluster studies
Silveira L TY, Ferreira JC, Patino CM. Mixed-effects model: a useful statistical tool for longitudinal and cluster studies. J Bras Pneumol . 2023;49(2):e20230137. doi:10.36416/1806 -3756/e20230137. PMID:37194822
-
[60]
Should we use one -sided or two -sided P values in tests of significance? Clin Exp Pharmacol Physiol
Ludbrook J. Should we use one -sided or two -sided P values in tests of significance? Clin Exp Pharmacol Physiol . 2013;40(6):357 -361. doi:10.1111/1440-1681.12086. PMID:23551169
-
[61]
Descriptive statistics and normality tests for statistical data
Mishra P , Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth . 2019;22(1):67-72. doi:10.4103/aca.ACA_157_18. PMID:30648682
-
[62]
Wilcoxon signed ranks test [Internet]
ScienceDirect Topics. Wilcoxon signed ranks test [Internet]. Available from: https://www.sciencedirect.com/topics/medicine-and-dentistry/wilcoxon- signed-ranks-test. Accessed 2026 Mar 17
2026
-
[63]
Wald tests of singular hypotheses
Drton M, Xiao H. Wald tests of singular hypotheses. Bernoulli. 2016;22(1):38-59. doi:10.3150/14-BEJ620
-
[64]
The Turing test: the first 50 years
French RM. The Turing test: the first 50 years. Trends Cogn Sci . 2000;4(3):115-122. doi:10.1016/S1364 -6613(00)01453-4. PMID:10689346
-
[65]
The binomial test [Internet]
Technology Networks. The binomial test [Internet]. 2024 Mar 26. Available from: http://www.technologynetworks.com/informatics/articles/the- binomial-test-366022. Accessed 2026 Mar 17
2024
-
[66]
Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT -AI extension. Nat Med . 2020;26(9):1351 -1363. doi:10.1038/s41591-020-1037-7. PMID:32908284
-
[67]
Kruskal -Wallis H -test for oneway analysis of variance (ANOVA) by ranks
MacFarland TW, Yates JM. Kruskal -Wallis H -test for oneway analysis of variance (ANOVA) by ranks. In: MacFarland TW, Yates JM, editors. Introduction to nonparametric statistics for the biological sciences using R . Cham, Switzerland: Springer International Publishing; 2016:177 -211. doi:10.1007/978-3-319-30634-6_6. ISBN:978-3-319-30633-9
-
[68]
Foundation models and intelligent decision -making: progress, challenges, and perspectives
Huang J, et al. Foundation models and intelligent decision -making: progress, challenges, and perspectives. Innovation (Camb) . 2025;6(6):100948. doi:10.1016/j.xinn.2025.100948. PMID:40528892
-
[69]
Artificial intelligence tools in supporting healthcare professionals for tailored patient care
Kim J, et al. Artificial intelligence tools in supporting healthcare professionals for tailored patient care. NPJ Digit Med . 2025;8(1):210. doi:10.1038/s41746-025-01604-3. PMID:40240489
-
[70]
Wearable devices and AI -driven remote monitoring in cardiovascular medicine: a narrative review
Gaoudam N, Sakhamudi SK, Kamal B, Addla N, Reddy EP, Ambala M, et al. Wearable devices and AI -driven remote monitoring in cardiovascular medicine: a narrative review. Cureus. 2025;17(8):e90208. doi:10.7759/cureus.90208. PMID:40964568
-
[71]
Canali S, Schiaffonati V, Aliverti A. Challenges and recommendations for wearable devices in digital health: data quality, interoperability, health equity, fairness. PLOS Digit Health. 2022;1(10):e0000104. doi:10.1371/journal.pdig.0000104. PMID:36812619
-
[72]
Investigating sources of inaccuracy in wearable optical heart rate sensors
Bent B, Goldstein BA, Kibbe WA, Dunn JP. Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ Digit Med. 2020;3:18. doi:10.1038/s41746-020-0226-6
-
[73]
Li WJ, Li LZ. Artificial intelligence in mobile health applications: a comprehensive review of its role in diabetes care. World J Methodol . 2026;16(1):107488. doi:10.5662/wjm.v16.i1.107488. PMID:41809156
-
[74]
Limits to the evaluation of the accuracy of continuous glucose monitoring systems by clinical trials
Schrangl P, Reiterer F, Heinemann L, Freckmann G, del Re L. Limits to the evaluation of the accuracy of continuous glucose monitoring systems by clinical trials. Biosensors (Basel). 2018;8(2):50. doi:10.3390/bios8020050. PMID:29783669
-
[75]
GLP -1 agonists [Internet]
Diabetes UK. GLP -1 agonists [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/about-diabetes/looking-after- diabetes/treatments/tablets-and-medication/glp-1. Accessed 2026 Mar 17
2026
-
[76]
Inter -observer agreement and reliability assessment for observational studies of clinical work
Walter SR, Dunsmuir WTM, Westbrook JI. Inter -observer agreement and reliability assessment for observational studies of clinical work. J Biomed Inform. 2019;100:103317. doi:10.1016/j.jbi.2019.103317. PMID:31654801. Multimedia Appendix 1 Patient Name: Steven (ID: 1002) Date: Based on CGM Data – May 2021
-
[77]
Demographics & Medical History • Age: 36 years • Sex: Male • Height / Weight / BMI: 172 cm / 68 kg / BMI: 23.0 • Occupation: Graphic designer (mostly sedentary, 8+ hours/day screen time) • Living Situation: Lives with partner; meals are prepared at home on weekdays, dine-out on weekends • Diabetes Type: Type 1 Diabetes Mellitus (diagnosed at age 30) • Dur...
2020
-
[78]
Treatment & Medication • Insulin Regimen: o Delivery: Continuous Subcutaneous Insulin Infusion (CSII) – Novolin R o Basal Insulin: ▪ Mean basal rate: 0.68 IU/h ▪ Median basal rate: 0.7 IU/h o Bolus Insulin: ▪ Total bolus injections recorded: 28 ▪ Mean bolus dose: 5.46 IU ▪ Administered manually before meals based on experience (no auto calculator) • Other...
-
[79]
CGM Monitoring Summary Monitoring duration: 11 days CGM wear time: 89.8% Total glucose readings: 948 Metric Value Recommended Target* Mean glucose 136.0 mg/dL (7.6 mmol/L) — GMI (Estimated HbA1c) 6.56% — Standard deviation 59.9 mg/dL (3.3 mmol/L) — Coefficient of variation 44.0% <36% Time in Range (70–180 mg/dL [3.9–10.0 mmol/L]) 74.8% >70% Time Below Ran...
-
[80]
Lifestyle & Daily Routine • Diet: o Weekdays: three regular meals at home ▪ Breakfast (8:30 AM): oatmeal with milk, egg or toast ▪ Lunch (1:00 PM): rice/noodles with vegetables and meat ▪ Dinner (7:00 PM): usually stir-fried dishes with rice or soup o Weekends: dine out (hotpot, fast food, or noodles), heavier carbohydrate load o Occasional afternoon snac...
-
[81]
Self-Management Behavior • Insulin Administration: o Comfortable using CSII independently o Adjusts basal settings when necessary (e.g., illness, heavy meals) o Does not use bolus calculator; estimates boluses based on food size/type o Fingerstick glucose checks: 2–3 times/week, especially to confirm CGM readings <70 mg/dL (<3.9 mmol/L) and to rule out fa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.