pith. machine review for the scientific record. sign in

arxiv: 2604.15124 · v1 · submitted 2026-04-16 · 💻 cs.CL

Recognition: unknown

Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling

Alvina Lai, Aristeidis Vagenas, Christo Albor, Emmanouil Korakas, Hengrui Zhang, Irshad Ahamed, Justin Healy, Kezhi Li, Zhijun Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:37 UTC · model grok-4.3

classification 💻 cs.CL
keywords large language modelcontinuous glucose monitoringdiabetes counselingclinical evaluationpatient educationretrieval-augmented generationblinded evaluation
0
0 comments X

The pith

Retrieval-grounded large language model responses scored higher than clinician-authored ones in blinded ratings for CGM diabetes counseling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a retrieval-grounded LLM conversational agent designed to explain continuous glucose monitoring patterns and support diabetes counseling without giving personalized medical advice. Researchers built twelve cases from public datasets and had six senior clinicians answer the same questions, then collected blinded ratings from three other clinicians on each response across six quality dimensions. The agent’s outputs received higher average scores than the clinicians’ own answers, driven mainly by better empathy and actionability, while the rate of safety concerns stayed comparably low. These results point to a possible role for such systems in helping patients prepare for routine visits and understand their CGM data more clearly.

Core claim

In a blinded multi-rater evaluation, the retrieval-grounded LLM conversational agent produced responses with a mean quality score of 4.37 compared with 3.58 for clinician-authored responses, yielding an estimated mean difference of 0.782 points (95% CI 0.692-0.872; P<.001), with the largest gains in empathy (1.062 points) and actionability (0.992 points); safety flag distributions were similar, with major concerns occurring in only 0.7% of ratings for each group.

What carries the argument

A retrieval-grounded LLM-based conversational agent that produces plain-language explanations of CGM data and diabetes counseling support while avoiding individualized therapeutic advice.

Load-bearing premise

The twelve constructed cases drawn from public datasets adequately represent the variety and complexity of real patient CGM records and questions, and that blinded clinician ratings accurately forecast what patients would experience in actual consultations.

What would settle it

A prospective study in which real patients use the conversational agent before or during clinic visits and researchers measure changes in patient understanding, adherence, or consultation errors relative to standard care.

read the original abstract

Continuous glucose monitoring (CGM) is central to diabetes care, but explaining CGM patterns clearly and empathetically remains time-intensive. Evidence for retrieval-grounded large language model (LLM) systems in CGM-informed counseling remains limited. To evaluate whether a retrieval-grounded LLM-based conversational agent (CA) could support patient understanding of CGM data and preparation for routine diabetes consultations. We developed a retrieval-grounded LLM-based CA for CGM interpretation and diabetes counseling support. The system generated plain-language responses while avoiding individualized therapeutic advice. Twelve CGM-informed cases were constructed from publicly available datasets. Between Oct 2025 and Feb 2026, 6 senior UK diabetes clinicians each reviewed 2 assigned cases and answered 24 questions. In a blinded multi-rater evaluation, each CA-generated and clinician-authored response was independently rated by 3 clinicians on 6 quality dimensions. Safety flags and perceived source labels were also recorded. Primary analyses used linear mixed-effects models. A total of 288 unique responses (144 CA and 144 clinician) generated 864 ratings. The CA received higher quality scores than clinician responses (mean 4.37 vs 3.58), with an estimated mean difference of 0.782 points (95% CI 0.692-0.872; P<.001). The largest differences were for empathy (1.062, 95% CI 0.948-1.177) and actionability (0.992, 95% CI 0.877-1.106). Safety flag distributions were similar, with major concerns rare in both groups (3/432, 0.7% each). Retrieval-grounded LLM systems may have value as adjunct tools for CGM review, patient education, and preconsultation preparation. However, these findings do not support autonomous therapeutic decision-making or unsupervised real-world use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript describes the development and blinded multi-rater evaluation of a retrieval-grounded LLM-based conversational agent (CA) for CGM interpretation and diabetes counseling. Twelve cases were constructed from public datasets; six senior clinicians authored responses to 24 questions across assigned cases, and each of the resulting 288 responses was rated by three independent clinicians on six quality dimensions plus safety flags. Linear mixed-effects models showed CA responses scored higher overall (mean 4.37 vs 3.58, estimated difference 0.782, 95% CI 0.692-0.872, P<.001), with largest gains in empathy (1.062) and actionability (0.992); safety flags were comparable and major concerns rare (0.7% each). The authors conclude that such systems may serve as adjuncts for patient education and pre-consultation preparation but not for autonomous therapeutic use.

Significance. If the comparative results hold, the work supplies concrete evidence that a retrieval-grounded LLM can produce responses rated superior to clinician-authored ones in a blinded setting, especially on empathy and actionability, while maintaining similar safety profiles. The blinded design, use of three raters per response, and linear mixed-effects modeling are clear methodological strengths that support the internal validity of the quality-score differences. This could inform the design of LLM adjuncts that reduce clinician time on routine CGM explanations. The small scale and synthetic nature of the cases, however, constrain how far the findings can be extrapolated to real-world clinical impact or patient outcomes.

major comments (1)
  1. [Methods] Methods (case construction): The 12 CGM-informed cases are stated to have been 'constructed from publicly available datasets,' yet no explicit selection criteria, stratification by CGM pattern complexity, patient demographics, comorbidities, or edge cases are supplied. This detail is load-bearing for the central claim because the headline quality differences (e.g., empathy difference of 1.062) and the interpretation that the CA 'may have value as adjunct tools' rest on the assumption that these synthetic cases adequately proxy real patient consultations; without it, the observed advantages cannot be confidently generalized.
minor comments (3)
  1. [Abstract] Abstract: The abstract supplies limited information on the precise rating scale (presumably 1-5), the exact six quality dimensions, and any inter-rater reliability statistics, which would help readers assess the magnitude and robustness of the reported mean differences.
  2. [Results] Results: The linear mixed-effects model specification (fixed effects, random effects for rater and case, covariance structure) is not described, preventing full evaluation of how the estimated mean difference of 0.782 and its confidence interval were derived.
  3. [Discussion] Discussion: The manuscript does not report any patient-reported outcome measures or real-time CGM stream validation, which would be useful context even if outside the current scope.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the study's blinded design and methodological strengths, as well as for the constructive comment on case construction. We address the major comment below and commit to revisions that enhance transparency.

read point-by-point responses
  1. Referee: [Methods] Methods (case construction): The 12 CGM-informed cases are stated to have been 'constructed from publicly available datasets,' yet no explicit selection criteria, stratification by CGM pattern complexity, patient demographics, comorbidities, or edge cases are supplied. This detail is load-bearing for the central claim because the headline quality differences (e.g., empathy difference of 1.062) and the interpretation that the CA 'may have value as adjunct tools' rest on the assumption that these synthetic cases adequately proxy real patient consultations; without it, the observed advantages cannot be confidently generalized.

    Authors: We agree that explicit details on case construction are necessary to allow readers to evaluate the representativeness of the cases and the generalizability of the quality differences. The original manuscript provided only a high-level statement to maintain conciseness, but this omission limits assessment of how well the cases reflect real consultations. In the revised manuscript we will add a dedicated Methods subsection that specifies the publicly available datasets used, the selection criteria applied, stratification by CGM pattern complexity (e.g., glycemic variability, hypo- and hyperglycemia), patient demographics, comorbidities, and deliberate inclusion of both typical and edge cases. We will also describe how the 24 questions were derived from the cases. These additions will directly support the interpretation of the results while preserving the blinded evaluation and statistical analyses. We view this as a straightforward improvement that strengthens rather than alters the core findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical evaluation with independent ratings

full rationale

The paper reports an empirical study: 12 constructed cases, clinician-authored responses, blinded ratings by 3 clinicians per response on 6 dimensions, and linear mixed-effects modeling of the resulting 864 ratings. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are invoked to derive the central quality-score differences. The analysis directly compares observed ratings between CA and clinician responses; the statistical model estimates mean differences from the data without reducing to self-definition or prior author results. The 12-case construction and proxy nature of clinician ratings are acknowledged limitations but do not create circularity in the reported findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical data collection and standard statistical analysis rather than novel theoretical constructs or fitted parameters beyond the study design.

axioms (1)
  • standard math The assumptions underlying linear mixed-effects models are valid for analyzing the ordinal rating data.
    Primary analyses used these models as stated in the abstract.

pith-pipeline@v0.9.0 · 5691 in / 1279 out tokens · 43075 ms · 2026-05-10T11:37:33.875029+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

113 extracted references · 50 canonical work pages

  1. [1]

    American Diabetes Association Professional Practice Committee. 7. Diabetes technology: standards of care in diabetes -2024. Diabetes Care . 2024;47(Suppl 1):S126-S144. doi:10.2337/dc24-S007. PMID:38078575

  2. [2]

    Diabetes atlas

    International Diabetes Federation . Diabetes atlas . 11th ed. Brussels, Belgium: International Diabetes Federation; 2025. ISBN:978 -2-930229- 96-6

  3. [3]

    Digital management of diabetes global research trends: a bibliometric study

    Zhu S, Bian H, Zhan J, Ni L, Huo L, et al. Digital management of diabetes global research trends: a bibliometric study. Front Med (Lausanne) . 2025;12:1620307. doi:10.3389/fmed.2025.1620307. PMID:41164162

  4. [4]

    A community-codesigned LLM-powered chatbot for primary care: a randomized controlled trial

    Li S, Li Y, Zhou S, et al. A community-codesigned LLM-powered chatbot for primary care: a randomized controlled trial. Nat Health . 2026;1(2):238 -

  5. [5]

    PMID:41659358

    doi:10.1038/s44360-025-00021-w. PMID:41659358

  6. [6]

    Expanding the role of continuous glucose monitoring in modern diabetes care beyond type 1 disease

    Klupa T, Czupryniak L, Dzida G, et al. Expanding the role of continuous glucose monitoring in modern diabetes care beyond type 1 disease. Diabetes Ther. 2023;14(8):1241-1266. doi:10.1007/s13300-023-01431-3. PMID:37322319

  7. [7]

    Understanding continuous glucose monitoring data

    Bergenstal RM. Understanding continuous glucose monitoring data. In: Hirsch IB, editor. Role of continuous glucose monitoring in diabetes treatment. Arlington, VA: American Diabetes Association; 2018. ISBN:978- 1-58040-714-3. PMID:34251769

  8. [8]

    Enhancing self - management in type 1 diabetes with wearables and deep learning

    Zhu T, Uduku C, Li K, Herrero P, Oliver N, Georgiou P. Enhancing self - management in type 1 diabetes with wearables and deep learning. NPJ Digit Med . 2022;5(1):78. doi:10.1038/s41746 -022-00626-5. PMID:35760819

  9. [9]

    Intermittent use of continuous glucose monitoring in type 2 diabetes is preferred: a qualitative study of patients' experiences

    Bendixen BE, Madsen H, Thomsen RW, et al. Intermittent use of continuous glucose monitoring in type 2 diabetes is preferred: a qualitative study of patients' experiences. J Diabetes Sci Technol . 2025. doi:10.1177/19322968251314629. PMID:40116013

  10. [10]

    ElSayed NA, Aleppo G, Aroda VR, et al; on behalf of the American Diabetes Association. 9. Pharmacologic approaches to glycemic treatment: standards of care in diabetes-2023. Diabetes Care. 2023;46(Suppl 1):S140- S157. doi:10.2337/dc23-S009. PMID:36507650

  11. [11]

    Using continuous glucose monitoring data in daily clinical practice

    Martens TW, Simonson GD, Bergenstal RM. Using continuous glucose monitoring data in daily clinical practice. Cleve Clin J Med . 2024;91(10):611-620. doi:10.3949/ccjm.91a.23090. PMID:39353661

  12. [12]

    Rates and correlates of uptake of continuous glucose monitors among adults with type 2 diabetes in primary care and endocrinology settings

    Mayberry LS, Guy C, Hendrickson CD, McCoy AB, Elasy T . Rates and correlates of uptake of continuous glucose monitors among adults with type 2 diabetes in primary care and endocrinology settings. J Gen Intern Med. 2023;38(11):2546 -2552. doi:10.1007/s11606 -023-08222-3. PMID:37254011

  13. [13]

    Dexcom Clarity Reports Overview

    Dexcom. Dexcom Clarity Reports Overview. Dexcom. URL: https://provider.dexcom.com/education-research/cgm-education- use/product-information/dexcom-clarity-reports-overview [accessed 2026-04-02]

  14. [14]

    FreeStyle Libre software reports tour

    Abbott. FreeStyle Libre software reports tour. LibreView. 2024. URL: https://www.libreview.com/files/documents/en- GB/FSReportTour_2024-08-19.pdf [accessed 2026-04-02]

  15. [15]

    Patient experiences of continuous glucose monitoring and sensor-augmented insulin pump therapy for diabetes: a systematic review of qualitative studies

    Natale P, et al. Patient experiences of continuous glucose monitoring and sensor-augmented insulin pump therapy for diabetes: a systematic review of qualitative studies. J Diabetes . 2023;15(12):1048 -1069. doi:10.1111/1753-0407.13454. PMID:37551735

  16. [16]

    Patients' and caregivers' experiences of using continuous glucose monitoring to support diabetes self - management: qualitative study

    Lawton J, Blackburn M, Allen J, et al. Patients' and caregivers' experiences of using continuous glucose monitoring to support diabetes self - management: qualitative study. BMC Endocr Disord . 2018;18(1):12. doi:10.1186/s12902-018-0239-1. PMID:29458348

  17. [17]

    Glucose interpretation meaning and action (GIMA): insights to blood glucose user interface interpretation in type 1 diabetes

    Kongdee R, Parsia B, Thabit H, Harper S. Glucose interpretation meaning and action (GIMA): insights to blood glucose user interface interpretation in type 1 diabetes. Digit Health . 2025;11:20552076251332580. doi:10.1177/20552076251332580. PMID:40351844

  18. [18]

    Chapter 3: Diabetes distress [Internet]

    Diabetes UK. Chapter 3: Diabetes distress [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/for-professionals/improving- care/good-practice/psychological-care/emotional-health-professionals- guide/chapter-3-diabetes-distress. Accessed 2026 Mar 17

  19. [19]

    Diabetes and depression

    Holt RI, de Groot M, Golden SH. Diabetes and depression. Curr Diab Rep. 2014 Jun;14(6):491. doi: 10.1007/s11892 -014-0491-3. PMID: 24743941; PMCID: PMC4476048

  20. [20]

    Links between diabetes and depression [Internet]

    Diabetes UK. Links between diabetes and depression [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/living-with- diabetes/emotional-wellbeing/depression. Accessed 2026 Mar 17

  21. [21]

    Large language models for diabetes care: potentials and prospects

    Sheng B, Guan Z, Lim LL, Jiang Z, Mathioudakis N, Li J, et al. Large language models for diabetes care: potentials and prospects. Sci Bull (Beijing) . 2024;69(5):583-588. doi:10.1016/j.scib.2024.01.004. PMID:38220476

  22. [22]

    Large language models in clinical trials: applications, technical advances, and future directions

    Lin A, Wang Z, Jiang A, Chen L, Qi C, Zhu L, et al. Large language models in clinical trials: applications, technical advances, and future directions. BMC Med. 2025;23(1):563. doi:10.1186/s12916-025-04348-9. PMID:41088200

  23. [23]

    Large language models for mental health applications: systematic review

    Guo Z, Lai A, Thygesen JH, Farrington J, Keen T , Li K. Large language models for mental health applications: systematic review. JMIR Ment Health . 2024;11:e57400. doi:10.2196/57400. PMID:39423368

  24. [24]

    Embracing the future of medical education with large language model –based virtual patients: scoping review

    Zeng J, Qi W, Shen S, Liu X, Li S, Wang B, et al. Embracing the future of medical education with large language model –based virtual patients: scoping review. J Med Internet Res . 2025;27:e79091. doi:10.2196/79091. PMID:41232097

  25. [25]

    Knowledge -practice performance gap in clinical large language models: systematic review of 39 benchmarks

    Gong EJ, Bang CS, Lee JJ, Baik GH. Knowledge -practice performance gap in clinical large language models: systematic review of 39 benchmarks. J Med Internet Res. 2025;27:e84120. doi:10.2196/84120. PMID:41325597

  26. [26]

    Development and evaluation of HopeBot: an LLM -based chatbot for structured and interactive PHQ -9 depression screening

    Guo Z, Lai A, Ive J, Petcu A, Wang Y, Qi L, et al. Development and evaluation of HopeBot: an LLM -based chatbot for structured and interactive PHQ -9 depression screening. arXiv [Preprint]. 2025. Available from: https://doi.org/10.48550/arXiv.2507.05984. Accessed 2026 Mar 17

  27. [27]

    The effectiveness of a custom AI chatbot for type 2 diabetes mellitus health literacy: development and evaluation study

    Kelly A, Noctor E, Ryan L, van de Ven P. The effectiveness of a custom AI chatbot for type 2 diabetes mellitus health literacy: development and evaluation study. J Med Internet Res. 2025;27:e70131. doi:10.2196/70131. PMID:40324160

  28. [28]

    Parameswaran V, Bernard J, Bernard A, Deo N, Tsung S, Lyytinen K, et al. Evaluating large language models and retrieval -augmented generation enhancement for delivering guideline -adherent nutrition information for cardiovascular disease prevention: cross -sectional study. J Med Internet Res. 2025;27:e78625. doi:10.2196/78625. PMID:41057043

  29. [29]

    Generative AI chatbot for diabetes management: formative 2-part qualitative study using DTalksBot involving patients and clinicians

    Jeon S, Lee S, Kim EH, Eun J, Lee K, Lim H, Lee J. Generative AI chatbot for diabetes management: formative 2-part qualitative study using DTalksBot involving patients and clinicians. JMIR Form Res . 2025;9:e72553. doi:10.2196/72553. PMID:41223424

  30. [31]

    LLM -CGM: a benchmark for large language model - enabled querying of continuous glucose monitoring data for conversational diabetes management

    Healey E, Kohane IS. LLM -CGM: a benchmark for large language model - enabled querying of continuous glucose monitoring data for conversational diabetes management. Pac Symp Biocomput . 2025;30:82 -93. doi:10.1142/9789819807024_0007. PMID:39670363

  31. [32]

    A fully automated artificial intelligence method for non-invasive, imaging-based identification of gene tic alterations in glioblastomas

    Healey E, Tan ALM, Flint KL, Ruiz JL, Kohane IS. A case study on using a large language model to analyze continuous glucose monitoring data. Sci Rep. 2025;15(1):1143. doi:10.1038/s41598 -024-84003-0. PMID:39774031

  32. [33]

    Diabetes education and support tele-visit needs differ in duration, content, and satisfaction in older versus younger adults

    Greenfield M, Stuber D, Stegman -Barber D, Kemmis K, Matthews B, Feuerstein-Simon CB, et al. Diabetes education and support tele-visit needs differ in duration, content, and satisfaction in older versus younger adults. Telemed Rep. 2022;3(1):107 -116. doi:10 .1089/tmr.2022.0007. PMID:35720451

  33. [34]

    Applied techniques for putting pre-visit planning in clinical practice to empower patient -centered care in the pandemic era: a systematic review and framework suggestion

    Gholamzadeh M, Abtahi H, Ghazisaeeidi M. Applied techniques for putting pre-visit planning in clinical practice to empower patient -centered care in the pandemic era: a systematic review and framework suggestion. BMC Health Serv Res. 2021;21(1):458. doi:10. 1186/s12913-021-06456-7. PMID:33985502

  34. [35]

    Expert recommendations for using time -in- range and other continuous glucose monitoring metrics to achieve patient- centered glycemic control in peopl e with diabetes

    Bellido V, Aguilera E, Cardona-Hernandez R, Diaz-Soto G, Gonzalez Perez de Villar N, Picon-Cesar MJ, et al. Expert recommendations for using time -in- range and other continuous glucose monitoring metrics to achieve patient- centered glycemic control in peopl e with diabetes. J Diabetes Sci Technol . 2023;17(5):1326-1336. doi:10.1177/19322968221088601. PM...

  35. [36]

    What approvals and decisions do I need? Health Research Authority

    Health Research Authority. What approvals and decisions do I need? Health Research Authority. URL: https://www.hra.nhs.uk/approvals - amendments/what-approvals-do-i-need/ [accessed 2026-04-02]

  36. [37]

    Diabetes Datasets -ShanghaiT1DM and ShanghaiT2DM [dataset]

    Zhu J. Diabetes Datasets -ShanghaiT1DM and ShanghaiT2DM [dataset]. figshare. 2022. doi:10.6084/m9.figshare.20444397.v3

  37. [38]

    Charlotte, NC: University of North Carolina at Charlotte

    ioT1DM Dataset [dataset on the internet]. Charlotte, NC: University of North Carolina at Charlotte. Available from: https://webpages.charlotte.edu/rbunescu/data/ohiot1dm/OhioT1DM- dataset.html. Accessed 2026 Mar 17

  38. [39]

    International consensus on use of continuous glucose monitoring

    Danne T, Nimri R, Battelino T, Bergenstal RM, Close KL, DeVries JH, et al. International consensus on use of continuous glucose monitoring. Diabetes Care. 2017;40(12):1631-1640. doi:10.2337/dc17-1600. PMID:29162583

  39. [40]

    Utilizing the new glucometrics: a practical guide to ambulatory glucose profile interpretation

    Doupis J, Horton ES. Utilizing the new glucometrics: a practical guide to ambulatory glucose profile interpretation. touchREV Endocrinol . 2022;18(1):20-26. doi:10.17925/EE.2022.18.1.20. PMID:35949362

  40. [41]

    Stepwise approach to continuous glucose monitoring interpretation for internists and family physicians

    Szmuilowicz ED, Aleppo G. Stepwise approach to continuous glucose monitoring interpretation for internists and family physicians. Postgrad Med. 2022;134(8):743 -751. doi:10.1080/00325481.2022.2110507. PMID:35930313

  41. [42]

    Quick guide: interpreting CGM data [Internet]

    DiabetesontheNet. Quick guide: interpreting CGM data [Internet]. 2020. Available from: https://diabetesonthenet.com/journal-diabetes- nursing/quick-guide-interpreting-cgm-data/. Accessed 2026 Mar 17

  42. [43]

    Quality statement 4: continuous glucose monitoring for adults who use insulin and need help monitoring their blood glucose [Internet]

    NICE. Quality statement 4: continuous glucose monitoring for adults who use insulin and need help monitoring their blood glucose [Internet]. 2023. Available from: https://www.nice.org.uk/guidance/qs209/chapter/Quality- statement-4-Continuous-glucose-monitoring-for-adults-who-use-insulin- and-need-help-monitoring-their-blood-glucose. Accessed 2026 Mar 17

  43. [44]

    AGP report [Internet]

    Accu-Chek. AGP report [Internet]. Available from: https://www.accu- chek.co.uk/training/cgm/agp-report. Accessed 2026 Mar 17

  44. [45]

    Managing diabetes [Internet]

    National Institute of Diabetes and Digestive and Kidney Diseases. Managing diabetes [Internet]. Available from: https://www.niddk.nih.gov/health- information/diabetes/overview/managing-diabetes. Accessed 2026 Mar 17

  45. [46]

    Diabetes: what it is, causes, symptoms, treatment and types [Internet]

    Cleveland Clinic. Diabetes: what it is, causes, symptoms, treatment and types [Internet]. 2023 Feb 17. Available from: https://my.clevelandclinic.org/health/diseases/7104-diabetes. Accessed 2026 Mar 17

  46. [47]

    Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range

    Battelino T , Danne T , Bergenstal RM, Amiel SA, Beck R, Biester T , et al. Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range. Diabetes Care . 2019;42(8):1593 -1603. doi:10.2337/dci19 -0028. PMID:31177185

  47. [48]

    Type 2 diabetes in adults: management [Internet]

    NICE. Type 2 diabetes in adults: management [Internet]. 2015 Dec 2. Available from: https://www.nice.org.uk/guidance/ng28. Accessed 2026 Mar 17

  48. [49]

    ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 5. Facilitating positive health behaviors and well-being to improve health outcomes: standards of care in diabetes -2023. Diabetes Care . 2023;46(Suppl 1):S68-S96. doi:10.2337/dc23-S005. PMID:36507648

  49. [50]

    What is diabetes distress and burnout? [Internet]

    Diabetes UK. What is diabetes distress and burnout? [Internet]. Available from: https://www.diabetes.org.uk/living-with-diabetes/emotional- wellbeing/diabetes-burnout. Accessed 2026 Mar 17

  50. [51]

    Diabetes and your emotions [Internet]

    Diabetes UK . Diabetes and your emotions [Internet]. Available from: https://www.diabetes.org.uk/living-with-diabetes/emotional-wellbeing. Accessed 2026 Mar 17

  51. [52]

    10 tips to ease diabetes stress [Internet]

    American Diabetes Association. 10 tips to ease diabetes stress [Internet]. Available from: https://diabetes.org/health-wellness/mental- health/ease-diabetes-care-stress. Accessed 2026 Mar 17

  52. [53]

    MedDialog: large-scale medical dialogue datasets

    Zeng G, Yang W, Ju Z, Yang M, Zhang J, Zhou H, et al. MedDialog: large-scale medical dialogue datasets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg, PA: Association for Computational Linguistics; 2020:9241 -9250. doi:10.18653/v1/2020.emnlp-main.743

  53. [54]

    OpenAIEmbeddings integration [Internet]

    LangChain. OpenAIEmbeddings integration [Internet]. Available from: https://docs.langchain.com/oss/python/integrations/text_embedding/ope nai. Accessed 2026 Mar 17

  54. [55]

    Faiss [Internet]

    Meta AI. Faiss [Internet]. Available from: https://ai.meta.com/tools/faiss/. Accessed 2026 Mar 17

  55. [56]

    Application of chatbots to help patients self-manage diabetes: systematic review and meta-analysis

    Wu Y, Zhang J, Ge P, Duan T, Zhou J, Wu Y, et al. Application of chatbots to help patients self-manage diabetes: systematic review and meta-analysis. J Med Internet Res. 2024;26:e60380. doi:10.2196/60380. PMID:39626235

  56. [57]

    Journal of Chiropractic Medicine , author =

    Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med . 2016;15(2):155 -163. doi:10.1016/j.jcm.2016.02.012. PMID:27330520

  57. [58]

    Using cluster bootstrapping to analyze nested data with a few clusters

    Huang FL. Using cluster bootstrapping to analyze nested data with a few clusters. Educ Psychol Meas. 2018;78(2):297-318

  58. [59]

    Mixed-effects model: a useful statistical tool for longitudinal and cluster studies

    Silveira L TY, Ferreira JC, Patino CM. Mixed-effects model: a useful statistical tool for longitudinal and cluster studies. J Bras Pneumol . 2023;49(2):e20230137. doi:10.36416/1806 -3756/e20230137. PMID:37194822

  59. [60]

    Should we use one -sided or two -sided P values in tests of significance? Clin Exp Pharmacol Physiol

    Ludbrook J. Should we use one -sided or two -sided P values in tests of significance? Clin Exp Pharmacol Physiol . 2013;40(6):357 -361. doi:10.1111/1440-1681.12086. PMID:23551169

  60. [61]

    Descriptive statistics and normality tests for statistical data

    Mishra P , Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth . 2019;22(1):67-72. doi:10.4103/aca.ACA_157_18. PMID:30648682

  61. [62]

    Wilcoxon signed ranks test [Internet]

    ScienceDirect Topics. Wilcoxon signed ranks test [Internet]. Available from: https://www.sciencedirect.com/topics/medicine-and-dentistry/wilcoxon- signed-ranks-test. Accessed 2026 Mar 17

  62. [63]

    Wald tests of singular hypotheses

    Drton M, Xiao H. Wald tests of singular hypotheses. Bernoulli. 2016;22(1):38-59. doi:10.3150/14-BEJ620

  63. [64]

    The Turing test: the first 50 years

    French RM. The Turing test: the first 50 years. Trends Cogn Sci . 2000;4(3):115-122. doi:10.1016/S1364 -6613(00)01453-4. PMID:10689346

  64. [65]

    The binomial test [Internet]

    Technology Networks. The binomial test [Internet]. 2024 Mar 26. Available from: http://www.technologynetworks.com/informatics/articles/the- binomial-test-366022. Accessed 2026 Mar 17

  65. [66]

    Denniston, Melanie J

    Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT -AI extension. Nat Med . 2020;26(9):1351 -1363. doi:10.1038/s41591-020-1037-7. PMID:32908284

  66. [67]

    Kruskal -Wallis H -test for oneway analysis of variance (ANOVA) by ranks

    MacFarland TW, Yates JM. Kruskal -Wallis H -test for oneway analysis of variance (ANOVA) by ranks. In: MacFarland TW, Yates JM, editors. Introduction to nonparametric statistics for the biological sciences using R . Cham, Switzerland: Springer International Publishing; 2016:177 -211. doi:10.1007/978-3-319-30634-6_6. ISBN:978-3-319-30633-9

  67. [68]

    Foundation models and intelligent decision -making: progress, challenges, and perspectives

    Huang J, et al. Foundation models and intelligent decision -making: progress, challenges, and perspectives. Innovation (Camb) . 2025;6(6):100948. doi:10.1016/j.xinn.2025.100948. PMID:40528892

  68. [69]

    Artificial intelligence tools in supporting healthcare professionals for tailored patient care

    Kim J, et al. Artificial intelligence tools in supporting healthcare professionals for tailored patient care. NPJ Digit Med . 2025;8(1):210. doi:10.1038/s41746-025-01604-3. PMID:40240489

  69. [70]

    Wearable devices and AI -driven remote monitoring in cardiovascular medicine: a narrative review

    Gaoudam N, Sakhamudi SK, Kamal B, Addla N, Reddy EP, Ambala M, et al. Wearable devices and AI -driven remote monitoring in cardiovascular medicine: a narrative review. Cureus. 2025;17(8):e90208. doi:10.7759/cureus.90208. PMID:40964568

  70. [71]

    Challenges and recommendations for wearable devices in digital health: data quality, interoperability, health equity, fairness

    Canali S, Schiaffonati V, Aliverti A. Challenges and recommendations for wearable devices in digital health: data quality, interoperability, health equity, fairness. PLOS Digit Health. 2022;1(10):e0000104. doi:10.1371/journal.pdig.0000104. PMID:36812619

  71. [72]

    Investigating sources of inaccuracy in wearable optical heart rate sensors

    Bent B, Goldstein BA, Kibbe WA, Dunn JP. Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ Digit Med. 2020;3:18. doi:10.1038/s41746-020-0226-6

  72. [73]

    Artificial intelligence in mobile health applications: a comprehensive review of its role in diabetes care

    Li WJ, Li LZ. Artificial intelligence in mobile health applications: a comprehensive review of its role in diabetes care. World J Methodol . 2026;16(1):107488. doi:10.5662/wjm.v16.i1.107488. PMID:41809156

  73. [74]

    Limits to the evaluation of the accuracy of continuous glucose monitoring systems by clinical trials

    Schrangl P, Reiterer F, Heinemann L, Freckmann G, del Re L. Limits to the evaluation of the accuracy of continuous glucose monitoring systems by clinical trials. Biosensors (Basel). 2018;8(2):50. doi:10.3390/bios8020050. PMID:29783669

  74. [75]

    GLP -1 agonists [Internet]

    Diabetes UK. GLP -1 agonists [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/about-diabetes/looking-after- diabetes/treatments/tablets-and-medication/glp-1. Accessed 2026 Mar 17

  75. [76]

    Inter -observer agreement and reliability assessment for observational studies of clinical work

    Walter SR, Dunsmuir WTM, Westbrook JI. Inter -observer agreement and reliability assessment for observational studies of clinical work. J Biomed Inform. 2019;100:103317. doi:10.1016/j.jbi.2019.103317. PMID:31654801. Multimedia Appendix 1 Patient Name: Steven (ID: 1002) Date: Based on CGM Data – May 2021

  76. [77]

    Demographics & Medical History • Age: 36 years • Sex: Male • Height / Weight / BMI: 172 cm / 68 kg / BMI: 23.0 • Occupation: Graphic designer (mostly sedentary, 8+ hours/day screen time) • Living Situation: Lives with partner; meals are prepared at home on weekdays, dine-out on weekends • Diabetes Type: Type 1 Diabetes Mellitus (diagnosed at age 30) • Dur...

  77. [78]

    Treatment & Medication • Insulin Regimen: o Delivery: Continuous Subcutaneous Insulin Infusion (CSII) – Novolin R o Basal Insulin: ▪ Mean basal rate: 0.68 IU/h ▪ Median basal rate: 0.7 IU/h o Bolus Insulin: ▪ Total bolus injections recorded: 28 ▪ Mean bolus dose: 5.46 IU ▪ Administered manually before meals based on experience (no auto calculator) • Other...

  78. [79]

    CGM Monitoring Summary Monitoring duration: 11 days CGM wear time: 89.8% Total glucose readings: 948 Metric Value Recommended Target* Mean glucose 136.0 mg/dL (7.6 mmol/L) — GMI (Estimated HbA1c) 6.56% — Standard deviation 59.9 mg/dL (3.3 mmol/L) — Coefficient of variation 44.0% <36% Time in Range (70–180 mg/dL [3.9–10.0 mmol/L]) 74.8% >70% Time Below Ran...

  79. [80]

    Lifestyle & Daily Routine • Diet: o Weekdays: three regular meals at home ▪ Breakfast (8:30 AM): oatmeal with milk, egg or toast ▪ Lunch (1:00 PM): rice/noodles with vegetables and meat ▪ Dinner (7:00 PM): usually stir-fried dishes with rice or soup o Weekends: dine out (hotpot, fast food, or noodles), heavier carbohydrate load o Occasional afternoon snac...

  80. [81]

    Self-Management Behavior • Insulin Administration: o Comfortable using CSII independently o Adjusts basal settings when necessary (e.g., illness, heavy meals) o Does not use bolus calculator; estimates boluses based on food size/type o Fingerstick glucose checks: 2–3 times/week, especially to confirm CGM readings <70 mg/dL (<3.9 mmol/L) and to rule out fa...

Showing first 80 references.