arxiv: 2604.15124 · v1 · submitted 2026-04-16 · 💻 cs.CL

Recognition: unknown

Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling

Alvina Lai, Aristeidis Vagenas, Christo Albor, Emmanouil Korakas, Hengrui Zhang, Irshad Ahamed, Justin Healy, Kezhi Li, Zhijun Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords large language modelcontinuous glucose monitoringdiabetes counselingclinical evaluationpatient educationretrieval-augmented generationblinded evaluation

0 comments

The pith

Retrieval-grounded large language model responses scored higher than clinician-authored ones in blinded ratings for CGM diabetes counseling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests a retrieval-grounded LLM conversational agent designed to explain continuous glucose monitoring patterns and support diabetes counseling without giving personalized medical advice. Researchers built twelve cases from public datasets and had six senior clinicians answer the same questions, then collected blinded ratings from three other clinicians on each response across six quality dimensions. The agent’s outputs received higher average scores than the clinicians’ own answers, driven mainly by better empathy and actionability, while the rate of safety concerns stayed comparably low. These results point to a possible role for such systems in helping patients prepare for routine visits and understand their CGM data more clearly.

Core claim

In a blinded multi-rater evaluation, the retrieval-grounded LLM conversational agent produced responses with a mean quality score of 4.37 compared with 3.58 for clinician-authored responses, yielding an estimated mean difference of 0.782 points (95% CI 0.692-0.872; P<.001), with the largest gains in empathy (1.062 points) and actionability (0.992 points); safety flag distributions were similar, with major concerns occurring in only 0.7% of ratings for each group.

What carries the argument

A retrieval-grounded LLM-based conversational agent that produces plain-language explanations of CGM data and diabetes counseling support while avoiding individualized therapeutic advice.

Load-bearing premise

The twelve constructed cases drawn from public datasets adequately represent the variety and complexity of real patient CGM records and questions, and that blinded clinician ratings accurately forecast what patients would experience in actual consultations.

What would settle it

A prospective study in which real patients use the conversational agent before or during clinic visits and researchers measure changes in patient understanding, adherence, or consultation errors relative to standard care.

read the original abstract

Continuous glucose monitoring (CGM) is central to diabetes care, but explaining CGM patterns clearly and empathetically remains time-intensive. Evidence for retrieval-grounded large language model (LLM) systems in CGM-informed counseling remains limited. To evaluate whether a retrieval-grounded LLM-based conversational agent (CA) could support patient understanding of CGM data and preparation for routine diabetes consultations. We developed a retrieval-grounded LLM-based CA for CGM interpretation and diabetes counseling support. The system generated plain-language responses while avoiding individualized therapeutic advice. Twelve CGM-informed cases were constructed from publicly available datasets. Between Oct 2025 and Feb 2026, 6 senior UK diabetes clinicians each reviewed 2 assigned cases and answered 24 questions. In a blinded multi-rater evaluation, each CA-generated and clinician-authored response was independently rated by 3 clinicians on 6 quality dimensions. Safety flags and perceived source labels were also recorded. Primary analyses used linear mixed-effects models. A total of 288 unique responses (144 CA and 144 clinician) generated 864 ratings. The CA received higher quality scores than clinician responses (mean 4.37 vs 3.58), with an estimated mean difference of 0.782 points (95% CI 0.692-0.872; P<.001). The largest differences were for empathy (1.062, 95% CI 0.948-1.177) and actionability (0.992, 95% CI 0.877-1.106). Safety flag distributions were similar, with major concerns rare in both groups (3/432, 0.7% each). Retrieval-grounded LLM systems may have value as adjunct tools for CGM review, patient education, and preconsultation preparation. However, these findings do not support autonomous therapeutic decision-making or unsupervised real-world use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM scored higher than clinicians on blinded ratings for CGM responses, mainly on empathy and actionability, but the 12 synthetic cases make real-world claims tentative.

read the letter

The main thing to know is that their retrieval-grounded LLM agent produced responses rated higher overall than those written by senior clinicians, with the clearest advantages in empathy and actionability, while safety flags stayed low and similar across both sets. They ran this as a blinded multi-rater study with linear mixed-effects models on 864 ratings from 12 constructed cases drawn from public datasets. Six UK diabetes clinicians each handled two cases and answered 24 questions about them; three other clinicians then scored every response on six dimensions without knowing the source. The mean difference came out to 0.78 points favoring the LLM, with tight confidence intervals that support the gap. That setup is cleaner than many LLM evaluation papers in clinical domains, and the numbers are reported plainly enough to check the analysis yourself. The design credits the use of independent raters and avoids obvious self-referential scoring. The soft spot is the narrow base of evidence. Twelve constructed cases from public sources do not cover the range of real patient CGM streams, comorbidities, or unusual patterns that show up in clinics. There are no patient ratings, no measures of actual comprehension or behavior change, and no real-time data streams. Clinician proxies can favor polished text even when the content is equivalent, so the empathy edge might not hold when patients are the end users. The paper stays within its bounds by noting it does not support autonomous use, which is fair. This is for researchers working on LLM tools for patient education in endocrinology or digital health. Readers who care about controlled evaluation methods will find the blinded rating protocol and model choice useful. It deserves peer review because the execution is structured and the results are specific enough for referees to evaluate the limits directly. Minor expansion on case construction details would help, but the core comparison stands on its own terms.

Referee Report

1 major / 3 minor

Summary. The manuscript describes the development and blinded multi-rater evaluation of a retrieval-grounded LLM-based conversational agent (CA) for CGM interpretation and diabetes counseling. Twelve cases were constructed from public datasets; six senior clinicians authored responses to 24 questions across assigned cases, and each of the resulting 288 responses was rated by three independent clinicians on six quality dimensions plus safety flags. Linear mixed-effects models showed CA responses scored higher overall (mean 4.37 vs 3.58, estimated difference 0.782, 95% CI 0.692-0.872, P<.001), with largest gains in empathy (1.062) and actionability (0.992); safety flags were comparable and major concerns rare (0.7% each). The authors conclude that such systems may serve as adjuncts for patient education and pre-consultation preparation but not for autonomous therapeutic use.

Significance. If the comparative results hold, the work supplies concrete evidence that a retrieval-grounded LLM can produce responses rated superior to clinician-authored ones in a blinded setting, especially on empathy and actionability, while maintaining similar safety profiles. The blinded design, use of three raters per response, and linear mixed-effects modeling are clear methodological strengths that support the internal validity of the quality-score differences. This could inform the design of LLM adjuncts that reduce clinician time on routine CGM explanations. The small scale and synthetic nature of the cases, however, constrain how far the findings can be extrapolated to real-world clinical impact or patient outcomes.

major comments (1)

[Methods] Methods (case construction): The 12 CGM-informed cases are stated to have been 'constructed from publicly available datasets,' yet no explicit selection criteria, stratification by CGM pattern complexity, patient demographics, comorbidities, or edge cases are supplied. This detail is load-bearing for the central claim because the headline quality differences (e.g., empathy difference of 1.062) and the interpretation that the CA 'may have value as adjunct tools' rest on the assumption that these synthetic cases adequately proxy real patient consultations; without it, the observed advantages cannot be confidently generalized.

minor comments (3)

[Abstract] Abstract: The abstract supplies limited information on the precise rating scale (presumably 1-5), the exact six quality dimensions, and any inter-rater reliability statistics, which would help readers assess the magnitude and robustness of the reported mean differences.
[Results] Results: The linear mixed-effects model specification (fixed effects, random effects for rater and case, covariance structure) is not described, preventing full evaluation of how the estimated mean difference of 0.782 and its confidence interval were derived.
[Discussion] Discussion: The manuscript does not report any patient-reported outcome measures or real-time CGM stream validation, which would be useful context even if outside the current scope.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the study's blinded design and methodological strengths, as well as for the constructive comment on case construction. We address the major comment below and commit to revisions that enhance transparency.

read point-by-point responses

Referee: [Methods] Methods (case construction): The 12 CGM-informed cases are stated to have been 'constructed from publicly available datasets,' yet no explicit selection criteria, stratification by CGM pattern complexity, patient demographics, comorbidities, or edge cases are supplied. This detail is load-bearing for the central claim because the headline quality differences (e.g., empathy difference of 1.062) and the interpretation that the CA 'may have value as adjunct tools' rest on the assumption that these synthetic cases adequately proxy real patient consultations; without it, the observed advantages cannot be confidently generalized.

Authors: We agree that explicit details on case construction are necessary to allow readers to evaluate the representativeness of the cases and the generalizability of the quality differences. The original manuscript provided only a high-level statement to maintain conciseness, but this omission limits assessment of how well the cases reflect real consultations. In the revised manuscript we will add a dedicated Methods subsection that specifies the publicly available datasets used, the selection criteria applied, stratification by CGM pattern complexity (e.g., glycemic variability, hypo- and hyperglycemia), patient demographics, comorbidities, and deliberate inclusion of both typical and edge cases. We will also describe how the 24 questions were derived from the cases. These additions will directly support the interpretation of the results while preserving the blinded evaluation and statistical analyses. We view this as a straightforward improvement that strengthens rather than alters the core findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical evaluation with independent ratings

full rationale

The paper reports an empirical study: 12 constructed cases, clinician-authored responses, blinded ratings by 3 clinicians per response on 6 dimensions, and linear mixed-effects modeling of the resulting 864 ratings. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are invoked to derive the central quality-score differences. The analysis directly compares observed ratings between CA and clinician responses; the statistical model estimates mean differences from the data without reducing to self-definition or prior author results. The 12-case construction and proxy nature of clinician ratings are acknowledged limitations but do not create circularity in the reported findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical data collection and standard statistical analysis rather than novel theoretical constructs or fitted parameters beyond the study design.

axioms (1)

standard math The assumptions underlying linear mixed-effects models are valid for analyzing the ordinal rating data.
Primary analyses used these models as stated in the abstract.

pith-pipeline@v0.9.0 · 5691 in / 1279 out tokens · 43075 ms · 2026-05-10T11:37:33.875029+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

113 extracted references · 50 canonical work pages

[1]

American Diabetes Association Professional Practice Committee. 7. Diabetes technology: standards of care in diabetes -2024. Diabetes Care . 2024;47(Suppl 1):S126-S144. doi:10.2337/dc24-S007. PMID:38078575

work page doi:10.2337/dc24-s007 2024
[2]

Diabetes atlas

International Diabetes Federation . Diabetes atlas . 11th ed. Brussels, Belgium: International Diabetes Federation; 2025. ISBN:978 -2-930229- 96-6

2025
[3]

Digital management of diabetes global research trends: a bibliometric study

Zhu S, Bian H, Zhan J, Ni L, Huo L, et al. Digital management of diabetes global research trends: a bibliometric study. Front Med (Lausanne) . 2025;12:1620307. doi:10.3389/fmed.2025.1620307. PMID:41164162

work page doi:10.3389/fmed.2025.1620307 2025
[4]

A community-codesigned LLM-powered chatbot for primary care: a randomized controlled trial

Li S, Li Y, Zhou S, et al. A community-codesigned LLM-powered chatbot for primary care: a randomized controlled trial. Nat Health . 2026;1(2):238 -

2026
[5]

PMID:41659358

doi:10.1038/s44360-025-00021-w. PMID:41659358

work page doi:10.1038/s44360-025-00021-w
[6]

Expanding the role of continuous glucose monitoring in modern diabetes care beyond type 1 disease

Klupa T, Czupryniak L, Dzida G, et al. Expanding the role of continuous glucose monitoring in modern diabetes care beyond type 1 disease. Diabetes Ther. 2023;14(8):1241-1266. doi:10.1007/s13300-023-01431-3. PMID:37322319

work page doi:10.1007/s13300-023-01431-3 2023
[7]

Understanding continuous glucose monitoring data

Bergenstal RM. Understanding continuous glucose monitoring data. In: Hirsch IB, editor. Role of continuous glucose monitoring in diabetes treatment. Arlington, VA: American Diabetes Association; 2018. ISBN:978- 1-58040-714-3. PMID:34251769

2018
[8]

Enhancing self - management in type 1 diabetes with wearables and deep learning

Zhu T, Uduku C, Li K, Herrero P, Oliver N, Georgiou P. Enhancing self - management in type 1 diabetes with wearables and deep learning. NPJ Digit Med . 2022;5(1):78. doi:10.1038/s41746 -022-00626-5. PMID:35760819

work page doi:10.1038/s41746 2022
[9]

Intermittent use of continuous glucose monitoring in type 2 diabetes is preferred: a qualitative study of patients' experiences

Bendixen BE, Madsen H, Thomsen RW, et al. Intermittent use of continuous glucose monitoring in type 2 diabetes is preferred: a qualitative study of patients' experiences. J Diabetes Sci Technol . 2025. doi:10.1177/19322968251314629. PMID:40116013

work page doi:10.1177/19322968251314629 2025
[10]

ElSayed NA, Aleppo G, Aroda VR, et al; on behalf of the American Diabetes Association. 9. Pharmacologic approaches to glycemic treatment: standards of care in diabetes-2023. Diabetes Care. 2023;46(Suppl 1):S140- S157. doi:10.2337/dc23-S009. PMID:36507650

work page doi:10.2337/dc23-s009 2023
[11]

Using continuous glucose monitoring data in daily clinical practice

Martens TW, Simonson GD, Bergenstal RM. Using continuous glucose monitoring data in daily clinical practice. Cleve Clin J Med . 2024;91(10):611-620. doi:10.3949/ccjm.91a.23090. PMID:39353661

work page doi:10.3949/ccjm.91a.23090 2024
[12]

Rates and correlates of uptake of continuous glucose monitors among adults with type 2 diabetes in primary care and endocrinology settings

Mayberry LS, Guy C, Hendrickson CD, McCoy AB, Elasy T . Rates and correlates of uptake of continuous glucose monitors among adults with type 2 diabetes in primary care and endocrinology settings. J Gen Intern Med. 2023;38(11):2546 -2552. doi:10.1007/s11606 -023-08222-3. PMID:37254011

work page doi:10.1007/s11606 2023
[13]

Dexcom Clarity Reports Overview

Dexcom. Dexcom Clarity Reports Overview. Dexcom. URL: https://provider.dexcom.com/education-research/cgm-education- use/product-information/dexcom-clarity-reports-overview [accessed 2026-04-02]

2026
[14]

FreeStyle Libre software reports tour

Abbott. FreeStyle Libre software reports tour. LibreView. 2024. URL: https://www.libreview.com/files/documents/en- GB/FSReportTour_2024-08-19.pdf [accessed 2026-04-02]

2024
[15]

Patient experiences of continuous glucose monitoring and sensor-augmented insulin pump therapy for diabetes: a systematic review of qualitative studies

Natale P, et al. Patient experiences of continuous glucose monitoring and sensor-augmented insulin pump therapy for diabetes: a systematic review of qualitative studies. J Diabetes . 2023;15(12):1048 -1069. doi:10.1111/1753-0407.13454. PMID:37551735

work page doi:10.1111/1753-0407.13454 2023
[16]

Patients' and caregivers' experiences of using continuous glucose monitoring to support diabetes self - management: qualitative study

Lawton J, Blackburn M, Allen J, et al. Patients' and caregivers' experiences of using continuous glucose monitoring to support diabetes self - management: qualitative study. BMC Endocr Disord . 2018;18(1):12. doi:10.1186/s12902-018-0239-1. PMID:29458348

work page doi:10.1186/s12902-018-0239-1 2018
[17]

Glucose interpretation meaning and action (GIMA): insights to blood glucose user interface interpretation in type 1 diabetes

Kongdee R, Parsia B, Thabit H, Harper S. Glucose interpretation meaning and action (GIMA): insights to blood glucose user interface interpretation in type 1 diabetes. Digit Health . 2025;11:20552076251332580. doi:10.1177/20552076251332580. PMID:40351844

work page doi:10.1177/20552076251332580 2025
[18]

Chapter 3: Diabetes distress [Internet]

Diabetes UK. Chapter 3: Diabetes distress [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/for-professionals/improving- care/good-practice/psychological-care/emotional-health-professionals- guide/chapter-3-diabetes-distress. Accessed 2026 Mar 17

2026
[19]

Diabetes and depression

Holt RI, de Groot M, Golden SH. Diabetes and depression. Curr Diab Rep. 2014 Jun;14(6):491. doi: 10.1007/s11892 -014-0491-3. PMID: 24743941; PMCID: PMC4476048

work page doi:10.1007/s11892 2014
[20]

Links between diabetes and depression [Internet]

Diabetes UK. Links between diabetes and depression [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/living-with- diabetes/emotional-wellbeing/depression. Accessed 2026 Mar 17

2026
[21]

Large language models for diabetes care: potentials and prospects

Sheng B, Guan Z, Lim LL, Jiang Z, Mathioudakis N, Li J, et al. Large language models for diabetes care: potentials and prospects. Sci Bull (Beijing) . 2024;69(5):583-588. doi:10.1016/j.scib.2024.01.004. PMID:38220476

work page doi:10.1016/j.scib.2024.01.004 2024
[22]

Large language models in clinical trials: applications, technical advances, and future directions

Lin A, Wang Z, Jiang A, Chen L, Qi C, Zhu L, et al. Large language models in clinical trials: applications, technical advances, and future directions. BMC Med. 2025;23(1):563. doi:10.1186/s12916-025-04348-9. PMID:41088200

work page doi:10.1186/s12916-025-04348-9 2025
[23]

Large language models for mental health applications: systematic review

Guo Z, Lai A, Thygesen JH, Farrington J, Keen T , Li K. Large language models for mental health applications: systematic review. JMIR Ment Health . 2024;11:e57400. doi:10.2196/57400. PMID:39423368

work page doi:10.2196/57400 2024
[24]

Embracing the future of medical education with large language model –based virtual patients: scoping review

Zeng J, Qi W, Shen S, Liu X, Li S, Wang B, et al. Embracing the future of medical education with large language model –based virtual patients: scoping review. J Med Internet Res . 2025;27:e79091. doi:10.2196/79091. PMID:41232097

work page doi:10.2196/79091 2025
[25]

Knowledge -practice performance gap in clinical large language models: systematic review of 39 benchmarks

Gong EJ, Bang CS, Lee JJ, Baik GH. Knowledge -practice performance gap in clinical large language models: systematic review of 39 benchmarks. J Med Internet Res. 2025;27:e84120. doi:10.2196/84120. PMID:41325597

work page doi:10.2196/84120 2025
[26]

Development and evaluation of HopeBot: an LLM -based chatbot for structured and interactive PHQ -9 depression screening

Guo Z, Lai A, Ive J, Petcu A, Wang Y, Qi L, et al. Development and evaluation of HopeBot: an LLM -based chatbot for structured and interactive PHQ -9 depression screening. arXiv [Preprint]. 2025. Available from: https://doi.org/10.48550/arXiv.2507.05984. Accessed 2026 Mar 17

work page doi:10.48550/arxiv.2507.05984 2025
[27]

The effectiveness of a custom AI chatbot for type 2 diabetes mellitus health literacy: development and evaluation study

Kelly A, Noctor E, Ryan L, van de Ven P. The effectiveness of a custom AI chatbot for type 2 diabetes mellitus health literacy: development and evaluation study. J Med Internet Res. 2025;27:e70131. doi:10.2196/70131. PMID:40324160

work page doi:10.2196/70131 2025
[28]

Parameswaran V, Bernard J, Bernard A, Deo N, Tsung S, Lyytinen K, et al. Evaluating large language models and retrieval -augmented generation enhancement for delivering guideline -adherent nutrition information for cardiovascular disease prevention: cross -sectional study. J Med Internet Res. 2025;27:e78625. doi:10.2196/78625. PMID:41057043

work page doi:10.2196/78625 2025
[29]

Generative AI chatbot for diabetes management: formative 2-part qualitative study using DTalksBot involving patients and clinicians

Jeon S, Lee S, Kim EH, Eun J, Lee K, Lim H, Lee J. Generative AI chatbot for diabetes management: formative 2-part qualitative study using DTalksBot involving patients and clinicians. JMIR Form Res . 2025;9:e72553. doi:10.2196/72553. PMID:41223424

work page doi:10.2196/72553 2025
[31]

LLM -CGM: a benchmark for large language model - enabled querying of continuous glucose monitoring data for conversational diabetes management

Healey E, Kohane IS. LLM -CGM: a benchmark for large language model - enabled querying of continuous glucose monitoring data for conversational diabetes management. Pac Symp Biocomput . 2025;30:82 -93. doi:10.1142/9789819807024_0007. PMID:39670363

work page doi:10.1142/9789819807024_0007 2025
[32]

A fully automated artificial intelligence method for non-invasive, imaging-based identification of gene tic alterations in glioblastomas

Healey E, Tan ALM, Flint KL, Ruiz JL, Kohane IS. A case study on using a large language model to analyze continuous glucose monitoring data. Sci Rep. 2025;15(1):1143. doi:10.1038/s41598 -024-84003-0. PMID:39774031

work page doi:10.1038/s41598 2025
[33]

Diabetes education and support tele-visit needs differ in duration, content, and satisfaction in older versus younger adults

Greenfield M, Stuber D, Stegman -Barber D, Kemmis K, Matthews B, Feuerstein-Simon CB, et al. Diabetes education and support tele-visit needs differ in duration, content, and satisfaction in older versus younger adults. Telemed Rep. 2022;3(1):107 -116. doi:10 .1089/tmr.2022.0007. PMID:35720451

work page arXiv 2022
[34]

Applied techniques for putting pre-visit planning in clinical practice to empower patient -centered care in the pandemic era: a systematic review and framework suggestion

Gholamzadeh M, Abtahi H, Ghazisaeeidi M. Applied techniques for putting pre-visit planning in clinical practice to empower patient -centered care in the pandemic era: a systematic review and framework suggestion. BMC Health Serv Res. 2021;21(1):458. doi:10. 1186/s12913-021-06456-7. PMID:33985502

2021
[35]

Expert recommendations for using time -in- range and other continuous glucose monitoring metrics to achieve patient- centered glycemic control in peopl e with diabetes

Bellido V, Aguilera E, Cardona-Hernandez R, Diaz-Soto G, Gonzalez Perez de Villar N, Picon-Cesar MJ, et al. Expert recommendations for using time -in- range and other continuous glucose monitoring metrics to achieve patient- centered glycemic control in peopl e with diabetes. J Diabetes Sci Technol . 2023;17(5):1326-1336. doi:10.1177/19322968221088601. PM...

work page doi:10.1177/19322968221088601 2023
[36]

What approvals and decisions do I need? Health Research Authority

Health Research Authority. What approvals and decisions do I need? Health Research Authority. URL: https://www.hra.nhs.uk/approvals - amendments/what-approvals-do-i-need/ [accessed 2026-04-02]

2026
[37]

Diabetes Datasets -ShanghaiT1DM and ShanghaiT2DM [dataset]

Zhu J. Diabetes Datasets -ShanghaiT1DM and ShanghaiT2DM [dataset]. figshare. 2022. doi:10.6084/m9.figshare.20444397.v3

work page doi:10.6084/m9.figshare.20444397.v3 2022
[38]

Charlotte, NC: University of North Carolina at Charlotte

ioT1DM Dataset [dataset on the internet]. Charlotte, NC: University of North Carolina at Charlotte. Available from: https://webpages.charlotte.edu/rbunescu/data/ohiot1dm/OhioT1DM- dataset.html. Accessed 2026 Mar 17

2026
[39]

International consensus on use of continuous glucose monitoring

Danne T, Nimri R, Battelino T, Bergenstal RM, Close KL, DeVries JH, et al. International consensus on use of continuous glucose monitoring. Diabetes Care. 2017;40(12):1631-1640. doi:10.2337/dc17-1600. PMID:29162583

work page doi:10.2337/dc17-1600 2017
[40]

Utilizing the new glucometrics: a practical guide to ambulatory glucose profile interpretation

Doupis J, Horton ES. Utilizing the new glucometrics: a practical guide to ambulatory glucose profile interpretation. touchREV Endocrinol . 2022;18(1):20-26. doi:10.17925/EE.2022.18.1.20. PMID:35949362

work page doi:10.17925/ee.2022.18.1.20 2022
[41]

Stepwise approach to continuous glucose monitoring interpretation for internists and family physicians

Szmuilowicz ED, Aleppo G. Stepwise approach to continuous glucose monitoring interpretation for internists and family physicians. Postgrad Med. 2022;134(8):743 -751. doi:10.1080/00325481.2022.2110507. PMID:35930313

work page doi:10.1080/00325481.2022.2110507 2022
[42]

Quick guide: interpreting CGM data [Internet]

DiabetesontheNet. Quick guide: interpreting CGM data [Internet]. 2020. Available from: https://diabetesonthenet.com/journal-diabetes- nursing/quick-guide-interpreting-cgm-data/. Accessed 2026 Mar 17

2020
[43]

Quality statement 4: continuous glucose monitoring for adults who use insulin and need help monitoring their blood glucose [Internet]

NICE. Quality statement 4: continuous glucose monitoring for adults who use insulin and need help monitoring their blood glucose [Internet]. 2023. Available from: https://www.nice.org.uk/guidance/qs209/chapter/Quality- statement-4-Continuous-glucose-monitoring-for-adults-who-use-insulin- and-need-help-monitoring-their-blood-glucose. Accessed 2026 Mar 17

2023
[44]

AGP report [Internet]

Accu-Chek. AGP report [Internet]. Available from: https://www.accu- chek.co.uk/training/cgm/agp-report. Accessed 2026 Mar 17

2026
[45]

Managing diabetes [Internet]

National Institute of Diabetes and Digestive and Kidney Diseases. Managing diabetes [Internet]. Available from: https://www.niddk.nih.gov/health- information/diabetes/overview/managing-diabetes. Accessed 2026 Mar 17

2026
[46]

Diabetes: what it is, causes, symptoms, treatment and types [Internet]

Cleveland Clinic. Diabetes: what it is, causes, symptoms, treatment and types [Internet]. 2023 Feb 17. Available from: https://my.clevelandclinic.org/health/diseases/7104-diabetes. Accessed 2026 Mar 17

2023
[47]

Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range

Battelino T , Danne T , Bergenstal RM, Amiel SA, Beck R, Biester T , et al. Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range. Diabetes Care . 2019;42(8):1593 -1603. doi:10.2337/dci19 -0028. PMID:31177185

work page doi:10.2337/dci19 2019
[48]

Type 2 diabetes in adults: management [Internet]

NICE. Type 2 diabetes in adults: management [Internet]. 2015 Dec 2. Available from: https://www.nice.org.uk/guidance/ng28. Accessed 2026 Mar 17

2015
[49]

ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 5. Facilitating positive health behaviors and well-being to improve health outcomes: standards of care in diabetes -2023. Diabetes Care . 2023;46(Suppl 1):S68-S96. doi:10.2337/dc23-S005. PMID:36507648

work page doi:10.2337/dc23-s005 2023
[50]

What is diabetes distress and burnout? [Internet]

Diabetes UK. What is diabetes distress and burnout? [Internet]. Available from: https://www.diabetes.org.uk/living-with-diabetes/emotional- wellbeing/diabetes-burnout. Accessed 2026 Mar 17

2026
[51]

Diabetes and your emotions [Internet]

Diabetes UK . Diabetes and your emotions [Internet]. Available from: https://www.diabetes.org.uk/living-with-diabetes/emotional-wellbeing. Accessed 2026 Mar 17

2026
[52]

10 tips to ease diabetes stress [Internet]

American Diabetes Association. 10 tips to ease diabetes stress [Internet]. Available from: https://diabetes.org/health-wellness/mental- health/ease-diabetes-care-stress. Accessed 2026 Mar 17

2026
[53]

MedDialog: large-scale medical dialogue datasets

Zeng G, Yang W, Ju Z, Yang M, Zhang J, Zhou H, et al. MedDialog: large-scale medical dialogue datasets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg, PA: Association for Computational Linguistics; 2020:9241 -9250. doi:10.18653/v1/2020.emnlp-main.743

work page doi:10.18653/v1/2020.emnlp-main.743 2020
[54]

OpenAIEmbeddings integration [Internet]

LangChain. OpenAIEmbeddings integration [Internet]. Available from: https://docs.langchain.com/oss/python/integrations/text_embedding/ope nai. Accessed 2026 Mar 17

2026
[55]

Faiss [Internet]

Meta AI. Faiss [Internet]. Available from: https://ai.meta.com/tools/faiss/. Accessed 2026 Mar 17

2026
[56]

Application of chatbots to help patients self-manage diabetes: systematic review and meta-analysis

Wu Y, Zhang J, Ge P, Duan T, Zhou J, Wu Y, et al. Application of chatbots to help patients self-manage diabetes: systematic review and meta-analysis. J Med Internet Res. 2024;26:e60380. doi:10.2196/60380. PMID:39626235

work page doi:10.2196/60380 2024
[57]

Journal of Chiropractic Medicine , author =

Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med . 2016;15(2):155 -163. doi:10.1016/j.jcm.2016.02.012. PMID:27330520

work page doi:10.1016/j.jcm.2016.02.012 2016
[58]

Using cluster bootstrapping to analyze nested data with a few clusters

Huang FL. Using cluster bootstrapping to analyze nested data with a few clusters. Educ Psychol Meas. 2018;78(2):297-318

2018
[59]

Mixed-effects model: a useful statistical tool for longitudinal and cluster studies

Silveira L TY, Ferreira JC, Patino CM. Mixed-effects model: a useful statistical tool for longitudinal and cluster studies. J Bras Pneumol . 2023;49(2):e20230137. doi:10.36416/1806 -3756/e20230137. PMID:37194822

work page doi:10.36416/1806 2023
[60]

Should we use one -sided or two -sided P values in tests of significance? Clin Exp Pharmacol Physiol

Ludbrook J. Should we use one -sided or two -sided P values in tests of significance? Clin Exp Pharmacol Physiol . 2013;40(6):357 -361. doi:10.1111/1440-1681.12086. PMID:23551169

work page doi:10.1111/1440-1681.12086 2013
[61]

Descriptive statistics and normality tests for statistical data

Mishra P , Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth . 2019;22(1):67-72. doi:10.4103/aca.ACA_157_18. PMID:30648682

work page doi:10.4103/aca.aca_157_18 2019
[62]

Wilcoxon signed ranks test [Internet]

ScienceDirect Topics. Wilcoxon signed ranks test [Internet]. Available from: https://www.sciencedirect.com/topics/medicine-and-dentistry/wilcoxon- signed-ranks-test. Accessed 2026 Mar 17

2026
[63]

Wald tests of singular hypotheses

Drton M, Xiao H. Wald tests of singular hypotheses. Bernoulli. 2016;22(1):38-59. doi:10.3150/14-BEJ620

work page doi:10.3150/14-bej620 2016
[64]

The Turing test: the first 50 years

French RM. The Turing test: the first 50 years. Trends Cogn Sci . 2000;4(3):115-122. doi:10.1016/S1364 -6613(00)01453-4. PMID:10689346

work page doi:10.1016/s1364 2000
[65]

The binomial test [Internet]

Technology Networks. The binomial test [Internet]. 2024 Mar 26. Available from: http://www.technologynetworks.com/informatics/articles/the- binomial-test-366022. Accessed 2026 Mar 17

2024
[66]

Denniston, Melanie J

Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT -AI extension. Nat Med . 2020;26(9):1351 -1363. doi:10.1038/s41591-020-1037-7. PMID:32908284

work page doi:10.1038/s41591-020-1037-7 2020
[67]

Kruskal -Wallis H -test for oneway analysis of variance (ANOVA) by ranks

MacFarland TW, Yates JM. Kruskal -Wallis H -test for oneway analysis of variance (ANOVA) by ranks. In: MacFarland TW, Yates JM, editors. Introduction to nonparametric statistics for the biological sciences using R . Cham, Switzerland: Springer International Publishing; 2016:177 -211. doi:10.1007/978-3-319-30634-6_6. ISBN:978-3-319-30633-9

work page doi:10.1007/978-3-319-30634-6_6 2016
[68]

Foundation models and intelligent decision -making: progress, challenges, and perspectives

Huang J, et al. Foundation models and intelligent decision -making: progress, challenges, and perspectives. Innovation (Camb) . 2025;6(6):100948. doi:10.1016/j.xinn.2025.100948. PMID:40528892

work page doi:10.1016/j.xinn.2025.100948 2025
[69]

Artificial intelligence tools in supporting healthcare professionals for tailored patient care

Kim J, et al. Artificial intelligence tools in supporting healthcare professionals for tailored patient care. NPJ Digit Med . 2025;8(1):210. doi:10.1038/s41746-025-01604-3. PMID:40240489

work page doi:10.1038/s41746-025-01604-3 2025
[70]

Wearable devices and AI -driven remote monitoring in cardiovascular medicine: a narrative review

Gaoudam N, Sakhamudi SK, Kamal B, Addla N, Reddy EP, Ambala M, et al. Wearable devices and AI -driven remote monitoring in cardiovascular medicine: a narrative review. Cureus. 2025;17(8):e90208. doi:10.7759/cureus.90208. PMID:40964568

work page doi:10.7759/cureus.90208 2025
[71]

Challenges and recommendations for wearable devices in digital health: data quality, interoperability, health equity, fairness

Canali S, Schiaffonati V, Aliverti A. Challenges and recommendations for wearable devices in digital health: data quality, interoperability, health equity, fairness. PLOS Digit Health. 2022;1(10):e0000104. doi:10.1371/journal.pdig.0000104. PMID:36812619

work page doi:10.1371/journal.pdig.0000104 2022
[72]

Investigating sources of inaccuracy in wearable optical heart rate sensors

Bent B, Goldstein BA, Kibbe WA, Dunn JP. Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ Digit Med. 2020;3:18. doi:10.1038/s41746-020-0226-6

work page doi:10.1038/s41746-020-0226-6 2020
[73]

Artificial intelligence in mobile health applications: a comprehensive review of its role in diabetes care

Li WJ, Li LZ. Artificial intelligence in mobile health applications: a comprehensive review of its role in diabetes care. World J Methodol . 2026;16(1):107488. doi:10.5662/wjm.v16.i1.107488. PMID:41809156

work page doi:10.5662/wjm.v16.i1.107488 2026
[74]

Limits to the evaluation of the accuracy of continuous glucose monitoring systems by clinical trials

Schrangl P, Reiterer F, Heinemann L, Freckmann G, del Re L. Limits to the evaluation of the accuracy of continuous glucose monitoring systems by clinical trials. Biosensors (Basel). 2018;8(2):50. doi:10.3390/bios8020050. PMID:29783669

work page doi:10.3390/bios8020050 2018
[75]

GLP -1 agonists [Internet]

Diabetes UK. GLP -1 agonists [Internet]. Diabetes UK. Available from: https://www.diabetes.org.uk/about-diabetes/looking-after- diabetes/treatments/tablets-and-medication/glp-1. Accessed 2026 Mar 17

2026
[76]

Inter -observer agreement and reliability assessment for observational studies of clinical work

Walter SR, Dunsmuir WTM, Westbrook JI. Inter -observer agreement and reliability assessment for observational studies of clinical work. J Biomed Inform. 2019;100:103317. doi:10.1016/j.jbi.2019.103317. PMID:31654801. Multimedia Appendix 1 Patient Name: Steven (ID: 1002) Date: Based on CGM Data – May 2021

work page doi:10.1016/j.jbi.2019.103317 2019
[77]

Demographics & Medical History • Age: 36 years • Sex: Male • Height / Weight / BMI: 172 cm / 68 kg / BMI: 23.0 • Occupation: Graphic designer (mostly sedentary, 8+ hours/day screen time) • Living Situation: Lives with partner; meals are prepared at home on weekdays, dine-out on weekends • Diabetes Type: Type 1 Diabetes Mellitus (diagnosed at age 30) • Dur...

2020
[78]

Treatment & Medication • Insulin Regimen: o Delivery: Continuous Subcutaneous Insulin Infusion (CSII) – Novolin R o Basal Insulin: ▪ Mean basal rate: 0.68 IU/h ▪ Median basal rate: 0.7 IU/h o Bolus Insulin: ▪ Total bolus injections recorded: 28 ▪ Mean bolus dose: 5.46 IU ▪ Administered manually before meals based on experience (no auto calculator) • Other...
[79]

CGM Monitoring Summary Monitoring duration: 11 days CGM wear time: 89.8% Total glucose readings: 948 Metric Value Recommended Target* Mean glucose 136.0 mg/dL (7.6 mmol/L) — GMI (Estimated HbA1c) 6.56% — Standard deviation 59.9 mg/dL (3.3 mmol/L) — Coefficient of variation 44.0% <36% Time in Range (70–180 mg/dL [3.9–10.0 mmol/L]) 74.8% >70% Time Below Ran...
[80]

Lifestyle & Daily Routine • Diet: o Weekdays: three regular meals at home ▪ Breakfast (8:30 AM): oatmeal with milk, egg or toast ▪ Lunch (1:00 PM): rice/noodles with vegetables and meat ▪ Dinner (7:00 PM): usually stir-fried dishes with rice or soup o Weekends: dine out (hotpot, fast food, or noodles), heavier carbohydrate load o Occasional afternoon snac...
[81]

Self-Management Behavior • Insulin Administration: o Comfortable using CSII independently o Adjusts basal settings when necessary (e.g., illness, heavy meals) o Does not use bolus calculator; estimates boluses based on food size/type o Fingerstick glucose checks: 2–3 times/week, especially to confirm CGM readings <70 mg/dL (<3.9 mmol/L) and to rule out fa...

Showing first 80 references.