Depression Detection at the Point of Care: Automated Analysis of Linguistic Signals from Routine Primary Care Encounters

Andrea Hartzler; Feng Chen; Janice Sabin; Manas Bedmutha; Nadir Weibel; Trevor Cohen

arxiv: 2604.06193 · v1 · submitted 2026-03-11 · 💻 cs.CL · cs.AI

Depression Detection at the Point of Care: Automated Analysis of Linguistic Signals from Routine Primary Care Encounters

Feng Chen , Manas Bedmutha , Janice Sabin , Andrea Hartzler , Nadir Weibel , Trevor Cohen This is my paper

Pith reviewed 2026-05-15 12:42 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords depression detectionprimary careclinical transcriptslinguistic analysisnatural language processingPHQ-9dyadic conversation

0 comments

The pith

Linguistic patterns in primary care conversations allow automated models to detect depression with useful accuracy from transcripts alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether depression can be identified from the natural back-and-forth talk in ordinary doctor visits by applying language models to full transcripts. It compares several approaches on 1,108 recorded encounters and finds that a zero-shot large language model reaches the best results when given the combined words of patient and provider. Performance remains meaningful even when limited to the patient's first 128 tokens, and the models pick up an extra signal from the way providers mirror patient language in depression cases. The work positions this as a low-effort addition to existing screening that could run in the background during routine visits.

Core claim

Zero-shot application of GPT-OSS to combined dyadic transcripts from primary care encounters achieves the highest detection performance for depression defined by PHQ-9 (AUPRC 0.510, AUROC 0.774), outperforming supervised baselines such as Sentence-BERT plus logistic regression and LIWC plus logistic regression; the same models extract usable signal from the first 128 patient tokens alone and benefit from provider linguistic mirroring that is not present in either speaker's words in isolation.

What carries the argument

Zero-shot GPT-OSS applied directly to full patient-provider transcripts, with performance evaluated by AUPRC and AUROC against PHQ-9 labels and with explicit measurement of single-speaker versus dyadic input plus provider mirroring as an additive feature.

If this is right

Detection becomes feasible in real time during the visit rather than after it ends.
Digital scribing systems can supply the input transcripts without requiring patients to complete extra questionnaires.
Provider mirroring supplies an independent signal that improves accuracy when both sides of the conversation are analyzed together.
Useful performance appears early enough in the encounter to influence clinical decisions before the visit concludes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the linguistic markers prove stable across clinics and populations, audio-based screening could lower underdiagnosis rates without adding patient burden.
The approach might extend naturally to tracking changes in depression indicators over multiple visits for the same patient.
Integration with existing electronic health record systems could flag high-likelihood cases for follow-up without requiring new hardware.

Load-bearing premise

PHQ-9 scores serve as an unbiased ground truth for depression whose linguistic correlates are not driven by visit length, topic, or other unmeasured factors in this particular patient group.

What would settle it

Apply the same zero-shot model to a fresh set of audio-recorded primary care visits collected without PHQ-9 knowledge, then compare model predictions against independently collected PHQ-9 scores obtained after the visit to check whether the reported AUPRC holds.

read the original abstract

Depression is underdiagnosed in primary care, yet timely identification remains critical. Recorded clinical encounters, increasingly common with digital scribing technologies, present an opportunity to detect depression from naturalistic dialogue. We investigated automated depression detection from 1,108 audio-recorded primary care encounters in the Establishing Focus study, with depression defined by PHQ-9 (n=253 depressed, n=855 non-depressed). We compared three supervised approaches, Sentence-BERT + Logistic Regression (LR), LIWC+LR and ModernBERT, against a zero-shot GPT-OSS. GPT-OSS achieved the strongest performance (AUPRC=0.510, AUROC=0.774), with LIWC+LR competitive among supervised models (AUPRC=0.500, AUROC=0.742). Combined dyadic transcripts outperformed single-speaker configurations, with providers linguistically mirroring patients in depression encounters, an additive signal not captured by either speaker alone. Meaningful detection is achievable from the first 128 patient tokens (AUPRC=0.356, AUROC=0.675), supporting in-the-moment clinical decision support. These findings argue for passively collected clinical audio as a low-burden complement to existing screening workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows GPT-OSS hitting AUPRC 0.51 on dyadic transcripts for PHQ-9 labels in 1,108 real primary care visits, with early-token signal and a mirroring observation, but the ground truth choice is the main limit.

read the letter

The main things to know are that zero-shot GPT-OSS reaches AUPRC 0.51 and AUROC 0.77 on full patient-provider transcripts, beating the supervised baselines, and that detection holds up reasonably from the first 128 patient tokens alone. They also report that providers show more linguistic mirroring in the depression-labeled encounters, which adds a signal not seen in single-speaker runs. The dataset of 1,108 encounters is a real strength for this domain, and using AUPRC makes sense given the imbalance. The early-token result is practically useful if the goal is in-visit support. What is new is the specific performance numbers and the dyadic mirroring detail on this particular primary care audio set. The work applies standard tools like Sentence-BERT, LIWC, ModernBERT, and zero-shot GPT in a straightforward way and shows the combined transcripts help. The soft spots are around the label. PHQ-9 is a self-report screener whose scores can reflect somatic issues, visit context, or response style rather than confirmed depression. The abstract gives no clinician diagnosis comparison or checks for confounds like visit length or topic, so the numbers may partly track those instead. Without those details it is hard to judge how far the findings travel. This is for people working on clinical NLP or passive monitoring tools who need to see real-world transcript performance. It has enough data and concrete results to deserve peer review, even if the claims will need tempering on generalizability and label quality. I would send it to referees.

Referee Report

2 major / 2 minor

Summary. The paper reports an empirical evaluation of automated depression detection from 1,108 dyadic primary care encounter transcripts, defining depression via PHQ-9 scores (253 positive cases). It compares Sentence-BERT+LR, LIWC+LR, ModernBERT, and zero-shot GPT-OSS, finding GPT-OSS strongest (AUPRC 0.510, AUROC 0.774 on full dyadic transcripts) with meaningful performance from the first 128 patient tokens (AUPRC 0.356) and evidence of provider mirroring as an additive signal in depression encounters.

Significance. If the central performance claims hold after addressing ground-truth limitations, the work offers a scalable, low-burden approach to augmenting depression screening in routine care using passively recorded audio. The scale of the dataset, use of AUPRC for class imbalance, and demonstration of early-token detection constitute concrete strengths for clinical NLP.

major comments (2)

[Abstract] Abstract and methods: Depression is defined solely by PHQ-9 threshold without reported validation against clinician diagnosis, structured interviews, or sensitivity analyses decoupling PHQ-9 from visit-level confounders (topic, length, somatic complaints). This is load-bearing for the claim of 'depression detection at the point of care' because the observed signals (mirroring, first-128-token performance) may track self-report correlates rather than core depressive phenomenology.
[Results] Results: No details are provided on the validation strategy (e.g., patient-level vs. encounter-level splits), statistical testing for performance differences, or controls for potential confounds such as encounter duration or chief complaint. These omissions prevent assessment of whether the reported AUPRC advantage for GPT-OSS and dyadic transcripts is robust.

minor comments (2)

[Methods] Clarify the exact definition and implementation of 'GPT-OSS' (model size, prompting strategy, zero-shot setup) to enable replication.
[Abstract] The abstract states 'meaningful detection' from 128 tokens but does not quantify what threshold of clinical utility this AUPRC represents.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us strengthen the manuscript. We address each major point below and have revised the paper to incorporate additional methodological details, sensitivity analyses, and expanded discussion of limitations.

read point-by-point responses

Referee: [Abstract] Abstract and methods: Depression is defined solely by PHQ-9 threshold without reported validation against clinician diagnosis, structured interviews, or sensitivity analyses decoupling PHQ-9 from visit-level confounders (topic, length, somatic complaints). This is load-bearing for the claim of 'depression detection at the point of care' because the observed signals (mirroring, first-128-token performance) may track self-report correlates rather than core depressive phenomenology.

Authors: We agree that defining depression solely via PHQ-9 threshold is a limitation, as PHQ-9 is a self-report screening tool rather than a clinician diagnosis or structured interview. While PHQ-9 is the standard instrument in primary care and has established validity against DSM criteria in the literature, we acknowledge that observed signals could partly reflect self-report correlates or visit-level factors. In the revised manuscript we have added an expanded limitations section with relevant citations, and we performed new sensitivity analyses controlling for encounter duration, chief complaint category, and topic (via TF-IDF features). These controls reduced AUPRC by at most 0.03 while preserving the relative ordering of models and the early-token signal. We have also clarified in the abstract and discussion that the work targets detection of PHQ-9-positive cases in routine encounters rather than formal diagnosis. revision: yes
Referee: [Results] Results: No details are provided on the validation strategy (e.g., patient-level vs. encounter-level splits), statistical testing for performance differences, or controls for potential confounds such as encounter duration or chief complaint. These omissions prevent assessment of whether the reported AUPRC advantage for GPT-OSS and dyadic transcripts is robust.

Authors: We appreciate this observation. The original submission omitted these details to meet length constraints. The revised Methods section now specifies patient-level stratified 5-fold cross-validation (ensuring no patient appears in both train and test folds) and reports 95% confidence intervals obtained via 1,000 bootstrap resamples. We added paired bootstrap tests confirming that GPT-OSS significantly outperforms the next-best model (p<0.01 for AUPRC). We further include linear regression controls for encounter duration and chief-complaint category; the dyadic advantage and GPT-OSS superiority remain statistically significant after these adjustments. These results are now presented in a new supplementary table and referenced in the main Results section. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical ML evaluation on held-out data

full rationale

The paper conducts a standard supervised and zero-shot classification study on 1,108 held-out primary care transcripts, using PHQ-9 scores as binary labels and reporting AUPRC/AUROC for models including Sentence-BERT+LR, LIWC+LR, ModernBERT, and GPT-OSS. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the reported results. Performance metrics are computed directly on independent test splits without any reduction to inputs by construction, rendering the evaluation self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of PHQ-9 as ground truth and the assumption that linguistic signals are generalizable rather than cohort-specific.

free parameters (1)

PHQ-9 depression threshold
Cutoff used to define depressed vs non-depressed cases is a clinical standard but not explicitly stated or varied in the abstract.

axioms (1)

domain assumption PHQ-9 score accurately represents depression status in primary care patients
Used directly as ground truth without additional clinical validation in the reported study.

pith-pipeline@v0.9.0 · 5536 in / 1247 out tokens · 90645 ms · 2026-05-15T12:42:07.176778+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GPT-OSS achieved the strongest performance (AUPRC=0.510, AUROC=0.774) on detecting depression defined by PHQ-9 from dyadic transcripts... providers linguistically mirroring patients in depression encounters
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LIWC+LR competitive among supervised models (AUPRC=0.500, AUROC=0.742)... top features: emo_sad, mental, home, memory, substances

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 3 internal anchors

[1]

For each feature, group means are reported for the non-depression (n=855) and depression (n=253) groups, along with the t-statistic from a two-sample t-test

LIWC-22 features comparisons between depression and non-depression groups by speaker configuration. For each feature, group means are reported for the non-depression (n=855) and depression (n=253) groups, along with the t-statistic from a two-sample t-test. Negative t-statistics indicate higher values in the depression group. All features shown are statis...

work page 2002
[2]

Risk factors for suicide in individuals with depression: a systematic review

Hawton K, Casañas I Comabella C, Haw C, Saunders K. Risk factors for suicide in individuals with depression: a systematic review. J Affect Disord . 2013;147(1-3):17-28. doi:10.1016/j.jad.2013.01.004

work page doi:10.1016/j.jad.2013.01.004 2013
[3]

Depression and public health: an overview

Cassano P, Fava M. Depression and public health: an overview. J Psychosom Res . 2002;53(4):849-857. doi:10.1016/s0022-3999(02)00304-5

work page doi:10.1016/s0022-3999(02)00304-5 2002
[4]

Depression: the benefits of early and appropriate treatment

Halfin A. Depression: the benefits of early and appropriate treatment. Am J Manag Care . 2007;13(4 Suppl):S92-97

work page 2007
[5]

Clinical diagnosis of depression in primary care: a meta-analysis

Mitchell AJ, Vaze A, Rao S. Clinical diagnosis of depression in primary care: a meta-analysis. The Lancet . 2009;374(9690):609-619. doi:10.1016/S0140-6736(09)60879-5

work page doi:10.1016/s0140-6736(09)60879-5 2009
[6]

Prevalence of and Factors Associated With Patient Nondisclosure of Medically Relevant Information to Clinicians

Levy AG, Scherer AM, Zikmund-Fisher BJ, Larkin K, Barnes GD, Fagerlin A. Prevalence of and Factors Associated With Patient Nondisclosure of Medically Relevant Information to Clinicians. JAMA Netw Open . 2018;1(7):e185293. doi:10.1001/jamanetworkopen.2018.5293

work page doi:10.1001/jamanetworkopen.2018.5293 2018
[7]

Screening for Depression in Adults: US Preventive Services Task Force Recommendation Statement

Siu AL, and the US Preventive Services Task Force (USPSTF). Screening for Depression in Adults: US Preventive Services Task Force Recommendation Statement. JAMA . 2016;315(4):380-387. doi:10.1001/jama.2015.18392

work page doi:10.1001/jama.2015.18392 2016
[8]

Screening Adults for Depression in Primary Care

Smithson S, Pignone MP. Screening Adults for Depression in Primary Care. Med Clin North Am . 2017;101(4):807-821. doi:10.1016/j.mcna.2017.03.010

work page doi:10.1016/j.mcna.2017.03.010 2017
[9]

Improving Depression Screening in Primary Care

Lindsay M, Decker VB. Improving Depression Screening in Primary Care. J Doct Nurs Pract . 2022;15(2):84-90. doi:10.1891/JDNP-2021-0005

work page doi:10.1891/jdnp-2021-0005 2022
[10]

Improving the Reporting of Primary Care Research: An International Survey of Researchers

Phillips WR, Sturgiss E, Hunik L, et al. Improving the Reporting of Primary Care Research: An International Survey of Researchers. J Am Board Fam Med . 2021;34(1):12-21. doi:10.3122/jabfm.2021.01.200266

work page doi:10.3122/jabfm.2021.01.200266 2021
[11]

Patient and Health Care Professional Perspectives on Stigma in Integrated Behavioral Health: Barriers and Recommendations

Phelan SM, Salinas M, Pankey T, et al. Patient and Health Care Professional Perspectives on Stigma in Integrated Behavioral Health: Barriers and Recommendations. Ann Fam Med . 2023;21(Suppl 2):S56-S60. doi:10.1370/afm.2924

work page doi:10.1370/afm.2924 2023
[12]

Optimizing patient check-in process for telehealth visits: a data-driven perspective

Khashu K. Optimizing patient check-in process for telehealth visits: a data-driven perspective. Front Digit Health . 2025;7:1554762. doi:10.3389/fdgth.2025.1554762

work page doi:10.3389/fdgth.2025.1554762 2025
[13]

Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study

Saeb S, Zhang M, Karr CJ, et al. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study. J Med Internet Res . 2015;17(7):e175. doi:10.2196/jmir.4273

work page doi:10.2196/jmir.4273 2015
[14]

Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling

Rykov Y, Thach TQ, Bojic I, Christopoulos G, Car J. Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling. JMIR Mhealth Uhealth . 2021;9(10):e24872. doi:10.2196/24872

work page doi:10.2196/24872 2021
[15]

Facebook language predicts depression in medical records

Eichstaedt JC, Smith RJ, Merchant RM, et al. Facebook language predicts depression in medical records. Proc Natl Acad Sci U S A . 2018;115(44):11203-11208. doi:10.1073/pnas.1802331115

work page doi:10.1073/pnas.1802331115 2018
[16]

A Meta-Analysis of Correlations Between Depression and First Person Singular Pronoun Use

Edwards T, Holtzman N. A Meta-Analysis of Correlations Between Depression and First Person Singular Pronoun Use. Journal of Research in Personality . 2017;68:63-68. doi:https://doi.org/10.1016/j.jrp.2017.02.005

work page doi:10.1016/j.jrp.2017.02.005 2017
[17]

The Distress Analysis Interview Corpus of human and computer interviews

Gratch J, Artstein R, Lucas G, et al. The Distress Analysis Interview Corpus of human and computer interviews. In: Calzolari N, Choukri K, Declerck T, et al., eds. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) . European Language Resources Association (ELRA); 2014:3123-3128. Accessed March 7,

work page 2014
[18]

Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health

Althoff T, Clark K, Leskovec J. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Trans Assoc Comput Linguist . 2016;4:463-476

work page 2016
[19]

Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning

Ewbank MP, Cummins R, Tablan V, et al. Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning. JAMA Psychiatry . 2020;77(1):35-43. doi:10.1001/jamapsychiatry.2019.2664

work page doi:10.1001/jamapsychiatry.2019.2664 2020
[20]

Estimating depression severity in narrative clinical notes using large language models

McCoy TH, Castro VM, Perlis RH. Estimating depression severity in narrative clinical notes using large language models. J Affect Disord . 2025;381:270-274. doi:10.1016/j.jad.2025.04.014

work page doi:10.1016/j.jad.2025.04.014 2025
[21]

Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

Tsui FR, Shi L, Ruiz V, et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open . 2021;4(1):ooab011. doi:10.1093/jamiaopen/ooab011

work page doi:10.1093/jamiaopen/ooab011 2021
[22]

SocialLM: Social Signal Processing of Patient-Provider Communication using LLMs and Contextual Aggregation

Bedmutha MS, Chen F, Hartzler A, Cohen T, Weibel N. Can Language Models Understand Social Behavior in Clinical Conversations? arXiv . Preprint posted online May 7, 2025:arXiv:2505.04152. doi:10.48550/arXiv.2505.04152

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.04152 2025
[23]

ConverSense: An Automated Approach to Assess Patient-Provider Interactions using Social Signals

Bedmutha MS, Tsedenbal A, Tobar K, et al. ConverSense: An Automated Approach to Assess Patient-Provider Interactions using Social Signals. Proc SIGCHI Conf Hum Factor Comput Syst . 2024;2024:448. doi:10.1145/3613904.3641998

work page doi:10.1145/3613904.3641998 2024
[24]

Depression underdiagnosis: Prevalence and associated factors

Faisal-Cury A, Ziebold C, Rodrigues DM de O, Matijasevich A. Depression underdiagnosis: Prevalence and associated factors. A population-based study. Journal of Psychiatric Research . 2022;151:157-165. doi:10.1016/j.jpsychires.2022.04.025

work page doi:10.1016/j.jpsychires.2022.04.025 2022
[25]

The underrecognition and undertreatment of depression: what is the breadth and depth of the problem? J Clin Psychiatry

Davidson JR, Meltzer-Brody SE. The underrecognition and undertreatment of depression: what is the breadth and depth of the problem? J Clin Psychiatry . 1999;60 Suppl 7:4-9; discussion 10-11

work page 1999
[26]

Depression Screening and Measurement-Based Care in Primary Care

Siniscalchi KA, Broome ME, Fish J, et al. Depression Screening and Measurement-Based Care in Primary Care. J Prim Care Community Health . 2020;11:2150132720931261. doi:10.1177/2150132720931261

work page doi:10.1177/2150132720931261 2020
[27]

Maybe they had a bad day: how LGBTQ and BIPOC patients react to bias in healthcare and struggle to speak out

Apodaca C, Casanova-Perez R, Bascom E, et al. Maybe they had a bad day: how LGBTQ and BIPOC patients react to bias in healthcare and struggle to speak out. J Am Med Inform Assoc . 2022;29(12):2075-2082. doi:10.1093/jamia/ocac142

work page doi:10.1093/jamia/ocac142 2022
[28]

The Establishing Focus protocol: Training for collaborative agenda setting and time management in the medical interview

Mauksch LB, Hillenburg L, Robins L. The Establishing Focus protocol: Training for collaborative agenda setting and time management in the medical interview. Families, Systems, & Health . 2001;19(2):147-157. doi:10.1037/h0089539

work page doi:10.1037/h0089539 2001
[29]

https://www.ahrq.gov/sites/default/files/2024-07/robins-report.pdf

work page 2024
[30]

General Hospital Psychiatry , author =

Manea L, Gilbody S, McMillan D. A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. General Hospital Psychiatry . 2015;37(1):67-75. doi:10.1016/j.genhosppsych.2014.09.009

work page doi:10.1016/j.genhosppsych.2014.09.009 2015
[31]

Speaker Role Identification in Clinical Conversations

Zolensky A, Jang KJ, Sabin J, et al. Speaker Role Identification in Clinical Conversations. Pac Symp Biocomput . 2026;31:144-157. doi:10.1142/9789819824755_0011

work page doi:10.1142/9789819824755_0011 2026
[32]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Inui K, Jiang J, Ng V, Wan X, eds. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics; 2019:3982-...

work page doi:10.18653/v1/d19-1410 2019
[33]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Warner B, Chaffin A, Clavié B, et al. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. arXiv . Preprint posted online December 19, 2024:arXiv:2412.13663. doi:10.48550/arXiv.2412.13663

work page internal anchor Pith review doi:10.48550/arxiv.2412.13663 2024
[34]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI, Agarwal S, Ahmad L, et al. gpt-oss-120b & gpt-oss-20b Model Card. arXiv . Preprint posted online August 8, 2025:arXiv:2508.10925. doi:10.48550/arXiv.2508.10925

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925 2025
[35]

Language use of depressed and depression-vulnerable college students

Rude SS, Gortner EM, Pennebaker JW. Language use of depressed and depression-vulnerable college students. Cognition and Emotion . 2004;18(8):1121-1133. doi:10.1080/02699930441000030

work page doi:10.1080/02699930441000030 2004
[36]

Corbin L, Griner E, Seyedi S, et al. A comparison of linguistic patterns between individuals with current major depressive disorder, past major depressive disorder, and controls in a virtual, psychiatric research interview. Journal of Affective Disorders Reports . 2023;14:100645. doi:10.1016/j.jadr.2023.100645

work page doi:10.1016/j.jadr.2023.100645 2023
[37]

Detecting depression in speech using verbal behavior analysis: a cross-cultural study

Amorese T, Cuciniello M, Greco C, et al. Detecting depression in speech using verbal behavior analysis: a cross-cultural study. Front Psychol . 2025;16:1514918. doi:10.3389/fpsyg.2025.1514918

work page doi:10.3389/fpsyg.2025.1514918 2025
[38]

Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models

Chen F, Ben-Zeev D, Sparks G, Kadakia A, Cohen T. Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models. In: Biocomputing 2026 . WORLD SCIENTIFIC; 2025:265-279. doi:10.1142/9789819824755_0019

work page doi:10.1142/9789819824755_0019 2026
[39]

Failure to Recognize Depression in Primary Care: Issues and Challenges

Egede LE. Failure to Recognize Depression in Primary Care: Issues and Challenges. J Gen Intern Med . 2007;22(5):701-703. doi:10.1007/s11606-007-0170-z

work page doi:10.1007/s11606-007-0170-z 2007
[40]

Validating Computer-Generated Measures of Linguistic Style Matching and Accommodation in Patient-Clinician Communication

Khaleghzadegan S, Rosen M, Links A, et al. Validating Computer-Generated Measures of Linguistic Style Matching and Accommodation in Patient-Clinician Communication. Patient Educ Couns . 2024;119:108074. doi:10.1016/j.pec.2023.108074

work page doi:10.1016/j.pec.2023.108074 2024
[41]

Eliciting the Patient’s Agenda- Secondary Analysis of Recorded Clinical Encounters

Singh Ospina N, Phillips KA, Rodriguez-Gutierrez R, et al. Eliciting the Patient’s Agenda- Secondary Analysis of Recorded Clinical Encounters. J Gen Intern Med. 2019;34(1):36-40. doi:10.1007/s11606-018-4540-5

work page doi:10.1007/s11606-018-4540-5 2019
[42]

Interrupted opening statements in clinical encounters: A scoping review

Coyle AC, Yen RW, Elwyn G. Interrupted opening statements in clinical encounters: A scoping review. Patient Education and Counseling. 2022;105(8):2653-2663. doi:10.1016/j.pec.2022.03.026

work page doi:10.1016/j.pec.2022.03.026 2022

[1] [1]

For each feature, group means are reported for the non-depression (n=855) and depression (n=253) groups, along with the t-statistic from a two-sample t-test

LIWC-22 features comparisons between depression and non-depression groups by speaker configuration. For each feature, group means are reported for the non-depression (n=855) and depression (n=253) groups, along with the t-statistic from a two-sample t-test. Negative t-statistics indicate higher values in the depression group. All features shown are statis...

work page 2002

[2] [2]

Risk factors for suicide in individuals with depression: a systematic review

Hawton K, Casañas I Comabella C, Haw C, Saunders K. Risk factors for suicide in individuals with depression: a systematic review. J Affect Disord . 2013;147(1-3):17-28. doi:10.1016/j.jad.2013.01.004

work page doi:10.1016/j.jad.2013.01.004 2013

[3] [3]

Depression and public health: an overview

Cassano P, Fava M. Depression and public health: an overview. J Psychosom Res . 2002;53(4):849-857. doi:10.1016/s0022-3999(02)00304-5

work page doi:10.1016/s0022-3999(02)00304-5 2002

[4] [4]

Depression: the benefits of early and appropriate treatment

Halfin A. Depression: the benefits of early and appropriate treatment. Am J Manag Care . 2007;13(4 Suppl):S92-97

work page 2007

[5] [5]

Clinical diagnosis of depression in primary care: a meta-analysis

Mitchell AJ, Vaze A, Rao S. Clinical diagnosis of depression in primary care: a meta-analysis. The Lancet . 2009;374(9690):609-619. doi:10.1016/S0140-6736(09)60879-5

work page doi:10.1016/s0140-6736(09)60879-5 2009

[6] [6]

Prevalence of and Factors Associated With Patient Nondisclosure of Medically Relevant Information to Clinicians

Levy AG, Scherer AM, Zikmund-Fisher BJ, Larkin K, Barnes GD, Fagerlin A. Prevalence of and Factors Associated With Patient Nondisclosure of Medically Relevant Information to Clinicians. JAMA Netw Open . 2018;1(7):e185293. doi:10.1001/jamanetworkopen.2018.5293

work page doi:10.1001/jamanetworkopen.2018.5293 2018

[7] [7]

Screening for Depression in Adults: US Preventive Services Task Force Recommendation Statement

Siu AL, and the US Preventive Services Task Force (USPSTF). Screening for Depression in Adults: US Preventive Services Task Force Recommendation Statement. JAMA . 2016;315(4):380-387. doi:10.1001/jama.2015.18392

work page doi:10.1001/jama.2015.18392 2016

[8] [8]

Screening Adults for Depression in Primary Care

Smithson S, Pignone MP. Screening Adults for Depression in Primary Care. Med Clin North Am . 2017;101(4):807-821. doi:10.1016/j.mcna.2017.03.010

work page doi:10.1016/j.mcna.2017.03.010 2017

[9] [9]

Improving Depression Screening in Primary Care

Lindsay M, Decker VB. Improving Depression Screening in Primary Care. J Doct Nurs Pract . 2022;15(2):84-90. doi:10.1891/JDNP-2021-0005

work page doi:10.1891/jdnp-2021-0005 2022

[10] [10]

Improving the Reporting of Primary Care Research: An International Survey of Researchers

Phillips WR, Sturgiss E, Hunik L, et al. Improving the Reporting of Primary Care Research: An International Survey of Researchers. J Am Board Fam Med . 2021;34(1):12-21. doi:10.3122/jabfm.2021.01.200266

work page doi:10.3122/jabfm.2021.01.200266 2021

[11] [11]

Patient and Health Care Professional Perspectives on Stigma in Integrated Behavioral Health: Barriers and Recommendations

Phelan SM, Salinas M, Pankey T, et al. Patient and Health Care Professional Perspectives on Stigma in Integrated Behavioral Health: Barriers and Recommendations. Ann Fam Med . 2023;21(Suppl 2):S56-S60. doi:10.1370/afm.2924

work page doi:10.1370/afm.2924 2023

[12] [12]

Optimizing patient check-in process for telehealth visits: a data-driven perspective

Khashu K. Optimizing patient check-in process for telehealth visits: a data-driven perspective. Front Digit Health . 2025;7:1554762. doi:10.3389/fdgth.2025.1554762

work page doi:10.3389/fdgth.2025.1554762 2025

[13] [13]

Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study

Saeb S, Zhang M, Karr CJ, et al. Mobile Phone Sensor Correlates of Depressive Symptom Severity in Daily-Life Behavior: An Exploratory Study. J Med Internet Res . 2015;17(7):e175. doi:10.2196/jmir.4273

work page doi:10.2196/jmir.4273 2015

[14] [14]

Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling

Rykov Y, Thach TQ, Bojic I, Christopoulos G, Car J. Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling. JMIR Mhealth Uhealth . 2021;9(10):e24872. doi:10.2196/24872

work page doi:10.2196/24872 2021

[15] [15]

Facebook language predicts depression in medical records

Eichstaedt JC, Smith RJ, Merchant RM, et al. Facebook language predicts depression in medical records. Proc Natl Acad Sci U S A . 2018;115(44):11203-11208. doi:10.1073/pnas.1802331115

work page doi:10.1073/pnas.1802331115 2018

[16] [16]

A Meta-Analysis of Correlations Between Depression and First Person Singular Pronoun Use

Edwards T, Holtzman N. A Meta-Analysis of Correlations Between Depression and First Person Singular Pronoun Use. Journal of Research in Personality . 2017;68:63-68. doi:https://doi.org/10.1016/j.jrp.2017.02.005

work page doi:10.1016/j.jrp.2017.02.005 2017

[17] [17]

The Distress Analysis Interview Corpus of human and computer interviews

Gratch J, Artstein R, Lucas G, et al. The Distress Analysis Interview Corpus of human and computer interviews. In: Calzolari N, Choukri K, Declerck T, et al., eds. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) . European Language Resources Association (ELRA); 2014:3123-3128. Accessed March 7,

work page 2014

[18] [18]

Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health

Althoff T, Clark K, Leskovec J. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Trans Assoc Comput Linguist . 2016;4:463-476

work page 2016

[19] [19]

Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning

Ewbank MP, Cummins R, Tablan V, et al. Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning. JAMA Psychiatry . 2020;77(1):35-43. doi:10.1001/jamapsychiatry.2019.2664

work page doi:10.1001/jamapsychiatry.2019.2664 2020

[20] [20]

Estimating depression severity in narrative clinical notes using large language models

McCoy TH, Castro VM, Perlis RH. Estimating depression severity in narrative clinical notes using large language models. J Affect Disord . 2025;381:270-274. doi:10.1016/j.jad.2025.04.014

work page doi:10.1016/j.jad.2025.04.014 2025

[21] [21]

Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

Tsui FR, Shi L, Ruiz V, et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open . 2021;4(1):ooab011. doi:10.1093/jamiaopen/ooab011

work page doi:10.1093/jamiaopen/ooab011 2021

[22] [22]

SocialLM: Social Signal Processing of Patient-Provider Communication using LLMs and Contextual Aggregation

Bedmutha MS, Chen F, Hartzler A, Cohen T, Weibel N. Can Language Models Understand Social Behavior in Clinical Conversations? arXiv . Preprint posted online May 7, 2025:arXiv:2505.04152. doi:10.48550/arXiv.2505.04152

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.04152 2025

[23] [23]

ConverSense: An Automated Approach to Assess Patient-Provider Interactions using Social Signals

Bedmutha MS, Tsedenbal A, Tobar K, et al. ConverSense: An Automated Approach to Assess Patient-Provider Interactions using Social Signals. Proc SIGCHI Conf Hum Factor Comput Syst . 2024;2024:448. doi:10.1145/3613904.3641998

work page doi:10.1145/3613904.3641998 2024

[24] [24]

Depression underdiagnosis: Prevalence and associated factors

Faisal-Cury A, Ziebold C, Rodrigues DM de O, Matijasevich A. Depression underdiagnosis: Prevalence and associated factors. A population-based study. Journal of Psychiatric Research . 2022;151:157-165. doi:10.1016/j.jpsychires.2022.04.025

work page doi:10.1016/j.jpsychires.2022.04.025 2022

[25] [25]

The underrecognition and undertreatment of depression: what is the breadth and depth of the problem? J Clin Psychiatry

Davidson JR, Meltzer-Brody SE. The underrecognition and undertreatment of depression: what is the breadth and depth of the problem? J Clin Psychiatry . 1999;60 Suppl 7:4-9; discussion 10-11

work page 1999

[26] [26]

Depression Screening and Measurement-Based Care in Primary Care

Siniscalchi KA, Broome ME, Fish J, et al. Depression Screening and Measurement-Based Care in Primary Care. J Prim Care Community Health . 2020;11:2150132720931261. doi:10.1177/2150132720931261

work page doi:10.1177/2150132720931261 2020

[27] [27]

Maybe they had a bad day: how LGBTQ and BIPOC patients react to bias in healthcare and struggle to speak out

Apodaca C, Casanova-Perez R, Bascom E, et al. Maybe they had a bad day: how LGBTQ and BIPOC patients react to bias in healthcare and struggle to speak out. J Am Med Inform Assoc . 2022;29(12):2075-2082. doi:10.1093/jamia/ocac142

work page doi:10.1093/jamia/ocac142 2022

[28] [28]

The Establishing Focus protocol: Training for collaborative agenda setting and time management in the medical interview

Mauksch LB, Hillenburg L, Robins L. The Establishing Focus protocol: Training for collaborative agenda setting and time management in the medical interview. Families, Systems, & Health . 2001;19(2):147-157. doi:10.1037/h0089539

work page doi:10.1037/h0089539 2001

[29] [29]

https://www.ahrq.gov/sites/default/files/2024-07/robins-report.pdf

work page 2024

[30] [30]

General Hospital Psychiatry , author =

Manea L, Gilbody S, McMillan D. A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. General Hospital Psychiatry . 2015;37(1):67-75. doi:10.1016/j.genhosppsych.2014.09.009

work page doi:10.1016/j.genhosppsych.2014.09.009 2015

[31] [31]

Speaker Role Identification in Clinical Conversations

Zolensky A, Jang KJ, Sabin J, et al. Speaker Role Identification in Clinical Conversations. Pac Symp Biocomput . 2026;31:144-157. doi:10.1142/9789819824755_0011

work page doi:10.1142/9789819824755_0011 2026

[32] [32]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Inui K, Jiang J, Ng V, Wan X, eds. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics; 2019:3982-...

work page doi:10.18653/v1/d19-1410 2019

[33] [33]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Warner B, Chaffin A, Clavié B, et al. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. arXiv . Preprint posted online December 19, 2024:arXiv:2412.13663. doi:10.48550/arXiv.2412.13663

work page internal anchor Pith review doi:10.48550/arxiv.2412.13663 2024

[34] [34]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI, Agarwal S, Ahmad L, et al. gpt-oss-120b & gpt-oss-20b Model Card. arXiv . Preprint posted online August 8, 2025:arXiv:2508.10925. doi:10.48550/arXiv.2508.10925

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925 2025

[35] [35]

Language use of depressed and depression-vulnerable college students

Rude SS, Gortner EM, Pennebaker JW. Language use of depressed and depression-vulnerable college students. Cognition and Emotion . 2004;18(8):1121-1133. doi:10.1080/02699930441000030

work page doi:10.1080/02699930441000030 2004

[36] [36]

Corbin L, Griner E, Seyedi S, et al. A comparison of linguistic patterns between individuals with current major depressive disorder, past major depressive disorder, and controls in a virtual, psychiatric research interview. Journal of Affective Disorders Reports . 2023;14:100645. doi:10.1016/j.jadr.2023.100645

work page doi:10.1016/j.jadr.2023.100645 2023

[37] [37]

Detecting depression in speech using verbal behavior analysis: a cross-cultural study

Amorese T, Cuciniello M, Greco C, et al. Detecting depression in speech using verbal behavior analysis: a cross-cultural study. Front Psychol . 2025;16:1514918. doi:10.3389/fpsyg.2025.1514918

work page doi:10.3389/fpsyg.2025.1514918 2025

[38] [38]

Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models

Chen F, Ben-Zeev D, Sparks G, Kadakia A, Cohen T. Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models. In: Biocomputing 2026 . WORLD SCIENTIFIC; 2025:265-279. doi:10.1142/9789819824755_0019

work page doi:10.1142/9789819824755_0019 2026

[39] [39]

Failure to Recognize Depression in Primary Care: Issues and Challenges

Egede LE. Failure to Recognize Depression in Primary Care: Issues and Challenges. J Gen Intern Med . 2007;22(5):701-703. doi:10.1007/s11606-007-0170-z

work page doi:10.1007/s11606-007-0170-z 2007

[40] [40]

Validating Computer-Generated Measures of Linguistic Style Matching and Accommodation in Patient-Clinician Communication

Khaleghzadegan S, Rosen M, Links A, et al. Validating Computer-Generated Measures of Linguistic Style Matching and Accommodation in Patient-Clinician Communication. Patient Educ Couns . 2024;119:108074. doi:10.1016/j.pec.2023.108074

work page doi:10.1016/j.pec.2023.108074 2024

[41] [41]

Eliciting the Patient’s Agenda- Secondary Analysis of Recorded Clinical Encounters

Singh Ospina N, Phillips KA, Rodriguez-Gutierrez R, et al. Eliciting the Patient’s Agenda- Secondary Analysis of Recorded Clinical Encounters. J Gen Intern Med. 2019;34(1):36-40. doi:10.1007/s11606-018-4540-5

work page doi:10.1007/s11606-018-4540-5 2019

[42] [42]

Interrupted opening statements in clinical encounters: A scoping review

Coyle AC, Yen RW, Elwyn G. Interrupted opening statements in clinical encounters: A scoping review. Patient Education and Counseling. 2022;105(8):2653-2663. doi:10.1016/j.pec.2022.03.026

work page doi:10.1016/j.pec.2022.03.026 2022