Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies

Abeer Badawi; Elham Dolatabadi; Jocelyn Rankin; Lydia Sequeira; Maia Norman; Will Aitken

arxiv: 2605.27546 · v1 · pith:4F7UWKBNnew · submitted 2026-05-26 · 💻 cs.CL · cs.HC

Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies

Abeer Badawi , Will Aitken , Lydia Sequeira , Jocelyn Rankin , Maia Norman , Elham Dolatabadi This is my paper

Pith reviewed 2026-06-29 18:31 UTC · model grok-4.3

classification 💻 cs.CL cs.HC

keywords conversationsyouthcrisisexpertgenerativetaxonomiestaxonomyaccuracy

0 comments

The pith

A constrained LLM generates conversation-specific keyphrases that raise youth crisis topic retrieval accuracy from 0.25 to 0.70 and reveal themes missed by fixed taxonomies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that generating dynamic keyphrases from large volumes of youth SMS crisis conversations with a constrained LLM can extend beyond rigid label systems to better identify mental health concerns. It expands an existing 19-label taxonomy to 39 hierarchical labels, applies the generative method to 703975 conversations, and validates results through expert review on 129 samples. A sympathetic reader would care because evolving youth distress language often escapes static categories, so improved detection could guide faster support for thousands of cases each year. The work reports concrete expert-rated gains in accuracy and clarity plus new surfaced themes such as immigration problems and caregiver burden.

Core claim

The paper claims that Keyphrase Generative Representation (KGR) using a constrained LLM produces concise keyphrases that accurately reflect conversation content in 81 percent of cases and improve clarity in 74 percent of cases. KGR enables a topic-retrieval workflow that reaches 0.70 accuracy compared with 0.25 for the manual analyst process and surfaces identity-linked themes absent from the original taxonomy, including immigration problems and caregiver burden. The expanded 39-label schema reaches 0.96 expert consensus reliability.

What carries the argument

Keyphrase Generative Representation (KGR), a constrained large language model that outputs concise, conversation-specific keyphrases from youth crisis SMS texts.

If this is right

The expanded 39-label hierarchical schema reaches 0.96 expert consensus reliability.
81 percent of KGR keyphrases accurately reflect content and 74 percent improve clarity over original text.
KGR surfaces identity-linked themes such as immigration problems and caregiver burden that fixed taxonomies omit.
The KGR-supported topic-retrieval workflow raises accuracy from 0.25 to 0.70 over manual analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same generative approach could be tested on crisis text from adult populations or other languages to check whether it consistently uncovers culturally specific patterns.
Integrating KGR into live responder dashboards might let teams flag emerging issues without waiting for taxonomy updates.
Generated keyphrases could serve as seeds for automatically proposing new taxonomy labels from ongoing data streams.
The method might reduce reliance on any single fixed label set when applied to other high-volume social-service text archives.
keywords:[

Load-bearing premise

The 387 expert annotations on only 129 conversations provide a reliable proxy for performance across the full 703975-conversation corpus without systematic LLM biases or hallucinations in mental-health contexts.

What would settle it

A fresh expert annotation round on a random sample of several thousand conversations drawn from the full corpus would show whether retrieval accuracy stays near 0.70 and whether keyphrases continue to avoid missing or distorting distress themes.

Figures

Figures reproduced from arXiv: 2605.27546 by Abeer Badawi, Elham Dolatabadi, Jocelyn Rankin, Lydia Sequeira, Maia Norman, Will Aitken.

**Figure 2.** Figure 2: The System architecture of the proposed Keyphrase Generative Representation (KGR) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Label distribution of 129 conversations annotated for the survey. (A) Expert-label distri [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluation of multi-label classification performance. (a) Comparison of expert label [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Expert evaluation of representation approaches for youth crisis conversations across three [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Crisis Responders (CRs) rapidly assess thousands of youth SMS conversations each year to identify mental health concerns and guide support. Yet youth distress is increasingly expressed through evolving and context-specific language that often does not fit fixed-label taxonomies. This work analyzed 703,975 de-identified Kids Help Phone conversations (2018-2023) and expanded KHP's 19-label issue taxonomy into a 39-label hierarchical schema. We then introduce Keyphrase Generative Representation (KGR), a constrained LLM generating concise, conversation-specific keyphrases, evaluated across 129 conversations and 387 expert annotations. The expanded taxonomy achieved expert consensus reliability, with an accuracy of 0.96, and expert review found that 81% of keyphrases accurately reflected content and 74% improved clarity. KGR surfaced identity-linked themes absent from the fixed taxonomy, including immigration problems and caregiver burden, and supported a topic-retrieval workflow that increased accuracy from 0.25 to 0.70 (+0.45) over the manual analyst process. KGR marks a shift toward hybrid, interpretable generative representations that extend crisis response beyond static taxonomies to surface emerging and culturally grounded patterns of youth distress.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KGR gives a workable way to expand taxonomies with constrained generation on crisis data, but the accuracy claims rest on a sample too small to support generalization to the full corpus.

read the letter

The paper's main move is to take a large set of real youth crisis SMS logs, grow a 19-label taxonomy to 39, and then use a constrained LLM to generate conversation-specific keyphrases that surface themes the fixed labels miss. On their test set this also lifted topic retrieval accuracy from 0.25 to 0.70. The expert review on the generated phrases (81% accurate, 74% clearer) is straightforward and domain-relevant.

What the work does cleanly is stay close to the operational problem: responders need something more flexible than static categories when language shifts. The new labels for immigration and caregiver burden are concrete examples of what the method catches. The constrained generation step keeps the output short and tied to the text, which fits the use case.

The soft spot is the evaluation size. The headline retrieval gain and the keyphrase quality numbers come from 129 conversations and 387 annotations. That is a small slice of 703k total logs, and the abstract gives no sampling details or stratification by the original taxonomy. Without that, it is difficult to know whether the reported lift would hold across the full range of conversations or whether the LLM introduces systematic misses or additions in mental-health contexts. The 0.96 taxonomy agreement is reported but the protocol behind it is not visible either.

This is for groups working on applied NLP for crisis lines or mental-health text. Readers who need a practical bridge from static labels to generative representations will see the point, but anyone planning to deploy the numbers will want tighter validation first.

It should go to peer review. The application is grounded enough and the problem is real, even if the current evidence needs more on sampling and bias checks.

Referee Report

2 major / 0 minor

Summary. The manuscript analyzes 703,975 de-identified youth SMS crisis conversations (2018-2023) from Kids Help Phone, expands the existing 19-label taxonomy to a 39-label hierarchical schema, and introduces Keyphrase Generative Representation (KGR) via a constrained LLM to produce conversation-specific keyphrases. Expert evaluation on 129 conversations (387 annotations) reports 0.96 accuracy for the expanded taxonomy, 81% of KGR keyphrases accurately reflecting content, 74% improving clarity, identification of new themes (e.g., immigration problems, caregiver burden), and a topic-retrieval workflow raising accuracy from 0.25 to 0.70.

Significance. If the empirical results hold under proper validation, the work demonstrates a practical hybrid generative approach that can extend static taxonomies in mental-health crisis response by surfacing evolving, context-specific, and culturally grounded patterns. The scale of the analyzed corpus and direct expert annotation provide applied relevance for real-world deployment.

major comments (2)

[Abstract] Abstract: The headline claim that KGR enables a topic-retrieval workflow increasing accuracy from 0.25 to 0.70 (+0.45) rests on expert review of only 129 conversations. No sampling strategy, stratification by the 39-label taxonomy, or representativeness argument is supplied, so the results cannot be taken as evidence that the method reliably surfaces emerging themes across the full 703,975-conversation corpus without systematic LLM bias.
[Abstract] Abstract: The reported expert consensus reliability of 0.96 and the 81%/74% keyphrase figures are presented without any description of the evaluation protocol, number of experts, inter-annotator agreement computation, or error bars. This absence prevents verification of the central empirical claims that underwrite the shift to generative representations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying areas where the abstract requires greater transparency regarding sampling and evaluation details. We address the two major comments below and commit to revisions that strengthen the presentation of our empirical claims without overstating generalizability.

read point-by-point responses

Referee: The headline claim that KGR enables a topic-retrieval workflow increasing accuracy from 0.25 to 0.70 (+0.45) rests on expert review of only 129 conversations. No sampling strategy, stratification by the 39-label taxonomy, or representativeness argument is supplied, so the results cannot be taken as evidence that the method reliably surfaces emerging themes across the full 703,975-conversation corpus without systematic LLM bias.

Authors: We agree that the abstract does not describe the sampling procedure or provide a representativeness argument, and that the reported accuracy gain is therefore limited to the evaluated subset. The 129 conversations were selected to enable feasible expert annotation; the topic-retrieval result is presented as an illustration on this sample rather than a claim for the entire corpus. In revision we will update the abstract to state the sample size explicitly, note the absence of a full stratification argument, and qualify the result as sample-based while discussing potential LLM biases in the discussion section. revision: yes
Referee: The reported expert consensus reliability of 0.96 and the 81%/74% keyphrase figures are presented without any description of the evaluation protocol, number of experts, inter-annotator agreement computation, or error bars. This absence prevents verification of the central empirical claims that underwrite the shift to generative representations.

Authors: The abstract summarizes the expert-evaluation outcomes but omits protocol details due to length constraints. The full manuscript contains the annotation protocol and expert count in the Methods section; however, we accept that these elements should be referenced in the abstract for immediate verifiability. We will revise the abstract to include a concise statement of the evaluation setup, the number of experts, and any available inter-annotator agreement or confidence information. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent empirical annotations

full rationale

The paper presents an empirical workflow: taxonomy expansion on 703k conversations, KGR generation via constrained LLM, and evaluation via 387 expert annotations on 129 conversations yielding accuracy figures (0.96 for taxonomy, 81%/74% for keyphrases, 0.25-to-0.70 workflow gain). No equations, fitted parameters, self-definitional loops, or load-bearing self-citations are described that would reduce any prediction or result to its own inputs by construction. All reported metrics derive from external expert review rather than internal redefinition or renaming of known patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5762 in / 1066 out tokens · 40550 ms · 2026-06-29T18:31:57.885360+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support
cs.CL 2026-06 unverdicted novelty 6.0

TheraJudge, trained via preference optimization on human annotations, reaches high clinician agreement (ICC 0.87-0.95) and, when used by TheraAgent, raises human-rated therapeutic quality by 0.43 points on a 5-point s...

Reference graph

Works this paper leans on

56 extracted references · 22 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Organization, W. H. Adolescent mental health (2021). URLhttps://www.who.int/ news-room/fact-sheets/detail/adolescent-mental-health. Accessed: 2025-08-18

2021
[2]

Patel, V. e. a. The lancet commission on global mental health and sustainable development. The Lancet Psychiatry5, 935–984 (2018). URLhttps://www.thelancet.com/journals/ lanpsy/article/PIIS2215-0366(21)00395-3/fulltext

2018
[3]

Wiens, K. e. a. A growing need for youth mental health services in canada: examining trends in youth mental health from 2011 to 2018.Epidemiology and Psychiatric Sciences29, e115 (2020). URLhttps://doi.org/10.1017/S2045796020000281

work page doi:10.1017/s2045796020000281 2011
[4]

Kids Help Phone.https://kidshelpphone.ca/(2025)

Kids Help Phone. Kids Help Phone.https://kidshelpphone.ca/(2025). URLhttps: //kidshelpphone.ca/. Accessed: 2025-12-02

2025
[5]

Turkington, R.et al.Why do people call crisis helplines? identifying taxonomies of presenting reasons and discovering associations between these reasons.Health Informatics Journal26, 2597–2613 (2020)

2020
[6]

& Lee, S

Ali, M., Ali, S., Abbas, Q., Abbas, Z. & Lee, S. W. Artificial intelligence for mental health: A narrative review of applications, challenges, and future directions in digital health.Digital Health11, 20552076251395548 (2025)

2025
[7]

& Kassam, A

Rose, D., Thornicroft, G., Pinfold, V. & Kassam, A. 250 labels used to stigmatise people with mental illness.BMC Health Services Research7, 97 (2007)

2007
[8]

J., Smart, S

Owen, D., Lynham, A. J., Smart, S. E., Pardi˜ nas, A. F. & Camacho-Collados, J. Ai for analyzing mental health disorders among social media users: Quarter-century narrative review of progress and challenges.Journal of Medical Internet Research26, e59225 (2024). 16

2024
[9]

Obadinma, S. e. a. The faiir conversational ai agent assistant for youth mental health service provision.npj Digital Medicine8, 1–13 (2025). URLhttps://www.nature.com/articles/ s41746-025-01647-6

2025
[10]

Overall trends for child and youth men- tal health (2024)

Canadian Institute for Health Information. Overall trends for child and youth men- tal health (2024). URLhttps://www.cihi.ca/en/child-and-youth-mental-health/ overall-trends-for-child-and-youth-mental-health. Accessed: 2025-08-18

2024
[11]

Levis, M., Leonard Westgate, C., Gui, J., Watts, B. V. & Shiner, B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models.Psychological Medicine51, 1382–1391 (2021)

2021
[12]

R.et al.Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts.JAMIA Open4, ooab011 (2021)

Tsui, F. R.et al.Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts.JAMIA Open4, ooab011 (2021). URL https://doi.org/10.1093/jamiaopen/ooab011.https://academic.oup.com/jamiaopen/ article-pdf/4/1/ooab011/36621506/ooab011.pdf

work page doi:10.1093/jamiaopen/ooab011.https://academic.oup.com/jamiaopen/ 2021
[13]

Jain, P., Srinivas, K. R. & Vichare, A. Depression and suicide analysis using machine learning and nlp.Journal of Physics: Conference Series2161, 012034 (2022). URLhttps://dx.doi. org/10.1088/1742-6596/2161/1/012034

work page doi:10.1088/1742-6596/2161/1/012034 2022
[14]

P.et al.Quantifying the association between psychotherapy content and clinical outcomes using deep learning.JAMA Psychiatry77, 35–43 (2020)

Ewbank, M. P.et al.Quantifying the association between psychotherapy content and clinical outcomes using deep learning.JAMA Psychiatry77, 35–43 (2020)

2020
[15]

M.et al.The promise of machine learning in predicting treatment outcomes in psychiatry.World Psychiatry20, 154–170 (2021)

Chekroud, A. M.et al.The promise of machine learning in predicting treatment outcomes in psychiatry.World Psychiatry20, 154–170 (2021)

2021
[16]

B.et al.Machine learning and natural language processing in psychotherapy research: Alliance as example use case.Journal of counseling psychology67, 438 (2020)

Goldberg, S. B.et al.Machine learning and natural language processing in psychotherapy research: Alliance as example use case.Journal of counseling psychology67, 438 (2020)

2020
[17]

& Bouneffouf, D

Lin, B., Cecchi, G. & Bouneffouf, D. Deep annotation of therapeutic working alliance in psychotherapy. InInternational workshop on health intelligence, 193–207 (Springer, 2023). URLhttps://doi.org/10.1007/978-3-031-36938-4_15

work page doi:10.1007/978-3-031-36938-4_15 2023
[18]

D., Zech, J

Malgaroli, M., Hull, T. D., Zech, J. M. & Althoff, T. Natural language processing for mental health interventions: A systematic review and research framework.Translational Psychiatry 13, 309 (2023)

2023
[19]

Zhang, Q.et al.Generative ai mental health chatbots as therapeutic tools: Systematic review and meta-analysis of their role in reducing mental health issues.Journal of Medical Internet Research27, e78238 (2025)

2025
[20]

& Cambria, E

Ji, S., Zhang, T., Yang, K., Ananiadou, S. & Cambria, E. Rethinking large language models in mental health applications (2023). URLhttps://arxiv.org/abs/2311.11267.2311.11267

work page arXiv 2023
[21]

ihealth: The ethics of artificial intelligence and big data in mental healthcare

Rubeis, G. ihealth: The ethics of artificial intelligence and big data in mental healthcare. Internet Interventions28, 100518 (2022). URLhttps://www.sciencedirect.com/science/ article/pii/S2214782922000252

2022
[22]

Ai is gone: engagement and ethics in data-driven technology for mental health

Carr, S. Ai is gone: engagement and ethics in data-driven technology for mental health. Journal of Mental Health29, 125–130 (2020). URLhttps://doi.org/10.1080/09638237. 2020.1714011. PMID: 32000544,https://doi.org/10.1080/09638237.2020.1714011. 17

work page doi:10.1080/09638237 2020
[23]

Torous, J.et al.The evolving field of digital mental health: current evidence and implemen- tation issues for smartphone apps, generative artificial intelligence, and virtual reality.World Psychiatry24, 156–174 (2025)

2025
[24]

V., Swaroop, S., Simon, A

Mughal, S., McIlwaine, S. V., Swaroop, S., Simon, A. & Shah, J. L. Five years of youth engagement with kids help phone canada (part 1): Phone, chat, text, and peer-to-peer service usage nationally, provincially, and over time.Telemedicine and e-Health30, 788–794 (2024). Epub 2023 Sep 12

2024
[25]

V., Swaroop, S., Simon, A

Mughal, S., McIlwaine, S. V., Swaroop, S., Simon, A. & Shah, J. L. Five years of youth engagement with kids help phone canada (part 2): Issues discussed over phone, chat, text, and peer-to-peer services by age range.Telemedicine and e-Health30, 795–804 (2024). Epub 2023 Sep 12

2024
[26]

Vector institute for artificial intelligence (2025)

Vector Institute. Vector institute for artificial intelligence (2025). URLhttps:// vectorinstitute.ai/. Accessed: 2026-03-06

2025
[27]

S.et al.Crisis text-line interventions: Evaluation of texters’ perceptions of effec- tiveness.Suicide and Life-Threatening Behavior52, 583–595 (2022)

Gould, M. S.et al.Crisis text-line interventions: Evaluation of texters’ perceptions of effec- tiveness.Suicide and Life-Threatening Behavior52, 583–595 (2022). Epub 2022 May 22

2022
[28]

Meta llama 3.https://ai.meta.com/blog/meta-llama-3/(2024)

Meta AI. Meta llama 3.https://ai.meta.com/blog/meta-llama-3/(2024). Accessed: 2025-09-29

2024
[29]

& Mago, V

Chandrasekaran, D. & Mago, V. Evolution of semantic similarity – a survey.arXiv preprint arXiv:2004.13820(2020). URLhttps://arxiv.org/abs/2004.13820. Accessed: 2026-01- 26

work page arXiv 2004
[30]

A.et al.On evaluation metrics for medical applications of artificial intelligence

Hicks, S. A.et al.On evaluation metrics for medical applications of artificial intelligence. Scientific Reports12, 5979 (2022). URLhttps://www.ncbi.nlm.nih.gov/pmc/articles/ PMC8993826/

2022
[31]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A.et al.The llama 3 herd of models (2024). URL https://arxiv.org/abs/2407.21783.2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Singhal, K.et al.Large language models encode clinical knowledge.Nature620, 172–180 (2023)

2023
[33]

Van Veen, D.et al.Adapted large language models can outperform medical experts in clinical text summarization.Nature Medicine30, 1134–1142 (2024)

2024
[34]

Bednarczyk, L.et al.Scientific evidence for clinical text summarization using large language models: Scoping review.Journal of Medical Internet Research27, e68998 (2025)

2025
[35]

C.et al.Llm-aix: An open source pipeline for information extraction from un- structured medical text based on privacy preserving large language models.medRxiv(2024)

Wiest, I. C.et al.Llm-aix: An open source pipeline for information extraction from un- structured medical text based on privacy preserving large language models.medRxiv(2024). Preprint

2024
[36]

S., Reid, M., Matsuo, Y

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large lan- guage models are zero-shot reasoners. In Koyejo, S.et al.(eds.)Advances in Neural Information Processing Systems, vol. 35, 22199–22213 (Curran Associates, Inc., 2022). URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf. 18

2022
[37]

A.et al.Artificial intelligence for mental health and mental illnesses: an overview

Graham, S. A.et al.Artificial intelligence for mental health and mental illnesses: an overview. Current Psychiatry Reports21, 116 (2019). URLhttps://link.springer.com/article/10. 1007/s11920-019-1094-0

2019
[38]

& Kassam, A

Rose, D., Thornicroft, G., Pinfold, V. & Kassam, A. 250 labels used to stigmatise people with mental illness.BMC Health Services Research7, 97 (2007). URLhttps://link.springer. com/article/10.1186/1472-6963-7-97

work page doi:10.1186/1472-6963-7-97 2007
[39]

URL https://doi.org/10.1038/s41746-023-00951-3

Swaminathan, A.et al.Natural language processing system for rapid detection and inter- vention of mental health crisis chat messages.npj Digital Medicine6, 213 (2023). URL https://doi.org/10.1038/s41746-023-00951-3

work page doi:10.1038/s41746-023-00951-3 2023
[40]

URLhttps: //doi.org/10.3389/fpsyt.2023.1110527

Broadbent, M.et al.A machine learning approach to identifying suicide risk among text- based crisis counseling encounters.Frontiers in Psychiatry14, 1110527 (2023). URLhttps: //doi.org/10.3389/fpsyt.2023.1110527

work page doi:10.3389/fpsyt.2023.1110527 2023
[41]

Self-labeling and its effects among adolescents diagnosed with mental disorders

Moses, T. Self-labeling and its effects among adolescents diagnosed with mental disorders. Social Science & Medicine68, 570–578 (2009). URLhttps://www.sciencedirect.com/ science/article/pii/S0277953608005790

2009
[42]

Gureje, O., Lewis-Fern´ andez, R., Hall, B. J. & Reed, G. M. Cultural considerations in the classification of mental disorders: why and how in ICD-11.BMC Medicine18, 25 (2020). URLhttps://doi.org/10.1186/s12916-020-1493-4

work page doi:10.1186/s12916-020-1493-4 2020
[43]

& Kirmayer, L

Lewis-Fern´ andez, R. & Kirmayer, L. J. Cultural concepts of distress and psychiatric disorders: Understanding symptom experience and expression in context.Transcultural Psychiatry56, 786–803 (2019). URLhttps://doi.org/10.1177/1363461519861795

work page doi:10.1177/1363461519861795 2019
[44]

M.et al.Quantifying changes in the language used around mental health on twitter over 10 years: Observational study.JMIR Mental Health9, e33685 (2022)

Stupinski, A. M.et al.Quantifying changes in the language used around mental health on twitter over 10 years: Observational study.JMIR Mental Health9, e33685 (2022). URL https://doi.org/10.2196/33685

work page doi:10.2196/33685 2022
[45]

Dinakar, K., Chen, J., Lieberman, H., Picard, R. W. & Filbin, R. Mixed-initiative real-time topic modeling & visualization for crisis counseling. InProceedings of the 20th International Conference on Intelligent User Interfaces (IUI ’15), 417–426 (Association for Computing Ma- chinery, 2015). URLhttps://doi.org/10.1145/2678025.2701395

work page doi:10.1145/2678025.2701395 2015
[46]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Althoff, T., Clark, K. & Leskovec, J. Large-scale analysis of counseling conversations: An application of natural language processing to mental health.Transactions of the Association for Computational Linguistics4, 463–476 (2016). URLhttps://doi.org/10.1162/tacl_a_ 00111

work page doi:10.1162/tacl_a_ 2016
[47]

K.et al.Exploring the efficacy of large language models in summarizing mental health counseling sessions: Benchmark study.JMIR Mental Health11, e57306 (2024)

Adhikary, P. K.et al.Exploring the efficacy of large language models in summarizing mental health counseling sessions: Benchmark study.JMIR Mental Health11, e57306 (2024). URL https://doi.org/10.2196/57306

work page doi:10.2196/57306 2024
[48]

URLhttps://doi.org/10.2196/58418

So, J.-h.et al.Aligning large language models for enhancing psychiatric interviews through symptom delineation and summarization: Pilot study.JMIR Formative Research8, e58418 (2024). URLhttps://doi.org/10.2196/58418. 19

work page doi:10.2196/58418 2024
[49]

International statistical classification of diseases and re- lated health problems (ICD).https://www.who.int/standards/classifications/ classification-of-diseases(2026)

World Health Organization. International statistical classification of diseases and re- lated health problems (ICD).https://www.who.int/standards/classifications/ classification-of-diseases(2026). URLhttps://www.who.int/standards/ classifications/classification-of-diseases. Accessed: 2026-03-02

2026
[50]

Intended use: SNOMED CT editorial guide (2026)

SNOMED International. Intended use: SNOMED CT editorial guide (2026). URL https://docs.snomed.org/snomed-ct-specifications/snomed-ct-editorial-guide/ readme/snomed-ct-introduction/intended-use. Accessed: 2026-03-02

2026
[51]

National guidelines for a behav- ioral health coordinated system of crisis care (2025)

Substance Abuse and Mental Health Services Administration. National guidelines for a behav- ioral health coordinated system of crisis care (2025). URLhttps://library.samhsa.gov/ sites/default/files/national-guidelines-crisis-care-pep24-01-037.pdf. Accessed: 2026-03-02

2025
[52]

& Haug, P

Meystre, S. & Haug, P. J. Automation of a problem list using natural language processing. BMC Medical Informatics and Decision Making5, 30 (2005). URLhttps://doi.org/10. 1186/1472-6947-5-30

2005
[53]

& Gevaert, O

Zhan, X., Humbert-Droz, M., Mukherjee, P. & Gevaert, O. Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovas- cular diseases.Patterns2, 100289 (2021). URLhttps://doi.org/10.1016/j.patter.2021. 100289

work page doi:10.1016/j.patter.2021 2021
[54]

URLhttps://doi.org/10.1001/ jamanetworkopen.2025.53174

Nguyen, D.et al.Performance of an intelligent messaging tool for clinical communi- cations.JAMA Network Open9, e2553174 (2026). URLhttps://doi.org/10.1001/ jamanetworkopen.2025.53174

work page arXiv 2026
[55]

Automatica49(11), 3222–3233 (2013)

Sulieman, L.et al.Classifying patient portal messages using convolutional neural networks. Journal of Biomedical Informatics74, 59–70 (2017). URLhttps://doi.org/10.1016/j. jbi.2017.08.014

work page doi:10.1016/j 2017
[56]

M., Fabbri, D., Denny, J

Cronin, R. M., Fabbri, D., Denny, J. C., Rosenbloom, S. T. & Jackson, G. P. A com- parison of rule-based and machine learning approaches for classifying patient portal mes- sages.International Journal of Medical Informatics105, 110–120 (2017). URLhttps: //doi.org/10.1016/j.ijmedinf.2017.06.004. 20

work page doi:10.1016/j.ijmedinf.2017.06.004 2017

[1] [1]

Organization, W. H. Adolescent mental health (2021). URLhttps://www.who.int/ news-room/fact-sheets/detail/adolescent-mental-health. Accessed: 2025-08-18

2021

[2] [2]

Patel, V. e. a. The lancet commission on global mental health and sustainable development. The Lancet Psychiatry5, 935–984 (2018). URLhttps://www.thelancet.com/journals/ lanpsy/article/PIIS2215-0366(21)00395-3/fulltext

2018

[3] [3]

Wiens, K. e. a. A growing need for youth mental health services in canada: examining trends in youth mental health from 2011 to 2018.Epidemiology and Psychiatric Sciences29, e115 (2020). URLhttps://doi.org/10.1017/S2045796020000281

work page doi:10.1017/s2045796020000281 2011

[4] [4]

Kids Help Phone.https://kidshelpphone.ca/(2025)

Kids Help Phone. Kids Help Phone.https://kidshelpphone.ca/(2025). URLhttps: //kidshelpphone.ca/. Accessed: 2025-12-02

2025

[5] [5]

Turkington, R.et al.Why do people call crisis helplines? identifying taxonomies of presenting reasons and discovering associations between these reasons.Health Informatics Journal26, 2597–2613 (2020)

2020

[6] [6]

& Lee, S

Ali, M., Ali, S., Abbas, Q., Abbas, Z. & Lee, S. W. Artificial intelligence for mental health: A narrative review of applications, challenges, and future directions in digital health.Digital Health11, 20552076251395548 (2025)

2025

[7] [7]

& Kassam, A

Rose, D., Thornicroft, G., Pinfold, V. & Kassam, A. 250 labels used to stigmatise people with mental illness.BMC Health Services Research7, 97 (2007)

2007

[8] [8]

J., Smart, S

Owen, D., Lynham, A. J., Smart, S. E., Pardi˜ nas, A. F. & Camacho-Collados, J. Ai for analyzing mental health disorders among social media users: Quarter-century narrative review of progress and challenges.Journal of Medical Internet Research26, e59225 (2024). 16

2024

[9] [9]

Obadinma, S. e. a. The faiir conversational ai agent assistant for youth mental health service provision.npj Digital Medicine8, 1–13 (2025). URLhttps://www.nature.com/articles/ s41746-025-01647-6

2025

[10] [10]

Overall trends for child and youth men- tal health (2024)

Canadian Institute for Health Information. Overall trends for child and youth men- tal health (2024). URLhttps://www.cihi.ca/en/child-and-youth-mental-health/ overall-trends-for-child-and-youth-mental-health. Accessed: 2025-08-18

2024

[11] [11]

Levis, M., Leonard Westgate, C., Gui, J., Watts, B. V. & Shiner, B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models.Psychological Medicine51, 1382–1391 (2021)

2021

[12] [12]

R.et al.Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts.JAMIA Open4, ooab011 (2021)

Tsui, F. R.et al.Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts.JAMIA Open4, ooab011 (2021). URL https://doi.org/10.1093/jamiaopen/ooab011.https://academic.oup.com/jamiaopen/ article-pdf/4/1/ooab011/36621506/ooab011.pdf

work page doi:10.1093/jamiaopen/ooab011.https://academic.oup.com/jamiaopen/ 2021

[13] [13]

Jain, P., Srinivas, K. R. & Vichare, A. Depression and suicide analysis using machine learning and nlp.Journal of Physics: Conference Series2161, 012034 (2022). URLhttps://dx.doi. org/10.1088/1742-6596/2161/1/012034

work page doi:10.1088/1742-6596/2161/1/012034 2022

[14] [14]

P.et al.Quantifying the association between psychotherapy content and clinical outcomes using deep learning.JAMA Psychiatry77, 35–43 (2020)

Ewbank, M. P.et al.Quantifying the association between psychotherapy content and clinical outcomes using deep learning.JAMA Psychiatry77, 35–43 (2020)

2020

[15] [15]

M.et al.The promise of machine learning in predicting treatment outcomes in psychiatry.World Psychiatry20, 154–170 (2021)

Chekroud, A. M.et al.The promise of machine learning in predicting treatment outcomes in psychiatry.World Psychiatry20, 154–170 (2021)

2021

[16] [16]

B.et al.Machine learning and natural language processing in psychotherapy research: Alliance as example use case.Journal of counseling psychology67, 438 (2020)

Goldberg, S. B.et al.Machine learning and natural language processing in psychotherapy research: Alliance as example use case.Journal of counseling psychology67, 438 (2020)

2020

[17] [17]

& Bouneffouf, D

Lin, B., Cecchi, G. & Bouneffouf, D. Deep annotation of therapeutic working alliance in psychotherapy. InInternational workshop on health intelligence, 193–207 (Springer, 2023). URLhttps://doi.org/10.1007/978-3-031-36938-4_15

work page doi:10.1007/978-3-031-36938-4_15 2023

[18] [18]

D., Zech, J

Malgaroli, M., Hull, T. D., Zech, J. M. & Althoff, T. Natural language processing for mental health interventions: A systematic review and research framework.Translational Psychiatry 13, 309 (2023)

2023

[19] [19]

Zhang, Q.et al.Generative ai mental health chatbots as therapeutic tools: Systematic review and meta-analysis of their role in reducing mental health issues.Journal of Medical Internet Research27, e78238 (2025)

2025

[20] [20]

& Cambria, E

Ji, S., Zhang, T., Yang, K., Ananiadou, S. & Cambria, E. Rethinking large language models in mental health applications (2023). URLhttps://arxiv.org/abs/2311.11267.2311.11267

work page arXiv 2023

[21] [21]

ihealth: The ethics of artificial intelligence and big data in mental healthcare

Rubeis, G. ihealth: The ethics of artificial intelligence and big data in mental healthcare. Internet Interventions28, 100518 (2022). URLhttps://www.sciencedirect.com/science/ article/pii/S2214782922000252

2022

[22] [22]

Ai is gone: engagement and ethics in data-driven technology for mental health

Carr, S. Ai is gone: engagement and ethics in data-driven technology for mental health. Journal of Mental Health29, 125–130 (2020). URLhttps://doi.org/10.1080/09638237. 2020.1714011. PMID: 32000544,https://doi.org/10.1080/09638237.2020.1714011. 17

work page doi:10.1080/09638237 2020

[23] [23]

Torous, J.et al.The evolving field of digital mental health: current evidence and implemen- tation issues for smartphone apps, generative artificial intelligence, and virtual reality.World Psychiatry24, 156–174 (2025)

2025

[24] [24]

V., Swaroop, S., Simon, A

Mughal, S., McIlwaine, S. V., Swaroop, S., Simon, A. & Shah, J. L. Five years of youth engagement with kids help phone canada (part 1): Phone, chat, text, and peer-to-peer service usage nationally, provincially, and over time.Telemedicine and e-Health30, 788–794 (2024). Epub 2023 Sep 12

2024

[25] [25]

V., Swaroop, S., Simon, A

Mughal, S., McIlwaine, S. V., Swaroop, S., Simon, A. & Shah, J. L. Five years of youth engagement with kids help phone canada (part 2): Issues discussed over phone, chat, text, and peer-to-peer services by age range.Telemedicine and e-Health30, 795–804 (2024). Epub 2023 Sep 12

2024

[26] [26]

Vector institute for artificial intelligence (2025)

Vector Institute. Vector institute for artificial intelligence (2025). URLhttps:// vectorinstitute.ai/. Accessed: 2026-03-06

2025

[27] [27]

S.et al.Crisis text-line interventions: Evaluation of texters’ perceptions of effec- tiveness.Suicide and Life-Threatening Behavior52, 583–595 (2022)

Gould, M. S.et al.Crisis text-line interventions: Evaluation of texters’ perceptions of effec- tiveness.Suicide and Life-Threatening Behavior52, 583–595 (2022). Epub 2022 May 22

2022

[28] [28]

Meta llama 3.https://ai.meta.com/blog/meta-llama-3/(2024)

Meta AI. Meta llama 3.https://ai.meta.com/blog/meta-llama-3/(2024). Accessed: 2025-09-29

2024

[29] [29]

& Mago, V

Chandrasekaran, D. & Mago, V. Evolution of semantic similarity – a survey.arXiv preprint arXiv:2004.13820(2020). URLhttps://arxiv.org/abs/2004.13820. Accessed: 2026-01- 26

work page arXiv 2004

[30] [30]

A.et al.On evaluation metrics for medical applications of artificial intelligence

Hicks, S. A.et al.On evaluation metrics for medical applications of artificial intelligence. Scientific Reports12, 5979 (2022). URLhttps://www.ncbi.nlm.nih.gov/pmc/articles/ PMC8993826/

2022

[31] [31]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A.et al.The llama 3 herd of models (2024). URL https://arxiv.org/abs/2407.21783.2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

Singhal, K.et al.Large language models encode clinical knowledge.Nature620, 172–180 (2023)

2023

[33] [33]

Van Veen, D.et al.Adapted large language models can outperform medical experts in clinical text summarization.Nature Medicine30, 1134–1142 (2024)

2024

[34] [34]

Bednarczyk, L.et al.Scientific evidence for clinical text summarization using large language models: Scoping review.Journal of Medical Internet Research27, e68998 (2025)

2025

[35] [35]

C.et al.Llm-aix: An open source pipeline for information extraction from un- structured medical text based on privacy preserving large language models.medRxiv(2024)

Wiest, I. C.et al.Llm-aix: An open source pipeline for information extraction from un- structured medical text based on privacy preserving large language models.medRxiv(2024). Preprint

2024

[36] [36]

S., Reid, M., Matsuo, Y

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large lan- guage models are zero-shot reasoners. In Koyejo, S.et al.(eds.)Advances in Neural Information Processing Systems, vol. 35, 22199–22213 (Curran Associates, Inc., 2022). URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf. 18

2022

[37] [37]

A.et al.Artificial intelligence for mental health and mental illnesses: an overview

Graham, S. A.et al.Artificial intelligence for mental health and mental illnesses: an overview. Current Psychiatry Reports21, 116 (2019). URLhttps://link.springer.com/article/10. 1007/s11920-019-1094-0

2019

[38] [38]

& Kassam, A

Rose, D., Thornicroft, G., Pinfold, V. & Kassam, A. 250 labels used to stigmatise people with mental illness.BMC Health Services Research7, 97 (2007). URLhttps://link.springer. com/article/10.1186/1472-6963-7-97

work page doi:10.1186/1472-6963-7-97 2007

[39] [39]

URL https://doi.org/10.1038/s41746-023-00951-3

Swaminathan, A.et al.Natural language processing system for rapid detection and inter- vention of mental health crisis chat messages.npj Digital Medicine6, 213 (2023). URL https://doi.org/10.1038/s41746-023-00951-3

work page doi:10.1038/s41746-023-00951-3 2023

[40] [40]

URLhttps: //doi.org/10.3389/fpsyt.2023.1110527

Broadbent, M.et al.A machine learning approach to identifying suicide risk among text- based crisis counseling encounters.Frontiers in Psychiatry14, 1110527 (2023). URLhttps: //doi.org/10.3389/fpsyt.2023.1110527

work page doi:10.3389/fpsyt.2023.1110527 2023

[41] [41]

Self-labeling and its effects among adolescents diagnosed with mental disorders

Moses, T. Self-labeling and its effects among adolescents diagnosed with mental disorders. Social Science & Medicine68, 570–578 (2009). URLhttps://www.sciencedirect.com/ science/article/pii/S0277953608005790

2009

[42] [42]

Gureje, O., Lewis-Fern´ andez, R., Hall, B. J. & Reed, G. M. Cultural considerations in the classification of mental disorders: why and how in ICD-11.BMC Medicine18, 25 (2020). URLhttps://doi.org/10.1186/s12916-020-1493-4

work page doi:10.1186/s12916-020-1493-4 2020

[43] [43]

& Kirmayer, L

Lewis-Fern´ andez, R. & Kirmayer, L. J. Cultural concepts of distress and psychiatric disorders: Understanding symptom experience and expression in context.Transcultural Psychiatry56, 786–803 (2019). URLhttps://doi.org/10.1177/1363461519861795

work page doi:10.1177/1363461519861795 2019

[44] [44]

M.et al.Quantifying changes in the language used around mental health on twitter over 10 years: Observational study.JMIR Mental Health9, e33685 (2022)

Stupinski, A. M.et al.Quantifying changes in the language used around mental health on twitter over 10 years: Observational study.JMIR Mental Health9, e33685 (2022). URL https://doi.org/10.2196/33685

work page doi:10.2196/33685 2022

[45] [45]

Dinakar, K., Chen, J., Lieberman, H., Picard, R. W. & Filbin, R. Mixed-initiative real-time topic modeling & visualization for crisis counseling. InProceedings of the 20th International Conference on Intelligent User Interfaces (IUI ’15), 417–426 (Association for Computing Ma- chinery, 2015). URLhttps://doi.org/10.1145/2678025.2701395

work page doi:10.1145/2678025.2701395 2015

[46] [46]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Althoff, T., Clark, K. & Leskovec, J. Large-scale analysis of counseling conversations: An application of natural language processing to mental health.Transactions of the Association for Computational Linguistics4, 463–476 (2016). URLhttps://doi.org/10.1162/tacl_a_ 00111

work page doi:10.1162/tacl_a_ 2016

[47] [47]

K.et al.Exploring the efficacy of large language models in summarizing mental health counseling sessions: Benchmark study.JMIR Mental Health11, e57306 (2024)

Adhikary, P. K.et al.Exploring the efficacy of large language models in summarizing mental health counseling sessions: Benchmark study.JMIR Mental Health11, e57306 (2024). URL https://doi.org/10.2196/57306

work page doi:10.2196/57306 2024

[48] [48]

URLhttps://doi.org/10.2196/58418

So, J.-h.et al.Aligning large language models for enhancing psychiatric interviews through symptom delineation and summarization: Pilot study.JMIR Formative Research8, e58418 (2024). URLhttps://doi.org/10.2196/58418. 19

work page doi:10.2196/58418 2024

[49] [49]

International statistical classification of diseases and re- lated health problems (ICD).https://www.who.int/standards/classifications/ classification-of-diseases(2026)

World Health Organization. International statistical classification of diseases and re- lated health problems (ICD).https://www.who.int/standards/classifications/ classification-of-diseases(2026). URLhttps://www.who.int/standards/ classifications/classification-of-diseases. Accessed: 2026-03-02

2026

[50] [50]

Intended use: SNOMED CT editorial guide (2026)

SNOMED International. Intended use: SNOMED CT editorial guide (2026). URL https://docs.snomed.org/snomed-ct-specifications/snomed-ct-editorial-guide/ readme/snomed-ct-introduction/intended-use. Accessed: 2026-03-02

2026

[51] [51]

National guidelines for a behav- ioral health coordinated system of crisis care (2025)

Substance Abuse and Mental Health Services Administration. National guidelines for a behav- ioral health coordinated system of crisis care (2025). URLhttps://library.samhsa.gov/ sites/default/files/national-guidelines-crisis-care-pep24-01-037.pdf. Accessed: 2026-03-02

2025

[52] [52]

& Haug, P

Meystre, S. & Haug, P. J. Automation of a problem list using natural language processing. BMC Medical Informatics and Decision Making5, 30 (2005). URLhttps://doi.org/10. 1186/1472-6947-5-30

2005

[53] [53]

& Gevaert, O

Zhan, X., Humbert-Droz, M., Mukherjee, P. & Gevaert, O. Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovas- cular diseases.Patterns2, 100289 (2021). URLhttps://doi.org/10.1016/j.patter.2021. 100289

work page doi:10.1016/j.patter.2021 2021

[54] [54]

URLhttps://doi.org/10.1001/ jamanetworkopen.2025.53174

Nguyen, D.et al.Performance of an intelligent messaging tool for clinical communi- cations.JAMA Network Open9, e2553174 (2026). URLhttps://doi.org/10.1001/ jamanetworkopen.2025.53174

work page arXiv 2026

[55] [55]

Automatica49(11), 3222–3233 (2013)

Sulieman, L.et al.Classifying patient portal messages using convolutional neural networks. Journal of Biomedical Informatics74, 59–70 (2017). URLhttps://doi.org/10.1016/j. jbi.2017.08.014

work page doi:10.1016/j 2017

[56] [56]

M., Fabbri, D., Denny, J

Cronin, R. M., Fabbri, D., Denny, J. C., Rosenbloom, S. T. & Jackson, G. P. A com- parison of rule-based and machine learning approaches for classifying patient portal mes- sages.International Journal of Medical Informatics105, 110–120 (2017). URLhttps: //doi.org/10.1016/j.ijmedinf.2017.06.004. 20

work page doi:10.1016/j.ijmedinf.2017.06.004 2017