Recognition: unknown
Toward Zero-Egress Psychiatric AI: On-Device LLM Deployment for Privacy-Preserving Mental Health Decision Support
Pith reviewed 2026-05-10 05:15 UTC · model grok-4.3
The pith
A mobile app runs fine-tuned LLMs entirely locally to deliver psychiatric assessments without any patient data leaving the device.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The work presents a zero-egress cross-platform mobile application that integrates three lightweight, fine-tuned, and quantized open-source LLMs coordinated by an on-device orchestration layer to perform ensemble inference and consensus-based diagnostic reasoning, yielding DSM-5-aligned assessments for differential diagnosis and symptom mapping with accuracy comparable to the server-side version and real-time latency on commodity hardware.
What carries the argument
An on-device orchestration layer that coordinates ensemble inference and consensus-based diagnostic reasoning among three quantized LLMs.
If this is right
- Clinicians gain local access to differential diagnosis support and evidence-linked symptom mapping without data transmission.
- Patients can use self-screening features with built-in safeguards while data remains on-device.
- The platform becomes usable in operational environments that prohibit any external data flow.
- Real-time performance is maintained on standard mobile hardware rather than requiring specialized servers.
Where Pith is reading between the lines
- The design could extend to other privacy-sensitive medical domains by swapping the diagnostic focus while retaining the local orchestration layer.
- Long-term, repeated local use might allow the models to adapt to individual users through on-device updates without cloud involvement.
- Testing the system on diverse populations would reveal whether quantization introduces biases in specific demographic groups.
Load-bearing premise
The quantized and fine-tuned models retain enough diagnostic fidelity for DSM-5 assessments to match the accuracy of their full server-side versions.
What would settle it
A direct comparison of diagnostic outputs from the on-device system versus the server version on the same set of clinical cases, measuring agreement rates and specific error types.
Figures
read the original abstract
Privacy represents one of the most critical yet underaddressed barriers to AI adoption in mental healthcare -- particularly in high-sensitivity operational environments such as military, correctional, and remote healthcare settings, where the risk of patient data exposure can deter help-seeking behavior entirely. Existing AI-enabled psychiatric decision support systems predominantly rely on cloud-based inference pipelines, requiring sensitive patient data to leave the device and traverse external servers, creating unacceptable privacy and security risks in these contexts. In this paper, we propose a zero-egress, on-device AI platform for privacy-preserving psychiatric decision support, deployed as a cross-platform mobile application. The proposed system extends our prior work on fine-tuned LLM consortiums for psychiatric diagnosis standardization by fundamentally re-architecting the inference pipeline for fully local execution -- ensuring that no patient data is transmitted to, processed by, or stored on any external server at any stage. The platform integrates a consortium of three lightweight, fine-tuned, and quantized open-source LLMs -- Gemma, Phi-3.5-mini, and Qwen2 -- selected for their compact architectures and proven efficiency on resource-constrained mobile hardware. An on-device orchestration layer coordinates ensemble inference and consensus-based diagnostic reasoning, producing DSM-5-aligned assessments for conditions. The platform is designed to assist clinicians with differential diagnosis and evidence-linked symptom mapping, as well as to support patient-facing self-screening with appropriate clinical safeguards. Initial evaluation demonstrates that the proposed zero-egress deployment achieves diagnostic accuracy comparable to its server-side predecessor while sustaining real-time inference latency on commodity mobile hardware.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a zero-egress on-device AI platform for privacy-preserving psychiatric decision support, implemented as a cross-platform mobile application. It extends prior work on fine-tuned LLM consortiums by deploying an ensemble of three quantized lightweight models (Gemma, Phi-3.5-mini, Qwen2) with an on-device orchestration layer for consensus-based, DSM-5-aligned diagnostic reasoning. The system targets differential diagnosis assistance and patient self-screening in sensitive settings (military, correctional, remote care) while ensuring no patient data leaves the device. The abstract asserts that initial evaluation shows diagnostic accuracy comparable to the server-side predecessor alongside real-time inference latency on commodity mobile hardware.
Significance. If the accuracy and latency claims are substantiated, the work would offer a concrete technical path to address a major adoption barrier for AI in mental healthcare: the privacy risk of data egress in high-stakes environments. Demonstrating a practical, fully local ensemble deployment using open-source models could enable safer clinician tools and self-screening applications without external servers, potentially increasing help-seeking behavior. The emphasis on consensus reasoning and clinical safeguards adds operational relevance beyond pure model compression.
major comments (2)
- [Abstract] Abstract: The central claim that 'initial evaluation demonstrates that the proposed zero-egress deployment achieves diagnostic accuracy comparable to its server-side predecessor' is presented without any quantitative metrics (accuracy, F1, Cohen's kappa), dataset description (size, conditions, ground-truth source), evaluation protocol, baselines, or error analysis. This absence leaves the primary performance assertion unsupported and prevents assessment of whether quantization and mobile constraints degrade DSM-5 diagnostic fidelity.
- [Abstract] Evaluation/Results section (implied by abstract claim): No details are provided on how the on-device ensemble was tested against the server-side predecessor, including any ablation on quantization effects, inter-rater agreement with clinicians, or statistical significance of the 'comparable' result. Without these, the claim that the architecture preserves diagnostic quality cannot be evaluated and is load-bearing for the paper's contribution.
minor comments (2)
- [Architecture] The description of the orchestration layer and consensus mechanism would benefit from a high-level diagram or pseudocode to clarify how the three models coordinate DSM-5 symptom mapping and differential diagnosis.
- [Introduction] Explicit citation to the prior server-side LLM consortium paper should be added in the introduction to clearly delineate the novel on-device re-architecture from the earlier fine-tuning work.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments correctly identify that the abstract's performance claim requires quantitative substantiation to be evaluable. We will revise the manuscript to add a dedicated Evaluation section with metrics, dataset details, ablations, and analysis, while updating the abstract accordingly. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'initial evaluation demonstrates that the proposed zero-egress deployment achieves diagnostic accuracy comparable to its server-side predecessor' is presented without any quantitative metrics (accuracy, F1, Cohen's kappa), dataset description (size, conditions, ground-truth source), evaluation protocol, baselines, or error analysis. This absence leaves the primary performance assertion unsupported and prevents assessment of whether quantization and mobile constraints degrade DSM-5 diagnostic fidelity.
Authors: We agree that the abstract claim is currently unsupported without supporting numbers and context. In revision we will expand the abstract to report key quantitative results (e.g., accuracy, F1, Cohen's kappa) and will add a new Evaluation section that fully describes the test dataset (size, conditions, ground-truth source), protocol, baselines, and error analysis so readers can assess any impact of quantization and on-device constraints. revision: yes
-
Referee: [Abstract] Evaluation/Results section (implied by abstract claim): No details are provided on how the on-device ensemble was tested against the server-side predecessor, including any ablation on quantization effects, inter-rater agreement with clinicians, or statistical significance of the 'comparable' result. Without these, the claim that the architecture preserves diagnostic quality cannot be evaluated and is load-bearing for the paper's contribution.
Authors: We accept this assessment. The current manuscript presents only a high-level claim. We will add a full Results/Evaluation section containing ablation studies on quantization, inter-rater agreement (Cohen's kappa) with clinicians and the server-side model, statistical significance tests, and error analysis. These additions will directly substantiate the comparability claim and allow evaluation of diagnostic fidelity preservation. revision: yes
Circularity Check
No significant circularity; architecture and claims are self-contained
full rationale
The paper presents an engineering description of an on-device LLM ensemble for psychiatric decision support, extending prior fine-tuned models via re-architecting for local execution. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided text. The accuracy comparability claim is asserted from 'initial evaluation' without reducing by construction to the inputs or prior work; it is an evidentiary assertion rather than a self-referential loop. Self-citation of the authors' earlier LLM consortium work is present but does not bear load on any derivation chain, as the deployment pipeline stands independently. This matches the default expectation of no circularity for descriptive systems papers.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Lightweight open-source LLMs can be fine-tuned to produce DSM-5-aligned psychiatric assessments
- domain assumption Quantization preserves sufficient diagnostic accuracy for clinical decision support
Forward citations
Cited by 1 Pith paper
-
Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction
Small open-weight language models can self-optimize prompts for clinical named entity recognition in dental notes, reaching micro F1 of 0.864 after DPO on Qwen2.5-14B.
Reference graph
Works this paper leans on
-
[1]
World Health Organization, World mental health report: Transforming mental health for all, Tech. rep., WHO, Geneva, Switzerland (2022). URLhttps://www.who.int/publications/i/item/9789240049338
-
[2]
S. Saxena, G. Thornicroft, M. Knapp, H. Whiteford, Resources for men- tal health: scarcity, inequity, and inefficiency, The Lancet 370 (9590) (2007) 878–889. doi:10.1016/S0140-6736(07)61239-2
-
[3]
G. Thornicroft, et al., Undertreatment of people with major depressive disorder in 21 countries, The British Journal of Psychiatry 210 (2) (2016) 119–124. doi:10.1192/bjp.bp.116.188078
-
[4]
C. W. Hoge, et al., Combat duty in Iraq and Afghanistan, mental health problems, and barriers to care, New England Journal of Medicine 351 (1) (2004) 13–22. doi:10.1056/NEJMoa040603
-
[5]
P. Y. Kim, et al., Stigma, barriers to care, and use of mental health ser- vices among active duty and National Guard soldiers after combat, Psy- chiatric Services 62 (1) (2011) 27–34. doi:10.1176/ps.62.1.pss6201 0027
-
[6]
T. Greene, et al., Stigma and barriers to mental health treatment in the military, Military Medicine 175 (2) (2010) 86–91. doi:10.7205/MILMED- D-09-00120
-
[7]
Z. Guo, et al., Automated depression detection using deep learning and natural language processing, ACM Transactions on Computing for Healthcare 1 (3) (2020) 1–19. doi:10.1145/3372168
-
[8]
M. Shim, et al., Machine learning-based diagnostic models for psychi- atric disorders: a systematic review, Journal of Psychiatric Research 133 (2021) 1–12. doi:10.1016/j.jpsychires.2020.12.019
-
[10]
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team, Google DeepMind, Gemma: Open models based on Gem- ini research and technology, arXiv preprint arXiv:2403.08295 (2024). URLhttps://arxiv.org/abs/2403.08295
work page internal anchor Pith review arXiv 2024
-
[11]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
M. Abdin, et al., Phi-3 technical report: A highly capable language model locally on your phone, arXiv preprint arXiv:2404.14219 (2024). URLhttps://arxiv.org/abs/2404.14219
work page internal anchor Pith review arXiv 2024
-
[12]
A. Yang, et al., Qwen2 technical report, arXiv preprint arXiv:2407.10671 (2024). URLhttps://arxiv.org/abs/2407.10671
work page internal anchor Pith review arXiv 2024
-
[13]
QLoRA: Efficient Finetuning of Quantized LLMs
T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, QLoRA: Effi- cient finetuning of quantized LLMs, in: Advances in Neural Information Processing Systems (NeurIPS), 2023. URLhttps://arxiv.org/abs/2305.14314
work page internal anchor Pith review arXiv 2023
-
[14]
M. Xu, et al., A survey of resource-efficient LLM and multimodal foun- dation models, arXiv preprint arXiv:2401.08092 (2024). URLhttps://arxiv.org/abs/2401.08092
-
[15]
S. Laskaridis, et al., MELTing point: Mobile evaluation of language transformers, arXiv preprint arXiv:2403.12844 (2024). URLhttps://arxiv.org/abs/2403.12844
-
[16]
American Psychiatric Association, Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), American Psychiatric Pub- lishing, Arlington, VA, 2013. doi:10.1176/appi.books.9780890425596
-
[17]
rep., WHO, Geneva, Switzerland (2019)
World Health Organization, International classification of diseases, 11th revision (ICD-11), Tech. rep., WHO, Geneva, Switzerland (2019). URLhttps://icd.who.int
2019
-
[18]
D. A. Regier, et al., Dsm-5 field trials in the United States and Canada, part II: Test-retest reliability of selected categorical di- agnoses, American Journal of Psychiatry 170 (1) (2013) 59–70. doi:10.1176/appi.ajp.2012.12070999
-
[19]
R. Freedman, et al., The initial field trials of DSM-5: new blooms and old thorns, American Journal of Psychiatry 170 (1) (2013) 1–5. doi:10.1176/appi.ajp.2012.12091189. 42
-
[20]
K. S. Kendler, An historical framework for psychiatric nosology, Psychological Medicine 39 (12) (2009) 1935–1941. doi:10.1017/S0033291709005753
-
[21]
R. M. A. Hirschfeld, et al., Perceptions and impact of bipolar disorder: how far have we really come? results of the national depressive and manic-depressive association 2000 survey, Journal of Clinical Psychiatry 64 (2) (2003) 161–174. doi:10.4088/JCP.v64n0209
-
[22]
E. Bandara, R. Gore, A. Yarlagadda, A. H. Clayton, P. Samuel, C. K. Rhea, S. Shetty, Standardization of psychiatric diagnoses–role of fine- tuned llm consortium and openai-gpt-oss reasoning llm enabled decision support system, arXiv preprint arXiv:2510.25588 (2025)
-
[23]
A. Vaswani, et al., Attention is all you need, Advances in Neural Infor- mation Processing Systems (NeurIPS) 30 (2017). URLhttps://arxiv.org/abs/1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
T. B. Brown, et al., Language models are few-shot learners, Advances in Neural Information Processing Systems (NeurIPS) 33 (2020) 1877–1901. URLhttps://arxiv.org/abs/2005.14165
work page internal anchor Pith review arXiv 2020
-
[25]
R. Gore, E. Bandara, S. Shetty, A. E. Musto, P. Rana, A. Valencia- Romero, C. Rhea, L. Tayebi, H. Richter, A. Yarlagadda, et al., Proof- of-tbi–fine-tuned vision language model consortium and openai-o3 rea- soning llm-based medical diagnosis support system for mild traumatic brain injury (tbi) prediction, arXiv preprint arXiv:2504.18671 (2025)
-
[26]
characterization of time-variant and time-invariant assessment of suicidality on reddit using C-SSRS
M. Gaur, et al., “characterization of time-variant and time-invariant assessment of suicidality on reddit using C-SSRS”, PLOS ONE 16 (5) (2021) e0250448. doi:10.1371/journal.pone.0250448
-
[27]
N. Flemotomos, et al., Automated quality assessment of cogni- tive behavioral therapy sessions through extracting psycholinguis- tic features, in: Proceedings of Interspeech, 2021, pp. 4251–4255. doi:10.21437/Interspeech.2021-357
-
[28]
I. Y. Chen, et al., Ethical machine learning in healthcare, Annual Review of Biomedical Data Science 4 (2021) 123–144. doi:10.1146/annurev- biodatasci-092820-114757. 43
-
[29]
Bandara, A
E. Bandara, A. Hass, S. Shetty, R. Mukkamala, R. Gore, A. Rahman, S. H. Bouk, Deep-stride: Automated security threat modeling with vision-language models, in: 2025 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 2025, pp. 1– 7
2025
-
[30]
URLhttps://github.com/ggerganov/ggml
GGML Contributors, GGUF: GPT-generated unified format (2023). URLhttps://github.com/ggerganov/ggml
2023
-
[31]
Gerganov, llama.cpp: LLM inference in C/C++ (2023)
G. Gerganov, llama.cpp: LLM inference in C/C++ (2023). URLhttps://github.com/ggerganov/llama.cpp
2023
-
[32]
URLhttps://github.com/mlc-ai/mlc-llm
MLC Team, MLC LLM: Universal LLM deployment engine (2023). URLhttps://github.com/mlc-ai/mlc-llm
2023
-
[33]
United States Congress, Health insurance portability and accountability act of 1996 (HIPAA), Federal Legislation Public Law 104-191, United States Department of Health and Human Services, Washington, DC (1996)
1996
-
[34]
rep., Official Journal of the European Union (2016)
European Parliament and Council, General data protection regulation (GDPR), regulation (eu) 2016/679, Tech. rep., Official Journal of the European Union (2016). URLhttps://gdpr-info.eu
2016
-
[35]
Department of Defense, DoD instruction 8582.01: Privacy in the DoD (2012)
U.S. Department of Defense, DoD instruction 8582.01: Privacy in the DoD (2012). URLhttps://www.esd.whs.mil/DD/
2012
-
[36]
General Services Administration, FedRAMP: Federal risk and au- thorization management program (2011)
U.S. General Services Administration, FedRAMP: Federal risk and au- thorization management program (2011). URLhttps://www.fedramp.gov
2011
-
[37]
B. Blobel, et al., Trustworthy, secure and privacy-protecting electronic health record systems, Methods of Information in Medicine 57 (2018) e47–e57. doi:10.3414/ME17-01-0048
-
[38]
P. S. Appelbaum, Privacy in psychiatric treatment: threats and re- sponses, American Journal of Psychiatry 159 (11) (2015) 1809–1818. doi:10.1176/appi.ajp.159.11.1809. 44
-
[39]
N. Rieke, et al., The future of digital health with federated learning, npj Digital Medicine 3 (1) (2020) 119. doi:10.1038/s41746-020-00323-1
-
[40]
J. C. Duchi, M. I. Jordan, M. J. Wainwright, Local privacy and sta- tistical minimax rates, IEEE Symposium on Foundations of Computer Science (FOCS) (2013) 429–438doi:10.1109/FOCS.2013.53
-
[41]
E. Bandara, A. Hass, R. Gore, S. Shetty, R. Mukkamala, S. H. Bouk, X. Liang, N. W. Keong, K. De Zoysa, A. Withanage, et al., Astride: A security threat modeling platform for agentic-ai applications, arXiv preprint arXiv:2512.04785 (2025)
-
[42]
URLhttps://github.com/unslothai/unsloth
Unsloth Contributors, Unsloth: Fast and memory-efficient LLM fine- tuning (2024). URLhttps://github.com/unslothai/unsloth
2024
-
[43]
D. B. Acharya, K. Kuppan, B. Divya, Agentic ai: Autonomous intelli- gence for complex goals–a comprehensive survey, IEEE Access (2025)
2025
-
[44]
E. Bandara, R. Gore, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman, X. Liang, S. H. Bouk, A. Hass, S. Rajapakse, et al., A practical guide for designing, developing, and deploying production-grade agentic ai work- flows, arXiv preprint arXiv:2512.08769 (2025)
-
[45]
Survey on Evaluation of LLM-based Agents
A. Yehudai, L. Eden, A. Li, G. Uziel, Y. Zhao, R. Bar-Haim, A. Cohan, M. Shmueli-Scheuer, Survey on evaluation of llm-based agents, arXiv preprint arXiv:2503.16416 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
Agentsway–software development methodology for ai agents- based teams,
E. Bandara, R. Gore, X. Liang, S. Rajapakse, I. Kularathne, P. Karunarathna, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman, et al., Agentsway–software development methodology for ai agents-based teams, arXiv preprint arXiv:2510.23664 (2025)
-
[47]
Towards responsi- ble and explainable ai agents with consensus-driven reasoning,
E. Bandara, T. Hewa, R. Gore, S. Shetty, R. Mukkamala, P. Foytik, A. Rahman, S. H. Bouk, X. Liang, A. Hass, et al., Towards respon- sible and explainable ai agents with consensus-driven reasoning, arXiv preprint arXiv:2512.21699 (2025)
-
[48]
E. Bandara, R. Gore, S. Shetty, S. Rajapakse, I. Kularathna, P. Karunarathna, R. Mukkamala, P. Foytik, S. H. Bouk, A. Rahman, 45 et al., A practical guide to agentic ai transition in organizations, arXiv preprint arXiv:2602.10122 (2026)
-
[49]
doi: 10.1046/j.1525-1497.2001.016009606.x
K. Kroenke, R. L. Spitzer, J. B. W. Williams, The PHQ-9: Validity of a brief depression severity measure, Journal of General Internal Medicine 16 (9) (2001) 606–613. doi:10.1046/j.1525-1497.2001.016009606.x
-
[50]
F. W. Weathers, et al., PTSD checklist for DSM-5 (PCL-5), Tech. rep., National Center for PTSD (2013). URLhttps://www.ptsd.va.gov/professional/assessment/ adult-sr/ptsd-checklist.asp
2013
-
[51]
URLhttps://developer.arm.com/ip-products/security-ip/ trustzone
ARM Ltd., ARM trustzone technology (2023). URLhttps://developer.arm.com/ip-products/security-ip/ trustzone
2023
-
[52]
URLhttps://support.apple.com/guide/security/ secure-enclave-sec59b0b31ff/web
Apple Inc., Apple platform security: Secure enclave (2023). URLhttps://support.apple.com/guide/security/ secure-enclave-sec59b0b31ff/web
2023
-
[53]
Bandara, mental-reasoning: A psychiatric diagnostic conversational dataset for DSM-5 aligned LLM fine-tuning (2025)
E. Bandara, mental-reasoning: A psychiatric diagnostic conversational dataset for DSM-5 aligned LLM fine-tuning (2025). URLhttps://huggingface.co/datasets/lambdaeranga/ mental-reasoning
2025
-
[54]
E. J. Hu, et al., LoRA: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685 (2021). URLhttps://arxiv.org/abs/2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[55]
E. Bandara, R. Gore, S. Shetty, R. Mukkamala, C. Rhea, A. Yarlagadda, S. Kaushik, L. De Silva, A. Maznychenko, I. Sokolowska, et al., Stan- dardization of neuromuscular reflex analysis–role of fine-tuned vision- language model consortium and openai gpt-oss reasoning llm enabled decision support system, arXiv preprint arXiv:2508.12473 (2025)
-
[56]
R. L. Spitzer, K. Kroenke, J. B. W. Williams, B. L¨ owe, A brief measure for assessing generalized anxiety disorder: the GAD- 7, Archives of Internal Medicine 166 (10) (2006) 1092–1097. doi:10.1001/archinte.166.10.1092. 46
-
[57]
R. M. A. Hirschfeld, et al., Development and validation of a screen- ing instrument for bipolar spectrum disorder: the mood disorder ques- tionnaire, American Journal of Psychiatry 157 (11) (2000) 1873–1875. doi:10.1176/appi.ajp.157.11.1873
-
[58]
S. R. Kay, A. Fiszbein, L. A. Opler, The positive and negative syndrome scale (PANSS) for schizophrenia, Schizophrenia Bulletin 13 (2) (1987) 261–276. doi:10.1093/schbul/13.2.261
-
[59]
E. J. Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again, Basic Books, New York, NY, 2019
2019
-
[60]
URLhttps://www.sprc.org/resources-programs/ safe-messaging-guidelines
Suicide Prevention Resource Center, Safe messaging guidelines for suicide and mental health (2022). URLhttps://www.sprc.org/resources-programs/ safe-messaging-guidelines
2022
-
[61]
Y. Kim, et al., Promises and pitfalls of large language models in psychi- atric diagnosis and knowledge tasks, The British Journal of Psychiatry (2024). doi:10.1192/bjp.2024.83
-
[62]
K. Singhal, et al., Large language models encode clinical knowledge, Nature 620 (2023) 172–180. doi:10.1038/s41586-023-06291-2
- [63]
-
[64]
URLhttps://arxiv.org/abs/2509.25992
others, MHINDR: A DSM-5 based mental health diagnosis and rec- ommendation framework using LLM, arXiv preprint arXiv:2509.25992 (2025). URLhttps://arxiv.org/abs/2509.25992
-
[65]
O. Golan, et al., LLM questionnaire completion for auto- matic psychiatric assessment, in: Findings of EMNLP, 2024. doi:10.18653/v1/2024.findings-emnlp.23
-
[66]
URLhttps://arxiv.org/abs/2508.11398
others, Trustworthy AI psychotherapy: Multi-agent LLM workflow for counseling and explainable mental disorder diagnosis, arXiv preprint 47 arXiv:2508.11398 (2025). URLhttps://arxiv.org/abs/2508.11398
- [67]
-
[68]
URLhttps://arxiv.org/abs/2509.14275
others, FedMentor: Domain-aware differential privacy for heteroge- neous federated LLMs in mental health, arXiv preprint arXiv:2509.14275 (2025). URLhttps://arxiv.org/abs/2509.14275
-
[70]
S. Pati, et al., Privacy preservation for federated learning in health care, Patterns 5 (7) (2024). doi:10.1016/j.patter.2024.100974
-
[71]
URLhttps://arxiv.org/abs/2504.00002
others, Are we there yet? a measurement study of efficiency for LLM applications on mobile devices, arXiv preprint arXiv:2504.00002 (2025). URLhttps://arxiv.org/abs/2504.00002
-
[72]
B. Yang, et al., DRHouse: An LLM-empowered diagnostic reasoning system through harnessing outcomes from sensor data and expert knowl- edge, in: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 8, 2024, pp. 1–29. doi:10.1145/3699771
-
[73]
others, Systematic review of large language models in mental health care, JMIR Mental Health 12 (2025) e78410. doi:10.2196/78410
-
[74]
others, The evolving field of digital mental health: current evi- dence and implementation issues for smartphone apps, generative artificial intelligence, and virtual reality, World Psychiatry (2025). doi:10.1002/wps.21307. 48
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.