pith. machine review for the scientific record. sign in

arxiv: 2604.01533 · v2 · submitted 2026-04-02 · 📡 eess.AS

Recognition: no theorem link

Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation

Alessandro Vinciarelli, Anna Esposito, Dongwei Li, Fuxiang Tao, Shuning Tang, Wei Ma, Xuri Ge

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:18 UTC · model grok-4.3

classification 📡 eess.AS
keywords speech-based depression detectioncross-linguistic validationEEG neurophysiological markersemotional arousalCDMA frameworkcomputational mental health modelstheta alpha oscillations
0
0 comments X

The pith

A speech model detects depression across languages and its scores align with EEG patterns of emotional processing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a speech-based depression detector trained on Italian data still works when applied to Chinese Mandarin speakers. It fuses read and spontaneous speech recorded under positive, neutral, and negative emotional conditions and shows that emotionally charged speech yields higher detection accuracy than neutral speech. The same model outputs are then compared with EEG recordings taken while participants view emotional faces, revealing correlations in the theta and alpha frequency bands. If these links hold, the approach supplies both cross-cultural robustness and direct biological grounding for an otherwise purely computational marker.

Core claim

The CDMA framework, previously validated on Italian, reaches an F1-score of 89.6 percent on a new Chinese dataset with simultaneous EEG. Emotionally valenced speech outperforms neutral speech, while positive and negative conditions perform comparably, supporting the claim that arousal rather than valence polarity drives the signal. Model-derived depression estimates correlate with theta and alpha oscillatory activity during emotional face processing, matching established neural signatures of emotional dysregulation in depression.

What carries the argument

Cross-Data Multilevel Attention (CDMA) framework that fuses read and spontaneous speech across emotional valences and is validated by direct correlation of its outputs with EEG theta and alpha band power during emotional face viewing.

If this is right

  • The framework maintains comparable accuracy when moved from Italian to Chinese speakers.
  • Emotionally valenced speech improves detection performance over neutral speech.
  • Equal performance on positive and negative tasks supports the emotional arousal hypothesis over valence polarity.
  • Alignment with theta and alpha EEG markers supplies neurophysiological validation for the computational model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Validated models of this type could support language-independent screening tools in diverse clinical populations.
  • The same correlation method could be applied to test computational markers for other conditions that involve emotional dysregulation.
  • Combining speech features with EEG could improve real-time monitoring once hardware becomes more portable.

Load-bearing premise

That correlations between the speech model outputs and EEG oscillations during emotional face processing specifically validate the depression-detection mechanism rather than reflecting general emotional processing differences.

What would settle it

Absence of significant correlation between the model's depression estimates and theta or alpha band oscillations in an independent replication that uses the same emotional face processing EEG task.

Figures

Figures reproduced from arXiv: 2604.01533 by Alessandro Vinciarelli, Anna Esposito, Dongwei Li, Fuxiang Tao, Shuning Tang, Wei Ma, Xuri Ge.

Figure 1
Figure 1. Figure 1: The figure illustrates the proposed framework. Purple arrow shows read speech processing; green, yellow, and red arrows show positive, neutral, and [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Time-frequency representations for the fearful facial expression [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Time-frequency representations for the sad facial expression condi [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Spearman correlations between model-derived depression logits and [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Speech-based depression detection has shown promise as an objective diagnostic tool, yet the cross-linguistic robustness of acoustic markers and their neurobiological underpinnings remain underexplored. This study extends Cross-Data Multilevel Attention (CDMA) framework, initially validated on Italian, to investigate these dimensions using a Chinese Mandarin dataset with Electroencephalography (EEG) recordings. We systematically fuse read speech with spontaneous speech across different emotional valences (positive, neutral, negative) to investigate whether emotional arousal is a more critical factor than valence polarity in enhancing detection performance in speech. Additionally, we establish the first neurophysiological validation for a speech-based depression model by correlating its predictions with neural oscillatory patterns during emotional face processing. Our results demonstrate strong cross-linguistic generalizability of the CDMA framework, achieving state-of-the-art performance (F1-score up to 89.6%) on the Chinese dataset, which is comparable to the previous Italian validation. Critically, emotionally valenced speech (both positive and negative) significantly outperformed neutral speech. This comparable performance between positive and negative tasks supports the emotional arousal hypothesis. Most importantly, EEG analysis revealed significant correlations between the model's speech-derived depression estimates and neural oscillatory patterns (theta and alpha bands), demonstrating alignment with established neural markers of emotional dysregulation in depression. This alignment, combined with the model's cross-linguistic robustness, not only supports that the CDMA framework's approach is a universally applicable and neurobiologically validated strategy but also establishes a novel paradigm for the neurophysiological validation of computational mental health models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript extends the Cross-Data Multilevel Attention (CDMA) framework, previously validated on Italian speech, to a Chinese Mandarin dataset that includes simultaneous EEG recordings. It reports state-of-the-art F1 scores up to 89.6% for depression detection, finds that emotionally valenced (positive or negative) speech outperforms neutral speech, interprets this as support for an emotional-arousal hypothesis, and claims the first neurophysiological validation of a speech-based depression model via significant correlations between the model's predictions and theta/alpha oscillations recorded during emotional face processing.

Significance. If the reported performance generalizes and the EEG correlations survive appropriate controls, the work would strengthen evidence for cross-linguistic acoustic markers of depression and introduce a concrete method for anchoring computational mental-health models in neural data; the cross-linguistic replication itself is a clear strength.

major comments (2)
  1. [Abstract / Results] Abstract and Results sections: the abstract states 'significant correlations' and 'SOTA F1-score up to 89.6%' yet supplies no sample sizes, exact statistical tests, degrees of freedom, p-values, or effect sizes; these omissions are load-bearing because the central claims of cross-linguistic robustness and neurophysiological validation cannot be evaluated without them.
  2. [Neurophysiological validation] Neurophysiological validation paragraph: the assertion that correlations between CDMA depression estimates and theta/alpha power during emotional face processing constitute validation of the speech-based detection mechanism (rather than shared sensitivity to general emotional dysregulation) requires explicit controls such as neutral-face baselines, non-emotional tasks, or a direct test that correlation strength differs between depressed and control groups; none of these are described, undermining the specificity of the 'neurobiologically validated' claim.
minor comments (1)
  1. [Abstract] Abstract: the sentence claiming the work 'establishes a novel paradigm' should be qualified until the validation procedure is shown to be replicable and specific.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us strengthen the manuscript. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results sections: the abstract states 'significant correlations' and 'SOTA F1-score up to 89.6%' yet supplies no sample sizes, exact statistical tests, degrees of freedom, p-values, or effect sizes; these omissions are load-bearing because the central claims of cross-linguistic robustness and neurophysiological validation cannot be evaluated without them.

    Authors: We agree that these statistical details are essential for evaluating the claims. Although the Methods and Results sections report the sample sizes (N for the Chinese cohort), cross-validation procedure, and correlation tests, we acknowledge the abstract was overly concise. In the revised manuscript we have expanded the abstract to include the sample size, the exact tests (Pearson correlation for EEG links and 5-fold CV for F1), degrees of freedom, p-values, and effect sizes. The Results section has been updated with the same details for full transparency. revision: yes

  2. Referee: [Neurophysiological validation] Neurophysiological validation paragraph: the assertion that correlations between CDMA depression estimates and theta/alpha power during emotional face processing constitute validation of the speech-based detection mechanism (rather than shared sensitivity to general emotional dysregulation) requires explicit controls such as neutral-face baselines, non-emotional tasks, or a direct test that correlation strength differs between depressed and control groups; none of these are described, undermining the specificity of the 'neurobiologically validated' claim.

    Authors: We appreciate the referee’s point on specificity. The emotional-face task was selected because theta/alpha changes during emotional processing are established markers of dysregulation in depression, and the model’s speech-derived scores correlate with these patterns. However, we agree that without explicit controls the interpretation remains open to the alternative of general emotional sensitivity. In the revision we have (1) rephrased the claim to “preliminary neurophysiological alignment” rather than full validation, (2) added a dedicated paragraph in the Discussion that explicitly acknowledges the absence of neutral-face baselines and group-difference tests, and (3) outlined these as necessary directions for future work. This change accurately reflects the current evidence while addressing the concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports empirical results from applying the CDMA framework to a new Chinese Mandarin speech dataset and correlating model outputs with EEG band power during an emotional face task. These steps rely on independent data collection, model training, and statistical correlation rather than any derivation that reduces a claimed prediction to a fitted parameter or self-referential definition. Prior Italian validation is cited as background but does not serve as the load-bearing justification for the new cross-linguistic or neurophysiological claims; the current findings stand on fresh measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; model internals, training hyperparameters, and exact EEG preprocessing steps are not described, so free parameters and specific axioms cannot be enumerated from available text.

pith-pipeline@v0.9.0 · 5597 in / 1051 out tokens · 28256 ms · 2026-05-13T21:18:42.705090+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

  1. [1]

    Depression,

    World Health Organization, “Depression,” https://www.who.int/ news-room/fact-sheets/detail/depression, 2023, accessed: 2025-06-11

  2. [2]

    Self-report bias and underreporting of depression on the bdi-ii,

    M. Hunt, J. Auriemma, and A. C. Cashaw, “Self-report bias and underreporting of depression on the bdi-ii,”Journal of personality assessment, vol. 80, no. 1, pp. 26–30, 2003

  3. [3]

    What is the impact of mental health-related stigma on help-seeking? a systematic review of quantitative and qualitative studies,

    S. Clement, O. Schauman, T. Graham, F. Maggioni, S. Evans-Lacko, N. Bezborodovs, C. Morgan, N. R ¨usch, J. S. Brown, and G. Thornicroft, “What is the impact of mental health-related stigma on help-seeking? a systematic review of quantitative and qualitative studies,”Psychological medicine, vol. 45, no. 1, pp. 11–27, 2015

  4. [4]

    Clinical diagnosis of depression in primary care: a meta-analysis,

    A. J. Mitchell, A. Vaze, and S. Rao, “Clinical diagnosis of depression in primary care: a meta-analysis,”The Lancet, vol. 374, no. 9690, pp. 609–619, 2009

  5. [5]

    Automated depression detection from text and audio: A systematic review,

    Y . Li, S. Kumbale, Y . Chen, T. Surana, E. S. Chng, and C. Guan, “Automated depression detection from text and audio: A systematic review,”IEEE Journal of Biomedical and Health Informatics, 2025

  6. [6]

    Speech-based depression assessment: A comprehensive survey,

    S. S. Leal, S. Ntalampiras, and R. Sassi, “Speech-based depression assessment: A comprehensive survey,”IEEE Transactions on Affective Computing, 2024

  7. [7]

    Wavdepressionnet: Automatic depression level prediction via raw speech signals,

    M. Niu, J. Tao, Y . Li, Y . Qin, and Y . Li, “Wavdepressionnet: Automatic depression level prediction via raw speech signals,”IEEE Transactions on Affective Computing, vol. 15, no. 1, pp. 285–296, 2023

  8. [8]

    Weakly-supervised depression detection in speech through self-learning based label correction,

    Y . Sun, Y . Zhou, X. Xu, J. Qi, F. Xu, Z. Ren, and B. W. Schuller, “Weakly-supervised depression detection in speech through self-learning based label correction,”IEEE Transactions on Audio, Speech and Language Processing, 2025

  9. [9]

    Examining the fourier spec- trum of speech signal from a time-frequency perspective for automatic depression level prediction,

    M. Niu, J. Tao, Y . He, S. Zhang, and M. Li, “Examining the fourier spec- trum of speech signal from a time-frequency perspective for automatic depression level prediction,”IEEE Transactions on Affective Computing, 2025

  10. [10]

    Comparison of read and spontaneous speech in case of automatic detection of depression,

    G. Kiss and K. Vicsi, “Comparison of read and spontaneous speech in case of automatic detection of depression,” inIEEE International Con- ference on Cognitive Infocommunications, 2017, pp. 000 213–000 218

  11. [11]

    Detecting depression: a comparison between spontaneous and read speech,

    S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, and G. Parker, “Detecting depression: a comparison between spontaneous and read speech,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7547–7551

  12. [12]

    A cross-attention layer coupled with mul- timodal fusion methods for recognizing depression from spontaneous speech,

    L. Ilias and D. Askounis, “A cross-attention layer coupled with mul- timodal fusion methods for recognizing depression from spontaneous speech,” inINTERSPEECH, 2024, pp. 912–916

  13. [13]

    Cross-data multilevel attention for depression detection: Analyzing the interplay between read and spontaneous speech,

    F. Tao, X. Ge, W. Ma, A. Esposito, and A. Vinciarelli, “Cross-data multilevel attention for depression detection: Analyzing the interplay between read and spontaneous speech,” inIEEE International Confer- ence on Bioinformatics and Biomedicine, 2024, pp. 1169–1176

  14. [14]

    What you say or how you say it? depression detection through joint modeling of linguistic and acoustic aspects of speech,

    N. Aloshban, A. Esposito, and A. Vinciarelli, “What you say or how you say it? depression detection through joint modeling of linguistic and acoustic aspects of speech,”Cognitive Computation, vol. 14, no. 5, pp. 1585–1598, 2022

  15. [15]

    Thin slices of depression: Improving depression detection performance through data segmenta- tion,

    R. Alsarrani, A. Esposito, and A. Vinciarelli, “Thin slices of depression: Improving depression detection performance through data segmenta- tion,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2022, pp. 6257–6261

  16. [16]

    V ocal acoustic features as potential biomarkers for identifying/diagnosing depression: a cross- sectional study,

    Q. Zhao, H.-Z. Fan, Y .-L. Li, L. Liu, Y .-X. Wu, Y .-L. Zhao, Z.-X. Tian, Z.-R. Wang, Y .-L. Tan, and S.-P. Tan, “V ocal acoustic features as potential biomarkers for identifying/diagnosing depression: a cross- sectional study,”Frontiers in Psychiatry, vol. 13, p. 815678, 2022

  17. [17]

    Detecting subtle signs of depression with automated speech analysis in a non-clinical sample,

    A. K ¨onig, J. Tr ¨oger, E. Mallick, M. Mina, N. Linz, C. Wagnon, J. Kar- bach, C. Kuhn, and J. Peter, “Detecting subtle signs of depression with automated speech analysis in a non-clinical sample,”BMC psychiatry, vol. 22, no. 1, p. 830, 2022

  18. [18]

    Acoustic signatures of depression elicited by emotion-based and theme-based speech tasks,

    Q. Lin, X. Wu, Y . Lei, W. Cheng, S. Huang, W. Wang, C. Li, and J. Zhao, “Acoustic signatures of depression elicited by emotion-based and theme-based speech tasks,”BMJ Mental Health, vol. 28, no. 1, 2025

  19. [19]

    Enhanced multimodal depression detection with emotion prompts,

    S. Teng, J. Liu, H. Sun, S. Chai, T. Tateyama, L. Lin, and Y .-W. Chen, “Enhanced multimodal depression detection with emotion prompts,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2025, pp. 1–5

  20. [20]

    A review of depression and suicide risk assessment using speech analysis,

    N. Cummins, S. Scherer, J. Krajewski, S. Schnieder, J. Epps, and T. F. Quatieri, “A review of depression and suicide risk assessment using speech analysis,”Speech Communication, vol. 71, pp. 10–49, 2015

  21. [21]

    Depression, negative emotionality, and self-referential language: A multi-lab, multi-measure, and multi-language-task research synthesis

    A. M. Tackman, D. A. Sbarra, A. L. Carey, M. B. Donnellan, A. B. Horn, N. S. Holtzman, T. S. Edwards, J. W. Pennebaker, and M. R. Mehl, “Depression, negative emotionality, and self-referential language: A multi-lab, multi-measure, and multi-language-task research synthesis.” Journal of personality and social psychology, vol. 116, no. 5, p. 817, 2019

  22. [22]

    Multilingual markers of depression in remotely collected speech samples: A prelim- inary analysis,

    N. Cummins, J. Dineley, P. Conde, F. Matcham, S. Siddi, F. Lamers, E. Carr, G. Lavelle, D. Leightley, K. M. Whiteet al., “Multilingual markers of depression in remotely collected speech samples: A prelim- inary analysis,”Journal of affective disorders, vol. 341, pp. 128–136, 2023

  23. [23]

    Investigation of cross-lingual depression pre- diction possibilities based on speech processing,

    G. Kiss and K. Vicsi, “Investigation of cross-lingual depression pre- diction possibilities based on speech processing,” inIEEE Interna- tional Conference on Cognitive Infocommunications, 2017, pp. 000 097– 000 102

  24. [24]

    The emotion probe: On the universality of cross-linguistic and cross-gender speech emotion recognition via machine learning,

    G. Costantini, E. Parada-Cabaleiro, D. Casali, and V . Cesarini, “The emotion probe: On the universality of cross-linguistic and cross-gender speech emotion recognition via machine learning,”Sensors, vol. 22, no. 7, p. 2461, 2022

  25. [25]

    A lightweight approach based on cross-modality for depression detection,

    E. Lim, M. Jhon, J.-W. Kim, S.-H. Kim, S. Kim, and H.-J. Yang, “A lightweight approach based on cross-modality for depression detection,” Computers in Biology and Medicine, vol. 186, p. 109618, 2025

  26. [26]

    Deficient interference inhibition for negative stimuli in depression: An event-related potential study,

    Q. Dai and Z. Feng, “Deficient interference inhibition for negative stimuli in depression: An event-related potential study,”Clinical Neuro- physiology, vol. 122, no. 1, pp. 52–61, 2011

  27. [27]

    Early perceptual anomaly of negative facial expression in depression: An event-related potential study,

    Q. Zhao, Y . Tang, S. Chen, Y . Lyu, A. Curtin, J. Wang, J. Sun, and S. Tong, “Early perceptual anomaly of negative facial expression in depression: An event-related potential study,”Neurophysiologie Clin- ique/Clinical Neurophysiology, vol. 45, no. 6, pp. 435–443, 2015

  28. [28]

    Lack of neural load modulation explains attention and working memory deficits in first-episode schizophrenia,

    D. Li, X. Zhang, Y . Kong, W. Yin, K. Jiang, X. Guo, X. Dong, L. Fu, G. Zhao, H. Gaoet al., “Lack of neural load modulation explains attention and working memory deficits in first-episode schizophrenia,” Clinical Neurophysiology, vol. 136, pp. 206–218, 2022

  29. [29]

    Pri- oritizing flexible working memory representations through retrospective attentional strengthening,

    D. Li, Y . Hu, M. Qi, C. Zhao, O. Jensen, J. Huang, and Y . Song, “Pri- oritizing flexible working memory representations through retrospective attentional strengthening,”NeuroImage, vol. 269, p. 119902, 2023

  30. [30]

    Electroencephalographic theta oscillatory dynamics reveal attentional bias to angry faces,

    L. Diao, S. Qi, M. Xu, L. Fan, and D. Yang, “Electroencephalographic theta oscillatory dynamics reveal attentional bias to angry faces,”Neu- roscience letters, vol. 656, pp. 31–36, 2017

  31. [31]

    A multi-modal open dataset for mental-disorder analysis,

    H. Cai, Z. Yuan, Y . Gao, S. Sun, N. Li, F. Tian, H. Xiao, J. Li, Z. Yang, X. Liet al., “A multi-modal open dataset for mental-disorder analysis,” Scientific Data, vol. 9, no. 1, p. 178, 2022

  32. [32]

    Spotting the traces of depression in read speech: An approach based on computational paralinguistics and social signal processing

    F. Tao, A. Esposito, and A. Vinciarelli, “Spotting the traces of depression in read speech: An approach based on computational paralinguistics and social signal processing.” inINTERSPEECH, 2020, pp. 1828–1832

  33. [33]

    Effects of feature type, learning algorithm and speaking style for depression detection from speech,

    V . Mitra and E. Shriberg, “Effects of feature type, learning algorithm and speaking style for depression detection from speech,” inIEEE International Conference on Acoustics, Speech and Signal Processing, 2015, pp. 4774–4778

  34. [34]

    A cross-linguistic depression detection method based on speech data,

    S. Qin, Y . Zhang, Y . Ma, H. Li, X. Li, B. Lian, W. Cai, J. Cui, and X. Zhao, “A cross-linguistic depression detection method based on speech data,”Journal of Affective Disorders, p. 119739, 2025

  35. [35]

    Visual working memory guides spatial attention: evidence from alpha oscillations and sustained potentials,

    D. Li, C. Zhao, J. Guo, Y . Kong, H. Li, B. Du, Y . Ding, and Y . Song, “Visual working memory guides spatial attention: evidence from alpha oscillations and sustained potentials,”Neuropsychologia, vol. 151, p. 107719, 2021

  36. [36]

    The effects of dynamic and static emotional facial expressions of humans and their avatars on the eeg: An erp and erd/ers study,

    T. Sollfrank, O. Kohnen, P. Hilfiker, L. C. Kegel, H. Jokeit, P. Brugger, M. L. Loertscher, A. Rey, D. Mersch, J. Sternagelet al., “The effects of dynamic and static emotional facial expressions of humans and their avatars on the eeg: An erp and erd/ers study,”Frontiers in neuroscience, vol. 15, p. 651044, 2021

  37. [37]

    The development of native chinese affective picture system–a pretest in 46 college students

    B. Lu, M. Hui, and H. Yu-Xia, “The development of native chinese affective picture system–a pretest in 46 college students.”Chinese mental health journal, 2005

  38. [38]

    Recent developments in opensmile, the munich open-source multimedia feature extractor,

    F. Eyben, F. Weninger, F. Gross, and B. Schuller, “Recent developments in opensmile, the munich open-source multimedia feature extractor,” in 21st ACM International Conference on Multimedia, 2013, pp. 835–838

  39. [39]

    The androids corpus: A new publicly available benchmark for speech based depression detection,

    F. Tao, A. Esposito, and A. Vinciarelli, “The androids corpus: A new publicly available benchmark for speech based depression detection,” INTERSPEECH, vol. 47, pp. 11–9, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

  40. [40]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

  41. [41]

    Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,

    T. Tieleman, G. Hintonet al., “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,”COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012

  42. [42]

    Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,

    E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Othmani, “Mfcc-based recurrent neural network for automatic clinical depression recognition and assessment from speech,”Biomedical Signal Processing and Control, vol. 71, p. 103107, 2022

  43. [43]

    Depaudionet: An efficient deep model for audio based depression classification,

    X. Ma, H. Yang, Q. Chen, D. Huang, and Y . Wang, “Depaudionet: An efficient deep model for audio based depression classification,” in Proceedings of the 6th international workshop on audio/visual emotion challenge, 2016, pp. 35–42

  44. [44]

    De- pression recognition using a proposed speech chain model fusing speech production and perception features,

    M. Du, S. Liu, T. Wang, W. Zhang, Y . Ke, L. Chen, and D. Ming, “De- pression recognition using a proposed speech chain model fusing speech production and perception features,”Journal of Affective Disorders, vol. 323, pp. 299–308, 2023

  45. [45]

    Decoupled multi-perspective fusion for speech depression detection,

    M. Zhao, H. Gao, L. Zhao, Z. Wang, F. Wang, W. Zheng, J. Li, and C. Liu, “Decoupled multi-perspective fusion for speech depression detection,”IEEE Transactions on Affective Computing, 2025

  46. [46]

    A convenient and low-cost model of depression screening and early warning based on voice data using for public mental health,

    X. Chen and Z. Pan, “A convenient and low-cost model of depression screening and early warning based on voice data using for public mental health,”International Journal of Environmental Research and Public Health, vol. 18, no. 12, p. 6441, 2021

  47. [47]

    Using deep neural networks for detecting depression from speech,

    M. Gheorghe, S. Mihalache, and D. Burileanu, “Using deep neural networks for detecting depression from speech,” inEuropean Signal Processing Conference, 2023, pp. 411–415

  48. [48]

    Automatic depression detection: An emo- tional audio-textual corpus and a gru/bilstm-based model,

    Y . Shen, H. Yang, and L. Lin, “Automatic depression detection: An emo- tional audio-textual corpus and a gru/bilstm-based model,” inICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6247–6251

  49. [49]

    A convnet for the 2020s,

    Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 976–11 986

  50. [50]

    Timesnet: Temporal 2d-variation modeling for general time series analysis,

    H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: Temporal 2d-variation modeling for general time series analysis,”arXiv preprint arXiv:2210.02186, 2022

  51. [51]

    Influence of depressive feelings in the brain processing of women with fibromyalgia: An eeg study,

    S. Villafaina, C. Sitges, D. Collado-Mateo, J. P. Fuentes-Garc ´ıa, and N. Gusi, “Influence of depressive feelings in the brain processing of women with fibromyalgia: An eeg study,”Medicine, vol. 98, no. 19, p. e15564, 2019

  52. [52]

    Neural mechanisms of the cognitive model of depression,

    S. G. Disner, C. G. Beevers, E. A. Haigh, and A. T. Beck, “Neural mechanisms of the cognitive model of depression,”Nature Reviews Neuroscience, vol. 12, no. 8, pp. 467–477, 2011

  53. [53]

    Depression, stress, and anhedonia: toward a synthesis and integrated model,

    D. A. Pizzagalli, “Depression, stress, and anhedonia: toward a synthesis and integrated model,”Annual review of clinical psychology, vol. 10, no. 1, pp. 393–423, 2014

  54. [54]

    The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology,

    J. Posner, J. A. Russell, and B. S. Peterson, “The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology,”Development and psychopathology, vol. 17, no. 3, pp. 715–734, 2005

  55. [55]

    Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses,

    J. M. Girard, J. F. Cohn, M. H. Mahoor, S. M. Mavadati, Z. Hammal, and D. P. Rosenwald, “Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses,”Image and vision computing, vol. 32, no. 10, pp. 641–647, 2014

  56. [56]

    Attentional biases for negative interpersonal stimuli in clinical depression

    I. H. Gotlib, E. Krasnoperova, D. N. Yue, and J. Joormann, “Attentional biases for negative interpersonal stimuli in clinical depression.”Journal of abnormal psychology, vol. 113, no. 1, p. 127, 2004

  57. [57]

    Cognition and depression: current status and future directions,

    I. H. Gotlib and J. Joormann, “Cognition and depression: current status and future directions,”Annual review of clinical psychology, vol. 6, no. 1, pp. 285–312, 2010

  58. [58]

    A survey on transfer learning,

    S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Trans- actions on knowledge and data engineering, vol. 22, no. 10, pp. 1345– 1359, 2009

  59. [59]

    Relational inductive biases, deep learning, and graph networks

    P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez-Gonzalez, V . Zam- baldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner et al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018

  60. [60]

    How transferable are features in deep neural networks?

    J. Yosinski, J. Clune, Y . Bengio, and H. Lipson, “How transferable are features in deep neural networks?”Advances in neural information processing systems, vol. 27, 2014

  61. [61]

    Reduced capacity to sustain positive emotion in major depression reflects diminished maintenance of fronto-striatal brain activation,

    A. S. Heller, T. Johnstone, A. J. Shackman, S. N. Light, M. J. Peterson, G. G. Kolden, N. H. Kalin, and R. J. Davidson, “Reduced capacity to sustain positive emotion in major depression reflects diminished maintenance of fronto-striatal brain activation,”Proceedings of the National Academy of Sciences, vol. 106, no. 52, pp. 22 445–22 450, 2009

  62. [62]

    Sensory processing in depression: Assessment and intervention perspective,

    A. Paquet, B. Calvet, A. Lacroix, and M. Girard, “Sensory processing in depression: Assessment and intervention perspective,”Clinical Psy- chology & Psychotherapy, vol. 29, no. 5, pp. 1567–1579, 2022

  63. [63]

    Eeg alpha oscillations: the inhibition–timing hypothesis,

    W. Klimesch, P. Sauseng, and S. Hanslmayr, “Eeg alpha oscillations: the inhibition–timing hypothesis,”Brain research reviews, vol. 53, no. 1, pp. 63–88, 2007

  64. [64]

    Lack of ventral striatal response to positive stimuli in depressed versus normal subjects,

    J. Epstein, H. Pan, J. H. Kocsis, Y . Yang, T. Butler, J. Chusid, H. Hochberg, J. Murrough, E. Strohmayer, E. Sternet al., “Lack of ventral striatal response to positive stimuli in depressed versus normal subjects,”American Journal of Psychiatry, vol. 163, no. 10, pp. 1784– 1790, 2006