End-to-End Machine Learning for Depressive State Classification via EEG and fNIRS

Mihoko Otake-Matsuura; Riki Sakurai; Shin'ichiro Kanoh; Simon Kojima; Tomasz M. Rutkowski

arxiv: 2606.11555 · v1 · pith:PS2TMPH4new · submitted 2026-06-10 · 🧬 q-bio.NC · cs.AI· cs.LG

End-to-End Machine Learning for Depressive State Classification via EEG and fNIRS

Riki Sakurai , Simon Kojima , Mihoko Otake-Matsuura , Shin'ichiro Kanoh , Tomasz M. Rutkowski This is my paper

Pith reviewed 2026-06-27 07:52 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AIcs.LG

keywords depressive state classificationEEGfNIRSmachine learningpilot studybiological signalsmental health diagnosticsobjective assessment

0 comments

The pith

The paper establishes an end-to-end machine learning framework using EEG and fNIRS signals to classify depressive states from a pilot with eleven healthy students.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper notes that conventional psychiatric diagnosis depends on subjective interviews and self-reports prone to bias. It identifies EEG and fNIRS as sources of objective biological signals that could detect depressive states, including those unrecognized by the individual. The work presents a pilot study of eleven healthy students that applies machine learning to these signals to create a classification framework. This setup is positioned as an initial step toward automated tools that could aid clinical decisions, especially when distinguishing depression from dementia symptoms in older adults.

Core claim

This pilot study of eleven healthy students establishes a framework for biological signal-based depression detection, serving as a foundational step toward automated, objective diagnostic tools for clinical use.

What carries the argument

End-to-end machine learning pipeline that processes combined EEG and fNIRS recordings to classify depressive states.

If this is right

The approach supplies a quantitative method to evaluate mental health states beyond self-report.
It could identify latent depressive states that subjects do not recognize themselves.
The framework offers a route to differentiate depression from dementia in aging populations to support quality of life.
It provides a concrete basis for developing automated diagnostic tools in mental healthcare.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Validation on larger and clinically diagnosed groups would be required before the models could be considered reliable for patient use.
The signals might be combined with additional data streams such as behavioral measures to strengthen classification.
If the pipeline proves robust, it could support continuous monitoring applications outside laboratory settings.

Load-bearing premise

Recordings from eleven healthy students contain patterns representative of clinical depressive states that models can generalize to actual patients.

What would settle it

Testing the trained models on EEG and fNIRS recordings from clinically diagnosed depressed individuals and checking whether classification matches independent clinical diagnoses.

Figures

Figures reproduced from arXiv: 2606.11555 by Mihoko Otake-Matsuura, Riki Sakurai, Shin'ichiro Kanoh, Simon Kojima, Tomasz M. Rutkowski.

**Figure 3.** Figure 3: Hybrid EEG-fNIRS sensor topography. Schematic representation [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Leave-one-subject-out cross-validation results. Comparison of classifi [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

The escalating demand for mental healthcare, driven by rising societal stress, highlights the limitations of traditional psychiatric diagnostics. Conventional methods - relying primarily on clinical interviews and patient self-reports - are inherently vulnerable to subjective bias and the varying empirical judgment of practitioners. To address the need for quantitative evaluation, biological signal-based detection, including electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a promising objective alternative. Such technology is particularly vital for identifying latent depressive states that may be unrecognized by the subjects themselves. Furthermore, in aging populations, the high comorbidity between depression and dementia necessitates early differentiation to prevent mutual symptom exacerbation and maintain Quality of Life (QoL). This pilot study of eleven healthy students establishes a framework for biological signal-based depression detection, serving as a foundational step toward automated, objective diagnostic tools for clinical use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper records EEG/fNIRS from eleven healthy students and applies standard classifiers but shows no results and has no depressive cases, so the framework claim for depression detection does not hold.

read the letter

The main takeaway is that this paper collects multimodal brain signals from eleven healthy students, runs routine machine learning classifiers on them, and presents the effort as establishing a framework for depressive state detection. The data and the claim do not line up.

The introduction correctly flags the limits of interview-based diagnosis and the value of objective signals, especially with comorbidity in older adults. Collecting both EEG and fNIRS in one session is a reasonable practical choice and the paper notes the need for quantitative tools. That part is straightforward and uncontroversial.

Nothing else stands out as new. The work applies existing classifiers to a well-studied target without new algorithms, derivations, or first-principles modeling. The abstract supplies no accuracy figures, no validation scheme, no baseline comparisons, and no subject criteria beyond the small healthy cohort.

The central weakness is the gap between experiment and stated goal. Healthy students supply no depressive labels, so the models can at best separate individuals within a normal group. They provide no evidence that the same signals would separate depressive from non-depressive states. A convenience sample of eleven controls is too narrow to support generalization claims, and the absence of any performance numbers leaves the technical execution untestable.

This is the sort of early methods note that might interest someone setting up their own recording protocol, but it does not deliver usable evidence on the clinical target. I would not bring it to a reading group or cite it. It does not merit sending out for peer review until real patient data and concrete results are added.

Referee Report

2 major / 0 minor

Summary. The manuscript presents a pilot study recording EEG and fNIRS signals from eleven healthy students and claims to establish an end-to-end machine learning framework for biological signal-based classification of depressive states, positioned as a foundational step toward objective diagnostic tools that could address limitations of subjective psychiatric assessments and aid differentiation in aging populations with dementia comorbidity.

Significance. If the central experimental result were supported by appropriate data and validation, the work would address a genuine clinical need for quantitative mental-health biomarkers. The absence of any depressive-state labels or clinical subjects, however, means the reported framework cannot demonstrate separation of depressive from non-depressive states and therefore supplies no evidence toward the claimed clinical utility.

major comments (2)

[Abstract] Abstract: The claim that recordings from eleven healthy students establish a framework for depressive-state classification is not supported by the described cohort, which contains no clinical subjects, no induced depressive states, and no validated depressive labels; any classifier can at best distinguish among healthy individuals.
[Abstract] Abstract: No performance metrics, cross-validation procedure, baseline comparisons, or subject-selection criteria are supplied, so the central claim that an automated, objective diagnostic framework has been established rests on an unshown experimental result from a convenience sample.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed review and the opportunity to respond. We address the major comments on the abstract below, agreeing where revisions are needed to better align claims with the pilot nature of the study.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that recordings from eleven healthy students establish a framework for depressive-state classification is not supported by the described cohort, which contains no clinical subjects, no induced depressive states, and no validated depressive labels; any classifier can at best distinguish among healthy individuals.

Authors: We agree that the cohort is restricted to healthy students without depressive labels or clinical subjects, so the work cannot demonstrate separation of depressive from non-depressive states. The manuscript frames the study as a pilot to develop the end-to-end ML pipeline on combined EEG-fNIRS signals. The abstract will be revised to state explicitly that the framework is validated only on healthy volunteers and that clinical data with depressive-state labels are required to support diagnostic claims. revision: yes
Referee: [Abstract] Abstract: No performance metrics, cross-validation procedure, baseline comparisons, or subject-selection criteria are supplied, so the central claim that an automated, objective diagnostic framework has been established rests on an unshown experimental result from a convenience sample.

Authors: The full manuscript contains the experimental details, cross-validation approach, performance metrics, and subject criteria in the Methods and Results sections. To address the concern that these are not evident from the abstract, we will revise the abstract to summarize the key validation procedures, metrics, and the convenience-sample nature of the healthy cohort. revision: yes

standing simulated objections not resolved

The manuscript contains no data from subjects with depressive states or validated depressive labels, so it cannot supply evidence for classification of depressive versus non-depressive states or for the claimed clinical utility.

Circularity Check

0 steps flagged

No significant circularity; empirical pilot framework without derivation chain

full rationale

The manuscript presents an empirical pilot study applying machine learning classifiers to EEG and fNIRS recordings from eleven healthy students, framed as establishing a framework for depressive state detection. No equations, parameter-fitting steps presented as predictions, self-citations invoked as uniqueness theorems, or ansatzes smuggled via prior work are described in the abstract or reader summary. The central claim reduces to an experimental demonstration on a convenience sample rather than any mathematical derivation that collapses to its inputs by construction. This is the most common honest outcome for applied ML papers lacking a formal derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the central claim rests on the unstated premise that student data can proxy clinical depression.

pith-pipeline@v0.9.1-grok · 5698 in / 1078 out tokens · 18871 ms · 2026-06-27T07:52:13.243096+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 1 canonical work pages

[1]

Beck depression inventory–II,

A. T. Beck, R. A. Steer, and G. Brown, “Beck depression inventory–II,” Psychological assessment, 1996

1996
[2]

Machine learning approach for early onset dementia neurobiomarker using EEG network topology features,

T. M. Rutkowski, M. S. Abe, T. Komendzinski, H. Sugimoto, S. Nareb- ski, and M. Otake-Matsuura, “Machine learning approach for early onset dementia neurobiomarker using EEG network topology features,” Frontiers in Human Neuroscience, vol. 17, p. 1155194, 2023

2023
[3]

T. M. Rutkowski, T. Komendzinski, and M. Otake-Matsuura, “Mild cognitive impairment prediction and cognitive score regression in the elderly using EEG topological data analysis and machine learning with awareness assessed in affective reminiscent paradigm,”Frontiers in Aging Neuroscience, vol. 15, p. 1294139, 2024

2024
[4]

Spatial auditory soundscapes for developing digital neurobiomarkers or cog- nitive interventions in early-onset dementia based on EEG and fNIRS machine-learning analysis,

S. Kojima, R. Shiba, Y . Morimoto, K. Furukawa, S. Kanoh, T. Komendzi ´nski, M. Otake-Matsuura, and T. M. Rutkowski, “Spatial auditory soundscapes for developing digital neurobiomarkers or cog- nitive interventions in early-onset dementia based on EEG and fNIRS machine-learning analysis,” in46th Annual International Conference of the IEEE Engineering in M...

2024
[5]

Depression Detection and Diagnosis Based on Electroencephalogram (EEG) Analysis: A Systematic Review,

K. Elnaggar, M. M. El-Gayar and M. Elmogy, “Depression Detection and Diagnosis Based on Electroencephalogram (EEG) Analysis: A Systematic Review,”Diagnostics (Basel), vol. 15, no. 2, p. 210, 2025

2025
[6]

Speaker recognition from raw waveform with SincNet,

M. Ravanelli and Y . Bengio, “Speaker recognition from raw waveform with SincNet,” in2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028, IEEE, 2018

2018
[7]

Deep learning with convolutional neural networks for EEG decoding and visualization,

R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for EEG decoding and visualization,”Human Brain Mapping, vol. 38, no. 11, pp. 5391– 5420, 2017

2017
[8]

Braindecode: toolbox for decod- ing raw electrophysiological brain data with deep learning models

B. Aristimunha, P. Guetschel, M. Wimpff, L. Gemein, C. Rommel, H. Banville, M. Sliwowski, D. Wilson, S. Brandt, T. Gnassounou, J. Paillard, B. Junqueira Lopes, S. Sedlar, T. Moreau, S. Chevallier, A. Gramfort, and R. T. Schirrmeister, “Braindecode: toolbox for decod- ing raw electrophysiological brain data with deep learning models.”
[9]

Baron-Cohen,Mind Reading: The Interactive Guide to Emotions

S. Baron-Cohen,Mind Reading: The Interactive Guide to Emotions. London, UK: Jessica Kingsley Publishers, 2004

2004
[10]

NRC V AD Lexicon v2: norms for valence, arousal, and dominance for over 55k English terms,

S. M. Mohammad, “NRC V AD Lexicon v2: norms for valence, arousal, and dominance for over 55k English terms,”arXiv preprint arXiv:2503.23547, 2025

work page arXiv 2025
[11]

sccn/liblsl: v1. 16.2,

T. Stenner, C. Boulay,et al., “sccn/liblsl: v1. 16.2,”Zenodo, 2022

2022

[1] [1]

Beck depression inventory–II,

A. T. Beck, R. A. Steer, and G. Brown, “Beck depression inventory–II,” Psychological assessment, 1996

1996

[2] [2]

Machine learning approach for early onset dementia neurobiomarker using EEG network topology features,

T. M. Rutkowski, M. S. Abe, T. Komendzinski, H. Sugimoto, S. Nareb- ski, and M. Otake-Matsuura, “Machine learning approach for early onset dementia neurobiomarker using EEG network topology features,” Frontiers in Human Neuroscience, vol. 17, p. 1155194, 2023

2023

[3] [3]

T. M. Rutkowski, T. Komendzinski, and M. Otake-Matsuura, “Mild cognitive impairment prediction and cognitive score regression in the elderly using EEG topological data analysis and machine learning with awareness assessed in affective reminiscent paradigm,”Frontiers in Aging Neuroscience, vol. 15, p. 1294139, 2024

2024

[4] [4]

Spatial auditory soundscapes for developing digital neurobiomarkers or cog- nitive interventions in early-onset dementia based on EEG and fNIRS machine-learning analysis,

S. Kojima, R. Shiba, Y . Morimoto, K. Furukawa, S. Kanoh, T. Komendzi ´nski, M. Otake-Matsuura, and T. M. Rutkowski, “Spatial auditory soundscapes for developing digital neurobiomarkers or cog- nitive interventions in early-onset dementia based on EEG and fNIRS machine-learning analysis,” in46th Annual International Conference of the IEEE Engineering in M...

2024

[5] [5]

Depression Detection and Diagnosis Based on Electroencephalogram (EEG) Analysis: A Systematic Review,

K. Elnaggar, M. M. El-Gayar and M. Elmogy, “Depression Detection and Diagnosis Based on Electroencephalogram (EEG) Analysis: A Systematic Review,”Diagnostics (Basel), vol. 15, no. 2, p. 210, 2025

2025

[6] [6]

Speaker recognition from raw waveform with SincNet,

M. Ravanelli and Y . Bengio, “Speaker recognition from raw waveform with SincNet,” in2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028, IEEE, 2018

2018

[7] [7]

Deep learning with convolutional neural networks for EEG decoding and visualization,

R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for EEG decoding and visualization,”Human Brain Mapping, vol. 38, no. 11, pp. 5391– 5420, 2017

2017

[8] [8]

Braindecode: toolbox for decod- ing raw electrophysiological brain data with deep learning models

B. Aristimunha, P. Guetschel, M. Wimpff, L. Gemein, C. Rommel, H. Banville, M. Sliwowski, D. Wilson, S. Brandt, T. Gnassounou, J. Paillard, B. Junqueira Lopes, S. Sedlar, T. Moreau, S. Chevallier, A. Gramfort, and R. T. Schirrmeister, “Braindecode: toolbox for decod- ing raw electrophysiological brain data with deep learning models.”

[9] [9]

Baron-Cohen,Mind Reading: The Interactive Guide to Emotions

S. Baron-Cohen,Mind Reading: The Interactive Guide to Emotions. London, UK: Jessica Kingsley Publishers, 2004

2004

[10] [10]

NRC V AD Lexicon v2: norms for valence, arousal, and dominance for over 55k English terms,

S. M. Mohammad, “NRC V AD Lexicon v2: norms for valence, arousal, and dominance for over 55k English terms,”arXiv preprint arXiv:2503.23547, 2025

work page arXiv 2025

[11] [11]

sccn/liblsl: v1. 16.2,

T. Stenner, C. Boulay,et al., “sccn/liblsl: v1. 16.2,”Zenodo, 2022

2022