pith. sign in

arxiv: 2605.28977 · v1 · pith:QMCOOXMQnew · submitted 2026-05-27 · 💻 cs.LG · cs.AI

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

Pith reviewed 2026-06-29 13:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords EEGMajor Depressive Disorderpost-hoc explainabilitydeep learningattribution methodsdepression detectionInceptionTimeblack-box models
0
0 comments X

The pith

Different post-hoc explainability methods applied to an EEG-based deep learning model for depression detection produce partially overlapping relevance structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains an InceptionTime model on EEG data to classify major depressive disorder and then applies five post-hoc attribution techniques: DeepSHAP, Integrated Gradients, GradCAM, Occlusion, and Permutation Feature Importance. It aggregates attributions globally across segments and subjects inside a subject-level stratified 5-fold cross-validation. The methods show recurring emphasis on frontal, temporal, and posterior regions, especially right-hemisphere channels, with stronger agreement among gradient- and perturbation-based approaches than with DeepSHAP. This partial convergence indicates that the model attends to consistent spatial patterns even though each explainer rests on different assumptions. Readers care because the result bears on whether black-box EEG classifiers can be inspected for meaningful neurophysiological content rather than data artifacts.

Core claim

The evaluated methods revealed partially convergent attribution patterns, with recurring emphasis on frontal, temporal, and posterior EEG regions, particularly in the right hemisphere. Quantitative comparison demonstrated substantial agreement between gradient- and perturbation-based approaches, while DeepSHAP produced comparatively distinct attribution distributions. At the same time, variability between explainability methods highlighted the influence of methodological assumptions on the resulting explanations. Overall, the results suggest that different post-hoc explainability approaches capture partially overlapping relevance structures in EEG-based deep learning models for depression de

What carries the argument

Global attribution aggregation of five post-hoc methods (DeepSHAP, Integrated Gradients, GradCAM, Occlusion, Permutation Feature Importance) applied inside subject-level stratified 5-fold cross-validation to an InceptionTime classifier on EEG segments for MDD detection.

If this is right

  • Attribution patterns remain broadly consistent with prior EEG studies of MDD.
  • Methodological assumptions of each explainer visibly shape the resulting maps.
  • Post-hoc explainability is useful for inspecting black-box EEG classifiers yet remains limited for clinical biomarker claims.
  • The analysis is positioned as exploratory rather than evidence of definitive neurophysiological markers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the observed overlaps prove stable across datasets, ensembles of multiple explainers could yield more reliable spatial summaries than any single method.
  • The right-hemisphere emphasis could motivate targeted channel-selection experiments in future model training.
  • Standardized protocols for comparing explainers on the same EEG partitions would reduce variability introduced by methodological choices.

Load-bearing premise

The subject-level stratified 5-fold cross-validation and global attribution aggregation produce stable, method-independent relevance maps that reflect model behavior rather than artifacts of the chosen explainers or data partitioning.

What would settle it

Re-running the identical pipeline with a different random seed for the 5-fold splits or with an additional explainer that yields completely non-overlapping region rankings would falsify the claim of partially overlapping relevance structures.

Figures

Figures reproduced from arXiv: 2605.28977 by Antonia \v{S}ar\v{c}evi\'c, Nikolina Frid.

Figure 1
Figure 1. Figure 1: a) EEG 10–20 system with 19 channels (b) EEG preprocessing protocol Because EEG recordings are frequently contaminated by physiological and environmental artifacts, preprocessing was performed following the protocol described in [23]. The prepro￾cessing was conducted in MATLAB R2021b using the EEGLAB toolbox. Signals were first filtered using a finite-impulse-response (FIR) band-pass filter between 0.1 and… view at source ↗
Figure 2
Figure 2. Figure 2: InceptionTime architecture for time-series classification adapted from [24]. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance evaluation curves for the InceptionTime model. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Global channel importance profile generated using DeepSHAP, comparing mean attri [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Topographic mapping of DeepSHAP feature attributions, illustrating the spatial [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: DeepSHAP regional and frequency-band heatmap matrix illustrating the interaction [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Global channel importance profile generated using Integrated Gradients (IG), com [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Topographic projection of Integrated Gradients (IG) feature attributions, illustrating [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Integrated Gradients (IG) regional and frequency-band attribution heatmap evaluat [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Integrated Gradients baseline stability analysis across the 5-fold cross-validation [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Multi-metric radar chart comparing Integrated Gradients, DeepSHAP, GradCAM, [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Inter-method consistency heatmap illustrating Kendall’s Tau cross-correlation values [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Final consensus and agreement-confidence profile highlighting the top 10 most influ [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
read the original abstract

Recent advances in deep learning have enabled increasingly accurate electroencephalography (EEG)-based classification of Major Depressive Disorder (MDD), but the decision-making processes of high-capacity models remain difficult to interpret. This study investigates multiple post-hoc explainability methods applied to an InceptionTime architecture trained for EEG-based MDD detection. The analysis includes Shapley-based, gradient-based, and perturbation-based attribution approaches: DeepSHAP, Integrated Gradients, GradCAM, Occlusion, and Permutation Feature Importance. Explainability analysis was performed within a subject-level stratified 5-fold cross-validation framework using global attribution aggregation across EEG segments and subjects. The evaluated methods revealed partially convergent attribution patterns, with recurring emphasis on frontal, temporal, and posterior EEG regions, particularly in the right hemisphere. Quantitative comparison demonstrated substantial agreement between gradient- and perturbation-based approaches, while DeepSHAP produced comparatively distinct attribution distributions. At the same time, variability between explainability methods highlighted the influence of methodological assumptions on the resulting explanations. Overall, the results suggest that different post-hoc explainability approaches capture partially overlapping relevance structures in EEG-based deep learning models for depression detection. Although the observed attribution patterns are broadly consistent with several previous EEG studies of MDD, the analysis should be interpreted as exploratory rather than evidence of definitive neurophysiological biomarkers or clinical applicability. The study highlights both the usefulness and limitations of post-hoc explainability for interpreting black-box EEG classifiers in psychiatric applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript compares post-hoc explainability methods (DeepSHAP, Integrated Gradients, GradCAM, Occlusion, Permutation Feature Importance) on an InceptionTime model for EEG-based MDD detection. Within a subject-level stratified 5-fold CV framework and global attribution aggregation, it reports partially convergent patterns across methods, with substantial agreement between gradient- and perturbation-based approaches, distinct distributions for DeepSHAP, and recurring emphasis on frontal/temporal/posterior regions (especially right hemisphere). The analysis is framed as exploratory.

Significance. If the partial overlaps are shown to be stable and not artifacts of partitioning or explainer assumptions, the work would usefully illustrate that different post-hoc methods capture overlapping but non-identical relevance structures in EEG deep-learning models, supporting the recommendation to employ multiple explainers rather than relying on any single technique for interpretability in psychiatric applications.

major comments (2)
  1. [Methods and Results] Methods/Results: The central claim that different explainers capture partially overlapping relevance structures depends on the subject-level stratified 5-fold CV plus global aggregation producing stable, method-independent maps. No per-fold attribution variance, inter-fold overlap statistics, or sensitivity analysis to the aggregation procedure is reported. If fold-to-fold maps differ substantially, the reported convergence between gradient/perturbation methods could be an artifact of the particular partition rather than a property of the InceptionTime model.
  2. [Abstract and Results] Abstract/Results: The abstract states that quantitative comparison showed 'substantial agreement' between gradient- and perturbation-based methods, yet the provided text supplies no numerical metrics (e.g., correlation coefficients, Dice overlap, or statistical tests), error bars, or details on how agreement was quantified. This absence makes it impossible to evaluate the strength or robustness of the convergence claim.
minor comments (1)
  1. [Discussion] The manuscript appropriately qualifies its conclusions as exploratory and not evidence of definitive biomarkers; this framing should be retained.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of robustness and quantification. We address each point below.

read point-by-point responses
  1. Referee: [Methods and Results] Methods/Results: The central claim that different explainers capture partially overlapping relevance structures depends on the subject-level stratified 5-fold CV plus global aggregation producing stable, method-independent maps. No per-fold attribution variance, inter-fold overlap statistics, or sensitivity analysis to the aggregation procedure is reported. If fold-to-fold maps differ substantially, the reported convergence between gradient/perturbation methods could be an artifact of the particular partition rather than a property of the InceptionTime model.

    Authors: We agree that the absence of per-fold variance and overlap statistics leaves the stability of the reported patterns unverified. In the revised manuscript we will add (i) per-fold attribution variance maps, (ii) inter-fold Dice overlap coefficients for each explainer, and (iii) a sensitivity comparison of global maps obtained by mean versus median aggregation. These additions will allow readers to judge whether the observed gradient/perturbation convergence is robust across partitions. revision: yes

  2. Referee: [Abstract and Results] Abstract/Results: The abstract states that quantitative comparison showed 'substantial agreement' between gradient- and perturbation-based methods, yet the provided text supplies no numerical metrics (e.g., correlation coefficients, Dice overlap, or statistical tests), error bars, or details on how agreement was quantified. This absence makes it impossible to evaluate the strength or robustness of the convergence claim.

    Authors: The referee correctly notes that the manuscript does not supply explicit numerical metrics for the claimed agreement. We will revise both the abstract and results section to report average Pearson (and Spearman) correlations between attribution maps of gradient- versus perturbation-based methods, together with standard deviations across folds and appropriate statistical tests. These quantitative details will replace the qualitative phrase 'substantial agreement'. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison with no derivations or self-referential predictions

full rationale

The paper conducts a direct empirical comparison of standard post-hoc explainability methods (DeepSHAP, Integrated Gradients, etc.) applied to a trained InceptionTime model on EEG data within subject-level stratified 5-fold CV. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim of partially overlapping attribution patterns is an observation from applying off-the-shelf tools, not a reduction to prior choices or definitions. The skeptic concern about unquantified fold stability is a potential evidentiary gap but does not constitute circularity under the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or theoretical claims; the study is purely empirical. No free parameters, axioms, or invented entities are introduced or required by the abstract.

pith-pipeline@v0.9.1-grok · 5795 in / 1065 out tokens · 27406 ms · 2026-06-29T13:40:02.371521+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 49 canonical work pages

  1. [1]

    GBD 2017 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the global burden of disease study 2017.The Lancet, 392(10159):1789–1858, 2018. doi: 10.1016/ S0140-6...

  2. [2]

    Kenneth S. Kendler. The phenomenology of major depression and the representativeness and nature of dsm criteria.American Journal of Psychiatry, 173(8):771–780, 2016. doi: 10. 1176/appi.ajp.2016.15121509. URLhttps://psychiatryonline.org/doi/abs/10.1176/ appi.ajp.2016.15121509

  3. [3]

    How many different ways do patients meet the diagnostic criteria for major de- pressive disorder?Comprehensive Psychiatry, 56:29–34, 2015

    Mark Zimmerman, William Ellison, Diane Young, Iwona Chelminski, and Kristy Dal- rymple. How many different ways do patients meet the diagnostic criteria for major de- pressive disorder?Comprehensive Psychiatry, 56:29–34, 2015. ISSN 0010-440X. doi: https://doi.org/10.1016/j.comppsych.2014.09.007. URLhttps://www.sciencedirect. com/science/article/pii/S00104...

  4. [4]

    John Rush

    A. John Rush. The varied clinical presentations of major depressive disorder.The Journal of Clinical Psychiatry, 68(suppl 8):22132, 2007. URLhttps://pubmed.ncbi.nlm.nih. gov/17640152/

  5. [5]

    Kessler and Evelyn J

    Ronald C. Kessler and Evelyn J. Bromet. The epidemiology of depression across cultures.Annual Review of Public Health, 34:119–138, 2013. doi: 10.1146/ annurev-publhealth-031912-114409

  6. [6]

    Ronald C. Kessler. The potential of predictive analytics to provide clinical decision support in depression treatment planning.Current Opinion in Psychiatry, 31(1):32–39, 2018. ISSN 0951-7367. URLhttps://journals.lww.com/co-psychiatry/fulltext/2018/01000/ the_potential_of_predictive_analytics_to_provide.7.aspx

  7. [7]

    Electroencephalogram alpha asymmetry in patients with depressive disorders: cur- rent perspectives.Neuropsychiatric Disease and Treatment, 14:1493–1504, 2018

    Andreas Kurt Kaiser, Maria-Theresa Gnjezda, Stephanie Knasmüller, and Wolfgang Aich- horn. Electroencephalogram alpha asymmetry in patients with depressive disorders: cur- rent perspectives.Neuropsychiatric Disease and Treatment, 14:1493–1504, 2018. doi: 18 10.2147/NDT.S137776. URLhttps://www.tandfonline.com/doi/abs/10.2147/NDT. S137776. PMID: 29928121

  8. [8]

    Nonlinear analysis of EEG complexity in episode and remission phase of recurrent depression.International Journal of Methods in Psychiatric Research, 29(2): e1816, 2020

    Milena Čukić, Miodrag Stokić, Slavoljub Radenković, Miloš Ljubisavljević, Slobodan Simić, and Danka Savić. Nonlinear analysis of EEG complexity in episode and remission phase of recurrent depression.International Journal of Methods in Psychiatric Research, 29(2): e1816, 2020. doi: 10.1002/mpr.1816

  9. [9]

    Russo, Joseph Geraci, Irene E

    Leif Simmatis, Emma E. Russo, Joseph Geraci, Irene E. Harmsen, Nardin Samuel, et al. Technical and clinical considerations for electroencephalography-based biomark- ers for major depressive disorder.npj Mental Health Research, 2(18), 2023. doi: 10.1038/s44184-023-00038-7

  10. [10]

    Alsham- rani, and Abdullah Alshehri

    Shumaila Aleem, Noor ul Huda, Rashid Amin, Samina Khalid, Sultan S. Alsham- rani, and Abdullah Alshehri. Machine learning algorithms for depression: Diagno- sis, insights, and research directions.Electronics, 11(7), 2022. ISSN 2079-9292. doi: 10.3390/electronics11071111. URLhttps://www.mdpi.com/2079-9292/11/7/1111

  11. [11]

    Depression biomark- ers using non-invasive eeg: A review.Neuroscience & Biobehavioral Reviews, 105:83– 93, 2019

    Fernando Soares de Aguiar Neto and João Luís Garcia Rosa. Depression biomark- ers using non-invasive eeg: A review.Neuroscience & Biobehavioral Reviews, 105:83– 93, 2019. ISSN 0149-7634. doi: https://doi.org/10.1016/j.neubiorev.2019.07.021. URL https://www.sciencedirect.com/science/article/pii/S0149763419303823

  12. [12]

    Behshad Hosseinifard, Mohammad Hassan Moradi, and Reza Rostami. Classifying de- pression patients and normal subjects using machine learning techniques and nonlinear features from eeg signal.Computer Methods and Programs in Biomedicine, 109(3):339– 345, 2013. ISSN 0169-2607. doi: https://doi.org/10.1016/j.cmpb.2012.10.008. URL https://www.sciencedirect.co...

  13. [13]

    Electroencephalogram (eeg)-based computer-aided tech- nique to diagnose major depressive disorder (mdd).Biomedical Signal Processing and Con- trol, 31:108–115, 2017

    Wajid Mumtaz, Likun Xia, Syed Saad Azhar Ali, Mohd Azhar Mohd Yasin, Muhammad Hussain, and Aamir Saeed Malik. Electroencephalogram (eeg)-based computer-aided tech- nique to diagnose major depressive disorder (mdd).Biomedical Signal Processing and Con- trol, 31:108–115, 2017. ISSN 1746-8094. doi: https://doi.org/10.1016/j.bspc.2016.07.006. URLhttps://www.s...

  14. [14]

    A pervasive approach to eeg- based depression detection.Complexity, 2018(1):5238028, 2018

    Hanshu Cai, Jiashuo Han, Yunfei Chen, Xiaocong Sha, Ziyang Wang, Bin Hu, Jing Yang, Lei Feng, Zhijie Ding, Yiqiang Chen, and Jürg Gutknecht. A pervasive approach to eeg- based depression detection.Complexity, 2018(1):5238028, 2018. doi: https://doi.org/10. 1155/2018/5238028

  15. [15]

    Ensemble approach for detection of depression using EEG features

    Egils Avots, Kl¯ avs Jermakovs, Maie Bachmann, Laura Päeske, Cagri Ozcinar, and Gho- lamreza Anbarjafari. Ensemble approach for detection of depression using EEG features. Entropy, 24(2):211, 2022. doi: 10.3390/e24020211

  16. [16]

    GCTNet: a graph convolutional transformer network for major depressive disorder detection based on EEG signals.Journal of Neural Engineering, 21(3):036042,

    YuwenWang, YudanPeng, MingxiuHan, XinyiLiu, HaijunNiu, JianCheng, SuhuaChang, and Tao Liu. GCTNet: a graph convolutional transformer network for major depressive disorder detection based on EEG signals.Journal of Neural Engineering, 21(3):036042,

  17. [17]

    doi: 10.1088/1741-2552/ad5048

  18. [18]

    Optimizing depression classification using com- bined datasets and hyperparameter tuning with Optuna.Sensors, 25(7):2083, 2025

    Stefana Dutua and Alina Elena Sultana. Optimizing depression classification using com- bined datasets and hyperparameter tuning with Optuna.Sensors, 25(7):2083, 2025. doi: 10.3390/s25072083

  19. [19]

    Interpretability and accuracy of machine learning algorithms for biomedical time series analysis – a scoping review

    Alan Jovic, Nikolina Frid, Karla Brkic, and Mario Cifrek. Interpretability and accuracy of machine learning algorithms for biomedical time series analysis – a scoping review. 19 Biomedical Signal Processing and Control, 110:108153, 2025. ISSN 1746-8094. doi: https: //doi.org/10.1016/j.bspc.2025.108153. URLhttps://www.sciencedirect.com/science/ article/pii...

  20. [20]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,

    CynthiaRudin. Stopexplainingblackboxmachinelearningmodelsforhighstakesdecisions and use interpretable models instead.Nature Machine Intelligence, 1(5):206–215, 2019. ISSN 2522-5839. doi: 10.1038/s42256-019-0048-x

  21. [21]

    Ričards Marcinkevičs and Julia E. Vogt. Interpretable and explainable machine learning: A methods-centric overview with concrete examples.WIREs Data Mining and Knowledge Discovery, 13(3):e1493, 2023. doi: https://doi.org/10.1002/widm.1493. URLhttps:// wires.onlinelibrary.wiley.com/doi/abs/10.1002/widm.1493

  22. [22]

    Explainable ai: A review of machine learning interpretability methods.Entropy, 23(1), 2021

    Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods.Entropy, 23(1), 2021. ISSN 1099-4300. doi: 10.3390/e23010018. URLhttps://www.mdpi.com/1099-4300/23/1/18

  23. [23]

    Hicks, Jonas L

    Steven A. Hicks, Jonas L. Isaksen, Vajira Thambawita, Jonas Ghouse, Gustav Ahlberg, Allan Linneberg, Niels Grarup, Inga Strümke, Christina Ellervik, Morten Salling Ole- sen, Torben Hansen, Claus Graff, Niels-Henrik Holstein-Rathlou, Pål Halvorsen, Mary M. Maleckar, and Jørgen K. Kanters. Explaining deep neural networks for knowledge discovery in electroca...

  24. [24]

    Opportunities and challenges for clinical practice in detecting depression using EEG and machine learning.Sensors, 25(2), 2025

    Damir Mulc, Jaksa Vukojevic, Eda Kalafatic, Mario Cifrek, Domagoj Vidovic, and Alan Jovic. Opportunities and challenges for clinical practice in detecting depression using EEG and machine learning.Sensors, 25(2), 2025. ISSN 1424-8220. doi: 10.3390/s25020409

  25. [25]

    Inceptiontime: Finding alexnet for time series classification.Data mining and knowledge discovery, 34(6):1936–1962, 2020

    Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F Schmidt, Jonathan Weber, Geoffrey I Webb, Lhassane Idoumghar, Pierre-Alain Muller, and François Petitjean. Inceptiontime: Finding alexnet for time series classification.Data mining and knowledge discovery, 34(6):1936–1962, 2020. URLhttps://doi.org/10.1007/ s10618-020-00710-y

  26. [26]

    Complexity and irregularity in the brain oscillations of depressive patients: A systematic review.Neuropsychiatry, 7(5):279–290,

    Alejandro de la Torre-Luque and Xavier Bornas. Complexity and irregularity in the brain oscillations of depressive patients: A systematic review.Neuropsychiatry, 7(5):279–290,

  27. [27]

    doi: 10.4172/Neuropsychiatry.1000232

  28. [28]

    Levinson, Daniel M

    Faranak Farzan, Sravya Atluri, Ye Mei, Sylvain Moreno, Andrea J. Levinson, Daniel M. Blumberger, and Zafiris J. Daskalakis. Brain temporal complexity in explaining the ther- apeutic and cognitive effects of seizure therapy.Brain, 140(4):1011–1025, 03 2017. ISSN 0006-8950. doi: 10.1093/brain/awx030

  29. [29]

    Chao, Daniel G

    Zenas C. Chao, Daniel G. Dillon, Yi-Hung Liu, Elyssa M. Barrick, and Chien-Te Wu. Altered coordination between frontal delta and parietal alpha networks underlies anhedo- nia and depressive rumination in major depressive disorder.Journal of Psychiatry and Neuroscience, 47(6):E367–E378, 2022. doi: 10.1503/jpn.220046

  30. [30]

    Study on fea- ture selection methods for depression detection using three-electrode EEG data.Inter- disciplinary Sciences: Computational Life Sciences, 10(3):558–565, 2018

    Hanshu Cai, Yunfei Chen, Jiashuo Han, Xiangzi Zhang, and Bin Hu. Study on fea- ture selection methods for depression detection using three-electrode EEG data.Inter- disciplinary Sciences: Computational Life Sciences, 10(3):558–565, 2018. doi: 10.1007/ s12539-018-0292-5

  31. [31]

    Single- channel EEG-based machine learning method for prescreening major depressive disorder

    Zhijiang Wan, Hao Zhang, Jiajin Huang, Haiyan Zhou, Jie Yang, and Ning Zhong. Single- channel EEG-based machine learning method for prescreening major depressive disorder. 20 International Journal of Information Technology & Decision Making, 18(05):1579–1603,

  32. [32]

    doi: 10.1142/S0219622019500342

  33. [33]

    A study of resting-state EEG biomarkers for depression recognition.arXiv preprint arXiv:2002.11039,

    Shuting Sun, Jianxiu Li, Huayu Chen, Tao Gong, Xiaowei Li, and Bin Hu. A study of resting-state EEG biomarkers for depression recognition.arXiv preprint arXiv:2002.11039,

  34. [34]

    doi: 10.48550/arXiv.2002.11039

  35. [35]

    Are eeg sequences time se- ries? eeg classification with time series models and joint subject training.arXiv preprint arXiv:2404.06966, 2024

    Johannes Burchert, Thorben Werner, Vijaya Krishna Yalavarthi, Diego Coello de Por- tugal, Maximilian Stubbemann, and Lars Schmidt-Thieme. Are eeg sequences time se- ries? eeg classification with time series models and joint subject training.arXiv preprint arXiv:2404.06966, 2024. URLhttps://arxiv.org/abs/2404.06966

  36. [36]

    Automated detection of major depressive disorder with eeg signals: A time series classification using deep learning.IEEE Access, 10:73804–73817, 2022

    Alireza Rafiei, Rasoul Zahedifar, Chiranjibi Sitaula, and Faezeh Marzbanrad. Automated detection of major depressive disorder with eeg signals: A time series classification using deep learning.IEEE Access, 10:73804–73817, 2022. URLhttps://ieeexplore.ieee. org/document/9828387

  37. [37]

    Reza Akbari Movahed, Gila Pirzad Jahromi, Shima Shahyad, and Gholam Hossein Meftahi. A major depressive disorder classification framework based on EEG signals using statistical, spectral, wavelet, functional connectivity, and nonlinear analysis.Journal of Neuroscience Methods, 358:109209, 2021. doi: 10.1016/j.jneumeth.2021.109209

  38. [38]

    Graph-based eeg approach for depres- sion prediction: integrating time-frequency complexity and spatial topology.Frontiers in Neuroscience, Volume 18 - 2024, 2024

    Wei Liu, Kebin Jia, and Zhuozheng Wang. Graph-based eeg approach for depres- sion prediction: integrating time-frequency complexity and spatial topology.Frontiers in Neuroscience, Volume 18 - 2024, 2024. ISSN 1662-453X. doi: 10.3389/fnins.2024. 1367212. URLhttps://www.frontiersin.org/journals/neuroscience/articles/10. 3389/fnins.2024.1367212

  39. [39]

    Daniel C. Elton. Self-explaining AI as an Alternative to Interpretable AI. In Ben Goertzel, Aleksandr I. Panov, Alexey Potapov, and Roman Yampolskiy, editors,Artificial General Intelligence, pages 95–106, Cham, 2020. Springer International Publishing. doi: 10.1007/ 978-3-030-52152-3\_10

  40. [40]

    Nadia Burkart and Marco F. Huber. A survey on the explainability of supervised machine learning.Journal of Artificial Intelligence Research, 70:245–317, 2021. doi: 10.1613/jair.1. 12228

  41. [41]

    Pranav Rajpurkar, Emma Chen, Oishi Banerjee, and Eric J. Topol. Ai in health and medicine.Nature Medicine, 28(1):31–38, 2022. ISSN 1546-170X. doi: 10.1038/ s41591-021-01614-0. URLhttps://doi.org/10.1038/s41591-021-01614-0

  42. [42]

    Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI.Journal of Medical Ethics, 47(5):329–335, 2021

    Juan Manuel Durán and Karin Rolanda Jongsma. Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI.Journal of Medical Ethics, 47(5):329–335, 2021. ISSN 0306-6800. doi: 10.1136/medethics-2020-106820

  43. [43]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InProceedings of the 31st International Conference on Neural Information Processing Sys- tems, NIPS’17, page 4768–4777, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964. URLhttps://dl.acm.org/doi/10.5555/3295222.3295230

  44. [44]

    Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus H. Gross. Towards better under- standing of gradient-based attribution methods for deep neural networks. InInternational Conference on Learning Representations, 2017. URLhttps://api.semanticscholar. org/CorpusID:3728967. 21

  45. [45]

    Axiomatic attribution for deep net- works

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep net- works. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017. URLhttps://dl.acm.org/doi/10.5555/3305890.3306024

  46. [46]

    Visualizing and understanding convolutional networks

    Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer, 2014. URLhttps: //link.springer.com/chapter/10.1007/978-3-319-10590-1_53

  47. [47]

    Perturbation-based methods for explaining deep neural networks: A survey.Pattern Recognition Letters, 150:228–234, 2021

    Maksims Ivanovs, Roberts Kadikis, and Kaspars Ozols. Perturbation-based methods for explaining deep neural networks: A survey.Pattern Recognition Letters, 150:228–234, 2021. URLhttps://www.sciencedirect.com/science/article/pii/S0167865521002440

  48. [48]

    Interpretable machine learning.Lulu

    C Molnar. Interpretable machine learning.Lulu. com, 2020

  49. [49]

    The disagreement problem in explainable machine learning: A practitioner’s perspective.Trans

    Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, and Himabindu Lakkaraju. The disagreement problem in explainable machine learning: A practitioner’s perspective.Trans. Mach. Learn. Res., 2024, 2022. URLhttps://api. semanticscholar.org/CorpusID:246485817

  50. [50]

    Mahato and S

    S. Mahato and S. Paul. Classification of depression patients and normal subjects based on electroencephalogram (eeg) signal using alpha power and theta asymmetry.Journal of Medical Systems, 44(1):28, 2019. doi: 10.1007/s10916-019-1486-z

  51. [51]

    Y. Liu, C. Pu, S. Xia, D. Deng, X. Wang, and M. Li. Machine learning approaches for diagnosing depression using eeg: A review.Translational Neuroscience, 13(1):224–235,

  52. [52]

    doi: 10.1515/tnsci-2022-0234

  53. [53]

    J. Zhu, Z. Wang, T. Gong, S. Zeng, X. Li, B. Hu, J. Li, S. Sun, and L. Zhang. An im- proved classification model for depression detection using eeg and eye tracking data.IEEE Transactions on Nanobioscience, 19(3):527–537, 2020. doi: 10.1109/TNB.2020.2990690

  54. [54]

    Mdd patients and healthy controls eeg data (new), November 2016

    Wajid Mumtaz. Mdd patients and healthy controls eeg data (new), November 2016. URL https://figshare.com/articles/dataset/EEG_Data_New/4244171. Figshare dataset

  55. [55]

    Learning important features through propagating activation differences

    Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InInternational conference on machine learn- ing, pages 3145–3153. PMlR, 2017. URLhttps://dl.acm.org/doi/10.5555/3305890. 3306006

  56. [56]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient- based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017. URLhttps://dl.acm.org/doi/10.1007/s11263-019-01228-7

  57. [57]

    Random forests.Machine learning, 45(1):5–32, 2001

    Leo Breiman. Random forests.Machine learning, 45(1):5–32, 2001

  58. [58]

    Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously.Journal of Machine Learning Research, 20(177):1–81, 2019. URL http://jmlr.org/papers/v20/18-760.html

  59. [59]

    C. T. Ekstrøm, T. A. Gerds, and A. K. Jensen. Sequential rank agreement methods for comparison of ranked lists.Biostatistics, 20(4):582–598, 2019. doi: 10.1093/biostatistics/ kxy017. 22

  60. [60]

    A graph signal processing approach to study high density eeg signals in patients with disorders of consciousness

    Sepehr Mortaheb, Jitka Annen, Camille Chatelle, Helena Cassol, Geraldine Martens, Au- rore Thibaut, Olivia Gosseries, and Steven Laureys. A graph signal processing approach to study high density eeg signals in patients with disorders of consciousness. In2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC...

  61. [61]

    Davidson

    Richard J. Davidson. Affective style and affective disorders: Perspectives from affective neuroscience.Cognition and Emotion, 12(3):307–330, 1998. doi: 10.1080/026999398379628

  62. [62]

    John J. B. Allen, Philipp M. Keune, Michael Schönenberg, and Robin Nusslock. Frontal EEG alpha asymmetry and emotion: From neural underpinnings and methodological con- siderations to psychopathology and social cognition.Psychophysiology, 55(1):e13028, 2018. doi: https://doi.org/10.1111/psyp.13028

  63. [63]

    Altered beta band spatial-temporal interactions during negative emotional processing in major depressive disorder: An MEG study.Journal of Affective Disorders, 338:254–261,

    Yishan Du, Lingling Hua, Shui Tian, ZhongPeng Dai, Yi Xia, Shuai Zhao, HaoWen Zou, Xiaoqin Wang, Hao Sun, Hongliang Zhou, YingHong Huang, ZhiJian Yao, and Qing Lu. Altered beta band spatial-temporal interactions during negative emotional processing in major depressive disorder: An MEG study.Journal of Affective Disorders, 338:254–261,

  64. [64]

    doi: 10.1016/j.jad.2023.06.001

    ISSN 0165-0327. doi: 10.1016/j.jad.2023.06.001

  65. [65]

    Chaolin Teng, Mengwei Wang, Wei Wang, Jin Ma, Min Jia, Min Wu, Yuanyuan Luo, Yu Wang, Yiyang Zhang, and Jin Xu. Abnormal properties of cortical functional brain net- work in major depressive disorder: Graph theory analysis based on electroencephalography- source estimates.Neuroscience, 506:80–90, 2022. ISSN 0306-4522. doi: https://doi.org/ 10.1016/j.neuro...

  66. [66]

    Fingelkurts, Andrew A

    Alexander A. Fingelkurts, Andrew A. Fingelkurts, Heikki Rytsälä, Kirsi Suominen, Erkki Isometsä, and Seppo Kähkönen. Composition of brain oscillations in ongoing EEG during major depression disorder.Neuroscience Research, 56(2):133–144, 2006. ISSN 0168-0102. doi: https://doi.org/10.1016/j.neures.2006.06.006

  67. [67]

    A benchmark for in- terpretability methods in deep neural networks.Advances in neural information processing systems, 32, 2019

    Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for in- terpretability methods in deep neural networks.Advances in neural information processing systems, 32, 2019. URLhttps://arxiv.org/abs/1806.10758. 23