pith. machine review for the scientific record. sign in

arxiv: 2604.23933 · v1 · submitted 2026-04-27 · 💻 cs.LG · eess.SP· q-bio.NC

Recognition: unknown

Robust and Clinically Reliable EEG Biomarkers: A Cross Population Framework for Generalizable Parkinson's Disease Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:41 UTC · model grok-4.3

classification 💻 cs.LG eess.SPq-bio.NC
keywords EEG biomarkersParkinson's disease detectioncross-population generalizationdistribution shiftmulti-site datamachine learningbiomarker stability
0
0 comments X

The pith

EEG biomarkers for Parkinson's disease detection become more accurate and stable when trained across multiple diverse populations rather than single cohorts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces an evaluation framework designed to measure how well EEG-based Parkinson's detection models perform when applied to new patient populations. By systematically testing every possible combination of training and testing groups from five separate cohorts, the work reveals that transfer between populations is not equal in both directions. Accuracy and the consistency of the identified biomarkers both rise as the number of distinct populations used for training grows, reaching 94.1 percent on completely held-out groups. A supporting theory based on optimizing risk over mixtures of populations explains why broader training sets produce more reliable features.

Core claim

The authors claim that training on data from multiple populations produces EEG representations that are robust to population shifts in Parkinson's disease detection. This is demonstrated through exhaustive cross-population testing showing asymmetric transfer and improved performance with greater diversity, supported by an analysis of mixture risk optimization that contracts the hypothesis space to favor generalizable solutions.

What carries the argument

A population-aware framework that uses n-gram expansion to generate all 75 directional train-test configurations across five cohorts, paired with nested cross-validation and channel selection to identify biomarkers without leakage between populations.

Load-bearing premise

The assumption that the five cohorts adequately represent the variety of real-world clinical data shifts and that the evaluation procedure completely prevents any information from the test population leaking into training.

What would settle it

A replication study on an additional independent cohort where single-population training achieves comparable or superior accuracy and biomarker stability to the multi-population approach, or detection of leakage in one of the 75 configurations.

Figures

Figures reproduced from arXiv: 2604.23933 by Arun Singh, KC Santosh, Longwei Wang, Martina Mancini, Md Rezwanul Akter Pallab, Nicholas R. Rasmussen, Rodrigue Rizk, Samuel Stuwart.

Figure 1
Figure 1. Figure 1: Conceptual illustration of cross-population generalizatio view at source ↗
Figure 2
Figure 2. Figure 2: Mathematical pipeline for channel selection. Raw multichan view at source ↗
Figure 3
Figure 3. Figure 3: Engineering diagram illustrating structured cross-popula view at source ↗
Figure 4
Figure 4. Figure 4: Low-dimensional projections of frame-level embeddings view at source ↗
Figure 5
Figure 5. Figure 5: Performance trends across model configurations and t view at source ↗
Figure 6
Figure 6. Figure 6: Population-wise topological EEG channel selection freque view at source ↗
read the original abstract

Developing robust and clinically reliable EEG biomarkers requires evaluation frameworks that explicitly address cross population generalization in multi site settings such as Parkinsons disease (PD) detection. Models trained under i.i.d. assumptions often capture population specific artifacts rather than disease relevant neural structure, leading to poor generalization across clinical cohorts. EEG further amplifies this challenge due to low signal to noise ratio and heterogeneous acquisition conditions. We propose a population aware evaluation framework to assess the robustness and clinical reliability of EEG biomarkers under distribution shift. Using an n gram expansion strategy, we enumerate all cross population train test configurations across five independent cohorts, resulting in 75 directional evaluations. A nested cross validation design with integrated channel selection ensures prospective biomarker identification without population leakage. Results show that cross population transfer is asymmetric and that both accuracy and biomarker stability improve with increasing training population diversity, achieving up to 94.1% accuracy on held out cohorts. A theoretical analysis based on mixture risk optimization and hypothesis space contraction explains these trends, showing that multi population training promotes population robust representations. This work establishes a principled framework for learning robust, generalizable, and clinically reliable EEG biomarkers for multi site biomedical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a population-aware evaluation framework for EEG-based Parkinson's disease detection that enumerates all cross-population train-test splits across five independent cohorts via an n-gram expansion strategy, yielding 75 directional evaluations. It applies nested cross-validation with integrated channel selection to identify biomarkers prospectively without population leakage. Empirical results indicate asymmetric cross-population transfer, with both classification accuracy and biomarker stability improving as training population diversity increases, reaching a peak of 94.1% accuracy on held-out cohorts. A theoretical analysis invoking mixture risk optimization and hypothesis space contraction is presented to explain these trends by arguing that multi-population training yields more robust representations.

Significance. If the empirical trends and framework hold under rigorous validation, the work would contribute a systematic approach to assessing generalization in multi-site EEG studies for PD, with potential implications for other neurological biomarkers under distribution shift. The observation that diversity improves stability and the asymmetry finding could guide cohort selection in future clinical machine learning applications. The theoretical component, however, currently functions more as post-hoc interpretation than a predictive derivation with verifiable bounds.

major comments (3)
  1. [Abstract/Results] Abstract and Results: The reported peak accuracy of 94.1% and claims of improvement with population diversity are presented without error bars, confidence intervals, cohort sizes, class balances, or statistical significance tests. This is load-bearing for the central empirical claim, as it prevents evaluation of whether observed gains exceed variability due to sample size differences or acquisition artifacts across the five cohorts.
  2. [Theoretical Analysis] Theoretical Analysis: The explanation via mixture risk optimization and hypothesis space contraction is framed as accounting for the asymmetric transfer and stability gains, yet the abstract and description provide no explicit equations, assumptions (e.g., bounded hypothesis complexity or uniform convergence conditions), or proof sketches showing how multi-population mixtures produce contraction. This renders the link to the 75 directional evaluations unverifiable and risks circularity, as the theory interprets rather than independently predicts the trends.
  3. [Methods] Methods: The n-gram enumeration combined with nested cross-validation and channel selection is asserted to eliminate population leakage across all 75 evaluations, but without concrete details on the nesting structure (e.g., how channel selection is isolated from test-population information) or verification metrics for leakage, the robustness of the asymmetry and diversity findings cannot be confirmed against confounds such as cohort imbalance.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'up to 94.1% accuracy on held out cohorts' does not specify which configuration or cohort achieves this value, reducing the ability to map the result to the diversity trend.
  2. [Introduction/Methods] The manuscript would benefit from explicit statements of the five cohort sizes and acquisition parameters (e.g., channel counts, sampling rates) to allow readers to assess representativeness of the distribution shifts.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and will incorporate revisions to enhance the statistical rigor, theoretical clarity, and methodological transparency of the work.

read point-by-point responses
  1. Referee: [Abstract/Results] Abstract and Results: The reported peak accuracy of 94.1% and claims of improvement with population diversity are presented without error bars, confidence intervals, cohort sizes, class balances, or statistical significance tests. This is load-bearing for the central empirical claim, as it prevents evaluation of whether observed gains exceed variability due to sample size differences or acquisition artifacts across the five cohorts.

    Authors: We agree that the current presentation of the 94.1% accuracy and diversity-related gains lacks essential statistical context. In the revised manuscript, we will add error bars (standard deviation across nested CV folds), 95% bootstrap confidence intervals, a supplementary table with exact cohort sizes and class balances for all five cohorts, and statistical significance tests (e.g., McNemar's test for pairwise comparisons and Wilcoxon signed-rank tests for diversity trends) to demonstrate that improvements exceed variability attributable to sample size or acquisition differences. revision: yes

  2. Referee: [Theoretical Analysis] Theoretical Analysis: The explanation via mixture risk optimization and hypothesis space contraction is framed as accounting for the asymmetric transfer and stability gains, yet the abstract and description provide no explicit equations, assumptions (e.g., bounded hypothesis complexity or uniform convergence conditions), or proof sketches showing how multi-population mixtures produce contraction. This renders the link to the 75 directional evaluations unverifiable and risks circularity, as the theory interprets rather than independently predicts the trends.

    Authors: The referee correctly identifies that the theoretical component is currently more interpretive than predictive. We will expand the Theoretical Analysis section with the explicit mixture risk equation R_mix(θ) = E_{P~μ}[R_P(θ)], the assumptions of bounded hypothesis complexity (finite VC dimension) and uniform convergence under mixture measures, and a proof sketch showing contraction of Rademacher complexity as the mixture support grows. This will provide a verifiable, forward link to the 75 directional evaluations rather than post-hoc interpretation. revision: yes

  3. Referee: [Methods] Methods: The n-gram enumeration combined with nested cross-validation and channel selection is asserted to eliminate population leakage across all 75 evaluations, but without concrete details on the nesting structure (e.g., how channel selection is isolated from test-population information) or verification metrics for leakage, the robustness of the asymmetry and diversity findings cannot be confirmed against confounds such as cohort imbalance.

    Authors: We acknowledge the need for greater methodological detail. The revised Methods section will include pseudocode and a diagram of the nested CV structure, clarifying that channel selection occurs only in inner training folds drawn exclusively from training populations (with the test population fully withheld). We will also add explicit leakage verification, including mutual information checks between selected channels and population identifiers, plus reporting of balanced accuracy to address potential cohort imbalance confounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain.

full rationale

The paper reports empirical results from enumerating 75 cross-population train-test configurations via n-gram expansion and nested CV with channel selection, observing asymmetric transfer and accuracy gains up to 94.1% with increased training diversity. It then states that a theoretical analysis based on mixture risk optimization and hypothesis space contraction explains these trends. No equations, self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text that would reduce the theoretical claims to the empirical inputs by construction. The theory functions as an interpretive explanation of observed patterns rather than a closed loop or ansatz smuggled via prior work. The derivation remains independent of the specific data fits.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard machine-learning assumptions about distribution shift and EEG signal separability, with no new free parameters or invented entities; the main additions are the evaluation protocol and explanatory theory.

axioms (2)
  • domain assumption EEG signals contain disease-relevant neural structure that can be separated from population-specific artifacts via appropriate cross-population evaluation.
    Core premise motivating the population-aware framework.
  • domain assumption Multi-population training contracts the hypothesis space toward population-robust representations.
    Invoked to explain why accuracy and stability improve with training diversity.

pith-pipeline@v0.9.0 · 5539 in / 1369 out tokens · 96439 ms · 2026-05-08T04:41:10.361826+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    Shalev-Shwartz and S

    S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms . Cambridge, UK: Cambridge University Press, 2014

  2. [2]

    V. N. Vapnik, Statistical Learning Theory . New York: Wiley, 1998. 41

  3. [3]

    Dataset shift in machine learning,

    J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. L awrence, “Dataset shift in machine learning,” MIT Press , 2009

  4. [4]

    A guide to cr oss-validation for artificial intelligence in medical imaging,

    T. J. Bradshaw, Z. Huemann, J. Hu, and A. Rahmim, “A guide to cr oss-validation for artificial intelligence in medical imaging,” Radiology: Artificial Intelligence , vol. 5, no. 4, p. e220232, 2023. [Online]. Available: https://doi.org/10.1148/ryai.220232

  5. [5]

    Shortcut learning in deep neural networks,

    R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,” Nature Machine Intelligence , vol. 2, no. 11, pp. 665–673, 2020

  6. [6]

    Variable generaliza- tion performance of a deep learning model to detect pneumonia in ch est radiographs: A cross-sectional study,

    J. R. Zech, M. A. Badgeley, M. Liu, A. B. Costa, J. J. Titano, and E. K. Oermann, “Variable generaliza- tion performance of a deep learning model to detect pneumonia in ch est radiographs: A cross-sectional study,” PLOS Medicine, vol. 15, no. 11, p. e1002683, 2018

  7. [7]

    When more is le ss: Incorporating additional datasets can hurt performance by introducing spurious correlat ions,

    R. Compton, L. Zhang, A. Puli, and R. Ranganath, “When more is le ss: Incorporating additional datasets can hurt performance by introducing spurious correlat ions,” in Proceedings of Machine Learning Research, vol. 219, 2023, pp. 1–24

  8. [8]

    Bias in medical ai: I mplications for clinical decision- making,

    J. L. Cross, M. A. Choma, and J. A. Onofrey, “Bias in medical ai: I mplications for clinical decision- making,” PLOS Digital Health , vol. 3, no. 11, p. e0000651, 2024

  9. [9]

    Common pitfalls and recommendations for using machine learning to detect and prognosticate for covid-19 using chest radiographs and ct sc ans,

    M. Roberts, D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Urspr ung, A. I. Aviles-Rivero, C. Etmann, C. McCague, L. Beer et al., “Common pitfalls and recommendations for using machine learning to detect and prognosticate for covid-19 using chest radiographs and ct sc ans,” Nature Machine Intelligence , vol. 3, no. 3, pp. 199–217, 2021

  10. [10]

    Benchmarking of eeg analysis techniques for parkinson ’s disease diagnosis: A compar- ison between traditional ml methods and foundation dl methods,

    D. Avola, A. Bernardini, G. Crocetti, A. Ladogana, M. Lezoche , M. Mancini, D. Pannone, and A. Ranaldi, “Benchmarking of eeg analysis techniques for parkinson ’s disease diagnosis: A compar- ison between traditional ml methods and foundation dl methods,” arXiv preprint arXiv:2507.13716 , 2025

  11. [11]

    Resting-state eeg measures cognitive impairment in parkinson’s dis ease,

    M. F. Anjum, A. I. Espinoza, R. C. Cole, A. Singh, P. May, E. Y. U c, S. Dasgupta, and N. S. Narayanan, “Resting-state eeg measures cognitive impairment in parkinson’s dis ease,” npj Parkinson’s Disease , vol. 10, no. 6, 2024

  12. [12]

    Eeg mortality dataset in parkinson’s disease,

    P. May, E. Y. Uc, and N. S. Narayanan, “Eeg mortality dataset in parkinson’s disease,” 2023. [Online]. Available: https://openneuro.org/datasets/ds007020 42

  13. [13]

    Parkinson’s disease detection from resting-state eeg signals using common spatial pat tern, entropy, and machine learning techniques,

    M. Aljalal, S. A. Aldosari, K. AlSharabi, A. M. Abdurraqeeb, and F . A. Alturki, “Parkinson’s disease detection from resting-state eeg signals using common spatial pat tern, entropy, and machine learning techniques,” Diagnostics, vol. 12, no. 5, p. 1033, 2022

  14. [14]

    Deep learning-based electroencephalography analysis: a systematic review,

    Y. Roy, H. Banville, I. Albuquerque, A. Gramfort, T. H. Falk, an d J. Faubert, “Deep learning-based electroencephalography analysis: a systematic review,” Journal of Neural Engineering , vol. 16, no. 5, p. 051001, 2019

  15. [15]

    Uc san diego resting-state eeg data from patients with parkinson’s disease,

    A. P. Rockhill, N. Jackson, J. George, A. R. Aron, and N. C. Swa nn, “Uc san diego resting-state eeg data from patients with parkinson’s disease,” 20 20. [Online]. Available: https://openneuro.org/datasets/ds002778

  16. [16]

    Mid-frontal theta activity is diminished during cognitive control in parkinson’s disease,

    A. Singh, S. P. Richardson, N. Narayanan, and J. F. Cavanagh , “Mid-frontal theta activity is diminished during cognitive control in parkinson’s disease,” Neuropsychologia, vol. 117, pp. 113–122,

  17. [17]

    Available: https://www.sciencedirect.com/science/ar ticle/pii/S0028393218302185

    [Online]. Available: https://www.sciencedirect.com/science/ar ticle/pii/S0028393218302185

  18. [18]

    Frontal theta and beta oscillations during lower-limb movement in parkinson’s disease ,

    A. Singh, R. C. Cole, A. I. Espinoza, D. Brown, J. F. Cavanagh, and N. S. Narayanan, “Frontal theta and beta oscillations during lower-limb movement in parkinson’s disease ,” Clinical Neurophysiology, vol. 131, no. 3, pp. 694–702, 2020

  19. [19]

    Data leakage in deep learning studies of tr anslational eeg,

    G. Brookshire, J. Kasper, N. M. Blauch, Y. C. Wu, R. Glatt, D. A . Merrill, S. Gerrol, K. J. Yoder, C. Quirk, and C. Lucero, “Data leakage in deep learning studies of tr anslational eeg,” Frontiers in Neuroscience, vol. 18, p. 1373515, 2024

  20. [20]

    Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure,

    D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera -Arroita, S. Hauenstein, J. J. Lahoz- Monfort, B. Schr¨ oder, W. Thuiller et al. , “Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure,” Ecography, vol. 40, no. 8, pp. 913–929, 2017

  21. [21]

    Hyperparameter tuning and performance assessment of statistical and machine-learning algo rithms using spatial data,

    P. Schratz, J. Muenchow, E. Iturritxa, J. Richter, and A. Br enning, “Hyperparameter tuning and performance assessment of statistical and machine-learning algo rithms using spatial data,” Ecological Modelling, vol. 406, pp. 109–120, 2019

  22. [22]

    Methodological issues in evaluating machine learning models for eeg seizure prediction: Good cross-validation accuracy does not guarantee generalizatio n to new patients,

    S. Shafiezadeh, G. M. Duma, G. Mento, A. Danieli, L. Antoniazzi, F. Del Popolo Cristaldi, P. Bonanni, and A. Testolin, “Methodological issues in evaluating machine learning models for eeg seizure prediction: Good cross-validation accuracy does not guarantee generalizatio n to new patients,” Applied Sciences , vol. 13, no. 7, p. 4262, 2023

  23. [23]

    A theory of learning from different domains,

    S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, a nd J. W. Vaughan, “A theory of learning from different domains,” Machine learning , vol. 79, no. 1, pp. 151–175, 2010. 43

  24. [24]

    Domain adaptatio n with multiple sources,

    Y. Mansour, M. Mohri, and A. Rostamizadeh, “Domain adaptatio n with multiple sources,” Advances in neural information processing systems , vol. 21, 2008

  25. [25]

    Unbiased look at dataset bias,

    A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR 2011 . IEEE, 2011, pp. 1521–1528

  26. [26]

    arXiv preprint arXiv:2007.01434 , year=

    I. Gulrajani and D. Lopez-Paz, “In search of lost domain gene ralization,” arXiv preprint arXiv:2007.01434, 2020

  27. [27]

    The precision-recall plot is more inf ormative than the roc plot when evaluating binary classifiers on imbalanced datasets,

    T. Saito and M. Rehmsmeier, “The precision-recall plot is more inf ormative than the roc plot when evaluating binary classifiers on imbalanced datasets,” PloS one , vol. 10, no. 3, p. e0118432, 2015

  28. [28]

    Assessing the performance of prediction models: a framework for traditional and novel measures,

    E. W. Steyerberg, A. J. Vickers, N. R. Cook, T. Gerds, M. Gon en, N. Obuchowski, M. J. Pencina, and M. W. Kattan, “Assessing the performance of prediction models: a framework for traditional and novel measures,” Epidemiology, vol. 21, no. 1, pp. 128–138, 2010

  29. [29]

    Deep asymmetric transfer netwo rk for unbalanced domain adaptation,

    D. Wang, P. Cui, and W. Zhu, “Deep asymmetric transfer netwo rk for unbalanced domain adaptation,” in Proceedings of the AAAI Conference on Artificial Intelligen ce, vol. 32, no. 1, 2018

  30. [30]

    The ten –twenty electrode system of the international federation,

    G. H. Klem, H. O. L¨ uders, H. H. Jasper, and C. Elger, “The ten –twenty electrode system of the international federation,” Electroencephalography and Clinical Neurophysiology, vol. 52, pp. 3–6, 1999

  31. [31]

    Amer ican clinical neurophysiology so- ciety guideline 3: A proposal for standard montages to be used in clin ical eeg,

    J. N. Acharya, A. Hani, P. Thirumala, and T. N. Tsuchida, “Amer ican clinical neurophysiology so- ciety guideline 3: A proposal for standard montages to be used in clin ical eeg,” Journal of Clinical Neurophysiology, vol. 33, no. 4, pp. 312–316, 2016

  32. [32]

    Deep learning with convo lutional neural networks for eeg decoding and visualization,

    R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tanger- mann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convo lutional neural networks for eeg decoding and visualization,” Human Brain Mapping , 2017

  33. [33]

    Explainable artificial int elligence: Understanding, visual- izing and interpreting deep learning models,

    W. Samek, T. Wiegand, and K.-R. M¨ uller, “Explainable artificial int elligence: Understanding, visual- izing and interpreting deep learning models,” ITU Journal: ICT Discoveries , 2017

  34. [34]

    Linear predictive coding distinguishes spectral eeg features of parkinso n’s disease,

    M. F. Anjum, S. Dasgupta, R. Mudumbai, A. Singh, J. F. Cavana gh, and N. S. Narayanan, “Linear predictive coding distinguishes spectral eeg features of parkinso n’s disease,” Parkinsonism & related disorders, vol. 79, pp. 79–85, 2020

  35. [35]

    Dynamical system based compact deep hybrid network for classification of parkinson disease related eeg signals,

    S. A. A. Shah, L. Zhang, and A. Bais, “Dynamical system based compact deep hybrid network for classification of parkinson disease related eeg signals,” Neural Networks , vol. 130, pp. 75–84, 2020. 44

  36. [36]

    A novel parkinson’s disease diagnosis index using higher-order spectra features in eeg signals,

    R. Yuvaraj, U. Rajendra Acharya, and Y. Hagiwara, “A novel parkinson’s disease diagnosis index using higher-order spectra features in eeg signals,” Neural Computing and Applications , vol. 30, no. 4, pp. 1225–1235, 2018

  37. [37]

    A deep convolutional-r ecurrent neural network architecture for parkinson’s disease eeg classification,

    S. Lee, R. Hussein, and M. J. McKeown, “A deep convolutional-r ecurrent neural network architecture for parkinson’s disease eeg classification,” pp. 1–4, 2019

  38. [38]

    Multi- scale feature and multi-channel selection toward parkinson’s disease diagnosis with eeg,

    H. Wu, J. Qi, E. Purwanto, X. Zhu, P. Yang, and J. Chen, “Multi- scale feature and multi-channel selection toward parkinson’s disease diagnosis with eeg,” Sensors, vol. 24, no. 14, p. 4634, 2024

  39. [39]

    Gepd: Gan-enhanced general- izable model for eeg-based detection of parkinson’s disease,

    Q. Zhang, R. Zhang, B. Zhu, J. Xiao, Y. Liu, X. Han, and Z. Wang , “Gepd: Gan-enhanced general- izable model for eeg-based detection of parkinson’s disease,” in International Conference on Intelligent Computing. Springer, 2025, pp. 311–322

  40. [40]

    Consistent cross-validatory model-selection for d ependent data: hv-block cross-validation,

    J. Racine, “Consistent cross-validatory model-selection for d ependent data: hv-block cross-validation,” Journal of econometrics , vol. 99, no. 1, pp. 39–61, 2000

  41. [41]

    blockcv: An r package for generating spatially or environmentally separated folds for k-fold cross-valida tion of species distribution models,

    R. Valavi, J. Elith, J. J. Lahoz-Monfort, and G. Guillera-Arroita , “blockcv: An r package for generating spatially or environmentally separated folds for k-fold cross-valida tion of species distribution models,” Biorxiv, p. 357798, 2018

  42. [42]

    Machine-learning-based diagnostics of eeg pathology,

    L. A. W. Gemein, R. T. Schirrmeister, P. Chrabaszcz, D. Wilson, J. Boedecker, A. Schulze-Bonhage, F. Hutter, and T. Ball, “Machine-learning-based diagnostics of eeg pathology,” NeuroImage, 2020

  43. [43]

    Channel selected stratified nested cross validation for clinically relevant eeg based parkinsons d isease detection,

    N. R. Rasmussen, R. Rizk, L. Wang, A. Singh, and K. Santosh, “ Channel selected stratified nested cross validation for clinically relevant eeg based parkinsons d isease detection,” arXiv preprint arXiv:2601.05276, 2025

  44. [44]

    Elevated synchrony in parkinson disease detected with electroencephalogr aphy,

    N. C. Swann, C. de Hemptinne, A. R. Aron, J. L. Ostrem, R. T. K night, and P. A. Starr, “Elevated synchrony in parkinson disease detected with electroencephalogr aphy,” Annals of Neurology , vol. 78, no. 5, pp. 742–750, Nov. 2015

  45. [45]

    Brain activity response to visual cues for gait impairment in parkinson’s disease: an eeg study,

    S. Stuart, J. Wagner, S. Makeig, and M. Mancini, “Brain activity response to visual cues for gait impairment in parkinson’s disease: an eeg study,” Neurorehabilitation and neural repair, vol. 35, no. 11, pp. 996–1009, 2021

  46. [46]

    Deepwhale net: climate change-aware fft-based deep neural network for passive acoustic monitoring,

    N. Rasmussen, R. Rizk, O. Matoo, and K. Santosh, “Deepwhale net: climate change-aware fft-based deep neural network for passive acoustic monitoring,” International Journal of Pattern Recognition and Artificial Intelligence , vol. 38, no. 14, p. 2459014, 2024. 45

  47. [47]

    Ecologica lly valid benchmarking and adaptive attention: Scalable marine bioacoustic monitoring,

    N. R. Rasmussen, R. Rizk, L. Wang, and K. Santosh, “Ecologica lly valid benchmarking and adaptive attention: Scalable marine bioacoustic monitoring,” arXiv preprint arXiv:2509.04682 , 2025

  48. [48]

    Deep Learning Scaling is Predictable, Empirically

    J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kian inejad, M. M. A. Patwary, Y. Yang, and Y. Zhou, “Deep learning scaling is predictable, empirically,” arXiv preprint arXiv:1712.00409, 2017

  49. [49]

    Reconciling modern mach ine-learning practice and the classical bias–variance trade-off,

    M. Belkin, D. Hsu, S. Ma, and S. Mandal, “Reconciling modern mach ine-learning practice and the classical bias–variance trade-off,” Proceedings of the National Academy of Sciences , vol. 116, no. 32, pp. 15 849–15 854, 2019

  50. [50]

    A survey on transfer learning,

    S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009

  51. [51]

    How transfera ble are features in deep neural networks?

    J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transfera ble are features in deep neural networks?” Advances in neural information processing systems , vol. 27, 2014

  52. [52]

    Issues and recommendations fr om the ohbm cobidas meeg committee for reproducible eeg and meg research,

    C. Pernet, M. I. Garrido, A. Gramfort, N. Maurits, C. M. Miche l, E. Pang, R. Salmelin, J. M. Schoffelen, P. A. Valdes-Sosa, and A. Puce, “Issues and recommendations fr om the ohbm cobidas meeg committee for reproducible eeg and meg research,” Nature neuroscience, vol. 23, no. 12, pp. 1473–1483, 2020

  53. [53]

    Single-trial analysis and classification of erp components—a tutorial,

    B. Blankertz, S. Lemm, M. Treder, S. Haufe, and K.-R. M¨ uller, “Single-trial analysis and classification of erp components—a tutorial,” NeuroImage, vol. 56, no. 2, pp. 814–825, 2011

  54. [54]

    Invariant Risk Minimization

    M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “In variant risk minimization,” arXiv preprint arXiv:1907.02893, 2019

  55. [55]

    An introduction to variable and featu re selection,

    I. Guyon and A. Elisseeff, “An introduction to variable and featu re selection,” Journal of machine learning research, vol. 3, no. Mar, pp. 1157–1182, 2003

  56. [56]

    Eeg-based classification of parkinson’s disease with freezing of ga it using midfrontal beta oscillations,

    S. Roy, J. Nuamah, T. J. Bosch, R. Barsainya, M. Scherer, T. Koeglsperger, K. Santosh, and A. Singh, “Eeg-based classification of parkinson’s disease with freezing of ga it using midfrontal beta oscillations,” J. Integr. Neurosci. , vol. 24, no. 6, 2025

  57. [57]

    Interpreting deep learning models for epileptic seizure detection on eeg signals,

    V. Gabeff, T. Teijeiro, M. Zapater, L. Cammoun, S. Rheims, P. R yvlin, and D. Atienza, “Interpreting deep learning models for epileptic seizure detection on eeg signals,” Artificial intelligence in medicine , vol. 117, p. 102084, 2021

  58. [58]

    Microsoft copilot: Ai companion for product ivity and research,

    M. Corporation, “Microsoft copilot: Ai companion for product ivity and research,” Online. Available: https://copilot.microsoft.com, 2025, accessed: Oct. 23, 2025. D eveloped by Microsoft as a large language model-based assistant for writing, coding, and research support . 46 Figure 1: Conceptual illustration of cross-population generalizatio n in ML-based...