arxiv: 2605.14066 · v1 · submitted 2026-05-13 · 📡 eess.AS · cs.AI· cs.CL· cs.SD

Recognition: no theorem link

A Benchmark for Early-stage Parkinson's Disease Detection from Speech

Terry Yi Zhong , Cristian Tejedor-Garcia , Khiet P. Truong , Janna Maas , Louis ten Bosch , Bastiaan R. Bloem

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:31 UTC · model grok-4.3

classification 📡 eess.AS cs.AIcs.CLcs.SD

keywords early-stage Parkinson's diseasespeech-based detectionbenchmarkspeaker-independent splitreplicable evaluationParkinson's speech tasks

0 comments

The pith

A benchmark with speaker-independent splits standardizes evaluation of speech-based early Parkinson's detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the first standardized benchmark for detecting early-stage Parkinson's disease from speech. Prior work has been hard to compare because studies use different datasets, languages, tasks, protocols, and definitions of early disease. The benchmark supplies a speaker-independent split on researcher-accessible datasets that cover three common speech tasks and multiple training-resource settings. Multi-dimensional breakdowns by dataset, aggregation level, gender, and disease stage are included to enable detailed, replicable comparisons. This structure supplies a common reference point that can accelerate development of reliable, non-invasive early-detection methods.

Core claim

We propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for fair and replicable cross-method evaluation on researcher-accessible datasets. The benchmark covers three common speech tasks and evaluates methods under different training-resource settings, together with multi-dimensional evaluation breakdowns by dataset, aggregation level, gender, and disease stage.

What carries the argument

Speaker-independent data split applied to three common speech tasks on publicly accessible datasets, enabling controlled training-resource experiments and fine-grained performance breakdowns.

If this is right

Methods can be compared directly under identical data splits and task conditions.
Performance can be assessed across low- and high-resource training regimes.
Breakdowns by gender and disease stage reveal where current approaches succeed or fail.
Public availability of the splits encourages reproducible research and clinical translation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread adoption could reduce reliance on private or mismatched datasets that currently hinder progress.
The same split structure could be reused for longitudinal tracking of speech changes over time.
Mobile or web-based screening tools might eventually be validated against the benchmark before clinical trials.

Load-bearing premise

The chosen datasets and speech tasks represent real-world early-stage Parkinson's cases, and the speaker-independent split prevents leakage while supporting generalization to new patients.

What would settle it

A method that ranks highest on the benchmark yet shows no improvement over chance when tested on an independent clinical cohort of early-stage Parkinson's patients from a different recording environment would falsify the usefulness of the benchmark.

read the original abstract

Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and EarlyPD definitions. To address this issue, we propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for fair and replicable cross-method evaluation on researcher-accessible datasets. The benchmark covers three common speech tasks and evaluates methods under different training-resource settings. We also present multi-dimensional evaluation breakdowns by dataset, aggregation level, gender, and disease stage to support fine-grained comparisons and clinical adoption. Our results provide a replicable reference and actionable insights, encouraging the adoption of this publicly available benchmark to advance robust and clinically meaningful EarlyPD detection from speech.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proposes a practical benchmark for early PD speech detection with speaker-independent splits, but the split's leakage resistance needs explicit verification in the methods.

read the letter

The core contribution here is a benchmark for early-stage Parkinson's detection from speech that uses speaker-independent splits on accessible datasets and covers three standard tasks with multi-dimensional breakdowns. That directly tackles the comparability problem across prior studies, which is a real pain point in the area. They also include evaluations under varying training data amounts and breakdowns by dataset, gender, and disease stage, which gives a replicable reference point others can build on. The proposal itself is logically clean with no circular claims or fitted parameters to worry about. Credit to them for focusing on fairness and clinical relevance rather than just another model tweak. The soft spot is the speaker split. The value of the whole thing rests on every recording from a speaker landing in only one partition, with consistent speaker IDs across tasks and sessions. The abstract does not detail how that linkage is enforced or confirm it holds for all datasets, so the no-leakage and generalization claims are hard to assess without the full protocol. Dataset representativeness for real early PD cases is also untested against broader clinical distributions. This is for speech researchers and clinicians who evaluate detection methods and want a shared testbed. A reader working on PD speech models would get concrete value from the splits and metrics for fair comparisons. It shows clear thinking on the fragmentation issue. I would bring it to a reading group as a maybe to walk through the split construction. I would not cite it in my own work unless I adopted the benchmark. It deserves peer review so the community can check the implementation details and tighten the protocol if needed.

Referee Report

2 major / 2 minor

Summary. The paper proposes the first benchmark for speech-based EarlyPD detection, featuring speaker-independent splits on researcher-accessible datasets, three common speech tasks, evaluations under varying training-resource settings, and multi-dimensional breakdowns by dataset, aggregation level, gender, and disease stage to enable fair, replicable cross-method comparisons.

Significance. If the speaker-independent splits are correctly implemented without leakage and the datasets adequately represent real-world EarlyPD cases, the benchmark would provide a much-needed standardized framework for comparing methods in an area where inconsistent protocols have hindered progress, supporting more robust and clinically meaningful research.

major comments (2)

[Section 3.2] Section 3.2 (Speaker-independent split definition): The protocol does not explicitly verify or demonstrate that every recording from a given speaker—across all sessions, tasks, and datasets—is assigned to exactly one partition; if speaker IDs are not globally consistent or if linkage is incomplete, the split permits leakage and the generalization claim does not hold.
[Section 4.3] Section 4.3 (Dataset characteristics and representativeness): No quantitative comparison is provided between the selected datasets' EarlyPD distributions (age, severity, language) and external clinical cohorts; without this, the claim that the benchmark supports clinically meaningful evaluation remains unanchored.

minor comments (2)

[Table 1] Table 1: The column headers for training-resource settings are not fully defined in the caption, making it difficult to interpret the reported metrics without cross-referencing the text.
[Section 5.1] Section 5.1: The aggregation-level breakdown would benefit from an explicit statement of how per-speaker versus per-recording metrics are computed to avoid ambiguity in the reported scores.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the potential value of the proposed benchmark. We address each major comment below and describe the revisions we will incorporate to strengthen the manuscript.

read point-by-point responses

Referee: [Section 3.2] Section 3.2 (Speaker-independent split definition): The protocol does not explicitly verify or demonstrate that every recording from a given speaker—across all sessions, tasks, and datasets—is assigned to exactly one partition; if speaker IDs are not globally consistent or if linkage is incomplete, the split permits leakage and the generalization claim does not hold.

Authors: We agree that explicit verification is essential to substantiate the no-leakage claim. In the revised manuscript we will expand Section 3.2 with (i) a step-by-step description of the global speaker-ID linkage procedure across all datasets and sessions, (ii) pseudocode of the verification routine, and (iii) tabulated results confirming that every speaker appears in exactly one partition. The accompanying code repository will be updated to expose this verification function so readers can reproduce the check. revision: yes
Referee: [Section 4.3] Section 4.3 (Dataset characteristics and representativeness): No quantitative comparison is provided between the selected datasets' EarlyPD distributions (age, severity, language) and external clinical cohorts; without this, the claim that the benchmark supports clinically meaningful evaluation remains unanchored.

Authors: We acknowledge that a quantitative anchor to external cohorts would strengthen clinical relevance claims. Because the benchmark is deliberately restricted to researcher-accessible datasets, obtaining matched statistics from closed clinical cohorts would require new data-access agreements outside the present scope. In the revision we will add a dedicated limitations paragraph in Section 4.3 that (a) qualitatively situates the benchmark datasets against published clinical summaries (age, UPDRS ranges, language) and (b) explicitly flags the absence of quantitative external benchmarking as a limitation, recommending it as future work once broader data-sharing agreements exist. revision: partial

Circularity Check

0 steps flagged

No circularity: benchmark proposal is self-contained with no derivations or self-referential reductions

full rationale

The paper proposes an evaluation benchmark and speaker-independent data split for EarlyPD speech detection without any mathematical derivations, fitted parameters, or load-bearing self-citations. The core claim (first benchmark with replicable split on accessible datasets) is a direct methodological contribution whose validity rests on external dataset properties and standard ML practices rather than reducing to its own inputs by construction. No equations, ansatzes, or uniqueness theorems are invoked that collapse back to the paper's own definitions or prior self-citations. The speaker-independent split is presented as an engineering choice whose correctness is verifiable against the datasets themselves, not assumed via internal logic.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are involved, as this is a benchmark proposal paper focused on evaluation protocols rather than theoretical derivations or new postulated entities.

pith-pipeline@v0.9.0 · 5454 in / 983 out tokens · 50202 ms · 2026-05-15T05:31:28.461176+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Speech impairment can appear early, sometimes years before prominent motor symptoms, and typically worsens with disease progression [2, 3]

Introduction Parkinson’s disease (PD) is the second most prevalent neurodegenerative disorder, affecting over 10 million people worldwide [1]. Speech impairment can appear early, sometimes years before prominent motor symptoms, and typically worsens with disease progression [2, 3]. This has motivated a recent interest in speech-based PD detection as a sca...

work page
[2]

A Benchmark for Early-stage Parkinson's Disease Detection from Speech

Benchmark Setup 2.1. Criteria for Early-Stage PD In prior studies, the definition of EarlyPD has not been standardized. Some studies rely on the MDS-UPDRS [20], others use the H&Y scale [22], and many also consider time after diagnosis (TAD), but no consistent rule exists [20, 23, 24]. We adopt the eligibility criteria specified in [23]: (i) Hoehn & arXiv...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

HC detection

Benchmark Protocol In this section, we describe our proposed benchmark protocol for binary speech-based EarlyPD vs. HC detection. We will release all materials needed to replicate the benchmark.1 3.1. Task Selection and Configuration All experiments in this paper are trained in a single-task setting. In the open track, we run experiments separately on thr...

work page
[4]

Training Data Settings We benchmark speech-based EarlyPD detection under four training-data settings

Experimental Setup 4.1. Training Data Settings We benchmark speech-based EarlyPD detection under four training-data settings. To isolate the effect of the PD speakers, we maintain the HC cohorts the same across all configurations: 1.AllPD (EarlyPD+non-EarlyPD):Train on the full set of PD speakers across all stages from the benchmark datasets. Table 1:Resu...

work page
[5]

Main Results Table 1 presents the main benchmark results across all training settings, models, and tasks

Results and Discussion 5.1. Main Results Table 1 presents the main benchmark results across all training settings, models, and tasks. We first examine the results following the comparisons defined in Section 4.1. For comparison (i), training exclusively on early-stage patients (EarlyPD) versus the matched subset (AllPD-sub) resulted in improvement on DDK ...

work page
[6]

Due to the limited support for spontaneous speech in existing open-source methods, it was not included in the present study

and exhibits the lowest deltas in the multi-dimensional analysis (Tables 3 and 4), whereas vowel-based evaluation is consistently more challenging, in line with prior findings [11, 37]. Due to the limited support for spontaneous speech in existing open-source methods, it was not included in the present study. We encourage future work to benchmark spontane...

work page
[7]

Conclusion This paper presents the first benchmark for speech-based EarlyPD detection, addressing the long-standing lack of comparability across prior studies. This benchmark provides a transparent and well-controlled protocol under different training-resource settings, including open tracks to ensure full comparability and private tracks to study the ben...

work page
[8]

This work used the Dutch national e-infrastructure with the support of the SURF Cooperative using grant no

Acknowledgments This publication is part of the project Responsible AI for V oice Diagnostics (RAIVD) with file number NGF.1607.22.013 of the research program NGF AiNed Fellowship Grants, which is financed by the Dutch Research Council (NWO). This work used the Dutch national e-infrastructure with the support of the SURF Cooperative using grant no. EINF-1...

work page
[9]

The core scientific content, including the proposed benchmark, results, analysis, discussion, and conclusion, was produced solely by the human authors

Generative AI Use Disclosure Generative AI tools were used for editing and polishing the language and checking the grammar of this manuscript to improve clarity and readability. The core scientific content, including the proposed benchmark, results, analysis, discussion, and conclusion, was produced solely by the human authors. All authors take full respo...

work page
[10]

Parkinson’s disease,

B. R. Bloem, M. S. Okun, and C. Klein, “Parkinson’s disease,” The Lancet, vol. 397, no. 10291, pp. 2284–2303, 2021

work page 2021
[11]

Progression of voice and speech impairment in the course of Parkinson’s disease: A longitudinal study,

S. Skodda, W. Grönheit, N. Mancinelli, and U. Schlegel, “Progression of voice and speech impairment in the course of Parkinson’s disease: A longitudinal study,”Parkinson’s Disease, vol. 2013, no. 1, p. 389195, 2013

work page 2013
[12]

Communication impairment in parkinson’s disease: Impact of motor and cognitive symptoms on speech and language,

K. M. Smith and D. N. Caplan, “Communication impairment in parkinson’s disease: Impact of motor and cognitive symptoms on speech and language,”Brain and language, vol. 185, pp. 38–46, 2018

work page 2018
[13]

Innovative speech-based deep learning approaches for Parkinson’s disease classification: A systematic review,

L. van Gelderen and C. Tejedor-Garcia, “Innovative speech-based deep learning approaches for Parkinson’s disease classification: A systematic review,”Applied Sciences, vol. 14, p. 7873, 2024

work page 2024
[14]

Machine learning applications for diagnosing parkinson’s disease via speech, language, and voice changes: A systematic review,

M. A. Hossain, E. Traini, and F. Amenta, “Machine learning applications for diagnosing parkinson’s disease via speech, language, and voice changes: A systematic review,”Inventions, vol. 10, no. 4, p. 48, 2025

work page 2025
[15]

V oice-based detection of parkinson’s disease using machine and deep learning approaches: A systematic review,

H. Sedigh Malekroodi, B.-i. Lee, and M. Yi, “V oice-based detection of parkinson’s disease using machine and deep learning approaches: A systematic review,”Bioengineering, vol. 12, no. 11, p. 1279, 2025

work page 2025
[16]

Automatic assessment of Parkinson’s disease using speech representations of phonation and articulation,

Y . Liu, M. K. Reddy, N. Penttila, T. Ihalainen, P. Alku, and O. Rasanen, “Automatic assessment of Parkinson’s disease using speech representations of phonation and articulation,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 242–255, 2023

work page 2023
[17]

Speech as a biomarker for disease detection,

C. Botelho, A. Abad, T. Schultz, and I. Trancoso, “Speech as a biomarker for disease detection,”IEEE Access, vol. 12, pp. 184 487–184 508, 2024

work page 2024
[18]

Bilingual dual-head deep model for parkinson’s disease detection from speech,

M. La Quatra, J. R. Orozco-Arroyave, and M. S. Siniscalchi, “Bilingual dual-head deep model for parkinson’s disease detection from speech,” inICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5

work page 2025
[19]

Pre-trained convolutional neural networks identify parkinson’s disease from spectrogram images of voice samples,

Y . Rahmatallah, A. S. Kemp, A. Iyer, L. Pillai, L. J. Larson- Prior, T. Virmani, and F. Prior, “Pre-trained convolutional neural networks identify parkinson’s disease from spectrogram images of voice samples,”Scientific Reports, vol. 15, no. 1, p. 7337, 2025

work page 2025
[20]

T. Y . Zhong, C. Tejedor-Garcia, M. Larson, and B. R. Bloem, RECA-PD: A Robust Explainable Cross-Attention Method for Speech-Based Parkinson’s Disease Classification. Springer Nature Switzerland, Aug. 2025, p. 343–355. [Online]. Available: http://dx.doi.org/10.1007/978-3-032-02548-7_29

work page doi:10.1007/978-3-032-02548-7_29 2025
[21]

Speech and language biomarkers for Parkinson’s disease prediction, early diagnosis and progression,

F. Cao, A. P. V ogel, P. Gharahkhani, and M. E. Renteria, “Speech and language biomarkers for Parkinson’s disease prediction, early diagnosis and progression,”npj Parkinson’s Disease, vol. 11, no. 1, p. 57, 2025

work page 2025
[22]

V oice-based early diagnosis of parkinson’s disease using spectrogram features and ai models,

D. Quamar, V . Ambeth Kumar, M. Rizwan, O. Bagdasar, and M. Kadar, “V oice-based early diagnosis of parkinson’s disease using spectrogram features and ai models,”Bioengineering, vol. 12, no. 10, p. 1052, 2025

work page 2025
[23]

Explainable artificial intelligence to diagnose early parkinson’s disease via voice analysis,

M. Shen, P. Mortezaagha, and A. Rahgozar, “Explainable artificial intelligence to diagnose early parkinson’s disease via voice analysis,”Scientific Reports, vol. 15, no. 1, p. 11687, 2025

work page 2025
[24]

Unveiling early signs of Parkinson’s disease via a longitudinal analysis of celebrity speech recordings,

A. Favaro, A. Butala, T. Thebaud, J. Villalba, N. Dehak, and L. Moro-Velázquez, “Unveiling early signs of Parkinson’s disease via a longitudinal analysis of celebrity speech recordings,”npj Parkinson’s Disease, vol. 10, no. 1, p. 207, 2024

work page 2024
[25]

Parkinsonism: onset, progression, and mortality,

M. M. Hoehn and M. D. Yahr, “Parkinsonism: onset, progression, and mortality,”Neurology, vol. 17, no. 5, pp. 427–427, 1967

work page 1967
[26]

X-vectors: new quantitative biomarkers for early parkinson’s disease detection from speech,

L. Jeancolas, D. Petrovska-Delacrétaz, G. Mangone, B.-E. Benkelfat, J.-C. Corvol, M. Vidailhet, S. Lehéricy, and H. Benali, “X-vectors: new quantitative biomarkers for early parkinson’s disease detection from speech,”Frontiers in Neuroinformatics, vol. 15, p. 578369, 2021

work page 2021
[28]

A multilingual speech analysis framework for robust and explainable early detection of parkinson’s disease,

H. Zebidi, Z. BenMessaoud, M. Frikha, and A. Hacine-Gharbi, “A multilingual speech analysis framework for robust and explainable early detection of parkinson’s disease,”International Journal of Speech Technology, vol. 29, no. 1, p. 1, 2026

work page 2026
[29]

Does language matter for early detection of parkinson’s disease from speech?

P. Plantinga, B. Cordelle, D. Louër, M. Ravanaelli, and D. Klein, “Does language matter for early detection of parkinson’s disease from speech?” in2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP), 2025, pp. 1–6

work page 2025
[30]

Movement disorder society-sponsored revision of the unified parkinson’s disease rating scale (mds-updrs): scale presentation and clinimetric testing results,

C. G. Goetz, B. C. Tilley, S. R. Shaftman, G. T. Stebbins, S. Fahn, P. Martinez-Martin, W. Poewe, C. Sampaio, M. B. Stern, R. Dodelet al., “Movement disorder society-sponsored revision of the unified parkinson’s disease rating scale (mds-updrs): scale presentation and clinimetric testing results,”Movement disorders: official journal of the Movement Disord...

work page 2008
[31]

V oice in parkinson’s disease: a machine learning study,

A. Suppa, G. Costantini, F. Asci, P. Di Leo, M. S. Al-Wardat, G. Di Lazzaro, S. Scalise, A. Pisani, and G. Saggio, “V oice in parkinson’s disease: a machine learning study,”Frontiers in neurology, vol. 13, p. 831428, 2022

work page 2022
[32]

Virtual exam for parkinson’s disease enables frequent and reliable remote measurements of motor function,

M. Burq, E. Rainaldi, K. C. Ho, C. Chen, B. R. Bloem, L. J. Evers, R. C. Helmich, L. Myers, W. J. Marks Jr, and R. Kapur, “Virtual exam for parkinson’s disease enables frequent and reliable remote measurements of motor function,”NPJ digital medicine, vol. 5, no. 1, p. 65, 2022

work page 2022
[33]

EW A-DB, slovak database of speech affected by neurodegenerative diseases,

M. Rusko, R. Sabo, M. Trnka, A. Zimmermann, R. Malaschitz, E. Ružick `y, P. Brandoburová, V . Kevická, and M. Škorvánek, “EW A-DB, slovak database of speech affected by neurodegenerative diseases,”medRxiv, pp. 2023–10, 2023

work page 2023
[34]

A survey of open voice and speech datasets for the screening and evaluation of Parkinson’s Disease,

J. C. Puerta-Acevedo, M. F. Alcalá-Durand, J. D. Arias-Londoño, and J. I. Godino-Llorente, “A survey of open voice and speech datasets for the screening and evaluation of Parkinson’s Disease,” inAutomatic Assessment of Parkinsonian Speech. Springer Nature Switzerland, 2026, vol. 2646, pp. 31–50

work page 2026
[35]

New spanish speech corpus database for the analysis of people suffering from Parkinson’s disease,

J. R. Orozco-Arroyave, J. D. Arias-Londoño, J. F. Vargas-Bonilla, M. C. González-Rátiva, and E. Nöth, “New spanish speech corpus database for the analysis of people suffering from Parkinson’s disease,” inProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 14). ELRA, 2014, pp. 342–347

work page 2014
[36]

NeuroV oz: a Castillian Spanish corpus of parkinsonian speech,

J. Mendes-Laureano, J. A. Gómez-García, A. Guerrero-López, E. Luque-Buzo, J. D. Arias-Londoño, F. J. Grandas-Pérez, and J. I. Godino-Llorente, “NeuroV oz: a Castillian Spanish corpus of parkinsonian speech,”Scientific Data, vol. 11, no. 1, p. 1367, 2024

work page 2024
[37]

Design of the PERSPECTIVE study: PERsonalized SPEeCh Therapy for actIVE conversation in Parkinson’s disease (randomized controlled trial),

J. J. L. Maas, N. De Vries, B. Bloem, and J. Kalf, “Design of the PERSPECTIVE study: PERsonalized SPEeCh Therapy for actIVE conversation in Parkinson’s disease (randomized controlled trial),”Trials, vol. 23, no. 1, p. 274, 2022

work page 2022
[38]

Effectiveness of remotely delivered speech therapy in persons with Parkinson’s disease–a randomised controlled trial,

J. J. L. Maas, N. M. de Vries, J. IntHout, B. R. Bloem, and J. G. Kalf, “Effectiveness of remotely delivered speech therapy in persons with Parkinson’s disease–a randomised controlled trial,” EClinicalMedicine, vol. 76, 2024

work page 2024
[39]

A coherent interpretation of auc as a measure of aggregated classification performance,

C. Ferri, J. Hernández-Orallo, and P. A. Flach, “A coherent interpretation of auc as a measure of aggregated classification performance,” inProceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 657–664

work page 2011
[40]

Area under the ROC Curve has the most consistent evaluation for binary classification,

J. Li, “Area under the ROC Curve has the most consistent evaluation for binary classification,”PLOS ONE, vol. 19, no. 12, p. e0316019, Dec. 2024

work page 2024
[41]

PhoneMD: Learning to diagnose Parkinson’s disease from smartphone data,

P. Schwab and W. Karlen, “PhoneMD: Learning to diagnose Parkinson’s disease from smartphone data,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 1118–1125

work page 2019
[42]

BDHPD Github Repository,

M. La Quatra, J. R. Orozco-Arroyave, and M. S. Siniscalchi, “BDHPD Github Repository,” https://github. com/MorenoLaQuatra/BDHPD, 2025, accessed: 2026-03-04

work page 2025
[43]

PD-V oice GitHub Repository,

Y . Rahmatallah, A. S. Kemp, A. Iyer, L. Pillai, L. J. Larson-Prior, T. Virmani, and F. Prior, “PD-V oice GitHub Repository,” https: //github.com/uams-tri/PD-V oice, 2025, accessed: 2026-03-04

work page 2025
[44]

RECA-PD Github Repository,

T. Y . Zhong, “RECA-PD Github Repository,” https://github.com/ terryyizhongru/RECA-PD, 2025, accessed: 2026-03-04

work page 2025
[45]

Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson’s Disease Speech Data,

E. Postma and C. Tejedor-Garcia, “Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson’s Disease Speech Data,” inInterspeech 2025, 2025, pp. 4603–4607

work page 2025
[46]

Unveiling interpretability in self- supervised speech representations for Parkinson’s diagnosis,

D. Gimeno-Gómez, C. Botelho, A. Pompili, A. Abad, and C. Martínez-Hinarejos, “Unveiling interpretability in self- supervised speech representations for Parkinson’s diagnosis,” IEEE Journal of Selected Topics in Signal Processing, 2025

work page 2025