JavaScript is NOT enabled !

Please enable JavaScript.

Article

FT
PubMed

This record comes from PubMed

Ranking pre-trained speech embeddings in Parkinson's disease detection: Does Wav2Vec 2.0 outperform its 1.0 version across speech modes and languages?

Klempir, Ondrej
Author Klempir, Ondrej Department of Biomedical Informatics, Faculty of Biomedical Engineering, Czech Technical University in Prague, Sitna Square 3105, Kladno, Czech Republic
Skryjova, Adela
Author Skryjova, Adela Department of Biomedical Informatics, Faculty of Biomedical Engineering, Czech Technical University in Prague, Sitna Square 3105, Kladno, Czech Republic
Tichopad, Ales
Author Tichopad, Ales Department of Biomedical Technology, Faculty of Biomedical Engineering, Czech Technical University in Prague, Sitna Square 3105, Kladno, Czech Republic
Krupicka, Radim
Author Krupicka, Radim Department of Biomedical Informatics, Faculty of Biomedical Engineering, Czech Technical University in Prague, Sitna Square 3105, Kladno, Czech Republic

Computational and structural biotechnology journal. 2025 ; 27 () : 2584-2601. [epub] 20250607

Comput Struct Biotechnol J
ISSN 2001-0370
Source

Status PubMed-not-MEDLINE Language English Country Netherlands Media electronic-ecollection

Document type Journal Article

Persistent link https://www.medvik.cz/link/pmid40586101

Online Full text

PubMed 40586101
PubMed Central PMC12206144
DOI 10.1016/j.csbj.2025.06.022
PII: S2001-0370(25)00238-7
Knihovny.cz E-resources

Keywords
Classification, Parkinson's disease, Speech modes, Wav2vec 1.0, Wav2vec 2.0,
Publication type
Journal Article MeSH

Speech and language technologies are effective tools for identifying the distinct speech changes associated with Parkinson's disease (PD), enabling earlier and more accurate diagnosis. Models leveraging recent advancements in self-supervised speech pretraining, such as Wav2Vec, have demonstrated superior performance over traditional feature extraction methods. While Wav2Vec 2.0 has been successfully utilized for PD detection, a rigorous quantitative comparison with Wav2Vec 1.0 is needed to comprehensively evaluate its advantages, limitations, and applicability across different speech modes in PD. This study presents a systematic comparison of Wav2Vec 1.0 and Wav2Vec 2.0 embeddings across three multilingual datasets using various classification approaches to classify normal (healthy controls; HC) and PD-affected speech. Additionally, both Wav2Vec 1.0 and 2.0 were benchmarked against traditional baseline features across diverse linguistic contexts, including spontaneous speech, non-spontaneous speech, and isolated vowels. A multicriteria TOPSIS approach was employed to rank feature extraction methods, revealing that Wav2Vec 2.0 excelled across speech modes, with its first transformer layer demonstrating the best performance for classifying read text and monologue, and its feature extractor performing best in vowel-based classification. In contrast, Wav2Vec 1.0, while generally outperformed by Wav2Vec 2.0, still provided a more efficient alternative with competitive performance. Finally, we combined selected layers from both architectures and have demonstrated improved diagnostic accuracy in vowel-based classification. This comparative analysis underscores the strengths of both Wav2Vec architectures and informs their optimal use in PD detection.

Department of Biomedical Informatics Faculty of Biomedical Engineering Czech Technical University Prague Sitna Square 3105 Kladno Czech Republic

Department of Biomedical Technology Faculty of Biomedical Engineering Czech Technical University Prague Sitna Square 3105 Kladno Czech Republic

See more in PubMed

Alowais S.A., Alghamdi S.S., Alsuhebany N., Alqahtani T., Alshaya A.I., Almohareb S.N., Aldairem A., Alrashed M., Bin Saleh K., Badreldin H.A., Al Yami M.S., Al Harbi S., Albekairy A.M. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. Bmc Med Educ. 2023;23 doi: 10.1186/s12909-023-04698-z. PubMed DOI PMC

Rusz J., Krack P., Tripoliti E. From prodromal stages to clinical trials: the promise of digital speech biomarkers in parkinson's disease. Neurosci Biobehav Rev. 2024;167 doi: 10.1016/j.neubiorev.2024.105922. PubMed DOI

Favaro A., Butala A., Thebaud T., Villalba J., Dehak N., Moro-Velázquez L. Unveiling early signs of Parkinson’s disease via a longitudinal analysis of celebrity speech recordings. Npj Park'S Dis. 2024;10 doi: 10.1038/s41531-024-00817-9. PubMed DOI PMC

Nasersharif B., Namvarpour M. Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion. J Supercomput. 2024;80:23667–23688. doi: 10.1007/s11227-024-06158-x. DOI

Schneider S., Baevski A., Collobert R., Auli M. Wav2vec Unsupervised PreTrain Speech Recognit. 2019 doi: 10.48550/arXiv.1904.05862. (arXiv) DOI

Baevski A., Zhou H., Mohamed A., Auli M. wav2vec 2 0 A Framew SelfSupervised Learn Speech Represent. 2020 doi: 10.48550/arXiv.2006.11477. (arXiv) DOI

Javanmardi F., Kadiri S.R., Alku P. Pre-trained models for detection and severity level classification of dysarthria from speech. Speech Commun. 2024;158 doi: 10.1016/j.specom.2024.103047. DOI

Cai J., Song Y., Wu J., Chen X. Voice disorder classification using Wav2vec 2.0 feature extraction. J Voice. 2024 doi: 10.1016/j.jvoice.2024.09.002. PubMed DOI

Favaro A., Tsai Y.-T., Butala A., Thebaud T., Villalba J., Dehak N., Moro-Velázquez L. Interpretable speech features vs. Dnn embeddings: what to use in the automatic assessment of Parkinson’s disease in multi-lingual scenarios. Comput Biol Med. 2023;166 doi: 10.1016/j.compbiomed.2023.107559. PubMed DOI

La Quatra M., Turco M.F., Svendsen T., Salvi G., Orozco-Arroyave J.R., Siniscalchi S.M. in: Interspeech 2024. ISCA, ISCA; 2024. Exploiting foundation models and speech enhancement for parkinson's disease detection from speech in Real-World operative conditions; pp. 1405–1409. DOI

W. Xu, Z. Dong, J. Peng, R. Wang, Z. Zhang, BAHBench: A Unified Benchmark for Evaluating Bio-Acoustic Health with Acoustic Foundation Models, Ieee Journal Of Biomedical And Health Informatics (Early Access). 1-13. 10.1109/JBHI.2025.3543968. PubMed DOI

Kunešová M., Zajíc Z., Šmídl L., Karafiát M. Comparison of wav2vec 2.0 models on three speech processing tasks. Int J Speech Technol. 2024;27:847–859. doi: 10.1007/s10772-024-10140-6. DOI

Shah J., Singla Y.K., Chen Ch, Shah R.R. What all do Audio Transform Models Hear? Probing Acoust Represent Lang Deliv Struct. 2021 doi: 10.48550/arXiv.2101.00387. (arXiv) DOI

Purohit T., Ruvolo B., Orozco-Arroyave J.R., Magimai.-Doss M. in: Icassp 2025 - 2025 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp) IEEE; 2025. Automatic Parkinson’s disease detection from speech: layer selection vs adaptation of foundation models; pp. 1–5. DOI

Q. Dao, L. Jeancolas, G. Mangone, S. Sambin, A. Chalançon, M. Gomes, S. Lehéricy, J.-C. Corvol, M. Vidailhet, I. Arnulf, D.P. Delacrétaz, M.A. El-Yacoubi, Detection of Early Parkinson's Disease by Leveraging Speech Foundation Models, Ieee Journal Of Biomedical And Health Informatics (Early Access). 1-10. 10.1109/JBHI.2025.3548917. PubMed DOI

Javanmardi F., Kadiri S.R., Alku P. Exploring the impact of Fine-Tuning the Wav2vec2 model in Database-Independent detection of dysarthric speech. Ieee J Biomed Health Inform. 2024;28:4951–4962. doi: 10.1109/JBHI.2024.3392829. PubMed DOI

Sheikh S.A. Selfsupervised Learn Pathol Speech Detect. 2024 doi: 10.48550/arXiv.2406.02572. (arXiv) DOI

Sheikh S.A., Kodrasi I. Impact Speech Mode Autom Pathol Speech Detect. 2024 doi: 10.48550/arXiv.2406.09968. (arXiv) DOI

Yokoi K., Iribe Y., Kitaoka N., Tsuboi T., Hiraga K., Satake Y., Hattori M., Tanaka Y., Sato M., Hori A., Katsuno M. Analysis of spontaneous speech in parkinson's disease by natural language processing. Park Relat Disord. 2023;113 doi: 10.1016/j.parkreldis.2023.105411. PubMed DOI

Tröger J., Dörr F., Schwed L., Linz N., König A., Thies T., Barbe M.T., Orozco-Arroyave J.R., Rusz J. An automatic measure for speech intelligibility in dysarthrias—validation across multiple languages and neurological disorders. Front Digit Health. 2024;6 doi: 10.3389/fdgth.2024.1440986. PubMed DOI PMC

Smolik T., Krupicka R., Klempir O. Vol. 2024. IEEE; 2024. Assessing speech intelligibility and severity level in parkinson's disease using Wav2Vec 2.0; pp. 231–234. (47Th International Conference On Telecommunications And Signal Processing (Tsp)). DOI

Klempíř O., Příhoda D., Krupička R. Evaluating the performance of wav2vec embedding for parkinson's disease detection. Meas Sci Rev. 2023;23:260–267. doi: 10.2478/msr-2023-0033. DOI

Klempíř O., Krupička R. Analyzing Wav2Vec 1.0 embeddings for Cross-Database Parkinson’s disease detection and speech features extraction. Sensors. 2024;24 doi: 10.3390/s24175520. PubMed DOI PMC

Jaeger H., Trivedi D., Stadtschnitzer M. Mobile device voice recordings at king's college London (MDVR-KCL) from both early and advanced parkinson's disease patients and healthy controls [Data set] Zenodo. 2019 doi: 10.5281/zenodo.2867216. DOI

J.R. Orozco-Arroyave, J.D. Arias-Londoño, J.F. Vargas-Bonilla, M.C. González-Rátiva, E. Nöth, New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 342–347, Reykjavik, Iceland. European Language Resources Association (ELRA). 〈https://aclanthology.org/L14-1549/〉.

Iyer A., Kemp A., Rahmatallah Y., Pillai L., Glover A., Prior F., Larson-Prior L., Virmani T. A machine learning method to process voice samples for identification of Parkinson’s disease. Sci Rep. 2023;13 doi: 10.1038/s41598-023-47568-w. PubMed DOI PMC

Klempíř O. R. Krupička, Machine learning using speech utterances for parkinson disease detection. Clin Technol. 2018;48:66–71.

PyTorch Audio Resampling, Pytorch Documentation Pages. (2024). 〈https://pytorch.org/audio/main/tutorials/audio_resampling_tutorial.html#resampling-overview〉 (accessed June 5, 2025).

Tong H., Yang Z., Wang S., Hu Y., Semiari O., Saad W., Yin C. Federated learning for audio semantic communication. Front Commun Netw. 2021;2 doi: 10.3389/frcmn.2021.734402. DOI

Wav2Vec 1.0 Large, Fairseq. (2019). 〈https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_large.pt〉 (accessed June 5, 2025).

Wav2Vec 2.0 XLSR-53, Hugging Face. (2021). 〈https://huggingface.co/facebook/wav2vec2-large-xlsr-53〉 (accessed June 5, 2025).

Xu Q., Baevski A., Auli M. Simple Eff Zeroshot CrossLing Phoneme Recognit. 2021 doi: 10.48550/arXiv.2109.11680. (arXiv) DOI

Vetráb M., Gosztolya G. Speech And Computer. Springer Nature Switzerland; Cham: 2023. Aggregation strategies of Wav2vec 2.0 embeddings for computational paralinguistic tasks; pp. 79–93. DOI

Magateshvaren Saras M.A., Mitra M.K., Tyagi S. Navigating the multiverse: a Hitchhiker’s guide to selecting harmonization methods for multimodal biomedical data. Biol Methods Protoc. 2025;10 doi: 10.1093/biomethods/bpaf028. PubMed DOI PMC

Python Scikit-learn, Supervised Learning. (2024). 〈https://scikit-learn.org/stable/supervised_learning.html〉 (accessed June 5, 2025).

de Moura Rezende dos Santos F., Guedes de Oliveira Almeida F., Pereira Rocha Martins A.C., Bittencourt Reis A.C., Holanda M. Vol. 2018. IEEE; 2018. Ranking machine learning classifiers using multicriteria approach; pp. 168–174. (11Th International Conference On The Quality Of Information And Communications Technology (Quatic)). DOI

Rosina J., Rogalewicz V., Ivlev I., Juřičková I., Donin G., Jantosova N., Vacek J., Otawová R., Kneppo P. Health technology assessment for medical devices. Clin Technol. 2014;44:23–36.

Di Cesare M.G., Perpetuini D., Cardone D., Merla A. Machine Learning-Assisted speech analysis for early detection of Parkinson’s disease: a study on speaker diarization and classification techniques. Sensors. 2024;24 doi: 10.3390/s24051499. PubMed DOI PMC

Reszka J., Janbakhshi P., Purohit T., Mohammadi S. Invest Eff DiffusBased Cond Gener Speech Models Use Speech Enhanc Dysarthric Speech. 2024 doi: 10.48550/arXiv.2412.13933. (arXiv) DOI

D. Escobar-Grisales, C.D. Ríos-Urrego, I. Baumann, K. Riedhammer, E. Noeth, T. Bocklet, A.M. Garcia, J.R. Orozco-Arroyave, It’s Time to Take Action: Acoustic Modeling of Motor Verbs to Detect Parkinson’s Disease, in: Interspeech 2024, ISCA, ISCA, 2024: pp. 1965-1969. 10.21437/Interspeech.2024-2205. DOI

Karan B., Sekhar Sahu S. An improved framework for Parkinson’s disease prediction using variational mode Decomposition-Hilbert spectrum of speech signal. Biocybern Biomed Eng. 2021;41:717–732. doi: 10.1016/j.bbe.2021.04.014. DOI

Hireš M., Drotár P., Pah N.D., Ngo Q.C., Kumar D.K. On the inter-dataset generalization of machine learning approaches to parkinson's disease detection from voice. Int J Med Inform. 2023;179 doi: 10.1016/j.ijmedinf.2023.105237. PubMed DOI

da Silva D.H., da L.R., Souza S., Ribeiro C.T., da S.H., Brasileiro S., Nardo J.R.M., Pereira A.A., de A., Andrade O., Web A. Application for exploratory data analysis and classification of Parkinson’s disease patients using machine learning models on different datasets. Softw Impacts. 2025;23 doi: 10.1016/j.simpa.2024.100737. DOI

Borrow
RIS

Find record

In BMC

Ranking pre-trained speech embeddings in Parkinson's disease detection: Does Wav2Vec 2.0 outperform its 1.0 version across speech modes and languages?

Find record

Citation metrics

Archiving options