Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

. 2015 ; 2015 () : 316325. [epub] 20150602

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid26136813

Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text "The North Wind and the Sun" were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.

Zobrazit více v PubMed

Maryn Y., Roy N. Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity. Jornal da Sociedade Brasileira de Fonoaudiologia. 2012;24(2):107–112. doi: 10.1590/S2179-64912012000200003. PubMed DOI

Oates J. Auditory-perceptual evaluation of disordered voice quality: pros, cons and future directions. Folia Phoniatrica et Logopaedica. 2009;61(1):49–56. doi: 10.1159/000200768. PubMed DOI

Kempster G. B., Gerratt B. R., Abbott K. V., Barkmeier-Kraemer J., Hillman R. E. Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. American Journal of Speech-Language Pathology. 2009;18(2):124–132. doi: 10.1044/1058-0360(2008/08-0017). PubMed DOI

Kreiman J., Gerratt B. R. The perceptual structure of pathologic voice quality. Journal of the Acoustical Society of America. 1996;100(3):1787–1795. doi: 10.1121/1.416074. PubMed DOI

Hirano M. Clinical Examination of Voice. New York, NY, USA: Springer; 1981.

Nawka T., Anders L.-C., Wendler J. Die auditive Beurteilung heiserer Stimmen nach dem RBH-System. Sprache Stimme Gehör. 1994;18(3):130–133.

Maryn Y., Roy N., De Bodt M., van Cauwenberge P., Corthals P. Acoustic measurement of overall voice quality: a meta-analysis. Journal of the Acoustical Society of America. 2009;126(5):2619–2634. doi: 10.1121/1.3224706. PubMed DOI

Parsa V., Jamieson D. G. Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research. 2001;44(2):327–339. doi: 10.1044/1092-4388(2001/027). PubMed DOI

de Krom G. Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. Journal of Speech and Hearing Research. 1995;38(4):794–811. doi: 10.1044/jshr.3804.794. PubMed DOI

Laver J. The Phonetic Description of Voice Quality. Cambridge, UK: Cambridge University Press; 1980.

Revis J., Giovanni A., Triglia J.-M. Influence de l’attaque sur l’analyse perceptive des dysphonies. Folia Phoniatrica et Logopaedica. 2002;54(1):19–25. doi: 10.1159/000048593. PubMed DOI

Revis J., Giovanni A., Wuyts F., Triglia J.-M. Comparison of different voice samples for perceptual analysis. Folia Phoniatrica et Logopaedica. 1999;51(3):108–116. doi: 10.1159/000021485. PubMed DOI

Titze I. R. Workshop on Acoustic Voice Analysis: Summary Statement. Denver, Colo, USA: National Center for Voice and Speech; 1995.

Fourcin A. Aspects of voice irregularity measurement in connected speech. Folia Phoniatrica et Logopaedica. 2009;61(3):126–136. doi: 10.1159/000219948. PubMed DOI

Fourcin A. J., Abberton E. First applications of a new laryngograph. Medical & Biological Illustration. 1971;21(3):172–182. PubMed

Zeißler V., Adelhardt J., Batliner A., et al. The prosody module. In: Wahlster W., editor. SmartKom: Foundations of Multimodal Dialogue Systems. Berlin, Germany: Springer; 2006. pp. 139–152.

Haderlein T., Nöth E., Toy H., et al. Automatic evaluation of prosodic features of tracheoesophageal substitute voice. European Archives of Oto-Rhino-Laryngology. 2007;264(11):1315–1321. doi: 10.1007/s00405-007-0363-4. PubMed DOI

Haderlein T., Moers C., Möbius B., Nöth E. Text, Speech and Dialogue: 15th International Conference, TSD 2012, Brno, Czech Republic, September 3–7, 2012. Proceedings. Vol. 7499. Berlin, Germany: Springer; 2012. Automatic rating of hoarseness by text-based cepstral and prosodic evaluation; pp. 573–580. DOI

Dejonckere P. H., Bradley P., Clemente P., et al. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS) European Archives of Oto-Rhino-Laryngology. 2001;258(2):77–82. doi: 10.1007/s004050000299. PubMed DOI

International Phonetic Association. Handbook of the International Phonetic Association. Cambridge, UK: Cambridge University Press; 1999.

Fourcin A., Abberton E., Miller D., Howells D. Laryngography speech pattern element tools for therapy, training and assessment. European Journal of Disorders of Communication. 1995;30(2):101–115. doi: 10.3109/13682829509082521. PubMed DOI

Fairbanks G. Voice and Articulation Drillbook. 2nd. New York, NY, USA: Harper; 1960.

Aronson A. E., Bless D. M. Clinical Voice Disorders. 4th. New York, NY, USA: Thieme; 2009.

Koreman J., Pützer M., Just M. Correlates of varying vocal fold adduction deficiencies in perception and production: methodological and practical considerations. Folia Phoniatrica et Logopaedica. 2004;56(5):305–320. doi: 10.1159/000080067. PubMed DOI

Pützer M., Barry W. J. Methodische Aspekte der auditiven Beurteilung von Stimmqualität. Sprache-Stimme-Gehör. 2004;28(4):188–197. doi: 10.1055/s-2004-835866. DOI

Ptok M., Schwemmle C., Iven C., Jessen M., Nawka T. Zur auditiven Bewertung der Stimmqualität. HNO. 2006;54(10):793–802. doi: 10.1007/s00106-005-1310-6. PubMed DOI

Haderlein T., Riedhammer K., Nöth E., et al. Application of automatic speech recognition to quantitative assessment of tracheoesophageal speech with different signal quality. Folia Phoniatrica et Logopaedica. 2009;61(1):12–17. doi: 10.1159/000187620. PubMed DOI

Batliner A., Buckow J., Niemann H., Nöth E., Warnke V. The prosody module. In: Wahlster W., editor. Verbmobil: Foundations of Speech-to-Speech Translation. Berlin, Germany: Springer; 2000. pp. 106–121. DOI

Batliner A., Fischer K., Huber R., Spilker J., Nöth E. How to find trouble in communication. Speech Communication. 2003;40(1-2):117–143. doi: 10.1016/S0167-6393(02)00079-1. DOI

Smola A. J., Schölkopf B. A tutorial on support vector regression. Statistics and Computing. 2004;14(3):199–222. doi: 10.1023/b:stco.0000035301.49549.88. DOI

Witten I., Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd. San Francisco, Calif, USA: Morgan Kaufmann; 2005.

Maier A. Speech of Children with Cleft Lip and Palate: Automatic Assessment. Dissertation. Vol. 29. Berlin, Germany: Logos; 2009. (Studien zur Mustererkennung).

Krippendorff K. Content Analysis: An Introduction to Its Methodology. Thousand Oaks, Calif, USA: Sage; 2004.

Moers C., Möbius B., Rosanowski F., Nöth E., Eysholdt U., Haderlein T. Vowel- and text-based cepstral analysis of chronic hoarseness. Journal of Voice. 2012;26(4):416–424. doi: 10.1016/j.jvoice.2011.05.001. PubMed DOI

Ptok M., Iven C., Jessen M., Schwemmle C. Objektiv gemessene Stimmlippenschwingungsirregularität vs. subjektiver Eindruck der Rauigkeit. HNO. 2006;54(2):132–138. doi: 10.1007/s00106-005-1250-1. PubMed DOI

Bagshaw P. C., Hiller S. M., Jack M. A. Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching. Proceedings of the European Conference on Speech Communication and Technology (Eurospeech '93); 1993; pp. 1003–1006.

Carding P. N., Steen I. N., Webb A., Mackenzie K., Deary I. J., Wilson J. A. The reliability and sensitivity to change of acoustic measures of voice quality. Clinical Otolaryngology and Allied Sciences. 2004;29(5):538–544. doi: 10.1111/j.1365-2273.2004.00846.x. PubMed DOI

Hillenbrand J. Perception of aperiodicities in synthetically generated voices. Journal of the Acoustical Society of America. 1988;83(6):2361–2371. doi: 10.1121/1.396367. PubMed DOI

Hartl D. M., Hans S., Vaissière J., Brasnu D. F. Objective acoustic and aerodynamic measures of breathiness in paralytic dysphonia. European Archives of Oto-Rhino-Laryngology. 2003;260(4):175–182. PubMed

Bele I. V. Reliability in perceptual analysis of voice quality. Journal of Voice. 2005;19(4):555–573. doi: 10.1016/j.jvoice.2004.08.008. PubMed DOI

Caballero-Morales S.-O. Estimation of phoneme-specific HMM topologies for the automatic recognition of dysarthric speech. Computational and Mathematical Methods in Medicine. 2013;2013:15. doi: 10.1155/2013/297860.297860 PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...