PURPOSE: We introduce a novel methodology for voice pathology detection using the publicly available Saarbrücken Voice Database and a robust feature set combining commonly used acoustic handcrafted features with two novel ones: pitch difference (relative variation in fundamental frequency) and NaN feature (failed fundamental frequency estimation). METHODS: We evaluate six machine learning (ML) algorithms-support vector machine, k-nearest neighbors, naive Bayes, decision tree, random forest, and AdaBoost-using grid search for feasible hyperparameters and 20 480 different feature subsets. Top 1000 classification models-feature subset combinations for each ML algorithm are validated with repeated stratified cross-validation. To address class imbalance, we apply k-means synthetic minority oversampling technique to augment the training data. RESULTS: Our approach achieves 85.61%, 84.69%, and 85.22% unweighted average recall for females, males, and combined results, respectively. We intentionally omit accuracy as it is a highly biased metric for imbalanced data. CONCLUSION: Our study demonstrates that by following the proposed methodology and feature engineering, there is a potential in detection of various voice pathologies using ML models applied to the simplest vocal task, a sustained utterance of the vowel /a:/. To enable easier use of our methodology and to support our claims, we provide a publicly available GitHub repository with DOI 10.5281/zenodo.13771573. Finally, we provide a REFORMS checklist to enhance readability, reproducibility, and justification of our approach.
- Klíčová slova
- Voice pathology detection—Voice disorder detection—Saarbrücken Voice Database—SVD—Machine learning—REFORMS,
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: Enhanced contact endoscopy (ECE) is a non-invasive technique used for the assessment of superficial vascular changes of mucosal lesions in high magnification. The aim of our study was to evaluate the clinical efficacy of ECE in an intraoperative settlement. METHODS: Structured assessment of laryngeal mucosal lesions using enhanced endoscopy (narrow band imaging (NBI) and ECE) was performed in a prospective clinical trial. Lesions were classified according to the European Laryngological Society Classification into non-suspicious and suspicious. Evaluations of endoscopic methods (NBI and ECE) were correlated with histopathology, histopathology being the gold standard. Sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), area under curve, diagnostic odds ratio (DOR), Kappa, incremental yield, and Youden´s index for NBI and ECE were calculated. RESULTS: A total of 110 patients with 136 lesions were enrolled, 50 benign non-neoplastic lesions, eight squamous cell papillomas, 45 dysplasias, and 33 squamous cell invasive cancers. Compared to NBI, ECE demonstrated higher sensitivity (91.0% vs 83.1%) and accuracy (90.4% vs 86.8%). NBI achieved higher specificity (91.8% vs 89.7%). PPV and NPV for ECE were 92.2% and 88.1%, whereas for NBI 93.1% and 80.4%. ECE showed greater overall diagnostic performance, with a DOR of 88.3 vs 55.2 and Kappa index of 0.805 vs 0.736. CONCLUSIONS: ECE enhances diagnostic sensitivity and accuracy and represents a valuable addition to laryngeal cancer diagnostics.
Voice is a major means of communication for humans, non-human mammals and many other vertebrates like birds and anurans. The physical and physiological principles of voice production are described by two theories: the MyoElastic-AeroDynamic (MEAD) theory and the Source-Filter Theory (SFT). While MEAD employs a multiphysics approach to understand the motor control and dynamics of self-sustained vibration of vocal folds or analogous tissues, SFT predominantly uses acoustics to understand spectral changes of the source via linear propagation through the vocal tract. Because the two theories focus on different aspects of voice production, they are often applied distinctly in specific areas of science and engineering. Here, we argue that the MEAD and the SFT are linked integral aspects of a holistic theory of voice production, describing a dynamically coupled system. The aim of this manuscript is to provide a comprehensive review of both the MEAD and the source-filter theory with its nonlinear extension, the latter of which suggests a number of conceptual similarities to sound production in brass instruments. We discuss the application of both theories to voice production of humans as well as of animals. An appraisal of voice production in the light of non-linear dynamics supports the notion that it can be best described with a systems view, considering coupled systems rather than isolated contributions of individual sub-systems.
- Klíčová slova
- MEAD, Myoelastic-aerodynamic theory, Phonation, Source-filter coupling, Source-filter interactions, Source-filter theory, Voice,
- MeSH
- akustika řeči * MeSH
- akustika MeSH
- biologické modely MeSH
- biomechanika MeSH
- fonace * MeSH
- hlasové řasy * fyziologie MeSH
- kvalita hlasu * MeSH
- lidé MeSH
- nelineární dynamika MeSH
- pružnost MeSH
- vibrace MeSH
- vokalizace zvířat MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Voice registers are assumed to be related to different laryngeal adjustments, but objective evidence has been insufficient. While chest register is usually associated with the lower pitch range, and head register with the higher pitch range, here we investigated a professional singer who claimed an ability to produce both these registers at every pitch, throughout her entire singing range. The singer performed separated phonations alternating between the two registers (further called chest-like and head-like) at all pitches from C3 (131 Hz) to C6 (1047 Hz). We monitored the vocal fold vibrations using high-speed video endoscopy and electroglottography. The microphone sound was recorded and used for blind listening tests performed by the three authors (insiders) and by six "naive" participants (outsiders). The outsiders correctly identified the registers in 64% of the cases, and the insiders in 89% of the cases. Objective analysis revealed larger closed quotient and vertical phase differences for the chest-like register within the lower range below G4 (<392 Hz), and also a larger closed quotient at the membranous glottis within the higher range above Bb4 (>466 Hz), but not between Ab4-A4 (415-440 Hz). The normalized amplitude quotient was consistently lower in the chest-like register throughout the entire range. The results indicate that that the singer employed subtle laryngeal control mechanisms for the chest-like and head-like phonations on top of the traditionally recognized low-pitched chest and high-pitched head register phenomena. Across all pitches, the chest-like register was produced with more rapid glottal closure that was usually, but not necessarily, accompanied also by stronger adduction of membranous glottis. These register changes were not always easily perceivable by listeners, however.
- Klíčová slova
- Electroglottography, High-speed videoendoscopy, Singing range, Vocal fold oscillation, Voice registers,
- MeSH
- akustika * MeSH
- audiovizuální záznam MeSH
- biomechanika MeSH
- dospělí MeSH
- elektrodiagnostika MeSH
- fonace * MeSH
- hlasové řasy fyziologie MeSH
- kvalita hlasu * MeSH
- laryngoskopie MeSH
- larynx fyziologie MeSH
- lidé MeSH
- vibrace MeSH
- zpívání * MeSH
- zvuková spektrografie MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- kazuistiky MeSH
INTRODUCTION: Vibratory positive expiratory pressure (PEP) devices are now commonly used as a resource for voice therapy. PEP devices promote improved vocal economy with the added benefit of producing a massage effect in the vocal tract. Although the benefits of PEP devices for voice have already been demonstrated, their impact on the vocal source is still not very clear. This study assesses the impact of phonation into the Acapella Choice (a type of PEP device) on the voice. METHODS: Three normophonic subjects underwent high-speed videoendoscopy assessment while pressure, flow and electroglottographic data was collected. RESULTS: Phonation into the Acapella device produces large changes in the pressure and flow profiles consequently affecting the voice source. In specific, when intraoral pressure increases as a consequence of the downward movement of the rocker arm in the Acapella device (reduction of the airflow outlet), phonation is hindered, demonstrated by the lower amplitude of vibration of the vocal folds and weaker modulation of the pressure and flow values by the glottal cycle. When the rocker arm in the Acapella device opens (increasing the airflow outlet), the opposite trend is observed where vocal fold vibration is aided and the modulation of pressure and flow by the vocal cycle increases. Based on the pressure and flow signals, we can assume that the impedance of the vocal tract alternates between two dominant regimes: increased inertive reactance (aided vibration) and increased resistance (hindered vibration). CONCLUSIONS: PEP devices, such as the Acapella device, are efficient in modulating the pressure and flow profiles in the vocal tract leading to the alternation of glottal vibration from aided to hindered. These changes in the glottal vibration can be considered an additional consequence of the massage effect caused by the Acapella device.
- Klíčová slova
- Acapella, Positive expiratory pressure, Semioccluded vocal tract exercises, Shaker, Tube phonation, Voice therapy,
- MeSH
- audiovizuální záznam * MeSH
- biomechanika MeSH
- časové faktory MeSH
- design vybavení * MeSH
- dospělí MeSH
- elektrodiagnostika přístrojové vybavení MeSH
- fonace * MeSH
- glottis * fyziologie MeSH
- hlasové řasy fyziologie MeSH
- hlasový trénink * MeSH
- kvalita hlasu * MeSH
- laryngoskopie přístrojové vybavení MeSH
- lidé MeSH
- tlak * MeSH
- ventilace umělá s výdechovým přetlakem přístrojové vybavení MeSH
- vibrace * MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.
- Klíčová slova
- Finite element modeling, Singing voice, Vocal efficiency, Vocal loading, Vocal tract, Voice training and therapy,
- Publikační typ
- časopisecké články MeSH
OBJECTIVES: This study aimed to estimate vocal loading in loud phonation of a vowel and two widely used semiocclusion voice exercises (SOVTEs). Impact stress (IS) was estimated from glottal closing speed, inertial forces from the second derivative of glottal opening and closing. STUDY DESIGN: Experimental study in vivo. METHODS: A vocally healthy male sustained the [o:] vowel with habitual loudness and loudly: (1) without a tube, (2) into a silicone "Lax Vox" type tube (35 cm in length, 10 mm in diameter) outer end submerged 10 cm in water, and (3) into a straw (length 12.6 cm, diameter 2.5 mm) the outer end in air. He tried to use equal effort in all loud samples. High-speed video-laryngo-endoscopy was performed with a rigid scope. Oral air pressure (Poral) was registered in a mouthpiece through which the endoscope was inserted into the larynx and to which the tubes were attached air-tightly. RESULTS: Compared with vowel phonation at habitual loudness, mean of maximal glottal width (max GW) increased by 44.1% for loud tube phonation and decreased by 1.8% for loud straw phonation, and mean absolute value of minimum GW time derivative dmin increased by 57.1% for tube and by 29.5% for straw suggesting faster glottal closing. Compared with loud vowel phonation, max GW increased by 22.6% for loud tube phonation, while it decreased by 16.6% for loud straw phonation. For the tube, dmindecreased by 7.6% and for the straw by 23.8%. Maximal acceleration (ACC) and deceleration (DC) values were larger for the tube and smaller for the straw than the values for both vowel phonations. CONCLUSIONS: IS, deduced from dmin, increased in loud SOVTEs compared to vowel phonation at a conversational loudness, but remained lower in loud SOVTEs than in loud vowel phonation, particularly with a narrow straw, which also reduced inertial forces, as suggested by the reduced ACC and DC.
- Klíčová slova
- Biomechanical loading, High-speed videoendoscopy, Phonation into a tube, Vocal fatigue,
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: The vocal characteristics of countertenors (CTTs) are poorly understood due to a lack of studies in this field. This study aims to explore differences among CTTs at various professional levels, examining both disparities and congruences in singing styles to better understand the CTT voice. MATERIALS AND METHODS: Four CTTs (one student, one amateur, and two professionals) sang "La giustizia ha già sull'arco" from Handel's Giulio Cesare, with concurrent videofluoroscopic, electroglottography (EGG), and acoustic data collection. Auditory-perceptual analysis was employed to rate professional level. Acoustic analysis included LH1-LH2, formant cluster prominence, and vibrato analysis. EGG data was analyzed using FonaDyn software, while anatomical modifications were quantified using videofluoroscopic images. RESULTS: CTTs exhibited EGG contact quotient values surpassing typical levels for inexperienced falsettos. Their vibrato characteristics aligned with expectations for classical singing, whereas the presence of the singer's formant was not observed. Variations in supraglottic adjustments among CTTs underscored the diversity of techniques employed by CTT singers. CONCLUSIONS: CTTs exhibited vocal techniques that highlighted the influence of individual preferences, professional experience, and stylistic choices in shaping their singing characteristics. The data revealed discernible differences between professional and amateur CTTs, providing insights into the impact of varying levels of experience on vocal expression.
- Klíčová slova
- Acoustic analysis, Countertenor, Singing formant, Singing voice, Vibrato, Western operatic,
- Publikační typ
- časopisecké články MeSH
OBJECTIVES: Positive expiratory pressure (PEP) devices have become an additional therapeutic approach for treating voice disorders. Similar to water resistance therapy (WRT), phonation in a PEP device introduces a secondary source of vibration within the vocal tract. This investigation aimed to compare the effects of phonation using a PEP device and silicone tube phonation (STP) commonly used in WRT on the vocal mechanism during phonation. METHODS: Three normophonic subjects participated in the study. High-speed videoendoscopy, pressure, airflow, electroglottography, and acoustic recordings were collected. RESULTS: The results demonstrated that phonation using both the PEP device and silicone tube induced alterations in glottal behavior. The PEP device produced more pronounced and consistent pressure oscillations, impacting the glottal cycle and influencing parameters including contact quotient (CQ), fundamental frequency, glottal area, pressure, and airflow. The regular vibratory mechanism of the PEP device systematically modified the glottal cycle. In STP, regular bubbling at lower depths of submersion produced higher CQ values, supporting the efficacy of deep bubbling exercises for inducing glottal adduction. CONCLUSIONS: The findings suggest that phonation using PEP devices has a more pronounced impact on the vocal tract and glottis. It also provides a stronger massage effect that directly affects the glottal source. Phonation with a silicone tube produces similar results, although to a lesser extent and with lower regularity. These findings offer guidance in the selection of voice therapy devices.
- Klíčová slova
- Acapella, Periodogram, Positive expiratory pressure, Semi-occluded vocal tract exercises, Shaker, Tube phonation, Vocal tract impedance, Water resistance therapy,
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: Neuromuscular electrical stimulation (NMES) is a complementary resource to voice therapy that can be used for the treatment of hypofunctional voice disorders. Although positive clinical studies have been reported, neutral and even potentially harmful effects of NMES are also described in the literature. Furthermore, in the studies examined by the authors, the use of different methods of NMES have been identified, which further contributes to the inconsistent results found among studies. Moreover, limited rationale is provided for the chosen NMES parameters such as electrode placement, frequency of NMES and length of treatment. The aims of this pilot study were to investigate the a) impact of different frequencies of NMES on glottal configuration and vocal fold vibration patterns and b) changes in laryngeal configuration and vocal output across 12 minutes of NMES. METHOD: Three experiments were carried out looking at changes in laryngeal configuration and voice output using different imaging techniques (fibreoptic nasolaryngoscopy and high-speed video), acoustical analysis (F0, formant analysis, SPL, CPPS and LHSR values), electroglottography (EGG) and Relative Fundamental Frequency (RFF) analyses. Glottal parameters and acoustical measures were recorded before, during, and after stimulation. Data was collected at rest and during phonation. RESULTS: Overall the results showed global changes in laryngeal configuration from normal to hyperfunctional (ie, increased RFF, SPL, CQ, and stiffness). Changes were more pronounced for lower frequencies of NMES and were significant within less than three minutes of application. CONCLUSION: NMES is an effective resource for the activation of intrinsic laryngeal muscles producing significant levels of adduction within few minutes of application. Lower NMES frequencies produced greater muscle activation when compared to higher frequencies.
- Klíčová slova
- Laryngeal stiffness—Relative fundamental frequency, Neuromuscular electrical stimulation—NMES—TENS—Voice therapy—Voice treatment—Electrical stimulation,
- MeSH
- elektrická stimulace MeSH
- hlas * fyziologie MeSH
- hlasové řasy fyziologie MeSH
- laryngální svaly fyziologie MeSH
- lidé MeSH
- pilotní projekty MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH