automatic speech recognition
Dotaz
Zobrazit nápovědu
Algoritmy pro automatické rozpoznávání řeči mohou umožnit sluchově postiženým s úplnou ztrátou sluchu vnímat informační obsah řeči alternativním způsobem. Zrakového analyzátoru lze využít promítáním rozpoznaných slov do speciálních brýlí. Somatosenzorický analyzátor lze stimulovat pomocí pole elektrod umístěných na trupu. Pilotní studie ukazuje na proveditelnost a relativně dobrou snášenlivost strukturované nízkoprahové elektrické stimulace soustavou uhlíkových koncentrických elektrod napařených na inertní pružnou membránu, event. taktilních či vibračních stimulátorů v matici 3x8, event. 4x8 pro kódování znaků abecedy polohou. Pokrok dosažený při automatickém rozpoznání řeči umožňuje dobré porozumění obsahu sdělení. Dosažené výsledky dávají naději na přenos řeči somatosenzorickou cestou polem stimulátorů umístěných na kůži trupu v reálném čase.
Algorithms for artificial speech recognition may be useful in presentation of auditory stimuli by somatosensory stimulation in profoundly or severely deaf people. Recognized text my be presented on special glasses or electrotactile stimulators. Multichannel electrocutaneous or pulsatile stimulation is essential for tactile presentation of recognized characters in sensory substitution through the sense of touch on the trunk. In this pilot study, a system for multichannel electrotactile stimulation is presented for studying carbon electrodes placed on inert flexible membrane. The system utilized an array of 3x8 or 4x8 stimulators, each of them representing one character. Initial testing of electrotactile character presentation was performed in human subject. Efficiency of artificial speech recognition algorithms and speed of character recognition on the trunk allow good understanding of subject matter. The testing results suggested that this promising method may allow speech recognition by deaf people in real time.
An experiment was carried out to determine whether the level of the speech fluency disorder can be estimated by means of automatic acoustic measurements. These measures analyze, for example, the amount of silence in a recording or the number of abrupt spectral changes in a speech signal. All the measures were designed to take into account symptoms of stuttering. In the experiment, 118 audio recordings of read speech by Czech native speakers were employed. The results indicate that the human-made rating of the speech fluency disorder in read speech can be predicted on the basis of automatic measurements. The number of abrupt spectral changes in the speech segments turns out to be the most appropriate measure to describe the overall speech performance. The results also imply that there are measures with good results describing partial symptoms (especially fixed postures without audible airflow).
- MeSH
- akustika řeči * MeSH
- akustika * MeSH
- algoritmy MeSH
- analýza rozptylu MeSH
- časové faktory MeSH
- dítě MeSH
- koktavost diagnóza patofyziologie psychologie MeSH
- kvalita hlasu * MeSH
- lidé MeSH
- měření tvorby řeči * MeSH
- mladiství MeSH
- mladý dospělý MeSH
- percepce řeči * MeSH
- rozpoznávání automatizované MeSH
- zvuková spektrografie MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
- MeSH
- akustika řeči MeSH
- čtení MeSH
- dospělí MeSH
- fonetika MeSH
- jazyk (prostředek komunikace) * MeSH
- lidé středního věku MeSH
- lidé MeSH
- Parkinsonova nemoc diagnóza patofyziologie MeSH
- plocha pod křivkou MeSH
- řeč fyziologie MeSH
- rozpoznávání (psychologie) MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Česká republika MeSH
- Německo MeSH
- Španělsko MeSH
Autoři referují o počítačovém programu NEWTON Dictate, který je určen k převodu diktovaného textu do písemné podoby, a popisují jeho možné užití v soudním lékařství k přepisu diktovaného pitevního nálezu. Jsou diskutována specifika soudního lékařství a jejich vliv na přepisovací proces, zejména je analyzována otázka, jak výhodně upravit počítačový program, aby nepřesnosti přepisu, zapříčiněné charakterem pitevního provozu, byly co nejmenší. Jde o hlasové adaptace a užití vhodného slovníku. Úloha pitevní dokumentátorky není v textu zpochybňována.
The paper describes the computer program NEWTON Dictate which is used for speech recognition and transcription. The possible uses of this program in forensic medicine are discussed, especially concerning the recognition and transcription of the autopsy findings. Specific conditions of the forensic medicine are introduced, with their influence on speech recognition and translation being the focus. The authors analyze program improvement aimed at reducing mistakes in recognition and transcription which may occur during autopsy work flow. Such improvement involves appropriate vocabulary usage and special vocal adaptation. The role of the autopsy secretary is acknowledged.
- MeSH
- automatizované zpracování dat metody organizace a řízení přístrojové vybavení MeSH
- databáze faktografické využití MeSH
- lidé MeSH
- pitva metody přístrojové vybavení MeSH
- počítače využití MeSH
- software pro rozpoznávání řeči využití MeSH
- soudní lékařství metody organizace a řízení přístrojové vybavení MeSH
- terminologie jako téma MeSH
- záznamy jako téma MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- přehledy MeSH
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
- MeSH
- algoritmy * MeSH
- databáze faktografické MeSH
- emoce fyziologie MeSH
- kvalita hlasu MeSH
- lidé MeSH
- neuronové sítě MeSH
- počítačové zpracování signálu přístrojové vybavení MeSH
- řeč fyziologie MeSH
- ROC křivka MeSH
- rozpoznávání automatizované * MeSH
- rozpoznávání fyziologické fyziologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
- MeSH
- akustika řeči * MeSH
- akustika MeSH
- chrapot diagnóza patofyziologie MeSH
- chronická nemoc MeSH
- čtení MeSH
- dospělí MeSH
- kvalita hlasu * MeSH
- lidé středního věku MeSH
- lidé MeSH
- měření tvorby řeči metody MeSH
- mladiství MeSH
- mladý dospělý MeSH
- počítačové zpracování signálu * MeSH
- prediktivní hodnota testů MeSH
- regresní analýza MeSH
- reprodukovatelnost výsledků MeSH
- rozpoznávání automatizované * MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- support vector machine MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
INTRODUCTION: Dysarthria, a motor speech disorder caused by muscle weakness or paralysis, severely impacts speech intelligibility and quality of life. The condition is prevalent in motor speech disorders such as Parkinson's disease (PD), atypical parkinsonism such as progressive supranuclear palsy (PSP), Huntington's disease (HD), and amyotrophic lateral sclerosis (ALS). Improving intelligibility is not only an outcome that matters to patients but can also play a critical role as an endpoint in clinical research and drug development. This study validates a digital measure for speech intelligibility, the ki: SB-M intelligibility score, across various motor speech disorders and languages following the Digital Medicine Society (DiMe) V3 framework. METHODS: The study used four datasets: healthy controls (HCs) and patients with PD, HD, PSP, and ALS from Czech, Colombian, and German populations. Participants' speech intelligibility was assessed using the ki: SB-M intelligibility score, which is derived from automatic speech recognition (ASR) systems. Verification with inter-ASR reliability and temporal consistency, analytical validation with correlations to gold standard clinical dysarthria scores in each disease, and clinical validation with group comparisons between HCs and patients were performed. RESULTS: Verification showed good to excellent inter-rater reliability between ASR systems and fair to good consistency. Analytical validation revealed significant correlations between the SB-M intelligibility score and established clinical measures for speech impairments across all patient groups and languages. Clinical validation demonstrated significant differences in intelligibility scores between pathological groups and healthy controls, indicating the measure's discriminative capability. DISCUSSION: The ki: SB-M intelligibility score is a reliable, valid, and clinically relevant tool for assessing speech intelligibility in motor speech disorders. It holds promise for improving clinical trials through automated, objective, and scalable assessments. Future studies should explore its utility in monitoring disease progression and therapeutic efficacy as well as add data from further dysarthrias to the validation.
- Publikační typ
- časopisecké články MeSH
OBJECTIVE: Nowadays proper detection of cognitive impairment has become a challenge for the scientific community. Alzheimer's Disease (AD), the most common cause of dementia, has a high prevalence that is increasing at a fast pace towards epidemic level. In the not-so-distant future this fact could have a dramatic social and economic impact. In this scenario, an early and accurate diagnosis of AD could help to decrease its effects on patients, relatives and society. Over the last decades there have been useful advances not only in classic assessment techniques, but also in novel non-invasive screening methodologies. METHODS: Among these methods, automatic analysis of speech -one of the first damaged skills in AD patients- is a natural and useful low cost tool for diagnosis. RESULTS: In this paper a non-linear multi-task approach based on automatic speech analysis is presented. Three tasks with different language complexity levels are analyzed, and promising results that encourage a deeper assessment are obtained. Automatic classification was carried out by using classic Multilayer Perceptron (MLP) and Deep Learning by means of Convolutional Neural Networks (CNN) (biologically- inspired variants of MLPs) over the tasks with classic linear features, perceptual features, Castiglioni fractal dimension and Multiscale Permutation Entropy. CONCLUSION: Finally, the most relevant features are selected by means of the non-parametric Mann- Whitney U-test.
- MeSH
- Alzheimerova nemoc diagnóza MeSH
- časná diagnóza MeSH
- deep learning MeSH
- diagnóza počítačová * metody MeSH
- dospělí MeSH
- kognitivní dysfunkce diagnóza MeSH
- kohortové studie MeSH
- lidé středního věku MeSH
- lidé MeSH
- měření tvorby řeči MeSH
- nelineární dynamika MeSH
- neuropsychologické testy MeSH
- řeč * MeSH
- rozpoznávání automatizované * metody MeSH
- senioři MeSH
- software pro rozpoznávání řeči MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Cíl: Výzkumné zamerení dentální cásti Centra biomedicínské informatiky je zamereno na vývoj elektronické zdravotnické dokumentace (EHR) pro stomatologii. Úvod: Nejdríve byl zkonstruován elektronický zubní kríž DentCross, který graficky reprezentuje stomatologická data pacienta. Doplnen byl o hlasové ovládání systémem Automatic speech recognition (ASR) a modul pro syntézu hlasu (TTS). Metody: Cílem práce bylo i dosáhnout co nejvetší komplexnosti systému a jeho jistou automatizaci. Z tohoto duvodu byl doplnen o speciální záznamové medium pro onemocnení temporomandibulárního kloubu (TMD). Výsledky: Po zkušenostech se starší verzí byla znalostní báze (KB) pro TMD jinak strukturována. Bylo použito klasifikacní diagnostické schéma dle American Academy of Orofacial Pain (AAOP). KB byla vytvorena v aplikaci MUDR KB Editor. Na techto základech byla vytvorena relacní databáze a samotná uživatelská záznamová aplikace v programech MUDR EHR a MUDRLite. Záver: Hlavní výhodou je ovšem stanovení pravdepodobné diagnózy onemocnení (AAOP delení) systémem (tzv. „custom“ komponentou) na základe typických dat zaznamenaných do elektronického formuláre pri vyšetrení. Pro realizaci samotné komponenty byl použit vývojový nástroj MS Visual Studio.NET 2003. Komponenta je naprogramovaná v jazyku C].
Background: The research goal of the Dental segment of the Centre of Biomedical Informatics is focused on the electronic health record (EHR) development for dentistry. Objectives: At the beginning there has been constructed an electronic dental cross "DentCross", which was representing patients dental data in the graphical form. It has been completed with the system of the automatic speech recognition (ASR) and voice synthesis module (TTS). Methods: The main goal of this work was to reach the high entirety of the system and its automatization. For this reason it has been completed with the special record medium for the temporomandibular disorders (TMD). Results: Concerning the experience with the old version the knowledge database (KB) for TMD has been structured differently. A classification diagnostic schema by the American Academy of Orofacial Pain (AAOP) has been used. The KB has been created in the MUDR KB Editor application. On this basis a relational database has been constructed and a user interface for data collection based on MUDR and MUDRLite EHR systems was developed. Conclusions: The main advantage of this system is determination of probable diagnosis of the disease (AAOP) by the system ("custom" component). It is based on the characteristic data, which have been recorded in the electronic form after the investigation. For the creation of the component of its alone MS Visual Studio.NET 2003 development tool has been used. The whole component is programmed in C] language.
- Klíčová slova
- poruchy temporomandibulárního kloubu, DentCross, elektronická zdravotnická dokumentace, klasifikace AAOP, MUDR EHR, MUDRLite, MUDR KB Editor,
- MeSH
- chorobopisy - počítačové systémy statistika a číselné údaje trendy využití MeSH
- elektronické zdravotní záznamy statistika a číselné údaje trendy využití MeSH
- financování organizované MeSH
- formuláře a záznamy - kontrola a vedení metody statistika a číselné údaje využití MeSH
- lidé MeSH
- stomatologie metody statistika a číselné údaje trendy MeSH
- Check Tag
- lidé MeSH
PURPOSE: This study aimed to evaluate the reliability of different approaches for estimating the articulation rates in connected speech of Parkinsonian patients with different stages of neurodegeneration compared to healthy controls. METHOD: Monologues and reading passages were obtained from 25 patients with idiopathic rapid eye movement sleep behavior disorder (iRBD), 25 de novo patients with Parkinson's disease (PD), 20 patients with multiple system atrophy (MSA), and 20 healthy controls. The recordings were subsequently evaluated using eight syllable localization algorithms, and their performances were compared to a manual transcript used as a reference. RESULTS: The Google & Pyphen method, based on automatic speech recognition followed by hyphenation, outperformed the other approaches (automated vs. hand transcription: r > .87 for monologues and r > .91 for reading passages, p < .001) in precise feature estimates and resilience to dysarthric speech. The Praat script algorithm achieved sufficient robustness (automated vs. hand transcription: r > .65 for monologues and r > .78 for reading passages, p < .001). Compared to the control group, we detected a slow rate in patients with MSA and a tendency toward a slower rate in patients with iRBD, whereas the articulation rate was unchanged in patients with early untreated PD. CONCLUSIONS: The state-of-the-art speech recognition tool provided the most precise articulation rate estimates. If speech recognizer is not accessible, the freely available Praat script based on simple intensity thresholding might still provide robust properties even in severe dysarthria. Automated articulation rate assessment may serve as a natural, inexpensive biomarker for monitoring disease severity and a differential diagnosis of Parkinsonism.
- MeSH
- dysartrie diagnóza etiologie MeSH
- lidé MeSH
- multisystémová atrofie * MeSH
- Parkinsonova nemoc * komplikace diagnóza MeSH
- řeč MeSH
- reprodukovatelnost výsledků MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH