-
Je něco špatně v tomto záznamu ?
Noise-robust speech triage
AL. Bartos, T. Cipr, DJ. Nelson, P. Schwarz, J. Banowetz, L. Jerabek,
Jazyk angličtina Země Spojené státy americké
Typ dokumentu časopisecké články
PubMed
29716295
DOI
10.1121/1.5031029
Knihovny.cz E-zdroje
- MeSH
- algoritmy * MeSH
- hluk * MeSH
- lidé MeSH
- percepce řeči fyziologie MeSH
- počítačové zpracování signálu MeSH
- poměr signál - šum * MeSH
- řeč * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Naval Research Laboratory Washington DC 20375 USA
Phonexia Limited and Brno University of Technology Brno Czech Republic
Suzanne R Miller Associates Marriotsville Maryland 21104 USA
United States Department of Defense 9800 Savage Road Fort Meade Maryland 20755 USA
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc19045471
- 003
- CZ-PrNML
- 005
- 20200120100744.0
- 007
- ta
- 008
- 200109s2018 xxu f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1121/1.5031029 $2 doi
- 035 __
- $a (PubMed)29716295
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxu
- 100 1_
- $a Bartos, Anthony L $u Suzanne R. Miller Associates, Marriotsville, Maryland 21104, USA.
- 245 10
- $a Noise-robust speech triage / $c AL. Bartos, T. Cipr, DJ. Nelson, P. Schwarz, J. Banowetz, L. Jerabek,
- 520 9_
- $a A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
- 650 12
- $a algoritmy $7 D000465
- 650 _2
- $a lidé $7 D006801
- 650 12
- $a hluk $7 D009622
- 650 _2
- $a počítačové zpracování signálu $7 D012815
- 650 12
- $a poměr signál - šum $7 D059629
- 650 12
- $a řeč $7 D013060
- 650 _2
- $a percepce řeči $x fyziologie $7 D013067
- 655 _2
- $a časopisecké články $7 D016428
- 700 1_
- $a Cipr, Tomas $u Phonexia Limited and Brno University of Technology, Brno, Czech Republic.
- 700 1_
- $a Nelson, Douglas J $u United States Department of Defense, 9800 Savage Road, Fort Meade, Maryland 20755, USA.
- 700 1_
- $a Schwarz, Petr $u Phonexia Limited and Brno University of Technology, Brno, Czech Republic.
- 700 1_
- $a Banowetz, John $u Naval Research Laboratory, Washington, DC 20375, USA.
- 700 1_
- $a Jerabek, Ladislav $u Suzanne R. Miller Associates, Marriotsville, Maryland 21104, USA.
- 773 0_
- $w MED00002959 $t The Journal of the Acoustical Society of America $x 1520-8524 $g Roč. 143, č. 4 (2018), s. 2313
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/29716295 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20200109 $b ABA008
- 991 __
- $a 20200120101120 $b ABA008
- 999 __
- $a ok $b bmc $g 1483740 $s 1084144
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2018 $b 143 $c 4 $d 2313 $e - $i 1520-8524 $m The Journal of the Acoustical Society of America $n J Acoust Soc Am $x MED00002959
- LZP __
- $a Pubmed-20200109