The use of residual analysis to improve the error rate accuracy of machine translation

. 2024 Apr 23 ; 14 (1) : 9293. [epub] 20240423

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38654050

Grantová podpora
APVV-18-0473 Slovak Research and Development Agency
VEGA-1/0821/21 Scientific Grant Agency

Odkazy

PubMed 38654050
PubMed Central PMC11039693
DOI 10.1038/s41598-024-59524-3
PII: 10.1038/s41598-024-59524-3
Knihovny.cz E-zdroje

The aim of the study is to compare two different approaches to machine translation-statistical and neural-using automatic MT metrics of error rate and residuals. We examined four available online MT systems (statistical Google Translate, neural Google Translate, and two European commission's MT tools-statistical mt@ec and neural eTranslation) through their products (MT outputs). We propose using residual analysis to improve the accuracy of machine translation error rate. Residuals represent a new approach to comparing the quality of statistical and neural MT outputs. The study provides new insights into evaluating machine translation quality from English and German into Slovak through automatic error rate metrics. In the category of prediction and syntactic-semantic correlativeness, statistical MT showed a significantly higher error rate than neural MT. Conversely, in the category of lexical semantics, neural MT showed a significantly higher error rate than statistical MT. The results indicate that relying solely on the reference when determining MT quality is insufficient. However, when combined with residuals, it offers a more objective view of MT quality and facilitates the comparison of statistical MT and neural MT.

Zobrazit více v PubMed

Wu Y, Qin Y. Machine translation of English speech: Comparison of multiple algorithms. J. Intell. Syst. 2022;31:159–167.

Sharma S, et al. Machine translation systems based on classical-statistical-deep-learning approaches. Electronics (Basel) 2023;12:1716.

Zhou M, Duan N, Liu S, Shum HY. Progress in neural NLP: Modeling, learning, and reasoning. Engineering. 2020;6:275–290. doi: 10.1016/j.eng.2019.12.014. DOI

Liu S, Zhu W. An analysis of the evaluation of the translation quality of neural machine translation application systems. Appl. Artif. Intell. 2023;37:2214460. doi: 10.1080/08839514.2023.2214460. DOI

Ghorbani, B. et al. Scaling laws for neural machine translation. Preprint at (2021).

Lee S, et al. A survey on evaluation metrics for machine translation. Mathematics. 2023;11:1006. doi: 10.3390/math11041006. DOI

Papineni, K., Roukos, S., Ward, T. & Zhu, W. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 311–318 (Philadelphia, 2002).

Snover, M., Dorr, B., Schwartz, R., Micciulla, L. & Makhoul, J. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas 223–231 (2006).

Lavie, A. Evaluating the output of machine translation systems. In Proceedings of Machine Translation Summit XIII: Tutorial Abstracts (Xiamen, China, 2011).

Tatman, R. Evaluating text output in NLP: BLEU at your own risk. Towards Data Science. https://towardsdatascience.com/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213 (2019).

Mathur, N., Baldwin, T. & Cohn, T. Tangled up in BLEU: Reevaluating the evaluation of automatic machine translation evaluation metrics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 4984–4997 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2020). 10.18653/v1/2020.acl-main.448.

Callison-Burch, C., Koehn, P. & Osborne, M. Improved statistical machine translation using paraphrases. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics 17–24 (Association for Computational Linguistics, Morristown, NJ, USA, 2006). 10.3115/1220835.1220838.

Machacek, M. & Bojar, O. Results of the WMT14 metrics shared task. In Proceedings of the Ninth Workshop on Statistical Machine Translation 293–301 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2014). 10.3115/v1/W14-3336.

Stanojević, M., Kamran, A., Koehn, P. & Bojar, O. Results of the WMT15 metrics shared task. In Proceedings of the Tenth Workshop on Statistical Machine Translation 256–273 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2015). 10.18653/v1/W15-3031.

Bojar, O., Graham, Y. & Kamran, A. Results of the WMT17 metrics shared task. In Proceedings of the Second Conference on Machine Translation 489–513 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2017). 10.18653/v1/W17-4755.

Post, M. A call for clarity in reporting BLEU scores (2018).

Nießen, S., Och, F. J., Leusch, G. & Ney, H. An evaluation tool for machine translation: Fast evaluation for MT research. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000) 39–45 (2000).

Popović, M. & Ney, H. Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of the Second Workshop on Statistical Machine Translation 48–55 (Association for Computational Linguistics, Prague, Czech Republic, 2007).

Sai AB, Mohankumar AK, Khapra MM. A survey of evaluation metrics used for NLG systems. ACM Comput. Surv. 2023;55:1–39. doi: 10.1145/3485766. DOI

Popović, M. chrF++: Words helping character n-grams. In Proceedings of the Second Conference on Machine Translation 612–618 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2017). 10.18653/v1/W17-4770.

Wang, W., Peter, J.-T., Rosendahl, H. & Ney, H. CharacTer: Translation edit rate on character level. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 505–510 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2016). 10.18653/v1/W16-2342.

Rei, R., Stewart, C., Farinha, A. C. & Lavie, A. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2685–2702 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2020). 10.18653/v1/2020.emnlp-main.213.

Alvarez-Vidal S, Oliver A. Assessing MT with measures of PE effort. Ampersand. 2023;11:100125. doi: 10.1016/j.amper.2023.100125. DOI

Marie, B., Fujita, A. & Rubino, R. Scientific credibility of machine translation research: A meta-evaluation of 769 papers. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 7297–7306 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2021). 10.18653/v1/2021.acl-long.566.

Munkova D, Munk M, Benko Ľ, Hajek P. The role of automated evaluation techniques in online professional translator training. PeerJ Comput. Sci. 2021;7:e706. doi: 10.7717/peerj-cs.706. PubMed DOI PMC

Google. Google Translate API—Fast Dynamic Localization—Google Cloud Platform. https://cloud.google.com/translate/ (2016).

Koehn, P. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the MT Summit vol. 5 79–86 (Phuket Island, 2005).

Wu, Y. et al.Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016).

eTranslation. https://webgate.ec.europa.eu/etranslation (2023).

Turovsky, B. Found in translation: More accurate, fluent sentences in Google Translate. The Keyword Google Bloghttps://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/ (2016).

Sheshadri SK, Gupta D, Costa-Jussà MR. A voyage on neural machine translation for Indic languages. Procedia Comput. Sci. 2023;218:2694–2712. doi: 10.1016/j.procs.2023.01.242. DOI

Pinnis, M., Krišlauks, R., Deksne, D. & Miks, T. Evaluation of neural machine translation for highly inflected and small languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 10762 LNCS 445–456 (Springer Verlag, 2018).

Yang K, Liu D, Qu Q, Sang Y, Lv J. An automatic evaluation metric for Ancient-Modern Chinese translation. Neural Comput. Appl. 2020 doi: 10.1007/s00521-020-05216-8. DOI

Fomicheva M, Specia L. Taking MT evaluation metrics to extremes: Beyond correlation with human judgments. Comput. Linguist. 2019;45:515–558. doi: 10.1162/coli_a_00356. DOI

Moghe, N., Sherborne, T., Steedman, M. & Birch, A. Extrinsic evaluation of machine translation metrics. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 13060–13078 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2023). 10.18653/v1/2023.acl-long.730.

Almahasees ZM. Assessing the translation of Google and Microsoft Bing in translating political texts from Arabic into English. Int. J. Lang. Lit. Linguist. 2017;3:1–4.

Almahasees ZM. Assessment of Google and Microsoft Bing translation of journalistic texts. Int. J. Lang. Lit. Linguist. 2018;4:231–235.

Marzouk S, Hansen-Schirra S. Evaluation of the impact of controlled language on neural machine translation compared to other MT architectures. Mach. Transl. 2019;33:179–203. doi: 10.1007/s10590-019-09233-w. DOI

Li M, Wang M. Optimizing automatic evaluation of machine translation with the ListMLE approach. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2019;18:1–18.

Singh SM, Singh TD. Low resource machine translation of English–Manipuri: A semi-supervised approach. Expert Syst. Appl. 2022;209:118187. doi: 10.1016/j.eswa.2022.118187. DOI

Shterionov D, et al. Human versus automatic quality evaluation of NMT and PBSMT. Mach. Transl. 2018;32:217–235. doi: 10.1007/s10590-018-9220-z. DOI

Tryhubyshyn, I., Tamchyna, A. & Bojar, O. Bad MT systems are good for quality estimation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track 200–208 (Asia-Pacific Association for Machine Translation, Macau SAR, China, 2023).

Kosta, P. Targets, theory and methods of slavic generative syntax: Minimalism, negation and clitics. In: Kempgen, Sebastian / Kosta, Peter / Berger, Tilman / Gutschmidt, Karl (eds.). Slavic Languages. Slavische Sprachen. An International Handbook of their Structure. In Slavic Languages. Slavische Sprachen. An International Handbook of their Structure, their History and their Investigation. Ein internationales Handbuch ihrer Struktur, ihrer Geschichte und ihrer Erforschung. (eds. Kempgen, S., Kosta, P., Berger, T. & Gutschmidt, K.) 282–316 (Berlin, New York: Mouton. de Gruyter, 2009).

Benko, Ľ. & Munková, D. Application of POS tagging in machine translation evaluation. In DIVAI 2016 : 11th International Scientific Conference on Distance Learning in Applied Informatics, Sturovo, May 2–4, 2016 471–489 (Wolters Kluwer, ISSN 2464–7489, Sturovo, 2016).

Munková, D., Kapusta, J. & Drlík, M. System for post-editing and automatic error classification of machine translation. In DIVAI 2016 : 11th International Scientific Conference on Distance Learning in Applied Informatics, Sturovo, May 2–4, 2016 571–579 (Wolters Kluwer, ISSN 2464–7489, Sturovo, 2016).

Munková, D., Munk, M., Benko, Ľ. & Absolon, J. From old fashioned “one size fits all” to tailor made online training. In Advances in Intelligent Systems and Computing vol. 916 365–376 (Springer Verlag, 2020).

Kapusta J, Benko Ľ, Munkova D, Munk M. Analysis of edit operations for post-editing systems. Int. J. Comput. Intell. Syst. 2021;14:197. doi: 10.1007/s44196-021-00048-3. DOI

Varga D, et al. Parallel corpora for medium density languages. Proc. RANLP. 2005;2005:590–596.

Benko, Ľ., Munkova, D., Munk, M., Benková, L. & Hájek, P. Dataset of evaluation error-rate metrics for journalistic texts EN/SK and DE/SK. Mendeley DataV1 (2024).

Qi, P., Zhang, Y., Zhang, Y., Bolton, J. & Manning, C. D. Stanza: A python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations 101–108 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2020). 10.18653/v1/2020.acl-demos.14.

Munk M, Pilkova A, Benko L, Blazekova P, Svec P. Web usage analysis of Pillar 3 disclosed information by deposit customers in turbulent times. Expert Syst Appl. 2021;185:115503. doi: 10.1016/j.eswa.2021.115503. DOI

Munkova D, Munk M, Benko L’, Stastny J. MT evaluation in the context of language complexity. Complexity. 2021;2021:1–15. doi: 10.1155/2021/2806108. DOI

Munkova D, Munk M, Welnitzova K, Jakabovicova J. Product and process analysis of machine translation into the inflectional language. Sage Open. 2021;11:215824402110545. doi: 10.1177/21582440211054501. DOI

Munk M, Munkova D, Benko L. Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics. J. Intell. Fuzzy Syst. 2018;34:3225–3233. doi: 10.3233/JIFS-169505. DOI

Vaňko, J. Kategoriálny rámec pre analýzu chýb strojového prekladu. In Mýliť sa je ľudské (ale aj strojové) (eds. Munkova, D. & Vaňko, J.) 83–100 (UKF v Nitre, Nitra, 2017).

Welnitzova, K. Post-editing of publicistic texts in the context of thinking and editing time. In 7th SWS International Scientific Conference on Arts and Humanities - ISCAH 2020, 25–27 August, 2020 (STEF92Technology, Sofia, 2020). 10.5593/sws.iscah.2020.7.1/s26.29.

Panisova, L. & Munkova, D. Peculiarities of machine translation of newspaper articles from English to Slovak. In Forlang: cudzie jazyky v akademickom prostredí : periodický zborník vedeckých príspevkov a odborných článkov z medzinárodnej vedeckej konferencie konanej 23. - 24. júna 2021 281–290 (Technická univerzita, Kosice, Kosice, Slovakia, 2021).

Skadiņš R, Goba K, Šics V. Improving SMT for Baltic languages with factored models. Front. Artif. Intell. Appl. 2010;219:125–132.

Bentivogli L, Bisazza A, Cettolo M, Federico M. Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French. Comput. Speech Lang. 2018;49:52–70. doi: 10.1016/j.csl.2017.11.004. DOI

Volkart, L., Bouillon, P. & Girletti, S. Statistical vs. neural machine translation: A comparison of MTH and DeepL at Swiss post’s language service. In Proceedings of the 40th Conference Translating and the Computer 145–150 (London, UK, 2018).

Jassem K, Dwojak T. Statistical versus neural machine translation—A case study for a medium size domain-specific bilingual corpus. Poznan Stud. Contemp. Linguist. 2019;55:491–515. doi: 10.1515/psicl-2019-0018. DOI

Hasan, Md. A., Alam, F., Chowdhury, S. A. & Khan, N. Neural vs statistical machine translation: Revisiting the Bangla-English language pair. In 2019 International Conference on Bangla Speech and Language Processing (ICBSLP) 1–5 (IEEE, 2019). 10.1109/ICBSLP47725.2019.201502.

Benkova L, Munkova D, Benko Ľ, Munk M. Evaluation of English–Slovak neural and statistical machine translation. Appl. Sci. 2021;11:2948. doi: 10.3390/app11072948. DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...