JavaScript is NOT enabled !

Please enable JavaScript.

* Show help

Reset

Most cited: 35431364

2 citations in PubMed Filters

Most cited article - PubMed ID 35431364

Why was this cited? Explainable machine learning applied to COVID-19 research literature

Scientometrics. 2022 ; 127 (5) : 2313-2349. [epub] 20220409

ISSN 0138-9130
Source

Article

Multi-class classification of COVID-19 documents using machine learning algorithms

Rabby, Gollam
Author Rabby, Gollam ORCID Department of Information and Knowledge Engineering, Prague University of Economics and Business, Prague, Czech Republic
Berka, Petr
Author Berka, Petr ORCID Department of Information and Knowledge Engineering, Prague University of Economics and Business, Prague, Czech Republic

Journal of intelligent information systems. 2023 ; 60 (2) : 571-591. [epub] 20221129

J Intell Inf Syst
ISSN 0925-9902
Source

In most biomedical research paper corpus, document classification is a crucial task. Even due to the global epidemic, it is a crucial task for researchers across a variety of fields to figure out the relevant scientific research papers accurately and quickly from a flood of biomedical research papers. It can also assist learners or researchers in assigning a research paper to an appropriate category and also help to find the relevant research paper within a very short time. A biomedical document classifier needs to be designed differently to go beyond a "general" text classifier because it's not dependent only on the text itself (i.e. on titles and abstracts) but can also utilize other information like entities extracted using some medical taxonomies or bibliometric data. The main objective of this research was to find out the type of information or features and representation method creates influence the biomedical document classification task. For this reason, we run several experiments on conventional text classification methods with different kinds of features extracted from the titles, abstracts, and bibliometric data. These procedures include data cleaning, feature engineering, and multi-class classification. Eleven different variants of input data tables were created and analyzed using ten machine learning algorithms. We also evaluate the data efficiency and interpretability of these models as essential features of any biomedical research paper classification system for handling specifically the COVID-19 related health crisis. Our major findings are that TF-IDF representations outperform the entity extraction methods and the abstract itself provides sufficient information for correct classification. Out of the used machine learning algorithms, the best performance over various forms of document representation was achieved by Random Forest and Neural Network (BERT). Our results lead to a concrete guideline for practitioners on biomedical document classification.

Keywords
COVID-19, Machine learning algorithms, Multi-class classification, Text mining,
Publication type
Journal Article MeSH

Article

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Journal of biomedical semantics. 2023 Nov 28 ; 14 (1) : 18. [epub] 20231128

J Biomed Semantics
ISSN 2041-1480
Source

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

Keywords
COVID-19, Domain-independent knowledge graph, Influential scholarly document prediction, Machine learning algorithms, Text mining, World health organization,
MeSH
Algorithms MeSH
COVID-19 * MeSH
Language MeSH
Humans MeSH
Pattern Recognition, Automated * MeSH
Machine Learning MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH

* Show help

Why was this cited? Explainable machine learning applied to COVID-19 research literature

Refine by MeSH