JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

Autor: Chalupova, Eliska

2 záznamů v Medvik Filtry

Článek

ENNGene: an Easy Neural Network model building tool for Genomics

Chalupová, Eliška
Autor Chalupová, Eliška Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
Vaculík, Ondřej
Autor Vaculík, Ondřej Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
Poláček, Jakub
Autor Poláček, Jakub Faculty of Informatics, Masaryk University, Brno, Czechia
Jozefov, Filip
Autor Jozefov, Filip Faculty of Informatics, Masaryk University, Brno, Czechia
Majtner, Tomáš
Autor Majtner, Tomáš Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
Alexiou, Panagiotis
Autor Alexiou, Panagiotis ORCID Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. panagiotis.alexiou@ceitec.muni.cz

BMC genomics. 2022 ; 23 (1) : 248. [pub] 20220331

BMC Genomics
ISSN 1471-2164
Medvik
Zdroj

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Článek

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

Scientific reports. 2020 ; 10 (1) : 9486. [pub] 20200611

Sci Rep
ISSN 2045-2322
Medvik
Zdroj

Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.

Kolekce

Publikováno

Filtry

* Zobrazit nápovědu

* Zobrazit nápovědu

Upřesnit dle MeSH