JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Článek
Článek online

FT
Medvik - BMČ

Je něco špatně v tomto záznamu ?

ENNGene: an Easy Neural Network model building tool for Genomics

E. Chalupová, O. Vaculík, J. Poláček, F. Jozefov, T. Majtner, P. Alexiou

Chalupová, Eliška
Autor Chalupová, Eliška Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
Vaculík, Ondřej
Autor Vaculík, Ondřej Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
Poláček, Jakub
Autor Poláček, Jakub Faculty of Informatics, Masaryk University, Brno, Czechia
Jozefov, Filip
Autor Jozefov, Filip Faculty of Informatics, Masaryk University, Brno, Czechia
Majtner, Tomáš
Autor Majtner, Tomáš Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
Alexiou, Panagiotis
Autor Alexiou, Panagiotis ORCID Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. panagiotis.alexiou@ceitec.muni.cz

BMC genomics. 2022 ; 23 (1) : 248. [pub] 20220331

BMC Genomics
ISSN 1471-2164
Medvik
Zdroj

Jazyk angličtina Země Velká Británie

Typ dokumentu časopisecké články

Perzistentní odkaz https://www.medvik.cz/link/bmc22019081

Grantová podpora
867414 H2020 Spreading Excellence and Widening Participation
CZ.02.2.69/0.0/0.0/18 053/0016952 Masarykova Univerzita

Online Plný text

NLK BioMedCentral od 2000-01-12
BioMedCentral Open Access od 2000
Directory of Open Access Journals od 2000
Free Medical Journals od 2000
PubMed Central od 2000
ProQuest Central od 2009-01-01
Open Access Digital Library od 2000-07-01
Open Access Digital Library od 2000-01-01
Open Access Digital Library od 2000-01-01
Medline Complete (EBSCOhost) od 2000-01-01
Health & Medicine (ProQuest) od 2009-01-01
ROAD: Directory of Open Access Scholarly Resources od 2000
Springer Nature OA/Free Journals od 2000-12-01

PubMed 35361122
DOI 10.1186/s12864-022-08414-x
Knihovny.cz E-zdroje

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Central European Institute of Technology Masaryk University Brno Czechia

Faculty of Informatics Masaryk University Brno Czechia

Faculty of Science National Centre for Biomolecular Research Masaryk University Brno Czechia

Citace poskytuje Crossref.org

000: 00000naa a2200000 a 4500

001: bmc22019081

003: CZ-PrNML

005: 20220804135337.0

007: ta

008: 220720s2022 xxk f 000 0|eng||

009: AR

024 7_: $a 10.1186/s12864-022-08414-x $2 doi

035 __: $a (PubMed)35361122

040 __: $a ABA008 $b cze $d ABA008 $e AACR2

041 0_: $a eng

044 __: $a xxk

100 1_: $a Chalupová, Eliška $u Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia

245 10: $a ENNGene: an Easy Neural Network model building tool for Genomics / $c E. Chalupová, O. Vaculík, J. Poláček, F. Jozefov, T. Majtner, P. Alexiou

520 9_: $a BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

650 _2: $a genomika $7 D023281

650 12: $a strojové učení $7 D000069550

650 12: $a neuronové sítě $7 D016571

650 _2: $a sekundární struktura proteinů $7 D017433

655 _2: $a časopisecké články $7 D016428

700 1_: $a Vaculík, Ondřej $u Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia

700 1_: $a Poláček, Jakub $u Faculty of Informatics, Masaryk University, Brno, Czechia

700 1_: $a Jozefov, Filip $u Faculty of Informatics, Masaryk University, Brno, Czechia

700 1_: $a Majtner, Tomáš $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia

700 1_: $a Alexiou, Panagiotis $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. panagiotis.alexiou@ceitec.muni.cz $1 https://orcid.org/0000000334377482

773 0_: $w MED00008181 $t BMC genomics $x 1471-2164 $g Roč. 23, č. 1 (2022), s. 248

856 41: $u https://pubmed.ncbi.nlm.nih.gov/35361122 $y Pubmed

910 __: $a ABA008 $b sig $c sign $y p $z 0

990 __: $a 20220720 $b ABA008

991 __: $a 20220804135330 $b ABA008

999 __: $a ok $b bmc $g 1822617 $s 1170324

BAS __: $a 3

BAS __: $a PreBMC

BMC __: $a 2022 $b 23 $c 1 $d 248 $e 20220331 $i 1471-2164 $m BMC genomics $n BMC Genomics $x MED00008181

GRA __: $a 867414 $p H2020 Spreading Excellence and Widening Participation

GRA __: $a CZ.02.2.69/0.0/0.0/18 053/0016952 $p Masarykova Univerzita

LZP __: $a Pubmed-20220720

Najít záznam

v PubMed

Citační ukazatele

Pouze přihlášení uživatelé

ENNGene: an Easy Neural Network model building tool for Genomics

Najít záznam

Citační ukazatele

Možnosti archivace