-
Je něco špatně v tomto záznamu ?
ENNGene: an Easy Neural Network model building tool for Genomics
E. Chalupová, O. Vaculík, J. Poláček, F. Jozefov, T. Majtner, P. Alexiou
Jazyk angličtina Země Velká Británie
Typ dokumentu časopisecké články
Grantová podpora
867414
H2020 Spreading Excellence and Widening Participation
CZ.02.2.69/0.0/0.0/18 053/0016952
Masarykova Univerzita
NLK
BioMedCentral
od 2000-12-01
BioMedCentral Open Access
od 2000
Directory of Open Access Journals
od 2000
Free Medical Journals
od 2000
PubMed Central
od 2000
ProQuest Central
od 2009-01-01
Open Access Digital Library
od 2000-01-01
Open Access Digital Library
od 2000-01-01
Open Access Digital Library
od 2000-07-01
Medline Complete (EBSCOhost)
od 2000-01-01
Health & Medicine (ProQuest)
od 2009-01-01
ROAD: Directory of Open Access Scholarly Resources
od 2000
Springer Nature OA/Free Journals
od 2000-12-01
- MeSH
- genomika MeSH
- neuronové sítě (počítačové) * MeSH
- sekundární struktura proteinů MeSH
- strojové učení * MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.
Central European Institute of Technology Masaryk University Brno Czechia
Faculty of Informatics Masaryk University Brno Czechia
Faculty of Science National Centre for Biomolecular Research Masaryk University Brno Czechia
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc22019081
- 003
- CZ-PrNML
- 005
- 20220804135337.0
- 007
- ta
- 008
- 220720s2022 xxk f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1186/s12864-022-08414-x $2 doi
- 035 __
- $a (PubMed)35361122
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxk
- 100 1_
- $a Chalupová, Eliška $u Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
- 245 10
- $a ENNGene: an Easy Neural Network model building tool for Genomics / $c E. Chalupová, O. Vaculík, J. Poláček, F. Jozefov, T. Majtner, P. Alexiou
- 520 9_
- $a BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.
- 650 _2
- $a genomika $7 D023281
- 650 12
- $a strojové učení $7 D000069550
- 650 12
- $a neuronové sítě (počítačové) $7 D016571
- 650 _2
- $a sekundární struktura proteinů $7 D017433
- 655 _2
- $a časopisecké články $7 D016428
- 700 1_
- $a Vaculík, Ondřej $u Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
- 700 1_
- $a Poláček, Jakub $u Faculty of Informatics, Masaryk University, Brno, Czechia
- 700 1_
- $a Jozefov, Filip $u Faculty of Informatics, Masaryk University, Brno, Czechia
- 700 1_
- $a Majtner, Tomáš $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
- 700 1_
- $a Alexiou, Panagiotis $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. panagiotis.alexiou@ceitec.muni.cz $1 https://orcid.org/0000000334377482
- 773 0_
- $w MED00008181 $t BMC genomics $x 1471-2164 $g Roč. 23, č. 1 (2022), s. 248
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/35361122 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y p $z 0
- 990 __
- $a 20220720 $b ABA008
- 991 __
- $a 20220804135330 $b ABA008
- 999 __
- $a ok $b bmc $g 1822617 $s 1170324
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2022 $b 23 $c 1 $d 248 $e 20220331 $i 1471-2164 $m BMC genomics $n BMC Genomics $x MED00008181
- GRA __
- $a 867414 $p H2020 Spreading Excellence and Widening Participation
- GRA __
- $a CZ.02.2.69/0.0/0.0/18 053/0016952 $p Masarykova Univerzita
- LZP __
- $a Pubmed-20220720