Detail
Článek
FT
Medvik - BMČ
  • Je něco špatně v tomto záznamu ?

ENNGene: an Easy Neural Network model building tool for Genomics

E. Chalupová, O. Vaculík, J. Poláček, F. Jozefov, T. Majtner, P. Alexiou

. 2022 ; 23 (1) : 248. [pub] 20220331

Jazyk angličtina Země Velká Británie

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/bmc22019081

Grantová podpora
867414 H2020 Spreading Excellence and Widening Participation
CZ.02.2.69/0.0/0.0/18 053/0016952 Masarykova Univerzita

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

000      
00000naa a2200000 a 4500
001      
bmc22019081
003      
CZ-PrNML
005      
20220804135337.0
007      
ta
008      
220720s2022 xxk f 000 0|eng||
009      
AR
024    7_
$a 10.1186/s12864-022-08414-x $2 doi
035    __
$a (PubMed)35361122
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxk
100    1_
$a Chalupová, Eliška $u Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
245    10
$a ENNGene: an Easy Neural Network model building tool for Genomics / $c E. Chalupová, O. Vaculík, J. Poláček, F. Jozefov, T. Majtner, P. Alexiou
520    9_
$a BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.
650    _2
$a genomika $7 D023281
650    12
$a strojové učení $7 D000069550
650    12
$a neuronové sítě (počítačové) $7 D016571
650    _2
$a sekundární struktura proteinů $7 D017433
655    _2
$a časopisecké články $7 D016428
700    1_
$a Vaculík, Ondřej $u Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
700    1_
$a Poláček, Jakub $u Faculty of Informatics, Masaryk University, Brno, Czechia
700    1_
$a Jozefov, Filip $u Faculty of Informatics, Masaryk University, Brno, Czechia
700    1_
$a Majtner, Tomáš $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
700    1_
$a Alexiou, Panagiotis $u Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. panagiotis.alexiou@ceitec.muni.cz $1 https://orcid.org/0000000334377482
773    0_
$w MED00008181 $t BMC genomics $x 1471-2164 $g Roč. 23, č. 1 (2022), s. 248
856    41
$u https://pubmed.ncbi.nlm.nih.gov/35361122 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y p $z 0
990    __
$a 20220720 $b ABA008
991    __
$a 20220804135330 $b ABA008
999    __
$a ok $b bmc $g 1822617 $s 1170324
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2022 $b 23 $c 1 $d 248 $e 20220331 $i 1471-2164 $m BMC genomics $n BMC Genomics $x MED00008181
GRA    __
$a 867414 $p H2020 Spreading Excellence and Widening Participation
GRA    __
$a CZ.02.2.69/0.0/0.0/18 053/0016952 $p Masarykova Univerzita
LZP    __
$a Pubmed-20220720

Najít záznam

Citační ukazatele

Nahrávání dat...

Možnosti archivace

Nahrávání dat...