JavaScript is NOT enabled !

Please enable JavaScript.

* Show help

Reset

Most cited: 32528107

4 citations in PubMed Filters

Most cited article - PubMed ID 32528107

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

Scientific reports. 2020 Jun 11 ; 10 (1) : 9486. [epub] 20200611

Sci Rep
ISSN 2045-2322
Source

Article

Genomic benchmarks: a collection of datasets for genomic sequence classification

Grešová, Katarína
Author Grešová, Katarína Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czechia
Martinek, Vlastimil
Author Martinek, Vlastimil Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czechia
Čechák, David
Author Čechák, David Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czechia
Šimeček, Petr
Author Šimeček, Petr ORCID Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. petr.simecek@ceitec.muni.cz
Alexiou, Panagiotis
Author Alexiou, Panagiotis Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia

BMC genomic data. 2023 May 01 ; 24 (1) : 25. [epub] 20230501

BMC Genom Data
ISSN 2730-6844
Source

BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks . CONCLUSIONS: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.

Keywords
Benchmark, Convolutional neural network, Dataset, Deep learning, Genomics,
MeSH
Benchmarking * MeSH
Chromatin MeSH
Genomics methods MeSH
Humans MeSH
Mice MeSH
Neural Networks, Computer * MeSH
Machine Learning MeSH
Animals MeSH
Check Tag
Humans MeSH
Mice MeSH
Animals MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Names of Substances
Chromatin MeSH

Article

miRBind: A Deep Learning Method for miRNA Binding Classification

Genes. 2022 Dec 09 ; 13 (12) : . [epub] 20221209

Genes (Basel)
ISSN 2073-4425
Source

The binding of microRNAs (miRNAs) to their target sites is a complex process, mediated by the Argonaute (Ago) family of proteins. The prediction of miRNA:target site binding is an important first step for any miRNA target prediction algorithm. To date, the potential for miRNA:target site binding is evaluated using either co-folding free energy measures or heuristic approaches, based on the identification of binding 'seeds', i.e., continuous stretches of binding corresponding to specific parts of the miRNA. The limitations of both these families of methods have produced generations of miRNA target prediction algorithms that are primarily focused on 'canonical' seed targets, even though unbiased experimental methods have shown that only approximately half of in vivo miRNA targets are 'canonical'. Herein, we present miRBind, a deep learning method and web server that can be used to accurately predict the potential of miRNA:target site binding. We trained our method using seed-agnostic experimental data and show that our method outperforms both seed-based approaches and co-fold free energy approaches. The full code for the development of miRBind and a freely accessible web server are freely available.

Keywords
CLASH, convolutional neural network, miRNA binding, miRNA:target prediction,
MeSH
Algorithms MeSH
Argonaute Proteins genetics metabolism MeSH
Deep Learning * MeSH
MicroRNAs * genetics metabolism MeSH
Computational Biology methods MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Names of Substances
Argonaute Proteins MeSH
MicroRNAs * MeSH

Article

ENNGene: an Easy Neural Network model building tool for Genomics

BMC genomics. 2022 Mar 31 ; 23 (1) : 248. [epub] 20220331

BMC Genomics
ISSN 1471-2164
Source

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Keywords
Convolutional Neural Network, Deep Learning, Evolutionary Conservation Score, GUI, RNA Secondary Structure, Recurrent Neural Network,
MeSH
Genomics MeSH
Neural Networks, Computer * MeSH
Protein Structure, Secondary MeSH
Machine Learning * MeSH
Publication type
Journal Article MeSH

Article

PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks

Frontiers in genetics. 2020 ; 11 () : 568546. [epub] 20201027

Front Genet
ISSN 1664-8021
Source

G-quadruplexes (G4s) are a class of stable structural nucleic acid secondary structures that are known to play a role in a wide spectrum of genomic functions, such as DNA replication and transcription. The classical understanding of G4 structure points to four variable length guanine strands joined by variable length nucleotide stretches. Experiments using G4 immunoprecipitation and sequencing experiments have produced a high number of highly probable G4 forming genomic sequences. The expense and technical difficulty of experimental techniques highlights the need for computational approaches of G4 identification. Here, we present PENGUINN, a machine learning method based on Convolutional neural networks, that learns the characteristics of G4 sequences and accurately predicts G4s outperforming state-of-the-art methods. We provide both a standalone implementation of the trained model, and a web application that can be used to evaluate sequences for their G4 potential.

Keywords
G quadruplex, bioinformatics and computational biology, deep neural network, genomic, imbalanced data classification, machine learning, web application,
Publication type
Journal Article MeSH

* Show help

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

Refine by MeSH