JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

Nejvíce citované: 33193663

2 citací v PubMed Filtry

Nejvíce citovaný článek - PubMed ID 33193663

PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks

Frontiers in genetics. 2020 ; 11 () : 568546. [epub] 20201027

Front Genet
ISSN 1664-8021
Zdroj

Článek

Genomic benchmarks: a collection of datasets for genomic sequence classification

Grešová, Katarína
Autor Grešová, Katarína Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czechia
Martinek, Vlastimil
Autor Martinek, Vlastimil Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czechia
Čechák, David
Autor Čechák, David Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czechia
Šimeček, Petr
Autor Šimeček, Petr ORCID Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. petr.simecek@ceitec.muni.cz
Alexiou, Panagiotis
Autor Alexiou, Panagiotis Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia

BMC genomic data. 2023 May 01 ; 24 (1) : 25. [epub] 20230501

BMC Genom Data
ISSN 2730-6844
Zdroj

BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks . CONCLUSIONS: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.

Klíčová slova
Benchmark, Convolutional neural network, Dataset, Deep learning, Genomics,
MeSH
benchmarking * MeSH
chromatin MeSH
genomika metody MeSH
lidé MeSH
myši MeSH
neuronové sítě * MeSH
strojové učení MeSH
zvířata MeSH
Check Tag
lidé MeSH
myši MeSH
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
Názvy látek
chromatin MeSH

Článek

Using Attribution Sequence Alignment to Interpret Deep Learning Models for miRNA Binding Site Prediction

Biology. 2023 Feb 26 ; 12 (3) : . [epub] 20230226

Biology (Basel)
ISSN 2079-7737
Zdroj

MicroRNAs (miRNAs) are small non-coding RNAs that play a central role in the post-transcriptional regulation of biological processes. miRNAs regulate transcripts through direct binding involving the Argonaute protein family. The exact rules of binding are not known, and several in silico miRNA target prediction methods have been developed to date. Deep learning has recently revolutionized miRNA target prediction. However, the higher predictive power comes with a decreased ability to interpret increasingly complex models. Here, we present a novel interpretation technique, called attribution sequence alignment, for miRNA target site prediction models that can interpret such deep learning models on a two-dimensional representation of miRNA and putative target sequence. Our method produces a human readable visual representation of miRNA:target interactions and can be used as a proxy for the further interpretation of biological concepts learned by the neural network. We demonstrate applications of this method in the clustering of experimental data into binding classes, as well as using the method to narrow down predicted miRNA binding sites on long transcript sequences. Importantly, the presented method works with any neural network model trained on a two-dimensional representation of interactions and can be easily extended to further domains such as protein-protein interactions.

Klíčová slova
CLASH, deep learning, interpretation, miRNA target prediction, visualization,
Publikační typ
časopisecké články MeSH

* Zobrazit nápovědu

PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks

Upřesnit dle MeSH