JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

Autor: Alexiou, Panagiotis

8 záznamů v Medvik Filtry

Článek

Deep learning and direct sequencing of labeled RNA captures transcriptome dynamics

Martinek, Vlastimil
Autor Martinek, Vlastimil ORCID Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA Central European Institute of Technology, Masaryk University, 625 00 Brno, Czech Republic National Centre for Biomolecular Research, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
Martin, Jessica
Autor Martin, Jessica ORCID Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
Belair, Cedric
Autor Belair, Cedric ORCID Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
Payea, Matthew J
Autor Payea, Matthew J ORCID Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
Malla, Sulochan
Autor Malla, Sulochan ORCID Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
Alexiou, Panagiotis
Autor Alexiou, Panagiotis ORCID Centre for Molecular Medicine & Biobanking, University of Malta, MSD 2080 Msida, Malta
Maragkakis, Manolis
Autor Maragkakis, Manolis ORCID Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA

NAR genomics and bioinformatics. 2024 ; 6 (3) : lqae116. [pub] 20240829

NAR Genom Bioinform
ISSN 2631-9268
Medvik
Zdroj

In eukaryotes, genes produce a variety of distinct RNA isoforms, each with potentially unique protein products, coding potential or regulatory signals such as poly(A) tail and nucleotide modifications. Assessing the kinetics of RNA isoform metabolism, such as transcription and decay rates, is essential for unraveling gene regulation. However, it is currently impeded by lack of methods that can differentiate between individual isoforms. Here, we introduce RNAkinet, a deep convolutional and recurrent neural network, to detect nascent RNA molecules following metabolic labeling with the nucleoside analog 5-ethynyl uridine and long-read, direct RNA sequencing with nanopores. RNAkinet processes electrical signals from nanopore sequencing directly and distinguishes nascent from pre-existing RNA molecules. Our results show that RNAkinet prediction performance generalizes in various cell types and organisms and can be used to quantify RNA isoform half-lives. RNAkinet is expected to enable the identification of the kinetic parameters of RNA isoforms and to facilitate studies of RNA metabolism and the regulatory elements that influence it.

Publikační typ
časopisecké články MeSH

Článek

Genomic benchmarks: a collection of datasets for genomic sequence classification

BMC genomic data. 2023 ; 24 (1) : 25. [pub] 20230501

BMC Genom Data
ISSN 2730-6844
Medvik
Zdroj

BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks . CONCLUSIONS: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.

Článek

miRBind: A Deep Learning Method for miRNA Binding Classification

Genes. 2022 ; 13 (12) : . [pub] 20221209

ISSN 2073-4425
Medvik
Zdroj

The binding of microRNAs (miRNAs) to their target sites is a complex process, mediated by the Argonaute (Ago) family of proteins. The prediction of miRNA:target site binding is an important first step for any miRNA target prediction algorithm. To date, the potential for miRNA:target site binding is evaluated using either co-folding free energy measures or heuristic approaches, based on the identification of binding 'seeds', i.e., continuous stretches of binding corresponding to specific parts of the miRNA. The limitations of both these families of methods have produced generations of miRNA target prediction algorithms that are primarily focused on 'canonical' seed targets, even though unbiased experimental methods have shown that only approximately half of in vivo miRNA targets are 'canonical'. Herein, we present miRBind, a deep learning method and web server that can be used to accurately predict the potential of miRNA:target site binding. We trained our method using seed-agnostic experimental data and show that our method outperforms both seed-based approaches and co-fold free energy approaches. The full code for the development of miRBind and a freely accessible web server are freely available.

MeSH
algoritmy MeSH
Argonaut proteiny genetika metabolismus MeSH
deep learning * MeSH
mikro RNA * genetika metabolismus MeSH
výpočetní biologie metody MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

ENNGene: an Easy Neural Network model building tool for Genomics

BMC genomics. 2022 ; 23 (1) : 248. [pub] 20220331

BMC Genomics
ISSN 1471-2164
Medvik
Zdroj

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Článek

DIANA-miRGen v4: indexing promoters and regulators for more than 1500 microRNAs

Nucleic acids research. 2021 ; 49 (D1) : D151-D159. [pub] 20210108

Nucleic Acids Res
ISSN 1362-4962
Medvik
Zdroj

Deregulation of microRNA (miRNA) expression plays a critical role in the transition from a physiological to a pathological state. The accurate miRNA promoter identification in multiple cell types is a fundamental endeavor towards understanding and characterizing the underlying mechanisms of both physiological as well as pathological conditions. DIANA-miRGen v4 (www.microrna.gr/mirgenv4) provides cell type specific miRNA transcription start sites (TSSs) for over 1500 miRNAs retrieved from the analysis of >1000 cap analysis of gene expression (CAGE) samples corresponding to 133 tissues, cell lines and primary cells available in FANTOM repository. MiRNA TSS locations were associated with transcription factor binding site (TFBSs) annotation, for >280 TFs, derived from analyzing the majority of ENCODE ChIP-Seq datasets. For the first time, clusters of cell types having common miRNA TSSs are characterized and provided through a user friendly interface with multiple layers of customization. DIANA-miRGen v4 significantly improves our understanding of miRNA biogenesis regulation at the transcriptional level by providing a unique integration of high-quality annotations for hundreds of cell specific miRNA promoters with experimentally derived TFBSs.

MeSH
anotace sekvence MeSH
buněčné linie MeSH
databáze nukleových kyselin * MeSH
genetická transkripce MeSH
genom * MeSH
internet MeSH
lidé MeSH
mikro RNA genetika metabolismus MeSH
počátek transkripce MeSH
primární buněčná kultura MeSH
promotorové oblasti (genetika) * MeSH
sekvence nukleotidů MeSH
software * MeSH
transkripční faktory genetika metabolismus MeSH
vazba proteinů MeSH
Check Tag
lidé MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks

Frontiers in genetics. 2020 ; 11 (-) : 568546. [pub] 20201027

Front Genet
ISSN 1664-8021
Medvik
Zdroj

G-quadruplexes (G4s) are a class of stable structural nucleic acid secondary structures that are known to play a role in a wide spectrum of genomic functions, such as DNA replication and transcription. The classical understanding of G4 structure points to four variable length guanine strands joined by variable length nucleotide stretches. Experiments using G4 immunoprecipitation and sequencing experiments have produced a high number of highly probable G4 forming genomic sequences. The expense and technical difficulty of experimental techniques highlights the need for computational approaches of G4 identification. Here, we present PENGUINN, a machine learning method based on Convolutional neural networks, that learns the characteristics of G4 sequences and accurately predicts G4s outperforming state-of-the-art methods. We provide both a standalone implementation of the trained model, and a web application that can be used to evaluate sequences for their G4 potential.

Publikační typ
časopisecké články MeSH

Článek

CDK9 activity is critical for maintaining MDM4 overexpression in tumor cells

Cell death & disease. 2020 ; 11 (9) : 754. [pub] 20200915

Cell Death Dis
ISSN 2041-4889
Medvik
Zdroj

The identification of the essential role of cyclin-dependent kinases (CDKs) in the control of cell division has prompted the development of small-molecule CDK inhibitors as anticancer drugs. For many of these compounds, the precise mechanism of action in individual tumor types remains unclear as they simultaneously target different classes of CDKs - enzymes controlling the cell cycle progression as well as CDKs involved in the regulation of transcription. CDK inhibitors are also capable of activating p53 tumor suppressor in tumor cells retaining wild-type p53 gene by modulating MDM2 levels and activity. In the current study, we link, for the first time, CDK activity to the overexpression of the MDM4 (MDMX) oncogene in cancer cells. Small-molecule drugs targeting the CDK9 kinase, dinaciclib, flavopiridol, roscovitine, AT-7519, SNS-032, and DRB, diminished MDM4 levels and activated p53 in A375 melanoma and MCF7 breast carcinoma cells with only a limited effect on MDM2. These results suggest that MDM4, rather than MDM2, could be the primary transcriptional target of pharmacological CDK inhibitors in the p53 pathway. CDK9 inhibitor atuveciclib downregulated MDM4 and enhanced p53 activity induced by nutlin-3a, an inhibitor of p53-MDM2 interaction, and synergized with nutlin-3a in killing A375 melanoma cells. Furthermore, we found that human pluripotent stem cell lines express significant levels of MDM4, which are also maintained by CDK9 activity. In summary, we show that CDK9 activity is essential for the maintenance of high levels of MDM4 in human cells, and drugs targeting CDK9 might restore p53 tumor suppressor function in malignancies overexpressing MDM4.

Článek

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

Scientific reports. 2020 ; 10 (1) : 9486. [pub] 20200611

Sci Rep
ISSN 2045-2322
Medvik
Zdroj

Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.

Kolekce

Publikováno

Filtry

* Zobrazit nápovědu

* Zobrazit nápovědu

Upřesnit dle MeSH