PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks

. 2020 ; 11 () : 568546. [epub] 20201027

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid33193663

G-quadruplexes (G4s) are a class of stable structural nucleic acid secondary structures that are known to play a role in a wide spectrum of genomic functions, such as DNA replication and transcription. The classical understanding of G4 structure points to four variable length guanine strands joined by variable length nucleotide stretches. Experiments using G4 immunoprecipitation and sequencing experiments have produced a high number of highly probable G4 forming genomic sequences. The expense and technical difficulty of experimental techniques highlights the need for computational approaches of G4 identification. Here, we present PENGUINN, a machine learning method based on Convolutional neural networks, that learns the characteristics of G4 sequences and accurately predicts G4s outperforming state-of-the-art methods. We provide both a standalone implementation of the trained model, and a web application that can be used to evaluate sequences for their G4 potential.

Zobrazit více v PubMed

Bailey T. L., Williams N., Misleh C., Li W. W. (2006). MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34 W369–W373. PubMed PMC

Barshai M., Orenstein Y. (2019). “Predicting G-quadruplexes from DNA sequences using multi-kernel convolutional neural networks,” in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB ’19 New York, NY.

Bedrat A., Lacroix L., Mergny J. L. (2016). Re-Evaluation of G-Quadruplex Propensity with G4Hunter. Nucleic Acids Res. 44 1746–1759. 10.1093/nar/gkw006 PubMed DOI PMC

Chambers V. S., Marsico G., Boutell J. M., Di Antonio M., Smith G. P., Balasubramanian S. (2015). High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 33 877–881. 10.1038/nbt.3295 PubMed DOI

Emmert-Streib F., Yang Z., Feng H., Tripathi S., Dehmer M. (2020). An introductory review of deep learning for prediction models with big data. Front. Artif. Intellig. 3:4 10.3389/frai.2020.00004 PubMed DOI PMC

Fitch F. B. (1944). Journal of Symbolic Logic. Storrs: Association for Symbolic Logic, 49–50.

Gellert M., Lipsett M. N., Davies D. R. (1962). Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A. 48 2013–2018. 10.1073/pnas.48.12.2013 PubMed DOI PMC

Georgakilas G. K., Grioni A., Liakos K. G., Chalupova E., Plessas F. C., Alexiou P. (2020). Multi-branch convolutional neural network for identification of small non-coding RNA genomic loci. Sci. Rep. 10:9486. PubMed PMC

Georgakilas G. K., Grioni A., Liakos K. G., Malanikova E., Plessas F. C., Alexiou P. (n.d.). MuStARD: deep learning for intra- and inter-species scanning of functional genomic patterns. bioRxiv [Preprint]. 10.1101/547679v1 DOI

Hänsel-Hertsch R., Beraldi D., Lensing S. V., Marsico G., Zyner K., Parry A., et al. (2016). G-quadruplex structures mark human regulatory chromatin. Nat. Genet. 48 1267–1272. 10.1038/ng.3662 PubMed DOI

Hon J., Martínek T., Zendulka J., Lexa M. (2017). Pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 33 3373–3379. 10.1093/bioinformatics/btx413 PubMed DOI

Huppert J. L. (2005). Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 33 2908–2916. 10.1093/nar/gki609 PubMed DOI PMC

LeCun Y., Bengio Y., Hinton G. (2015). Deep learning. Nature 521 436–444. PubMed

Lombardi E. P., Londoño-Vallejo A. (2020). A guide to computational methods for G-quadruplex prediction. Nucleic Acids Res. 48 1–15. 10.1093/nar/gkz1097 PubMed DOI PMC

Marsico G., Chambers V. S., Sahakyan A. B., McCauley P., Boutell J. M., Di Antonio M., et al. (2019). Whole genome experimental maps of DNA G-quadruplexes in multiple species. Nucleic Acids Res. 47 3862–3874. 10.1093/nar/gkz179 PubMed DOI PMC

Quinlan A. R., Hall I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 841–842. 10.1093/bioinformatics/btq033 PubMed DOI PMC

Sahakyan A. B., Chambers V. S., Marsico G., Santner T., Di Antonio M., Balasubramanian S. (2017). Machine learning model for sequence-driven DNA G-quadruplex formation. Sci. Rep. 7 1–11. PubMed PMC

Sen D., Gilbert W. (1988). Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 334 364–366. 10.1038/334364a0 PubMed DOI

Spiegel J., Adhikari S., Balasubramanian S. (2020). The structure and function of DNA G-quadruplexes. Trends Chem. 24:3074. 10.1016/j.trechm.2019.07.002 PubMed DOI PMC

Tang B., Pan Z., Yin K., Khateeb A. (2019). Recent advances of deep learning in bioinformatics and computational biology. Front. Genet. 10:214. 10.3389/fgene.2019.00214 PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Genomic benchmarks: a collection of datasets for genomic sequence classification

. 2023 May 01 ; 24 (1) : 25. [epub] 20230501

Using Attribution Sequence Alignment to Interpret Deep Learning Models for miRNA Binding Site Prediction

. 2023 Feb 26 ; 12 (3) : . [epub] 20230226

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...