PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
33193663
PubMed Central
PMC7653191
DOI
10.3389/fgene.2020.568546
Knihovny.cz E-zdroje
- Klíčová slova
- G quadruplex, bioinformatics and computational biology, deep neural network, genomic, imbalanced data classification, machine learning, web application,
- Publikační typ
- časopisecké články MeSH
G-quadruplexes (G4s) are a class of stable structural nucleic acid secondary structures that are known to play a role in a wide spectrum of genomic functions, such as DNA replication and transcription. The classical understanding of G4 structure points to four variable length guanine strands joined by variable length nucleotide stretches. Experiments using G4 immunoprecipitation and sequencing experiments have produced a high number of highly probable G4 forming genomic sequences. The expense and technical difficulty of experimental techniques highlights the need for computational approaches of G4 identification. Here, we present PENGUINN, a machine learning method based on Convolutional neural networks, that learns the characteristics of G4 sequences and accurately predicts G4s outperforming state-of-the-art methods. We provide both a standalone implementation of the trained model, and a web application that can be used to evaluate sequences for their G4 potential.
Zobrazit více v PubMed
Bailey T. L., Williams N., Misleh C., Li W. W. (2006). MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34 W369–W373. PubMed PMC
Barshai M., Orenstein Y. (2019). “Predicting G-quadruplexes from DNA sequences using multi-kernel convolutional neural networks,” in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB ’19 New York, NY.
Bedrat A., Lacroix L., Mergny J. L. (2016). Re-Evaluation of G-Quadruplex Propensity with G4Hunter. Nucleic Acids Res. 44 1746–1759. 10.1093/nar/gkw006 PubMed DOI PMC
Chambers V. S., Marsico G., Boutell J. M., Di Antonio M., Smith G. P., Balasubramanian S. (2015). High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 33 877–881. 10.1038/nbt.3295 PubMed DOI
Emmert-Streib F., Yang Z., Feng H., Tripathi S., Dehmer M. (2020). An introductory review of deep learning for prediction models with big data. Front. Artif. Intellig. 3:4 10.3389/frai.2020.00004 PubMed DOI PMC
Fitch F. B. (1944). Journal of Symbolic Logic. Storrs: Association for Symbolic Logic, 49–50.
Gellert M., Lipsett M. N., Davies D. R. (1962). Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A. 48 2013–2018. 10.1073/pnas.48.12.2013 PubMed DOI PMC
Georgakilas G. K., Grioni A., Liakos K. G., Chalupova E., Plessas F. C., Alexiou P. (2020). Multi-branch convolutional neural network for identification of small non-coding RNA genomic loci. Sci. Rep. 10:9486. PubMed PMC
Georgakilas G. K., Grioni A., Liakos K. G., Malanikova E., Plessas F. C., Alexiou P. (n.d.). MuStARD: deep learning for intra- and inter-species scanning of functional genomic patterns. bioRxiv [Preprint]. 10.1101/547679v1 DOI
Hänsel-Hertsch R., Beraldi D., Lensing S. V., Marsico G., Zyner K., Parry A., et al. (2016). G-quadruplex structures mark human regulatory chromatin. Nat. Genet. 48 1267–1272. 10.1038/ng.3662 PubMed DOI
Hon J., Martínek T., Zendulka J., Lexa M. (2017). Pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 33 3373–3379. 10.1093/bioinformatics/btx413 PubMed DOI
Huppert J. L. (2005). Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 33 2908–2916. 10.1093/nar/gki609 PubMed DOI PMC
LeCun Y., Bengio Y., Hinton G. (2015). Deep learning. Nature 521 436–444. PubMed
Lombardi E. P., Londoño-Vallejo A. (2020). A guide to computational methods for G-quadruplex prediction. Nucleic Acids Res. 48 1–15. 10.1093/nar/gkz1097 PubMed DOI PMC
Marsico G., Chambers V. S., Sahakyan A. B., McCauley P., Boutell J. M., Di Antonio M., et al. (2019). Whole genome experimental maps of DNA G-quadruplexes in multiple species. Nucleic Acids Res. 47 3862–3874. 10.1093/nar/gkz179 PubMed DOI PMC
Quinlan A. R., Hall I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 841–842. 10.1093/bioinformatics/btq033 PubMed DOI PMC
Sahakyan A. B., Chambers V. S., Marsico G., Santner T., Di Antonio M., Balasubramanian S. (2017). Machine learning model for sequence-driven DNA G-quadruplex formation. Sci. Rep. 7 1–11. PubMed PMC
Sen D., Gilbert W. (1988). Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 334 364–366. 10.1038/334364a0 PubMed DOI
Spiegel J., Adhikari S., Balasubramanian S. (2020). The structure and function of DNA G-quadruplexes. Trends Chem. 24:3074. 10.1016/j.trechm.2019.07.002 PubMed DOI PMC
Tang B., Pan Z., Yin K., Khateeb A. (2019). Recent advances of deep learning in bioinformatics and computational biology. Front. Genet. 10:214. 10.3389/fgene.2019.00214 PubMed DOI PMC