Most cited article - PubMed ID 33341900
SARS-CoV-2 hot-spot mutations are significantly enriched within inverted repeats and CpG island loci
Retroviruses are among the most extensively studied viral families, both historically and in contemporary research. They are primarily investigated in the fields of viral oncogenesis, reverse transcription mechanisms, and other infection-specific aspects. These include the integration of endogenous retroviruses (ERVs) into host genomes, a process widely utilized in genetic engineering, and the ongoing search for HIV/AIDS treatment. G-quadruplexes (G4) have emerged as potential therapeutic targets in antiviral therapy and have been identified in important regulatory regions of viral genomes. In this study, we examine the presence of potential G-quadruplex-forming sequences (PQS) across all currently available unique retroviral genomes. Given that these retroviral genomes typically consist of single-stranded RNA (ssRNA) molecules, we also investigated whether the localization of PQSs is strand-dependent. This is particularly relevant since antisense transcripts have been detected in HIV, and ERV integration into the host genome involves reverse transcription from genomic positive strand ssRNA to double-stranded DNA (dsDNA), implicating both strands in this process. We show that in most mammalian retroviruses, including human retroviruses, PQSs are significantly more prevalent on the negative (antisense) strand, with some notable exceptions such as HIV-1. In sharp contrast, avian retroviruses exhibit a higher prevalence of PQSs on the positive (sense) strand.
- Keywords
- Bioinformatics, G-quadruplex, G4Hunter, Persistent infection, Retroviral genome,
- MeSH
- Endogenous Retroviruses genetics MeSH
- G-Quadruplexes * MeSH
- Genome, Viral * MeSH
- Humans MeSH
- Retroviridae * genetics MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Nucleic acids are not only static carriers of genetic information but also play vital roles in controlling cellular lifecycles through their fascinating structural diversity [...].
- MeSH
- DNA * chemistry metabolism MeSH
- Nucleic Acid Conformation * MeSH
- Humans MeSH
- RNA * chemistry metabolism MeSH
- Computational Biology * methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Introductory Journal Article MeSH
- Editorial MeSH
- Names of Substances
- DNA * MeSH
- RNA * MeSH
Noncanonical secondary structures in nucleic acids have been studied intensively in recent years. Important biological roles of cruciform structures formed by inverted repeats (IRs) have been demonstrated in diverse organisms, including humans. Using Palindrome analyser, we analyzed IRs in all accessible bacterial genome sequences to determine their frequencies, lengths, and localizations. IR sequences were identified in all species, but their frequencies differed significantly across various evolutionary groups. We detected 242,373,717 IRs in all 1,565 bacterial genomes. The highest mean IR frequency was detected in the Tenericutes (61.89 IRs/kbp) and the lowest mean frequency was found in the Alphaproteobacteria (27.08 IRs/kbp). IRs were abundant near genes and around regulatory, tRNA, transfer-messenger RNA (tmRNA), and rRNA regions, pointing to the importance of IRs in such basic cellular processes as genome maintenance, DNA replication, and transcription. Moreover, we found that organisms with high IR frequencies were more likely to be endosymbiotic, antibiotic producing, or pathogenic. On the other hand, those with low IR frequencies were far more likely to be thermophilic. This first comprehensive analysis of IRs in all available bacterial genomes demonstrates their genomic ubiquity, nonrandom distribution, and enrichment in genomic regulatory regions. IMPORTANCE Our manuscript reports for the first time a complete analysis of inverted repeats in all fully sequenced bacterial genomes. Thanks to the availability of unique computational resources, we were able to statistically evaluate the presence and localization of these important regulatory sequences in bacterial genomes. This work revealed a strong abundance of these sequences in regulatory regions and provides researchers with a valuable tool for their manipulation.
- Keywords
- Palindrome analyser, bacteria domain, bacterial genome analysis, inverted repeats,
- MeSH
- Bacteria genetics MeSH
- Phylogeny MeSH
- Genomics * MeSH
- Humans MeSH
- DNA Replication * MeSH
- Base Sequence MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Hepatitis B virus (HBV) is one of the most dangerous human pathogenic viruses found in all corners of the world. Recent sequencing of ancient HBV viruses revealed that these viruses have accompanied humanity for several millenia. As G-quadruplexes are considered to be potential therapeutic targets in virology, we examined G-quadruplex-forming sequences (PQS) in modern and ancient HBV genomes. Our analyses showed the presence of PQS in all 232 tested HBV genomes, with a total number of 1258 motifs and an average frequency of 1.69 PQS per kbp. Notably, the PQS with the highest G4Hunter score in the reference genome is the most highly conserved. Interestingly, the density of PQS motifs is lower in ancient HBV genomes than in their modern counterparts (1.5 and 1.9/kb, respectively). This modern frequency of 1.90 is very close to the PQS frequency of the human genome (1.93) using identical parameters. This indicates that the PQS content in HBV increased over time to become closer to the PQS frequency in the human genome. No statistically significant differences were found between PQS densities in HBV lineages found in different continents. These results, which constitute the first paleogenomics analysis of G4 propensity, are in agreement with our hypothesis that, for viruses causing chronic infections, their PQS frequencies tend to converge evolutionarily with those of their hosts, as a kind of 'genetic camouflage' to both hijack host cell transcriptional regulatory systems and to avoid recognition as foreign material.
- MeSH
- Biological Evolution MeSH
- G-Quadruplexes * MeSH
- Genome, Human MeSH
- Genomics MeSH
- Humans MeSH
- Paleontology MeSH
- Hepatitis B virus * genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Epigenetics deals with changes in gene expression that are not caused by modifications in the primary sequence of nucleic acids. These changes beyond primary structures of nucleic acids not only include DNA/RNA methylation, but also other reversible conversions, together with histone modifications or RNA interference. In addition, under particular conditions (such as specific ion concentrations or protein-induced stabilization), the right-handed double-stranded DNA helix (B-DNA) can form noncanonical structures commonly described as "non-B DNA" structures. These structures comprise, for example, cruciforms, i-motifs, triplexes, and G-quadruplexes. Their formation often leads to significant differences in replication and transcription rates. Noncanonical RNA structures have also been documented to play important roles in translation regulation and the biology of noncoding RNAs. In human and animal studies, the frequency and dynamics of noncanonical DNA and RNA structures are intensively investigated, especially in the field of cancer research and neurodegenerative diseases. In contrast, noncanonical DNA and RNA structures in plants have been on the fringes of interest for a long time and only a few studies deal with their formation, regulation, and physiological importance for plant stress responses. Herein, we present a review focused on the main fields of epigenetics in plants and their possible roles in stress responses and signaling, with special attention dedicated to noncanonical DNA and RNA structures.
- Keywords
- Acetylation, Chromatin, Epigenetics, G-quadruplex, Gene expression, Histone, Methylation, Non-B DNA, Stress signaling,
- MeSH
- DNA genetics chemistry MeSH
- Epigenesis, Genetic MeSH
- G-Quadruplexes * MeSH
- Humans MeSH
- Nucleic Acids * MeSH
- RNA genetics chemistry MeSH
- Plants genetics MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA MeSH
- Nucleic Acids * MeSH
- RNA MeSH
Non-B nucleic acids structures have arisen as key contributors to genetic variation in SARS-CoV-2. Herein, we investigated the presence of defining spike protein mutations falling within inverted repeats (IRs) for 18 SARS-CoV-2 variants, discussed the potential roles of G-quadruplexes (G4s) in SARS-CoV-2 biology, and identified potential pseudoknots within the SARS-CoV-2 genome. Surprisingly, there was a large variation in the number of defining spike protein mutations arising within IRs between variants and these were more likely to occur in the stem region of the predicted hairpin stem-loop secondary structure. Notably, mutations implicated in ACE2 binding and propagation (e.g., ΔH69/V70, N501Y, and D614G) were likely to occur within IRs, whilst mutations involved in antibody neutralization and reduced vaccine efficacy (e.g., T19R, ΔE156, ΔF157, R158G, and G446S) were rarely found within IRs. We also predicted that RNA pseudoknots could predominantly be found within, or next to, 29 mutations found in the SARS-CoV-2 spike protein. Finally, the Omicron variants BA.2, BA.4, BA.5, BA.2.12.1, and BA.2.75 appear to have lost two of the predicted G4-forming sequences found in other variants. These were found in nsp2 and the sequence complementary to the conserved stem-loop II-like motif (S2M) in the 3' untranslated region (UTR). Taken together, non-B nucleic acids structures likely play an integral role in SARS-CoV-2 evolution and genetic diversity.
- Keywords
- G-quadruplex, SARS-CoV-2, adaptation, inverted repeats, mutation, pseudoknot, spike protein,
- MeSH
- 3' Untranslated Regions MeSH
- COVID-19 * genetics MeSH
- Genomics MeSH
- Spike Glycoprotein, Coronavirus genetics MeSH
- Humans MeSH
- Nucleic Acids * MeSH
- SARS-CoV-2 genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- 3' Untranslated Regions MeSH
- Spike Glycoprotein, Coronavirus MeSH
- Nucleic Acids * MeSH
- spike protein, SARS-CoV-2 MeSH Browser
The current monkeypox virus (MPXV) strain differs from the strain arising in 2018 by 50+ single nucleotide polymorphisms (SNPs) and is mutating much faster than expected. The cytidine deaminase apolipoprotein B messenger RNA editing enzyme, catalytic subunit B (APOBEC3) was hypothesized to be driving this increased mutation. APOBEC has recently been identified to preferentially mutate cruciform DNA secondary structures formed by inverted repeats (IRs). IRs were recently identified as hot spots for mutation in severe acute respiratory syndrome coronavirus 2, and we aimed to identify whether IRs were also hot spots for mutation within MPXV genomes. We found that MPXV genomes were replete with IR sequences. Of the 50+ SNPs identified in the 2022 outbreak strain, 63.9% of these were found to have arisen within IR regions in the 2018 reference strain (MT903344.1). Notably, IR sequences found in the 2018 reference strain were significantly lost over time, with an average of 32.5% of these sequences being conserved in the 2022 MPXV genomes. This evidence was highly indicative that mutations were arising within IRs. This data provides further support to the hypothesis that APOBEC may be driving MPXV mutation and highlights the necessity for greater surveillance of IRs of MPXV genomes to detect new mutations.
- Keywords
- APOBEC, evolution, inverted repeats, monkeypox, mutation,
- MeSH
- COVID-19 * MeSH
- Humans MeSH
- Mutation MeSH
- SARS-CoV-2 MeSH
- Monkeypox virus * genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Cruciforms occur when inverted repeat sequences in double-stranded DNA adopt intra-strand hairpins on opposing strands. Biophysical and molecular studies of these structures confirm their characterization as four-way junctions and have demonstrated that several factors influence their stability, including overall chromatin structure and DNA supercoiling. Here, we review our understanding of processes that influence the formation and stability of cruciforms in genomes, covering the range of sequences shown to have biological significance. It is challenging to accurately sequence repetitive DNA sequences, but recent advances in sequencing methods have deepened understanding about the amounts of inverted repeats in genomes from all forms of life. We highlight that, in the majority of genomes, inverted repeats are present in higher numbers than is expected from a random occurrence. It is, therefore, becoming clear that inverted repeats play important roles in regulating many aspects of DNA metabolism, including replication, gene expression, and recombination. Cruciforms are targets for many architectural and regulatory proteins, including topoisomerases, p53, Rif1, and others. Notably, some of these proteins can induce the formation of cruciform structures when they bind to DNA. Inverted repeat sequences also influence the evolution of genomes, and growing evidence highlights their significance in several human diseases, suggesting that the inverted repeat sequences and/or DNA cruciforms could be useful therapeutic targets in some cases.
- Keywords
- DNA base sequence, DNA structure, DNA supercoiling, cruciform, epigenetics, genome stability, inverted repeat, replication, transcription,
- MeSH
- DNA genetics MeSH
- Nucleic Acid Conformation MeSH
- DNA, Cruciform MeSH
- Humans MeSH
- Nucleic Acids * MeSH
- Inverted Repeat Sequences MeSH
- Repetitive Sequences, Nucleic Acid genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Review MeSH
- Names of Substances
- DNA MeSH
- DNA, Cruciform MeSH
- Nucleic Acids * MeSH
In a recently published paper, we have found that SARS-CoV-2 hot-spot mutations are significantly associated with inverted repeat loci and CG dinucleotides. However, fast-spreading strains with new mutations (so-called mink farm mutations, England mutations and Japan mutations) have been recently described. We used the new datasets to check the positioning of mutation sites in genomes of the new SARS-CoV-2 strains. Using an open-access Palindrome analyzer tool, we found mutations in these new strains to be significantly enriched in inverted repeat loci.
- Keywords
- SARS-CoV-2, inverted repeats, mutations,
- MeSH
- COVID-19 virology MeSH
- Genome, Viral MeSH
- Humans MeSH
- Mutation * MeSH
- SARS-CoV-2 genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Letter MeSH
- Research Support, Non-U.S. Gov't MeSH
The genomic diversity of SARS-CoV-2 has been a focus during the ongoing COVID-19 pandemic. Here, we analyzed the distribution and character of emerging mutations in a data set comprising more than 95,000 virus genomes covering eight major SARS-CoV-2 lineages in the GISAID database, including genotypes arising during COVID-19 therapy. Globally, the C>U transitions and G>U transversions were the most represented mutations, accounting for the majority of single-nucleotide variations. Mutational spectra were not influenced by the time the virus had been circulating in its host or medical treatment. At the amino acid level, we observed about a 2-fold excess of substitutions in favor of hydrophobic amino acids over the reverse. However, most mutations constituting variants of interests of the S-protein (spike) lead to hydrophilic amino acids, counteracting the global trend. The C>U and G>U substitutions altered codons towards increased amino acid hydrophobicity values in more than 80% of cases. The bias is explained by the existing differences in the codon composition for amino acids bearing contrasting biochemical properties. Mutation asymmetries apparently influence the biochemical features of SARS CoV-2 proteins, which may impact protein-protein interactions, fusion of viral and cellular membranes, and virion assembly.
- Keywords
- SARS-CoV-2, amino acid hydrophobicity, apolipoprotein B mRNA editing enzyme (APOBEC), coronavirus, evolution, genetic variation, mutability,
- MeSH
- Alleles MeSH
- Amino Acids chemistry genetics MeSH
- COVID-19 virology MeSH
- APOBEC Deaminases MeSH
- Phylogeny MeSH
- Genetic Variation MeSH
- Genome, Viral * MeSH
- Genotype MeSH
- Spike Glycoprotein, Coronavirus chemistry genetics MeSH
- Hydrophobic and Hydrophilic Interactions * MeSH
- Host-Pathogen Interactions MeSH
- Protein Interaction Domains and Motifs MeSH
- Polymorphism, Single Nucleotide MeSH
- Humans MeSH
- Evolution, Molecular MeSH
- Mutation * MeSH
- SARS-CoV-2 genetics MeSH
- Amino Acid Substitution MeSH
- Protein Binding MeSH
- Viral Proteins chemistry genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Amino Acids MeSH
- APOBEC Deaminases MeSH
- Spike Glycoprotein, Coronavirus MeSH
- Viral Proteins MeSH