Nejvíce citovaný článek - PubMed ID 32719673
In-Depth Bioinformatic Analyses of Nidovirales Including Human SARS-CoV-2, SARS-CoV, MERS-CoV Viruses Suggest Important Roles of Non-canonical Nucleic Acid Structures in Their Lifecycles
Hepatitis delta virus (HDV) is a highly unusual RNA satellite virus that depends on the presence of hepatitis B virus (HBV) to be infectious. Its compact and variable single-stranded RNA genome consists of eight major genotypes distributed unevenly across different continents. The significance of noncanonical secondary structures such as G-quadruplexes (G4s) is increasingly recognized at the DNA and RNA levels, particularly for transcription, replication, and translation. G4s are formed from guanine-rich sequences and have been identified in the vast majority of viral, eukaryotic, and prokaryotic genomes. In this study, we analyzed the G4 propensity of HDV genomes by using G4Hunter. Unlike HBV, which has a G4 density similar to that of the human genome, HDV displays a significantly higher number of potential quadruplex-forming sequences (PQS), with a density more than four times greater than that of the human genome. This finding suggests a critical role for G4s in HDV, especially given that the PQS regions are conserved across HDV genotypes. Furthermore, the prevalence of G4-forming sequences may represent a promising target for therapeutic interventions to control HDV replication.
- Publikační typ
- časopisecké články MeSH
Noncanonical secondary structures in nucleic acids have been studied intensively in recent years. Important biological roles of cruciform structures formed by inverted repeats (IRs) have been demonstrated in diverse organisms, including humans. Using Palindrome analyser, we analyzed IRs in all accessible bacterial genome sequences to determine their frequencies, lengths, and localizations. IR sequences were identified in all species, but their frequencies differed significantly across various evolutionary groups. We detected 242,373,717 IRs in all 1,565 bacterial genomes. The highest mean IR frequency was detected in the Tenericutes (61.89 IRs/kbp) and the lowest mean frequency was found in the Alphaproteobacteria (27.08 IRs/kbp). IRs were abundant near genes and around regulatory, tRNA, transfer-messenger RNA (tmRNA), and rRNA regions, pointing to the importance of IRs in such basic cellular processes as genome maintenance, DNA replication, and transcription. Moreover, we found that organisms with high IR frequencies were more likely to be endosymbiotic, antibiotic producing, or pathogenic. On the other hand, those with low IR frequencies were far more likely to be thermophilic. This first comprehensive analysis of IRs in all available bacterial genomes demonstrates their genomic ubiquity, nonrandom distribution, and enrichment in genomic regulatory regions. IMPORTANCE Our manuscript reports for the first time a complete analysis of inverted repeats in all fully sequenced bacterial genomes. Thanks to the availability of unique computational resources, we were able to statistically evaluate the presence and localization of these important regulatory sequences in bacterial genomes. This work revealed a strong abundance of these sequences in regulatory regions and provides researchers with a valuable tool for their manipulation.
- Klíčová slova
- Palindrome analyser, bacteria domain, bacterial genome analysis, inverted repeats,
- MeSH
- Bacteria genetika MeSH
- fylogeneze MeSH
- genomika * MeSH
- lidé MeSH
- replikace DNA * MeSH
- sekvence nukleotidů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Hepatitis B virus (HBV) is one of the most dangerous human pathogenic viruses found in all corners of the world. Recent sequencing of ancient HBV viruses revealed that these viruses have accompanied humanity for several millenia. As G-quadruplexes are considered to be potential therapeutic targets in virology, we examined G-quadruplex-forming sequences (PQS) in modern and ancient HBV genomes. Our analyses showed the presence of PQS in all 232 tested HBV genomes, with a total number of 1258 motifs and an average frequency of 1.69 PQS per kbp. Notably, the PQS with the highest G4Hunter score in the reference genome is the most highly conserved. Interestingly, the density of PQS motifs is lower in ancient HBV genomes than in their modern counterparts (1.5 and 1.9/kb, respectively). This modern frequency of 1.90 is very close to the PQS frequency of the human genome (1.93) using identical parameters. This indicates that the PQS content in HBV increased over time to become closer to the PQS frequency in the human genome. No statistically significant differences were found between PQS densities in HBV lineages found in different continents. These results, which constitute the first paleogenomics analysis of G4 propensity, are in agreement with our hypothesis that, for viruses causing chronic infections, their PQS frequencies tend to converge evolutionarily with those of their hosts, as a kind of 'genetic camouflage' to both hijack host cell transcriptional regulatory systems and to avoid recognition as foreign material.
- MeSH
- biologická evoluce MeSH
- G-kvadruplexy * MeSH
- genom lidský MeSH
- genomika MeSH
- lidé MeSH
- paleontologie MeSH
- virus hepatitidy B * genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Cruciforms occur when inverted repeat sequences in double-stranded DNA adopt intra-strand hairpins on opposing strands. Biophysical and molecular studies of these structures confirm their characterization as four-way junctions and have demonstrated that several factors influence their stability, including overall chromatin structure and DNA supercoiling. Here, we review our understanding of processes that influence the formation and stability of cruciforms in genomes, covering the range of sequences shown to have biological significance. It is challenging to accurately sequence repetitive DNA sequences, but recent advances in sequencing methods have deepened understanding about the amounts of inverted repeats in genomes from all forms of life. We highlight that, in the majority of genomes, inverted repeats are present in higher numbers than is expected from a random occurrence. It is, therefore, becoming clear that inverted repeats play important roles in regulating many aspects of DNA metabolism, including replication, gene expression, and recombination. Cruciforms are targets for many architectural and regulatory proteins, including topoisomerases, p53, Rif1, and others. Notably, some of these proteins can induce the formation of cruciform structures when they bind to DNA. Inverted repeat sequences also influence the evolution of genomes, and growing evidence highlights their significance in several human diseases, suggesting that the inverted repeat sequences and/or DNA cruciforms could be useful therapeutic targets in some cases.
- Klíčová slova
- DNA base sequence, DNA structure, DNA supercoiling, cruciform, epigenetics, genome stability, inverted repeat, replication, transcription,
- MeSH
- DNA genetika MeSH
- konformace nukleové kyseliny MeSH
- křížová struktura DNA MeSH
- lidé MeSH
- nukleové kyseliny * MeSH
- obrácené repetice MeSH
- repetitivní sekvence nukleových kyselin genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
- Názvy látek
- DNA MeSH
- křížová struktura DNA MeSH
- nukleové kyseliny * MeSH
Parasitic helminths infecting humans are highly prevalent infecting ∼2 billion people worldwide, causing inflammatory responses, malnutrition and anemia that are the primary cause of morbidity. In addition, helminth infections of cattle have a significant economic impact on livestock production, milk yield and fertility. The etiological agents of helminth infections are mainly Nematodes (roundworms) and Platyhelminths (flatworms). G-quadruplexes (G4) are unusual nucleic acid structures formed by G-rich sequences that can be recognized by specific G4 ligands. Here we used the G4Hunter Web Tool to identify and compare potential G4 sequences (PQS) in the nuclear and mitochondrial genomes of various helminths to identify G4 ligand targets. PQS are nonrandomly distributed in these genomes and often located in the proximity of genes. Unexpectedly, a Nematode, Ascaris lumbricoides, was found to be highly enriched in stable PQS. This species can tolerate high-stability G4 structures, which are not counter selected at all, in stark contrast to most other species. We experimentally confirmed G4 formation for sequences found in four different parasitic helminths. Small molecules able to selectively recognize G4 were found to bind to Schistosoma mansoni G4 motifs. Two of these ligands demonstrated potent activity both against larval and adult stages of this parasite.
- MeSH
- cizopasní červi genetika MeSH
- G-kvadruplexy * MeSH
- genom MeSH
- hlístice * genetika MeSH
- lidé MeSH
- ligandy MeSH
- paraziti genetika MeSH
- ploštěnci * genetika MeSH
- skot MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- skot MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- ligandy MeSH
G-quadruplexes are four-stranded nucleic acid structures occurring in the genomes of all living organisms and viruses. It is increasingly evident that these structures play important molecular roles; generally, by modulating gene expression and overall genome integrity. For a long period, G-quadruplexes have been studied specifically in the context of human promoters, telomeres, and associated diseases (cancers, neurological disorders). Several of the proteins for binding G-quadruplexes are known, providing promising targets for influencing G-quadruplex-related processes in organisms. Nonetheless, in plants, only a small number of G-quadruplex binding proteins have been described to date. Thus, we aimed to bioinformatically inspect the available protein sequences to find the best protein candidates with the potential to bind G-quadruplexes. Two similar glycine and arginine-rich G-quadruplex-binding motifs were described in humans. The first is the so-called "RGG motif"-RRGDGRRRGGGGRGQGGRGRGGGFKG, and the second (which has been recently described) is known as the "NIQI motif"-RGRGRGRGGGSGGSGGRGRG. Using this general knowledge, we searched for plant proteins containing the above mentioned motifs, using two independent approaches (BLASTp and FIMO scanning), and revealed many proteins containing the G4-binding motif(s). Our research also revealed the core proteins involved in G4 folding and resolving in green plants, algae, and the key plant model organism, Arabidopsis thaliana. The discovered protein candidates were annotated using STRINGdb and sorted by their molecular and physiological roles in simple schemes. Our results point to the significant role of G4-binding proteins in the regulation of gene expression in plants.
- Klíčová slova
- G-quadruplex folding, G-quadruplex resolving, G-quadruplex-binding proteins, NIQI, RGG motif, regulation of gene expression,
- Publikační typ
- časopisecké články MeSH
G-quadruplexes have long been perceived as rare and physiologically unimportant nucleic acid structures. However, several studies have revealed their importance in molecular processes, suggesting their possible role in replication and gene expression regulation. Pathways involving G-quadruplexes are intensively studied, especially in the context of human diseases, while their involvement in gene expression regulation in plants remains largely unexplored. Here, we conducted a bioinformatic study and performed a complex circular dichroism measurement to identify a stable G-quadruplex in the gene RPB1, coding for the RNA polymerase II large subunit. We found that this G-quadruplex-forming locus is highly evolutionarily conserved amongst plants sensu lato (Archaeplastida) that share a common ancestor more than one billion years old. Finally, we discussed a new hypothesis regarding G-quadruplexes interacting with UV light in plants to potentially form an additional layer of the regulatory network.
- Klíčová slova
- UV light, circular dichroism, evolution, nucleic acids, plant science,
- MeSH
- Arabidopsis chemie genetika účinky záření MeSH
- cirkulární dichroismus MeSH
- fylogeneze MeSH
- G-kvadruplexy * účinky záření MeSH
- Glaucophyta chemie genetika účinky záření MeSH
- molekulární evoluce MeSH
- regulace genové exprese u rostlin genetika MeSH
- Rhodophyta chemie genetika účinky záření MeSH
- RNA-polymerasa II chemie genetika MeSH
- rostlinné proteiny chemie genetika účinky záření MeSH
- rostliny chemie genetika účinky záření MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- ultrafialové záření MeSH
- výpočetní biologie MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- RNA-polymerasa II MeSH
- rostlinné proteiny MeSH
Fungal infections cause >1 million deaths annually and the emergence of antifungal resistance has prompted the exploration for novel antifungal targets. Quadruplexes are four-stranded nucleic acid secondary structures, which can regulate processes such as transcription, translation, replication and recombination. They are also found in genes linked to virulence in microbes, and ligands that bind to quadruplexes can eliminate drug-resistant pathogens. Using a computational approach, we quantified putative quadruplex-forming sequences (PQS) in 1359 genomes across the fungal kingdom and explored their presence in genes related to virulence, drug resistance and biological processes associated with pathogenicity in Aspergillus fumigatus. Here we present the largest analysis of PQS in fungi and identify significant heterogeneity of these sequences throughout phyla, genera and species. PQS were genetically conserved in Aspergillus spp. and frequently pathogenic species appeared to contain fewer PQS than their lesser/non-pathogenic counterparts. GO-term analysis identified that PQS-containing genes were involved in processes linked with virulence such as zinc ion binding, the biosynthesis of secondary metabolites and regulation of transcription in A. fumigatus. Although the genome frequency of PQS was lower in A. fumigatus, PQS could be found enriched in genes involved in virulence, and genes upregulated during germination and hypoxia. Moreover, PQS were found in genes involved in drug resistance. Quadruplexes could have important roles within fungal biology and virulence, but their roles require further elucidation.
- Klíčová slova
- Aspergillus fumigatus, Fungi, G-quadruplexes, drug resistance, i-motifs, in-silico, virulence,
- MeSH
- algoritmy MeSH
- antifungální látky farmakologie MeSH
- Ascomycota MeSH
- Aspergillus fumigatus genetika MeSH
- Aspergillus MeSH
- fungální léková rezistence účinky léků MeSH
- genom fungální účinky léků MeSH
- genom virový MeSH
- transkriptom MeSH
- virulence MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Názvy látek
- antifungální látky MeSH
The importance of gene expression regulation in viruses based upon G-quadruplex may point to its potential utilization in therapeutic targeting. Here, we present analyses as to the occurrence of putative G-quadruplex-forming sequences (PQS) in all reference viral dsDNA genomes and evaluate their dependence on PQS occurrence in host organisms using the G4Hunter tool. PQS frequencies differ across host taxa without regard to GC content. The overlay of PQS with annotated regions reveals the localization of PQS in specific regions. While abundance in some, such as repeat regions, is shared by all groups, others are unique. There is abundance within introns of Eukaryota-infecting viruses, but depletion of PQS in introns of bacteria-infecting viruses. We reveal a significant positive correlation between PQS frequencies in dsDNA viruses and corresponding hosts from archaea, bacteria, and eukaryotes. A strong relationship between PQS in a virus and its host indicates their close coevolution and evolutionarily reciprocal mimicking of genome organization.
- Klíčová slova
- G-quadruplex, G4Hunter, bioinformatics, coevolution, dsDNA, host, virus,
- MeSH
- Archaea virologie MeSH
- Bacteria virologie MeSH
- DNA genetika MeSH
- G-kvadruplexy * MeSH
- genom virový * MeSH
- genom MeSH
- lidé MeSH
- regulace genové exprese MeSH
- virové proteiny genetika MeSH
- viry genetika MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- DNA MeSH
- virové proteiny MeSH
SARS-CoV-2 is an intensively investigated virus from the order Nidovirales (Coronaviridae family) that causes COVID-19 disease in humans. Through enormous scientific effort, thousands of viral strains have been sequenced to date, thereby creating a strong background for deep bioinformatics studies of the SARS-CoV-2 genome. In this study, we inspected high-frequency mutations of SARS-CoV-2 and carried out systematic analyses of their overlay with inverted repeat (IR) loci and CpG islands. The main conclusion of our study is that SARS-CoV-2 hot-spot mutations are significantly enriched within both IRs and CpG island loci. This points to their role in genomic instability and may predict further mutational drive of the SARS-CoV-2 genome. Moreover, CpG islands are strongly enriched upstream from viral ORFs and thus could play important roles in transcription and the viral life cycle. We hypothesize that hypermethylation of these loci will decrease the transcription of viral ORFs and could therefore limit the progression of the disease.
- Klíčová slova
- CpG methylation, SARS-CoV-2, hot spot, inverted repeats,
- MeSH
- COVID-19 virologie MeSH
- CpG ostrůvky * MeSH
- genom virový MeSH
- lidé MeSH
- metylace DNA MeSH
- mutace * MeSH
- SARS-CoV-2 genetika MeSH
- vazba proteinů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH