Nejvíce citovaný článek - PubMed ID 27603574
Palindrome analyser - A new web-based server for predicting and evaluating inverted repeats in nucleotide sequences
With continuous advances in DNA sequencing methods, accessibility to high-quality genomic information for all living organisms is ever-increasing. However, to interpret this information effectively and formulate hypotheses, users often require higher level programming skills. Therefore, the generation of web-based tools is becoming increasingly popular. CpG island regions in genomes are often found in gene promoters and are prone to DNA methylation, with their methylation status determining if a gene is expressed. Notably, understanding the biological impact of CpX modifications on genomic regulation is becoming increasingly important as these modifications have been associated with diseases such as cancer and neurodegeneration. However, there is currently no easy-to-use, scalable tool to detect and quantify CpX islands in full genomes. We have developed a Java-based web server for CpX island analyses that benefits from the DNA Analyzer Web server environment and overcomes several limitations. For a pilot demonstration study, we selected a well-described model organism Drosophila melanogaster. Subsequent analysis of the obtained CpX islands revealed several interesting and previously undescribed phenomena. One of them is the fact, that nearly half of long CpG islands were located on chromosome X, and that long CpA and CpT islands were significantly overrepresented at the subcentromeric regions of autosomes (chr2 and chr3) and also on chromosome Y. Wide genome overlays of predicted CpX islands revealed their co-occurrence with various (epi)genomics features comprising cytosine methylations, accessible chromatin, transposable elements, or binding of transcription factors and other proteins. CpX Hunter is freely available as a web tool at: https://bioinformatics.ibp.cz/#/analyse/cpg.
- Klíčová slova
- CpA islands, CpG islands, CpT islands, Drosophila, dinucleotide, genome analyses, web server,
- MeSH
- CpG ostrůvky * MeSH
- Drosophila melanogaster * genetika MeSH
- genom hmyzu * MeSH
- internet MeSH
- metylace DNA MeSH
- software * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
Nucleic acids are not only static carriers of genetic information but also play vital roles in controlling cellular lifecycles through their fascinating structural diversity [...].
- MeSH
- DNA * chemie metabolismus MeSH
- konformace nukleové kyseliny * MeSH
- lidé MeSH
- RNA * chemie metabolismus MeSH
- výpočetní biologie * metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- úvodní články MeSH
- úvodníky MeSH
- Názvy látek
- DNA * MeSH
- RNA * MeSH
Noncanonical secondary structures in nucleic acids have been studied intensively in recent years. Important biological roles of cruciform structures formed by inverted repeats (IRs) have been demonstrated in diverse organisms, including humans. Using Palindrome analyser, we analyzed IRs in all accessible bacterial genome sequences to determine their frequencies, lengths, and localizations. IR sequences were identified in all species, but their frequencies differed significantly across various evolutionary groups. We detected 242,373,717 IRs in all 1,565 bacterial genomes. The highest mean IR frequency was detected in the Tenericutes (61.89 IRs/kbp) and the lowest mean frequency was found in the Alphaproteobacteria (27.08 IRs/kbp). IRs were abundant near genes and around regulatory, tRNA, transfer-messenger RNA (tmRNA), and rRNA regions, pointing to the importance of IRs in such basic cellular processes as genome maintenance, DNA replication, and transcription. Moreover, we found that organisms with high IR frequencies were more likely to be endosymbiotic, antibiotic producing, or pathogenic. On the other hand, those with low IR frequencies were far more likely to be thermophilic. This first comprehensive analysis of IRs in all available bacterial genomes demonstrates their genomic ubiquity, nonrandom distribution, and enrichment in genomic regulatory regions. IMPORTANCE Our manuscript reports for the first time a complete analysis of inverted repeats in all fully sequenced bacterial genomes. Thanks to the availability of unique computational resources, we were able to statistically evaluate the presence and localization of these important regulatory sequences in bacterial genomes. This work revealed a strong abundance of these sequences in regulatory regions and provides researchers with a valuable tool for their manipulation.
- Klíčová slova
- Palindrome analyser, bacteria domain, bacterial genome analysis, inverted repeats,
- MeSH
- Bacteria genetika MeSH
- fylogeneze MeSH
- genomika * MeSH
- lidé MeSH
- replikace DNA * MeSH
- sekvence nukleotidů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Non-B nucleic acids structures have arisen as key contributors to genetic variation in SARS-CoV-2. Herein, we investigated the presence of defining spike protein mutations falling within inverted repeats (IRs) for 18 SARS-CoV-2 variants, discussed the potential roles of G-quadruplexes (G4s) in SARS-CoV-2 biology, and identified potential pseudoknots within the SARS-CoV-2 genome. Surprisingly, there was a large variation in the number of defining spike protein mutations arising within IRs between variants and these were more likely to occur in the stem region of the predicted hairpin stem-loop secondary structure. Notably, mutations implicated in ACE2 binding and propagation (e.g., ΔH69/V70, N501Y, and D614G) were likely to occur within IRs, whilst mutations involved in antibody neutralization and reduced vaccine efficacy (e.g., T19R, ΔE156, ΔF157, R158G, and G446S) were rarely found within IRs. We also predicted that RNA pseudoknots could predominantly be found within, or next to, 29 mutations found in the SARS-CoV-2 spike protein. Finally, the Omicron variants BA.2, BA.4, BA.5, BA.2.12.1, and BA.2.75 appear to have lost two of the predicted G4-forming sequences found in other variants. These were found in nsp2 and the sequence complementary to the conserved stem-loop II-like motif (S2M) in the 3' untranslated region (UTR). Taken together, non-B nucleic acids structures likely play an integral role in SARS-CoV-2 evolution and genetic diversity.
- Klíčová slova
- G-quadruplex, SARS-CoV-2, adaptation, inverted repeats, mutation, pseudoknot, spike protein,
- MeSH
- 3' nepřekládaná oblast MeSH
- COVID-19 * genetika MeSH
- genomika MeSH
- glykoprotein S, koronavirus genetika MeSH
- lidé MeSH
- nukleové kyseliny * MeSH
- SARS-CoV-2 genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- 3' nepřekládaná oblast MeSH
- glykoprotein S, koronavirus MeSH
- nukleové kyseliny * MeSH
- spike protein, SARS-CoV-2 MeSH Prohlížeč
The current monkeypox virus (MPXV) strain differs from the strain arising in 2018 by 50+ single nucleotide polymorphisms (SNPs) and is mutating much faster than expected. The cytidine deaminase apolipoprotein B messenger RNA editing enzyme, catalytic subunit B (APOBEC3) was hypothesized to be driving this increased mutation. APOBEC has recently been identified to preferentially mutate cruciform DNA secondary structures formed by inverted repeats (IRs). IRs were recently identified as hot spots for mutation in severe acute respiratory syndrome coronavirus 2, and we aimed to identify whether IRs were also hot spots for mutation within MPXV genomes. We found that MPXV genomes were replete with IR sequences. Of the 50+ SNPs identified in the 2022 outbreak strain, 63.9% of these were found to have arisen within IR regions in the 2018 reference strain (MT903344.1). Notably, IR sequences found in the 2018 reference strain were significantly lost over time, with an average of 32.5% of these sequences being conserved in the 2022 MPXV genomes. This evidence was highly indicative that mutations were arising within IRs. This data provides further support to the hypothesis that APOBEC may be driving MPXV mutation and highlights the necessity for greater surveillance of IRs of MPXV genomes to detect new mutations.
- Klíčová slova
- APOBEC, evolution, inverted repeats, monkeypox, mutation,
- MeSH
- COVID-19 * MeSH
- lidé MeSH
- mutace MeSH
- SARS-CoV-2 MeSH
- virus opičích neštovic * genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
G-quadruplexes (G4s) have been long considered rare and physiologically unimportant in vitro curiosities, but recent methodological advances have proved their presence and functions in vivo. Moreover, in addition to their functional relevance in bacteria and animals, including humans, their importance has been recently demonstrated in evolutionarily distinct plant species. In this study, we analyzed the genome of Pisum sativum (garden pea, or the so-called green pea), a unique member of the Fabaceae family. Our results showed that this genome contained putative G4 sequences (PQSs). Interestingly, these PQSs were located nonrandomly in the nuclear genome. We also found PQSs in mitochondrial (mt) and chloroplast (cp) DNA, and we experimentally confirmed G4 formation for sequences found in these two organelles. The frequency of PQSs for nuclear DNA was 0.42 PQSs per thousand base pairs (kbp), in the same range as for cpDNA (0.53/kbp), but significantly lower than what was found for mitochondrial DNA (1.58/kbp). In the nuclear genome, PQSs were mainly associated with regulatory regions, including 5'UTRs, and upstream of the rRNA region. In contrast to genomic DNA, PQSs were located around RNA genes in cpDNA and mtDNA. Interestingly, PQSs were also associated with specific transposable elements such as TIR and LTR and around them, pointing to their role in their spreading in nuclear DNA. The nonrandom localization of PQSs uncovered their evolutionary and functional significance in the Pisum sativum genome.
- Klíčová slova
- G-quadruplex, G4 propensity, chloroplast DNA, sequence prediction,
- MeSH
- 5' nepřekládaná oblast MeSH
- G-kvadruplexy * MeSH
- genom rostlinný MeSH
- hrách setý genetika MeSH
- lidé MeSH
- sekvence nukleotidů MeSH
- transpozibilní elementy DNA genetika MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- 5' nepřekládaná oblast MeSH
- transpozibilní elementy DNA MeSH
Cruciforms occur when inverted repeat sequences in double-stranded DNA adopt intra-strand hairpins on opposing strands. Biophysical and molecular studies of these structures confirm their characterization as four-way junctions and have demonstrated that several factors influence their stability, including overall chromatin structure and DNA supercoiling. Here, we review our understanding of processes that influence the formation and stability of cruciforms in genomes, covering the range of sequences shown to have biological significance. It is challenging to accurately sequence repetitive DNA sequences, but recent advances in sequencing methods have deepened understanding about the amounts of inverted repeats in genomes from all forms of life. We highlight that, in the majority of genomes, inverted repeats are present in higher numbers than is expected from a random occurrence. It is, therefore, becoming clear that inverted repeats play important roles in regulating many aspects of DNA metabolism, including replication, gene expression, and recombination. Cruciforms are targets for many architectural and regulatory proteins, including topoisomerases, p53, Rif1, and others. Notably, some of these proteins can induce the formation of cruciform structures when they bind to DNA. Inverted repeat sequences also influence the evolution of genomes, and growing evidence highlights their significance in several human diseases, suggesting that the inverted repeat sequences and/or DNA cruciforms could be useful therapeutic targets in some cases.
- Klíčová slova
- DNA base sequence, DNA structure, DNA supercoiling, cruciform, epigenetics, genome stability, inverted repeat, replication, transcription,
- MeSH
- DNA genetika MeSH
- konformace nukleové kyseliny MeSH
- křížová struktura DNA MeSH
- lidé MeSH
- nukleové kyseliny * MeSH
- obrácené repetice MeSH
- repetitivní sekvence nukleových kyselin genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
- Názvy látek
- DNA MeSH
- křížová struktura DNA MeSH
- nukleové kyseliny * MeSH
BACKGROUND: The plastid genomes of the green algal order Chlamydomonadales tend to expand their non-coding regions, but this phenomenon is poorly understood. Here we shed new light on organellar genome evolution in Chlamydomonadales by studying a previously unknown non-photosynthetic lineage. We established cultures of two new Polytoma-like flagellates, defined their basic characteristics and phylogenetic position, and obtained complete organellar genome sequences and a transcriptome assembly for one of them. RESULTS: We discovered a novel deeply diverged chlamydomonadalean lineage that has no close photosynthetic relatives and represents an independent case of photosynthesis loss. To accommodate these organisms, we establish the new genus Leontynka, with two species (L. pallida and L. elongata) distinguishable through both their morphological and molecular characteristics. Notable features of the colourless plastid of L. pallida deduced from the plastid genome (plastome) sequence and transcriptome assembly include the retention of ATP synthase, thylakoid-associated proteins, the carotenoid biosynthesis pathway, and a plastoquinone-based electron transport chain, the latter two modules having an obvious functional link to the eyespot present in Leontynka. Most strikingly, the ~362 kbp plastome of L. pallida is by far the largest among the non-photosynthetic eukaryotes investigated to date due to an extreme proliferation of sequence repeats. These repeats are also present in coding sequences, with one repeat type found in the exons of 11 out of 34 protein-coding genes, with up to 36 copies per gene, thus affecting the encoded proteins. The mitochondrial genome of L. pallida is likewise exceptionally large, with its >104 kbp surpassed only by the mitogenome of Haematococcus lacustris among all members of Chlamydomonadales hitherto studied. It is also bloated with repeats, though entirely different from those in the L. pallida plastome, which contrasts with the situation in H. lacustris where both the organellar genomes have accumulated related repeats. Furthermore, the L. pallida mitogenome exhibits an extremely high GC content in both coding and non-coding regions and, strikingly, a high number of predicted G-quadruplexes. CONCLUSIONS: With its unprecedented combination of plastid and mitochondrial genome characteristics, Leontynka pushes the frontiers of organellar genome diversity and is an interesting model for studying organellar genome evolution.
- Klíčová slova
- Chlamydomonadales, G-quadruplex, GC content, Green algae, Mitochondrial genome, Non-photosynthetic algae, Plastid genome, Repeat expansion,
- MeSH
- Chlorophyceae * MeSH
- Chlorophyta * genetika MeSH
- fotosyntéza genetika MeSH
- fylogeneze MeSH
- genom plastidový * MeSH
- molekulární evoluce MeSH
- plastidy MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
R-loops are common non-B nucleic acid structures formed by a three-stranded nucleic acid composed of an RNA-DNA hybrid and a displaced single-stranded DNA (ssDNA) loop. Because the aberrant R-loop formation leads to increased mutagenesis, hyper-recombination, rearrangements, and transcription-replication collisions, it is regarded as important in human diseases. Therefore, its prevalence and distribution in genomes are studied intensively. However, in silico tools for R-loop prediction are limited, and therefore, we have developed the R-loop tracker tool, which was implemented as a part of the DNA Analyser web server. This new tool is focused upon (1) prediction of R-loops in genomic DNA without length and sequence limitations; (2) integration of R-loop tracker results with other tools for nucleic acids analyses, including Genome Browser; (3) internal cross-evaluation of in silico results with experimental data, where available; (4) easy export and correlation analyses with other genome features and markers; and (5) enhanced visualization outputs. Our new R-loop tracker tool is freely accessible on the web pages of DNA Analyser tools, and its implementation on the web-based server allows effective analyses not only for DNA segments but also for full chromosomes and genomes.
- Klíčová slova
- RNA–DNA hybrid, non-B structure, sequence analysis,
- MeSH
- algoritmy * MeSH
- DNA chemie genetika MeSH
- genomika metody MeSH
- internet statistika a číselné údaje MeSH
- lidé MeSH
- nestabilita genomu * MeSH
- R-smyčka * MeSH
- software MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- DNA MeSH
In a recently published paper, we have found that SARS-CoV-2 hot-spot mutations are significantly associated with inverted repeat loci and CG dinucleotides. However, fast-spreading strains with new mutations (so-called mink farm mutations, England mutations and Japan mutations) have been recently described. We used the new datasets to check the positioning of mutation sites in genomes of the new SARS-CoV-2 strains. Using an open-access Palindrome analyzer tool, we found mutations in these new strains to be significantly enriched in inverted repeat loci.
- Klíčová slova
- SARS-CoV-2, inverted repeats, mutations,
- MeSH
- COVID-19 virologie MeSH
- genom virový MeSH
- lidé MeSH
- mutace * MeSH
- SARS-CoV-2 genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- dopisy MeSH
- práce podpořená grantem MeSH