MOTIVATION: Transposable elements (TEs) in eukaryotes often get inserted into one another, forming sequences that become a complex mixture of full-length elements and their fragments. The reconstruction of full-length elements and the order in which they have been inserted is important for genome and transposon evolution studies. However, the accumulation of mutations and genome rearrangements over evolutionary time makes this process error-prone and decreases the efficiency of software aiming to recover all nested full-length TEs. RESULTS: We created software that uses a greedy recursive algorithm to mine increasingly fragmented copies of full-length LTR retrotransposons in assembled genomes and other sequence data. The software called TE-greedy-nester considers not only sequence similarity but also the structure of elements. This new tool was tested on a set of natural and synthetic sequences and its accuracy was compared to similar software. We found TE-greedy-nester to be superior in a number of parameters, namely computation time and full-length TE recovery in highly nested regions. AVAILABILITY AND IMPLEMENTATION: http://gitlab.fi.muni.cz/lexa/nested. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
BACKGROUND: Transposable elements form a significant proportion of eukaryotic genomes. Recently, Lexa et al. (Nucleic Acids Res 42:968-978, 2014) reported that plant long terminal repeat (LTR) retrotransposons often contain potential quadruplex sequences (PQSs) in their LTRs and experimentally confirmed their ability to adopt four-stranded DNA conformations. RESULTS: Here, we searched for PQSs in human retrotransposons and found that PQSs are specifically localized in the 3'-UTR of LINE-1 elements, in LTRs of HERV elements and are strongly accumulated in specific regions of SVA elements. Circular dichroism spectroscopy confirmed that most PQSs had adopted monomolecular or bimolecular guanine quadruplex structures. Evolutionarily young SVA elements contained more PQSs than older elements and their propensity to form quadruplex DNA was higher. Full-length L1 elements contained more PQSs than truncated elements; the highest proportion of PQSs was found inside transpositionally active L1 elements (PA2 and HS families). CONCLUSIONS: Conservation of quadruplexes at specific positions of transposable elements implies their importance in their life cycle. The increasing quadruplex presence in evolutionarily young LINE-1 and SVA families makes these elements important contributors toward present genome-wide quadruplex distribution.
- MeSH
- dlouhé rozptýlené jaderné elementy MeSH
- elementy Alu MeSH
- endogenní retroviry MeSH
- G-kvadruplexy * MeSH
- genomika MeSH
- lidé MeSH
- mapování chromozomů MeSH
- repetitivní sekvence nukleových kyselin MeSH
- transpozibilní elementy DNA * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Retrotransposons with long terminal repeats (LTR) form a significant proportion of eukaryotic genomes, especially in plants. They have gag and pol genes and several regulatory regions necessary for transcription and reverse transcription. We searched for potential quadruplex-forming sequences (PQSs) and potential triplex-forming sequences (PTSs) in 18 377 full-length LTR retrotransposons collected from 21 plant species. We found that PQSs were often located in LTRs, both upstream and downstream of promoters from which the whole retrotransposon is transcribed. Upstream-located guanine PQSs were dominant in the minus DNA strand, whereas downstream-located guanine PQSs prevailed in the plus strand, indicating their role both at transcriptional and post-transcriptional levels. Our circular dichroism spectroscopy measurements confirmed that these PQSs readily adopted guanine quadruplex structures-some of them were paralell-stranded, while others were anti-parallel-stranded. The PQS often formed doublets at a mutual distance of up to 400 bp. PTSs were most abundant in 3'UTR (but were also present in 5'UTR). We discuss the potential role of quadruplexes and triplexes as the regulators of various processes participating in LTR retrotransposon life cycle and as potential recombination sites during post-insertional retrotransposon-based genome rearrangements.
MOTIVATION: Current methods for identification of potential triplex-forming sequences in genomes and similar sequence sets rely primarily on detecting homopurine and homopyrimidine tracts. Procedures capable of detecting sequences supporting imperfect, but structurally feasible intramolecular triplex structures are needed for better sequence analysis. RESULTS: We modified an algorithm for detection of approximate palindromes, so as to account for the special nature of triplex DNA structures. From available literature, we conclude that approximate triplexes tolerate two classes of errors. One, analogical to mismatches in duplex DNA, involves nucleotides in triplets that do not readily form Hoogsteen bonds. The other class involves geometrically incompatible neighboring triplets hindering proper alignment of strands for optimal hydrogen bonding and stacking. We tested the statistical properties of the algorithm, as well as its correctness when confronted with known triplex sequences. The proposed algorithm satisfactorily detects sequences with intramolecular triplex-forming potential. Its complexity is directly comparable to palindrome searching. AVAILABILITY: Our implementation of the algorithm is available at http://www.fi.muni.cz/lexa/triplex as source code and a web-based search tool. The source code compiles into a library providing searching capability to other programs, as well as into a stand-alone command-line application based on this library. CONTACT: lexa@fi.muni.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy MeSH
- chybné párování bází MeSH
- DNA chemie metabolismus MeSH
- Escherichia coli K12 genetika MeSH
- genom MeSH
- konformace nukleové kyseliny MeSH
- lidé MeSH
- obrácené repetice MeSH
- pravděpodobnostní funkce MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- MeSH
- algoritmy MeSH
- genom MeSH
- sekvenční analýza DNA metody MeSH
- sekvenční seřazení mortalita MeSH
- software MeSH
- Publikační typ
- srovnávací studie MeSH