Nejvíce citovaný článek - PubMed ID 20001252
MOTIVATION: Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly. Therefore, tools that are able to process those data without supervision need to be developed. RESULTS: In this paper, we present a tool for reference-free phylogeny capable of handling data where no mature-level assembly is available. The tool allows distance calculation for raw reads, contigs, and the combination of the latter. The tool provides an estimation of the Levenshtein distance between the sequences, which in turn estimates the number of mutations between the organisms. Compared to the previous research, the novelty of the method lies in a newly proposed combination of the read and contig measures, a new method for read-contig mapping, and an efficient embedding of contigs.
- Klíčová slova
- Contigs, Levenshtein distance, Phylogeny, Reads, Sequence similarity,
- Publikační typ
- časopisecké články MeSH
Satellite repeats are major sequence constituents of centromeres in many plant and animal species. Within a species, a single family of satellite sequences typically occupies centromeres of all chromosomes and is absent from other parts of the genome. Due to their common origin, sequence similarities exist among the centromere-specific satellites in related species. Here, we report a remarkably different pattern of centromere evolution in the plant tribe Fabeae, which includes genera Pisum, Lathyrus, Vicia, and Lens. By immunoprecipitation of centromeric chromatin with CENH3 antibodies, we identified and characterized a large and diverse set of 64 families of centromeric satellites in 14 species. These families differed in their nucleotide sequence, monomer length (33-2,979 bp), and abundance in individual species. Most families were species-specific, and most species possessed multiple (2-12) satellites in their centromeres. Some of the repeats that were shared by several species exhibited promiscuous patterns of centromere association, being located within CENH3 chromatin in some species, but apart from the centromeres in others. Moreover, FISH experiments revealed that the same family could assume centromeric and noncentromeric positions even within a single species. Taken together, these findings suggest that Fabeae centromeres are not shaped by the coevolution of a single centromeric satellite with its interacting CENH3 proteins, as proposed by the centromere drive model. This conclusion is also supported by the absence of pervasive adaptive evolution of CENH3 sequences retrieved from Fabeae species.
- Klíčová slova
- CENH3, ChIP-seq, centromere evolution, plant chromosomes, satellite DNA,
- MeSH
- centromera chemie MeSH
- druhová specificita MeSH
- Fabaceae genetika MeSH
- genetická variace * MeSH
- satelitní DNA chemie MeSH
- selekce (genetika) MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
- Názvy látek
- satelitní DNA MeSH