BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
One of the major challenges in modern biology is the use of large omics datasets for the characterization of complex processes such as cell response to infection. These challenges are even bigger when analyses need to be performed for comparison of different species including model and non-model organisms. To address these challenges, the graph theory was applied to characterize the tick vector and human cell protein response to infection with Anaplasma phagocytophilum, the causative agent of human granulocytic anaplasmosis. A network of interacting proteins and cell processes clustered in biological pathways, and ranked with indexes representing the topology of the proteome was prepared. The results demonstrated that networks of functionally interacting proteins represented in both infected and uninfected cells can describe the complete set of host cell processes and metabolic pathways, providing a deeper view of the comparative host cell response to pathogen infection. The results demonstrated that changes in the tick proteome were driven by modifications in protein representation in response to A. phagocytophilum infection. Pathogen infection had a higher impact on tick than human proteome. Since most proteins were linked to several cell processes, the changes in protein representation affected simultaneously different biological pathways. The method allowed discerning cell processes that were affected by pathogen infection from those that remained unaffected. The results supported that human neutrophils but not tick cells limit pathogen infection through differential representation of ras-related proteins. This methodological approach could be applied to other host-pathogen models to identify host derived key proteins in response to infection that may be used to develop novel control strategies for arthropod-borne pathogens.
- MeSH
- Anaplasma phagocytophilum growth & development MeSH
- Anaplasmosis pathology MeSH
- Biological Phenomena MeSH
- Cell Line MeSH
- Arthropod Vectors * MeSH
- Host-Pathogen Interactions * MeSH
- Ticks MeSH
- Humans MeSH
- Protein Interaction Maps MeSH
- Proteins analysis MeSH
- Proteome analysis MeSH
- Models, Theoretical * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution.
- MeSH
- DNA, Plant genetics MeSH
- Drosophila classification genetics MeSH
- Phylogeny * MeSH
- Genome genetics MeSH
- Genes, Insect genetics MeSH
- Magnoliopsida genetics MeSH
- Repetitive Sequences, Nucleic Acid genetics MeSH
- Cluster Analysis MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.
- MeSH
- Algorithms MeSH
- DNA chemistry MeSH
- Eukaryota genetics MeSH
- Phylogeny MeSH
- Genome MeSH
- Internet MeSH
- Repetitive Sequences, Nucleic Acid * MeSH
- Sequence Analysis, DNA * MeSH
- Cluster Analysis MeSH
- Software * MeSH
- High-Throughput Nucleotide Sequencing * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
- MeSH
- DNA, Plant genetics MeSH
- Genome, Plant * MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Consensus Sequence MeSH
- Zea mays genetics MeSH
- Magnoliopsida genetics MeSH
- Chromosome Mapping methods MeSH
- Metaphase MeSH
- Computer Graphics MeSH
- Cyperaceae genetics MeSH
- DNA, Satellite classification genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Cluster Analysis MeSH
- Software * MeSH
- Vicia faba genetics MeSH
- Publication type
- Journal Article MeSH
RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz .
- MeSH
- DNA Probes chemistry genetics MeSH
- DNA chemistry genetics MeSH
- Genomics methods MeSH
- Humans MeSH
- Repetitive Sequences, Nucleic Acid MeSH
- Sequence Analysis, DNA methods MeSH
- Cluster Analysis MeSH
- Software MeSH
- DNA Transposable Elements MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
PDBsum, http://www.ebi.ac.uk/pdbsum, is a website providing numerous pictorial analyses of each entry in the Protein Data Bank. It portrays the structural features of all proteins, DNA and ligands in the entry, as well as depicting the interactions between them. The latest features, described here, include annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs showing the relationships between related protein domain architectures, analyses of ligand binding clusters across different experimental determinations of the same protein, analyses of tunnels in proteins and new search options.
- MeSH
- Databases, Protein * MeSH
- Genetic Variation MeSH
- Internet MeSH
- Protein Conformation * MeSH
- Humans MeSH
- Ligands MeSH
- Computer Graphics MeSH
- Proteins chemistry genetics MeSH
- Drug Design MeSH
- Cluster Analysis MeSH
- Protein Structure, Tertiary MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
Repetitive sequences are ubiquitous components of all eukaryotic genomes. They contribute to genome evolution and the regulation of gene transcription. However, the uncontrolled activity of repetitive sequences can negatively affect genome functions and stability. Therefore, repetitive DNAs are embedded in a highly repressive heterochromatic environment in plant cell nuclei. Here, we analyzed the sequence, composition and the epigenetic makeup of peculiar non-pericentromeric heterochromatic segments in the genome of the Australian crucifer Ballantinia antipoda. By the combination of high throughput sequencing, graph-based clustering and cytogenetics, we found that the heterochromatic segments consist of a mixture of unique sequences and an A-T-rich 174 bp satellite repeat (BaSAT1). BaSAT1 occupies about 10% of the B. antipoda nuclear genome in >250 000 copies. Unlike many other highly repetitive sequences, BaSAT1 repeats are hypomethylated; this contrasts with the normal patterns of DNA methylation in the B. antipoda genome. Detailed analysis of several copies revealed that these non-methylated BaSAT1 repeats were also devoid of heterochromatic histone H3K9me2 methylation. However, the factors decisive for the methylation status of BaSAT1 repeats remain currently unknown. In summary, we show that even highly repetitive sequences can exist as hypomethylated in the plant nuclear genome.
- MeSH
- Arabidopsis genetics MeSH
- Tracheophyta chemistry genetics metabolism MeSH
- Epigenesis, Genetic MeSH
- Phylogeny MeSH
- Genome, Plant MeSH
- Heterochromatin genetics metabolism MeSH
- Histones chemistry metabolism MeSH
- DNA Methylation genetics MeSH
- DNA, Satellite chemistry genetics metabolism MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: The banana family (Musaceae) includes genetically a diverse group of species and their diploid and polyploid hybrids that are widely cultivated in the tropics. In spite of their socio-economic importance, the knowledge of Musaceae genomes is basically limited to draft genome assemblies of two species, Musa acuminata and M. balbisiana. Here we aimed to complement this information by analyzing repetitive genome fractions of six species selected to represent various phylogenetic groups within the family. RESULTS: Low-pass sequencing of M. acuminata, M. ornata, M. textilis, M. beccarii, M. balbisiana, and Ensete gilletii genomes was performed using a 454/Roche platform. Sequence reads were subjected to analysis of their overall intra- and inter-specific similarities and, all major repeat families were quantified using graph-based clustering. Maximus/SIRE and Angela lineages of Ty1/copia long terminal repeat (LTR) retrotransposons and the chromovirus lineage of Ty3/gypsy elements were found to make up most of highly repetitive DNA in all species (14-34.5% of the genome). However, there were quantitative differences and sequence variations detected for classified repeat families as well as for the bulk of total repetitive DNA. These differences were most pronounced between species from different taxonomic sections of the Musaceae family, whereas pairs of closely related species (M. acuminata/M. ornata and M. beccarii/M. textilis) shared similar populations of repetitive elements. CONCLUSIONS: This study provided the first insight into the composition and sequence variation of repetitive parts of Musaceae genomes. It allowed identification of repetitive sequences specific for a single species or a group of species that can be utilized as molecular markers in breeding programs and generated computational resources that will be instrumental in repeat masking and annotation in future genome assembly projects.
- MeSH
- Musaceae classification genetics MeSH
- DNA, Plant analysis genetics MeSH
- Phylogeny MeSH
- Genetic Variation MeSH
- Genome, Plant * MeSH
- Evolution, Molecular MeSH
- Repetitive Sequences, Nucleic Acid * MeSH
- Sequence Analysis, DNA MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
... Origin of Life Problem, 288 Autocatalytic Sets of Catalytic Polymers, 298 -- Growth on the Infinite Graph ... ... Regulatory Systems of Prokaryotes and Eukaryotes, 412 -- An Ensemble Theory Based on Random Directed Graphs ...
1st ed. 709 s. : il.
- Keywords
- Biologie, Evoluce, Fylogeneze,
- MeSH
- Biological Evolution MeSH
- Biology MeSH
- Phylogeny MeSH
- Evolution, Molecular MeSH
- Origin of Life MeSH
- Conspectus
- Obecná genetika. Obecná cytogenetika. Evoluce
- NML Fields
- molekulární biologie, molekulární medicína