JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Graph clustering Dotaz Zobrazit nápovědu

Přesná shoda Sémantické

Reset

25 záznamů v Medvik

Článek online

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

... RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence ...

BMC bioinformatics. 2010 ; 11 () : 378. [pub] 20100715

BMC Bioinformatics
ISSN 1471-2105
Medvik
Zdroj

BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

MeSH
DNA rostlinná genetika MeSH
genom rostlinný MeSH
Glycine max genetika MeSH
hrách setý genetika MeSH
mapování chromozomů MeSH
repetitivní sekvence nukleových kyselin MeSH
sekvenční analýza DNA MeSH
shluková analýza MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

Use of Graph Theory to Characterize Human and Arthropod Vector Cell Protein Response to Infection With Anaplasma phagocytophilum

... To address these challenges, the graph theory was applied to characterize the tick vector and human cell ...

Frontiers in cellular and infection microbiology. 2018 ; 8 (-) : 265. [pub] 20180803

Front Cell Infect Microbiol
ISSN 2235-2988
Medvik
Zdroj

One of the major challenges in modern biology is the use of large omics datasets for the characterization of complex processes such as cell response to infection. These challenges are even bigger when analyses need to be performed for comparison of different species including model and non-model organisms. To address these challenges, the graph theory was applied to characterize the tick vector and human cell protein response to infection with Anaplasma phagocytophilum, the causative agent of human granulocytic anaplasmosis. A network of interacting proteins and cell processes clustered in biological pathways, and ranked with indexes representing the topology of the proteome was prepared. The results demonstrated that networks of functionally interacting proteins represented in both infected and uninfected cells can describe the complete set of host cell processes and metabolic pathways, providing a deeper view of the comparative host cell response to pathogen infection. The results demonstrated that changes in the tick proteome were driven by modifications in protein representation in response to A. phagocytophilum infection. Pathogen infection had a higher impact on tick than human proteome. Since most proteins were linked to several cell processes, the changes in protein representation affected simultaneously different biological pathways. The method allowed discerning cell processes that were affected by pathogen infection from those that remained unaffected. The results supported that human neutrophils but not tick cells limit pathogen infection through differential representation of ras-related proteins. This methodological approach could be applied to other host-pathogen models to identify host derived key proteins in response to infection that may be used to develop novel control strategies for arthropod-borne pathogens.

MeSH
Anaplasma phagocytophilum růst a vývoj MeSH
anaplasmóza patologie MeSH
biologické jevy MeSH
buněčné linie MeSH
členovci - vektory * MeSH
interakce hostitele a patogenu * MeSH
klíšťata MeSH
lidé MeSH
mapy interakcí proteinů MeSH
proteiny analýza MeSH
proteom analýza MeSH
teoretické modely * MeSH
zvířata MeSH
Check Tag
lidé MeSH
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

Genomic repeat abundances contain phylogenetic signal

... demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based ...

Systematic biology. 2015 ; 64 (1) : 112-26.

Syst Biol
ISSN 1076-836X
Medvik
Zdroj

A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution.

MeSH
DNA rostlinná genetika MeSH
Drosophila klasifikace genetika MeSH
fylogeneze * MeSH
genom genetika MeSH
hmyzí geny genetika MeSH
Magnoliopsida genetika MeSH
repetitivní sekvence nukleových kyselin genetika MeSH
shluková analýza MeSH
zvířata MeSH
Check Tag
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads

... A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm ...

Bioinformatics. 2013 ; 29 (6) : 792-3.

ISSN 1367-4811
Medvik
Zdroj

MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.

MeSH
algoritmy MeSH
DNA chemie MeSH
Eukaryota genetika MeSH
fylogeneze MeSH
genom MeSH
internet MeSH
repetitivní sekvence nukleových kyselin * MeSH
sekvenční analýza DNA * MeSH
shluková analýza MeSH
software * MeSH
vysoce účinné nukleotidové sekvenování * MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

... The pipeline first employs graph-based sequence clustering to identify groups of reads that represent ...

Nucleic acids research. 2017 ; 45 (12) : e111.

Nucleic Acids Res
ISSN 1362-4962
Medvik
Zdroj

Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.

Článek

Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

... RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation ...

Nature protocols. 2020 ; 15 (11) : 3745-3776. [pub] 20201023

Nat Protoc
ISSN 1750-2799
Medvik
Zdroj

RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz .

MeSH
DNA sondy chemie genetika MeSH
DNA chemie genetika MeSH
genomika metody MeSH
lidé MeSH
repetitivní sekvence nukleových kyselin MeSH
sekvenční analýza DNA metody MeSH
shluková analýza MeSH
software MeSH
transpozibilní elementy DNA MeSH
vysoce účinné nukleotidové sekvenování metody MeSH
zvířata MeSH
Check Tag
lidé MeSH
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

PDBsum additions

... annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs ...

Nucleic acids research. 2014 ; 42 (Database issue) : D292-6.

Nucleic Acids Res
ISSN 1362-4962
Medvik
Zdroj

PDBsum, http://www.ebi.ac.uk/pdbsum, is a website providing numerous pictorial analyses of each entry in the Protein Data Bank. It portrays the structural features of all proteins, DNA and ligands in the entry, as well as depicting the interactions between them. The latest features, described here, include annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs showing the relationships between related protein domain architectures, analyses of ligand binding clusters across different experimental determinations of the same protein, analyses of tunnels in proteins and new search options.

MeSH
databáze proteinů * MeSH
genetická variace MeSH
internet MeSH
konformace proteinů * MeSH
lidé MeSH
ligandy MeSH
počítačová grafika MeSH
proteiny chemie genetika MeSH
racionální návrh léčiv MeSH
shluková analýza MeSH
terciární struktura proteinů MeSH
Check Tag
lidé MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
Research Support, N.I.H., Extramural MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH

Článek online

Genome invasion by a hypomethylated satellite repeat in Australian crucifer Ballantinia antipoda

... By the combination of high throughput sequencing, graph-based clustering and cytogenetics, we found that ...

Plant journal. 2019 ; 99 (6) : 1066-1079. [pub] 20190628

Plant J
ISSN 1365-313X
Medvik
Zdroj

Repetitive sequences are ubiquitous components of all eukaryotic genomes. They contribute to genome evolution and the regulation of gene transcription. However, the uncontrolled activity of repetitive sequences can negatively affect genome functions and stability. Therefore, repetitive DNAs are embedded in a highly repressive heterochromatic environment in plant cell nuclei. Here, we analyzed the sequence, composition and the epigenetic makeup of peculiar non-pericentromeric heterochromatic segments in the genome of the Australian crucifer Ballantinia antipoda. By the combination of high throughput sequencing, graph-based clustering and cytogenetics, we found that the heterochromatic segments consist of a mixture of unique sequences and an A-T-rich 174 bp satellite repeat (BaSAT1). BaSAT1 occupies about 10% of the B. antipoda nuclear genome in >250 000 copies. Unlike many other highly repetitive sequences, BaSAT1 repeats are hypomethylated; this contrasts with the normal patterns of DNA methylation in the B. antipoda genome. Detailed analysis of several copies revealed that these non-methylated BaSAT1 repeats were also devoid of heterochromatic histone H3K9me2 methylation. However, the factors decisive for the methylation status of BaSAT1 repeats remain currently unknown. In summary, we show that even highly repetitive sequences can exist as hypomethylated in the plant nuclear genome.

MeSH
Arabidopsis genetika MeSH
cévnaté rostliny chemie genetika metabolismus MeSH
epigeneze genetická MeSH
fylogeneze MeSH
genom rostlinný MeSH
heterochromatin genetika metabolismus MeSH
histony chemie metabolismus MeSH
metylace DNA genetika MeSH
satelitní DNA chemie genetika metabolismus MeSH
vysoce účinné nukleotidové sekvenování MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

Genome-wide analysis of repeat diversity across the family Musaceae

... overall intra- and inter-specific similarities and, all major repeat families were quantified using graph-based ...

PLoS One. 2014 ; 9 (6) : e98918. [pub] 20140616

ISSN 1932-6203
Medvik
Zdroj

BACKGROUND: The banana family (Musaceae) includes genetically a diverse group of species and their diploid and polyploid hybrids that are widely cultivated in the tropics. In spite of their socio-economic importance, the knowledge of Musaceae genomes is basically limited to draft genome assemblies of two species, Musa acuminata and M. balbisiana. Here we aimed to complement this information by analyzing repetitive genome fractions of six species selected to represent various phylogenetic groups within the family. RESULTS: Low-pass sequencing of M. acuminata, M. ornata, M. textilis, M. beccarii, M. balbisiana, and Ensete gilletii genomes was performed using a 454/Roche platform. Sequence reads were subjected to analysis of their overall intra- and inter-specific similarities and, all major repeat families were quantified using graph-based clustering. Maximus/SIRE and Angela lineages of Ty1/copia long terminal repeat (LTR) retrotransposons and the chromovirus lineage of Ty3/gypsy elements were found to make up most of highly repetitive DNA in all species (14-34.5% of the genome). However, there were quantitative differences and sequence variations detected for classified repeat families as well as for the bulk of total repetitive DNA. These differences were most pronounced between species from different taxonomic sections of the Musaceae family, whereas pairs of closely related species (M. acuminata/M. ornata and M. beccarii/M. textilis) shared similar populations of repetitive elements. CONCLUSIONS: This study provided the first insight into the composition and sequence variation of repetitive parts of Musaceae genomes. It allowed identification of repetitive sequences specific for a single species or a group of species that can be utilized as molecular markers in breeding programs and generated computational resources that will be instrumental in repeat masking and annotation in future genome assembly projects.

MeSH
banánovníkovité klasifikace genetika MeSH
DNA rostlinná analýza genetika MeSH
fylogeneze MeSH
genetická variace MeSH
genom rostlinný * MeSH
molekulární evoluce MeSH
repetitivní sekvence nukleových kyselin * MeSH
sekvenční analýza DNA MeSH
výpočetní biologie metody MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Kniha

The origins of order : self-organization and selection in evolution

... Origin of Life Problem, 288 Autocatalytic Sets of Catalytic Polymers, 298 -- Growth on the Infinite Graph ... ... Regulatory Systems of Prokaryotes and Eukaryotes, 412 -- An Ensemble Theory Based on Random Directed Graphs ...

Kauffman, Stuart A., 1939-
Autor Autorita ORCID

New York : Oxford University Press, 1993

1st ed. 709 s. : il.

Konspekt
Obecná genetika. Obecná cytogenetika. Evoluce
NLK Obory
molekulární biologie, molekulární medicína

Kolekce

Publikováno

Filtry

Graph clustering Dotaz Zobrazit nápovědu

Přesná shoda Sémantické

Graph clustering Dotaz Zobrazit nápovědu Přesná shoda Sémantické

Upřesnit dle MeSH

Graph clustering Dotaz Zobrazit nápovědu

Přesná shoda Sémantické