JavaScript NENÍ povolen !

Prosím povolte JavaScript.

graph-based clustering Dotaz Zobrazit nápovědu

Přesná shoda Sémantické

Reset

17 záznamů v Medvik

Článek online

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

... RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence ...

BMC bioinformatics. 2010 ; 11 () : 378. [pub] 20100715

BMC Bioinformatics
ISSN 1471-2105
Medvik
Zdroj

BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

MeSH
DNA rostlinná genetika MeSH
genom rostlinný MeSH
Glycine max genetika MeSH
hrách setý genetika MeSH
mapování chromozomů MeSH
repetitivní sekvence nukleových kyselin MeSH
sekvenční analýza DNA MeSH
shluková analýza MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

Genomic repeat abundances contain phylogenetic signal

... demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based ...

Systematic biology. 2015 ; 64 (1) : 112-26.

Syst Biol
ISSN 1076-836X
Medvik
Zdroj

A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution.

MeSH
DNA rostlinná genetika MeSH
Drosophila klasifikace genetika MeSH
fylogeneze * MeSH
genom genetika MeSH
hmyzí geny genetika MeSH
Magnoliopsida genetika MeSH
repetitivní sekvence nukleových kyselin genetika MeSH
shluková analýza MeSH
zvířata MeSH
Check Tag
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads

... A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm ...

Bioinformatics. 2013 ; 29 (6) : 792-3.

ISSN 1367-4811
Medvik
Zdroj

MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.

MeSH
algoritmy MeSH
DNA chemie MeSH
Eukaryota genetika MeSH
fylogeneze MeSH
genom MeSH
internet MeSH
repetitivní sekvence nukleových kyselin * MeSH
sekvenční analýza DNA * MeSH
shluková analýza MeSH
software * MeSH
vysoce účinné nukleotidové sekvenování * MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

... RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation ...

Nature protocols. 2020 ; 15 (11) : 3745-3776. [pub] 20201023

Nat Protoc
ISSN 1750-2799
Medvik
Zdroj

RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz .

MeSH
DNA sondy chemie genetika MeSH
DNA chemie genetika MeSH
genomika metody MeSH
lidé MeSH
repetitivní sekvence nukleových kyselin MeSH
sekvenční analýza DNA metody MeSH
shluková analýza MeSH
software MeSH
transpozibilní elementy DNA MeSH
vysoce účinné nukleotidové sekvenování metody MeSH
zvířata MeSH
Check Tag
lidé MeSH
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

... The pipeline first employs graph-based sequence clustering to identify groups of reads that represent ...

Nucleic acids research. 2017 ; 45 (12) : e111.

Nucleic Acids Res
ISSN 1362-4962
Medvik
Zdroj

Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.

Článek online

Genome invasion by a hypomethylated satellite repeat in Australian crucifer Ballantinia antipoda

... By the combination of high throughput sequencing, graph-based clustering and cytogenetics, we found that ...

Plant journal. 2019 ; 99 (6) : 1066-1079. [pub] 20190628

Plant J
ISSN 1365-313X
Medvik
Zdroj

Repetitive sequences are ubiquitous components of all eukaryotic genomes. They contribute to genome evolution and the regulation of gene transcription. However, the uncontrolled activity of repetitive sequences can negatively affect genome functions and stability. Therefore, repetitive DNAs are embedded in a highly repressive heterochromatic environment in plant cell nuclei. Here, we analyzed the sequence, composition and the epigenetic makeup of peculiar non-pericentromeric heterochromatic segments in the genome of the Australian crucifer Ballantinia antipoda. By the combination of high throughput sequencing, graph-based clustering and cytogenetics, we found that the heterochromatic segments consist of a mixture of unique sequences and an A-T-rich 174 bp satellite repeat (BaSAT1). BaSAT1 occupies about 10% of the B. antipoda nuclear genome in >250 000 copies. Unlike many other highly repetitive sequences, BaSAT1 repeats are hypomethylated; this contrasts with the normal patterns of DNA methylation in the B. antipoda genome. Detailed analysis of several copies revealed that these non-methylated BaSAT1 repeats were also devoid of heterochromatic histone H3K9me2 methylation. However, the factors decisive for the methylation status of BaSAT1 repeats remain currently unknown. In summary, we show that even highly repetitive sequences can exist as hypomethylated in the plant nuclear genome.

MeSH
Arabidopsis genetika MeSH
cévnaté rostliny chemie genetika metabolismus MeSH
epigeneze genetická MeSH
fylogeneze MeSH
genom rostlinný MeSH
heterochromatin genetika metabolismus MeSH
histony chemie metabolismus MeSH
metylace DNA genetika MeSH
satelitní DNA chemie genetika metabolismus MeSH
vysoce účinné nukleotidové sekvenování MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Kniha

An introduction to bioinformatics algorithms

... 6.11 Gene Prediction 193 -- 6.12 Statistical Approaches to Gene Prediction 197 -- 6.13 Similarity-Based ... ... Algorithms 247 -- 8.1 Graphs 247 -- 8.2 Graphs and Genetics 260 -- 8.3 DNA Sequencing 262 -- 8.4 Shortest ... ... and Trees 339 -- 10.1 Gene Expression Analysis 339 -- 10.2 Hierarchical Clustering 343 -- 10.3 k-Means ... ... Clustering 346 -- 10.4 Clustering and Corrupted Cliques 348 -- 10.5 Evolutionary Trees 354 -- 10.6 Distance-Based ... ... 366 -- 10.9 Character-Based Tree Reconstruction 368 -- 10.10 Small Parsimony Problem 370 -- 10.11 Large ...

Cambridge, Massachusets ; London, England : The MIT Press, 2004

Computational molecular biology series

[1st ed.] xviii, 435 s. : il.

Kniha

The origins of order : self-organization and selection in evolution

... Origin of Life Problem, 288 Autocatalytic Sets of Catalytic Polymers, 298 -- Growth on the Infinite Graph ... ... Components in the Genetic Regulatory Systems of Prokaryotes and Eukaryotes, 412 -- An Ensemble Theory Based ... ... on Random Directed Graphs, 419 Summary, 439 -- 12. ...

Kauffman, Stuart A., 1939-
Autor Autorita ORCID

New York : Oxford University Press, 1993

1st ed. 709 s. : il.

Konspekt
Obecná genetika. Obecná cytogenetika. Evoluce
NLK Obory
molekulární biologie, molekulární medicína

Článek online

Genome-wide analysis of repeat diversity across the family Musaceae

... overall intra- and inter-specific similarities and, all major repeat families were quantified using graph-based ...

PLoS One. 2014 ; 9 (6) : e98918. [pub] 20140616

ISSN 1932-6203
Medvik
Zdroj

BACKGROUND: The banana family (Musaceae) includes genetically a diverse group of species and their diploid and polyploid hybrids that are widely cultivated in the tropics. In spite of their socio-economic importance, the knowledge of Musaceae genomes is basically limited to draft genome assemblies of two species, Musa acuminata and M. balbisiana. Here we aimed to complement this information by analyzing repetitive genome fractions of six species selected to represent various phylogenetic groups within the family. RESULTS: Low-pass sequencing of M. acuminata, M. ornata, M. textilis, M. beccarii, M. balbisiana, and Ensete gilletii genomes was performed using a 454/Roche platform. Sequence reads were subjected to analysis of their overall intra- and inter-specific similarities and, all major repeat families were quantified using graph-based clustering. Maximus/SIRE and Angela lineages of Ty1/copia long terminal repeat (LTR) retrotransposons and the chromovirus lineage of Ty3/gypsy elements were found to make up most of highly repetitive DNA in all species (14-34.5% of the genome). However, there were quantitative differences and sequence variations detected for classified repeat families as well as for the bulk of total repetitive DNA. These differences were most pronounced between species from different taxonomic sections of the Musaceae family, whereas pairs of closely related species (M. acuminata/M. ornata and M. beccarii/M. textilis) shared similar populations of repetitive elements. CONCLUSIONS: This study provided the first insight into the composition and sequence variation of repetitive parts of Musaceae genomes. It allowed identification of repetitive sequences specific for a single species or a group of species that can be utilized as molecular markers in breeding programs and generated computational resources that will be instrumental in repeat masking and annotation in future genome assembly projects.

MeSH
banánovníkovité klasifikace genetika MeSH
DNA rostlinná analýza genetika MeSH
fylogeneze MeSH
genetická variace MeSH
genom rostlinný * MeSH
molekulární evoluce MeSH
repetitivní sekvence nukleových kyselin * MeSH
sekvenční analýza DNA MeSH
výpočetní biologie metody MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

Large-scale cortico-subcortical functional networks in focal epilepsies: The role of the basal ganglia

... The complete weighted network was analyzed based on correlation matrices of 90 and 194 regions. ...

NeuroImage. Clinical. 2017 ; 14 (-) : 28-36. [pub] 20161218

Neuroimage Clin
ISSN 2213-1582
Medvik
Zdroj

OBJECTIVES: The aim was to describe the contribution of basal ganglia (BG) thalamo-cortical circuitry to the whole-brain functional connectivity in focal epilepsies. METHODS: Interictal resting-state fMRI recordings were acquired in 46 persons with focal epilepsies. Of these 46, 22 had temporal lobe epilepsy: 9 left temporal (LTLE), 13 right temporal (RTLE); 15 had frontal lobe epilepsy (FLE); and 9 had parietal/occipital lobe epilepsy (POLE). There were 20 healthy controls. The complete weighted network was analyzed based on correlation matrices of 90 and 194 regions. The network topology was quantified on a global and regional level by measures based on graph theory, and connection-level changes were analyzed by the partial least square method. RESULTS: In all patient groups except RTLE, the shift of the functional network topology away from random was observed (normalized clustering coefficient and characteristic path length were higher in patient groups than in controls). Links contributing to this change were found in the cortico-subcortical connections. Weak connections (low correlations) consistently contributed to this modification of the network. The importance of regions changed: decreases in the subcortical areas and both decreases and increases in the cortical areas were observed in node strength, clustering coefficient and eigenvector centrality in patient groups when compared to controls. Node strength decreases of the basal ganglia, i.e. the putamen, caudate, and pallidum, were displayed in LTLE, FLE, and POLE. The connectivity within the basal ganglia-thalamus circuitry was not disturbed; the disturbance concerned the connectivity between the circuitry and the cortex. SIGNIFICANCE: Focal epilepsies affect large-scale brain networks beyond the epileptogenic zones. Cortico-subcortical functional connectivity disturbance was displayed in LTLE, FLE, and POLE. Significant changes in the resting-state functional connectivity between cortical and subcortical structures suggest an important role of the BG and thalamus in focal epilepsies.

Kolekce

Publikováno

Filtry

graph-based clustering Dotaz Zobrazit nápovědu

Přesná shoda Sémantické

graph-based clustering Dotaz Zobrazit nápovědu Přesná shoda Sémantické

Upřesnit dle MeSH

graph-based clustering Dotaz Zobrazit nápovědu

Přesná shoda Sémantické