Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

. 2010 Jul 15 ; 11 () : 378. [epub] 20100715

Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid20633259

BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

Zobrazit více v PubMed

Ansorge WJ. Next-generation DNA sequencing techniques. New Biotechnol. 2009;25:195–203. doi: 10.1016/j.nbt.2008.12.009. PubMed DOI

Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–1145. doi: 10.1038/nbt1486. PubMed DOI

Schuster SC. Next-generation sequencing transforms today's biology. Nat Methods. 2008;5:16–18. doi: 10.1038/nmeth1156. PubMed DOI

Murray MG, Peters DL, Thompson WF. Ancient repeated sequences in the pea and mung bean genomes and implications for genome evolution. J Mol Evol. 1981;17:31–42. doi: 10.1007/BF01792422. DOI

Flavell RB, Bennett MD, Smith JB, Smith DB. Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem Genet. 1974;12:257–269. doi: 10.1007/BF00485947. PubMed DOI

Macas J, Neumann P, Navratilova A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 2007;8:427. doi: 10.1186/1471-2164-8-427. PubMed DOI PMC

Swaminathan K, Varala K, Hudson ME. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey. BMC Genomics. 2007;8:132. doi: 10.1186/1471-2164-8-132. PubMed DOI PMC

Wicker T, Narechania A, Sabot F, Stein J, Vu GTH, Graner A, Ware D, Stein N. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats. BMC Genomics. 2008;9:518. doi: 10.1186/1471-2164-9-518. PubMed DOI PMC

Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003;19:651–652. doi: 10.1093/bioinformatics/btg034. PubMed DOI

R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2009.

Csardi G, Nepusz T. The igraph Software Package for Complex Network Research. InterJournal. 2006. p. 1695. Complex Systems.

The R project for statistical computing. http://www.r-project.org PubMed

Clauset A, Newman MEJ, Moore C. Finding community structure in very large networks. Phys Rev E. 2004;70:066111. doi: 10.1103/PhysRevE.70.066111. PubMed DOI

Girvan M, Newman MEJ. Community structure in social and biological networks. P Natl Acad Sci USA. 2002;99:7821–7826. doi: 10.1073/pnas.122653799. PubMed DOI PMC

Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. doi: 10.1103/PhysRevE.69.026113. PubMed DOI

Newman MEJ. Modularity and community structure in networks. P Natl Acad Sci USA. 2006;103:8577–8582. doi: 10.1073/pnas.0601602103. PubMed DOI PMC

Reingold EM, Fruchterman TMJ. Graph drawing by force-directed placement. Software Pract Exper. pp. 1129–1164.

Lawrence M, Wickham H, Cook D, Hofmann H, Swayne D. Extending the GGobi pipeline from R. Computation Stat. 2009;24:195–205. doi: 10.1007/s00180-008-0115-y. DOI

Swayne DF, Lang DT, Buja A, Cook D. GGobi: evolving from XGobi into an extensible framework for interactive data visualization. Comput Stat Data An. 2003;43:423–444. doi: 10.1016/S0167-9473(02)00286-4. DOI

RepeatMasker Open-3.0. http://www.repeatmasker.org

Smykal P, Kalendar R, Ford R, Macas J, Griga M. Evolutionary conserved lineage of Angela-family retrotransposons as a genome-wide microsatellite repeat dispersal agent. Heredity. 2009;103:157–167. doi: 10.1038/hdy.2009.45. PubMed DOI

Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000;16:418–420. doi: 10.1016/S0168-9525(00)02093-X. PubMed DOI

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2010;38:D46–51. doi: 10.1093/nar/gkp1024. PubMed DOI PMC

Mason O, Verwoerd M. Graph theory and networks in Biology. IET Syst Biol. 2007;1:89–119. doi: 10.1049/iet-syb:20060038. PubMed DOI

Kingsford C, Schatz M, Pop M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics. 2010;11:21. doi: 10.1186/1471-2105-11-21. PubMed DOI PMC

Medvedev P, Brudno M. Maximum Likelihood Genome Assembly. J Comput Biol. 2009;16:1101–1116. doi: 10.1089/cmb.2009.0047. PubMed DOI PMC

Zerbino D, Birney E. Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. PubMed DOI PMC

DeBarry JD, Liu R, Bennetzen JL. Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm. BMC Bioinformatics. 2008;9:235. doi: 10.1186/1471-2105-9-235. PubMed DOI PMC

Tangphatsornruang S, Somta P, Uthaipaisanwong P, Chanprasert J, Sangsrakru D, Seehalak W, Sommanas W, Tragoonrung S, Srinives P. Characterization of microsatellites and gene contents from genome shotgun sequences of mungbean (Vigna radiata (L.) Wilczek) BMC Plant Biol. 2009;9:137. doi: 10.1186/1471-2229-9-137. PubMed DOI PMC

Staden R. The Staden sequence analysis package. Mol Biotechnol. 1996;5:233–241. doi: 10.1007/BF02900361. PubMed DOI

Frishman Y, Tal A. Multi-Level Graph Layout on the GPU. IEEE T Vis Comput Gr. 2007;13:1310–1319. doi: 10.1109/TVCG.2007.70580. PubMed DOI

Godiyal A, Hoberock J, Garland M, Hart J. Graph Drawing. Vol. 5417. Heidelberg: Springer Berlin; 2009. Rapid Multipole Graph Drawing on the GPU; pp. 90–101. full_text.

Cluster resources. http://www.clusterresources.com

BioPerl. http://www.bioperl.org

Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. PubMed

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Contrasting distributions and expression characteristics of transcribing repeats in Setaria viridis

. 2025 Mar ; 18 (1) : e20551.

Holocentric repeat landscapes: From micro-evolutionary patterns to macro-evolutionary associations with karyotype evolution

. 2024 Dec ; 33 (24) : e17100. [epub] 20230814

Sexy ways: approaches to studying plant sex chromosomes

. 2024 Sep 11 ; 75 (17) : 5204-5219.

Celine, a long interspersed nuclear element retrotransposon, colonizes in the centromeres of poplar chromosomes

. 2024 Jul 31 ; 195 (4) : 2787-2798.

Cytogenetic Analysis of Satellitome of Madagascar Leaf-Tailed Geckos

. 2024 Mar 28 ; 15 (4) : . [epub] 20240328

The Dynamic Interplay Between Ribosomal DNA and Transposable Elements: A Perspective From Genomics and Cytogenetics

. 2024 Mar 01 ; 41 (3) : .

Rapid gene content turnover on the germline-restricted chromosome in songbirds

. 2023 Jul 29 ; 14 (1) : 4579. [epub] 20230729

The Role of Repetitive Sequences in Repatterning of Major Ribosomal DNA Clusters in Lepidoptera

. 2023 Jun 01 ; 15 (6) : .

Telomeres and Their Neighbors

. 2022 Sep 16 ; 13 (9) : . [epub] 20220916

Draft Sequencing Crested Wheatgrass Chromosomes Identified Evolutionary Structural Changes and Genes and Facilitated the Development of SSR Markers

. 2022 Mar 16 ; 23 (6) : . [epub] 20220316

Power and Weakness of Repetition - Evaluating the Phylogenetic Signal From Repeatomes in the Family Rosaceae With Two Case Studies From Genera Prone to Polyploidy and Hybridization (Rosa and Fragaria)

. 2021 ; 12 () : 738119. [epub] 20211207

The Role of Satellite DNAs in Genome Architecture and Sex Chromosome Evolution in Crambidae Moths

. 2021 ; 12 () : 661417. [epub] 20210330

Ancient Origin of Two 5S rDNA Families Dominating in the Genus Rosa and Their Behavior in the Canina-Type Meiosis

. 2021 ; 12 () : 643548. [epub] 20210308

Fundamentally different repetitive element composition of sex chromosomes in Rumex acetosa

. 2021 Jan 01 ; 127 (1) : 33-47.

Complete Mitochondrial Genome of Three Species of the Genus Microtus (Arvicolinae, Rodentia)

. 2020 Nov 16 ; 10 (11) : . [epub] 20201116

Characterization and Dynamics of Repeatomes in Closely Related Species of Hieracium (Asteraceae) and Their Synthetic and Apomictic Hybrids

. 2020 ; 11 () : 591053. [epub] 20201102

Comparative analyses of DNA repeats and identification of a novel Fesreba centromeric element in fescues and ryegrasses

. 2020 Jun 17 ; 20 (1) : 280. [epub] 20200617

Differential Genome Size and Repetitive DNA Evolution in Diploid Species of Melampodium sect. Melampodium (Asteraceae)

. 2020 ; 11 () : 362. [epub] 20200331

Origin, Diversity, and Evolution of Telomere Sequences in Plants

. 2020 ; 11 () : 117. [epub] 20200221

The Utility of Graph Clustering of 5S Ribosomal DNA Homoeologs in Plant Allopolyploids, Homoploid Hybrids, and Cryptic Introgressants

. 2020 ; 11 () : 41. [epub] 20200210

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace