TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

. 2017 Jul 07 ; 45 (12) : e111.

Jazyk angličtina Země Anglie, Velká Británie Médium print

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid28402514

Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.

Zobrazit více v PubMed

Macas J., Mészáros T., Nouzová M.. PlantSat: a specialized database for plant satellite repeats. Bioinformatics. 2002; 18:28–35. PubMed

Garrido-Ramos M.A. Satellite DNA in plants: More than just rubbish. Cytogenet. Genome Res. 2015; 146:153–170. PubMed

Plohl M., Luchetti A., Meštrović N., Mantovani B.. Satellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin. Gene. 2008; 409:72–82. PubMed

Ellegren H. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 2004; 5:435–445. PubMed

Richard G.-F., Kerrest A., Dujon B.. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev. 2008; 72:686–727. PubMed PMC

Plohl M., Meštrović N., Mravinac B.. Centromere identity from the DNA point of view. Chromosoma. 2014; 123:313–325. PubMed PMC

Fuchs J., Strehl S., Brandes A., Schweizer D., Schubert I.. Molecular-cytogenetic characterization of the Vicia faba genome – heterochromatin differentiation, replication patterns and sequence localization. Chromosom. Res. 1998; 6:219–230. PubMed

Macas J., Požárková D., Navrátilová A., Nouzová M., Neumann P.. Two new families of tandem repeats isolated from genus Vicia using genomic self-priming PCR. Mol. Gen. Genet. 2000; 263:741–751. PubMed

Cai Z., Liu H., He Q., Pu M., Chen J., Lai J., Li X., Jin W.. Differential genome evolution and speciation of Coix lacryma-jobi L. and Coix aquatica Roxb. hybrid guangxi revealed by repetitive sequence analysis and fine karyotyping. BMC Genomics. 2014; 15:1025. PubMed PMC

Navrátilová A., Neumann P., Macas J.. Karyotype analysis of four Vicia species using in situ hybridization with repetitive sequences. Ann. Bot. 2003; 91:921–926. PubMed PMC

Kit S. Equilibrium sedimentation in density gradients of DNA preparations from animal tissues. J. Mol. Biol. 1961; 3:711–716. PubMed

Hemleben V., Kovařík A., Torres-Ruiz R.A., Volkov R.A., Beridze T.. Plant highly repeated satellite DNA: molecular evolution, distribution and use for identification of hybrids. Syst. Biodivers. 2007; 5:277–289.

Benson G. Tandem Repeats Finder: a program to analyse DNA sequences. Nucleic Acids Res. 1999; 27:573–578. PubMed PMC

Glunčić M., Paar V.. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res. 2013; 41:e17. PubMed PMC

Herzel H., Weiss O., Trifonov E.N.. 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics. 1999; 15:187–193. PubMed

Macas J., Navrátilová A., Koblížková A.. Sequence homogenization and chromosomal localization of VicTR-B satellites differ between closely related Vicia species. Chromosoma. 2006; 115:437–447. PubMed

Sharma D., Issac B., Raghava G.P.S., Ramaswamy R.. Spectral repeat finders (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics. 2004; 20:1405–1412. PubMed

Treangen T.J., Salzberg S.L.. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 2012; 13:36–46. PubMed PMC

Novák P., Neumann P., Macas J.. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics. 2010; 11:378. PubMed PMC

Novák P., Neumann P., Pech J., Steinhaisl J., Macas J.. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics. 2013; 29:792–793. PubMed

Weiss-Schneeweiss H., Leitch A.R., McCann J., Jang T.-S., Macas J.. Hörandl E, Appelhans M. Employing next generation sequencing to explore the repeat landscape of the plant genome. Next Generation Sequencing in Plant Systematics. Regnum Vegetabile 157. 2015; 158, Königstein: Koeltz Scientific Books; 155–179.

Pagan H.J.T., Macas J., Novák P., McCulloch E.S., Stevens R.D., Ray D.A.. Survey sequencing reveals elevated DNA transposon activity, novel elements, and variation in repetitive landscapes among vesper bats. Genome Biol. Evol. 2012; 4:575–585. PubMed PMC

García G., Ríos N., Gutiérrez V.. Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus Austrolebias (Cyprinodontiformes, Rivulidae). Genetica. 2015; 143:353–360. PubMed

Camacho J.P.M., Ruiz-Ruano F.J., Martín-Blázquez R., López-León M.D., Cabrero J., Lorite P., Cabral-de-Mello D.C., Bakkali M.. A step to the gigantic genome of the desert locust: chromosome sizes and repeated DNAs. Chromosoma. 2014; 124:263–275. PubMed

Neumann P., Navrátilová A., Schroeder-Reiter E., Koblížková A., Steinbauerová V., Chocholová E., Novák P., Wanner G., Macas J.. Stretching the rules: monocentric chromosomes with multiple centromere domains. PLoS Genet. 2012; 8:e1002777. PubMed PMC

Marques A., Ribeiro T., Neumann P., Macas J., Novák P., Schubert V., Pellino M., Fuchs J., Ma W., Kuhlmann M. et al. . Holocentromeres in Rhynchospora are associated with genome-wide centromere-specific repeat arrays interspersed among euchromatin. Proc. Natl. Acad. Sci. U.S.A. 2015; 112:13633–13638. PubMed PMC

Heckmann S., Macas J., Kumke K., Fuchs J., Schubert V., Ma L., Novák P., Neumann P., Taudien S., Platzer M. et al. . The holocentric species Luzula elegans shows interplay between centromere and large-scale genome organization. Plant J. 2013; 73:555–565. PubMed

Ruiz-Ruano F.J., López-León M.D., Cabrero J., Camacho J.P.M.. High-throughput analysis of the satellitome illuminates satellite DNA evolution. Sci. Rep. 2016; 6:28333. PubMed PMC

Macas J., Kejnovský E., Neumann P., Novák P., Koblížková A., Vyskot B.. Next generation sequencing-based analysis of repetitive DNA in the model dioecious plant Silene latifolia. PLoS One. 2011; 6:e27335. PubMed PMC

Macas J., Novák P., Pellicer J., Čížková J., Koblížková A., Neumann P., Fuková I., Doležel J., Kelly L.J., Leitch I.J.. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS One. 2015; 10:e0143424. PubMed PMC

Renny-Byfield S., Kovařík A., Chester M., Nichols R.A., Macas J., Novák P., Leitch A.R.. Independent, rapid and targeted loss of highly repetitive DNA in natural and synthetic allopolyploids of Nicotiana tabacum. PLoS One. 2012; 7:e36963. PubMed PMC

Macas J., Neumann P., Novák P., Jiang J.. Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data. Bioinformatics. 2010; 26:2101–2108. PubMed

Torres G.A., Gong Z., Iovene M., Hirsch C.D., Buell C.R., Bryan G.J., Novák P., Macas J., Jiang J.. Organization and evolution of subtelomeric satellite repeats in the potato genome. G3. 2011; 1:85–92. PubMed PMC

Blondel V.D., Guillaume J.-L., Lambiotte R., Lefebvre E.. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008; 10008:6.

Wilson R.J. Wilson R.J. Introduction to Graph Theory. 1996; 4th edn, Addison Wesley Longman Limited.

Zaslavsky T. Signed graphs. Discret. Appl. Math. 1982; 4:47–74.

Fraley C., Raftery A.E.. Model-based clustering, discriminant analysis, and densiy estimation. J. Am. Stat. Assoc. 2002; 97:611–631.

Havecker E.R., Gao X., Voytas D.F.. The diversity of LTR retrotransposons. Genome Biol. 2004; 5:225. PubMed PMC

Csardi G., Nepusz T.. The igraph software package for complex network research. Inter J. Compex Syst. 2006.

Afgan E., Baker D., van den Beek M., Blankenberg D., Bouvier D., Čech M., Chilton J., Clements D., Coraor N., Eberhard C. et al. . The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016; 44:W3–W10. PubMed PMC

Kato A., Albert P., Vega J., Birchler J.. Sensitive fluorescence in situ hybridization signal detection in maize using directly labeled probes produced by high concentration DNA polymerase nick translation. Biotech. Histochem. 2006; 81:71–78. PubMed

Macas J., Neumann P., Navrátilová A.. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 2007; 8:427. PubMed PMC

Kato A., Yakura K., Tanifuji S.. Sequence analysis of Vicia faba repeated DNA, the FokI repeat element. Nucleic Acids Res. 1984; 12:6415–6426. PubMed PMC

Fuchs J., Pich U., Meister A., Schubert I.. Differentiation of field bean heterochromatin by in situ hybridization with a repeated FokI sequence. Chromosom. Res. 1994; 2:25–28. PubMed

Ananiev E.V., Phillips R.L., Rines H.W.. Chromosome-specific molecular organization of maize (Zea mays L.) centromeric regions. Proc. Natl. Acad. Sci. U.S.A. 1998; 95:13073–13078. PubMed PMC

Ananiev E.V., Phillips R.L., Rines H.W.. A knob-associated tandem repeat in maize capable of forming fold-back DNA segments: are chromosome knobs megatransposons. Proc. Natl. Acad. Sci. U.S.A. 1998; 95:10785–1090. PubMed PMC

Ananiev E.V., Phillips R.L., Rines H.W.. Complex structure of knob DNA on maize chromosome 9: retrotransposon invasion into heterochromatin. Genetics. 1998; 149:2025–2037. PubMed PMC

Maggini F., Cremonini R., Zolfino C., Tucci G.F., D’Ovidio R., Delre V., DePace C., Scarascia Mugnozza G.T., Cionini P.G.. Structure and chromosomal localization of DNA sequences related to ribosomal subrepeats in Vicia faba. Chromosoma. 1991; 100:229–234. PubMed

Macas J., Koblížková A., Navrátilová A., Neumann P.. Hypervariable 3΄ UTR region of plant LTR-retrotransposons as a source of novel satellite repeats. Gene. 2009; 448:198–206. PubMed

Schaper E., Kajava A. V., Hauser A., Anisimova M.. Repeat or not repeat? Statistical validation of tandem repeat prediction in genomic sequences. Nucleic Acids Res. 2012; 40:10005–10017. PubMed PMC

Lim K.G., Kwoh C.K., Hsu L.Y., Wirawan A.. Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief. Bioinform. 2013; 14:67–81. PubMed

Fertin G., Jean G., Radulescu A., Rusu I.. DExTaR: detection of exact tandem repeats based on the de Bruijn graph. Proc. - 2014 IEEE Int. Conf. Bioinforma. Biomed. IEEE BIBM 2014. 2014; doi:10.1109/BIBM.2014.6999134.

Fertin G., Jean G., Radulescu A., Rusu I.. Hybrid de novo tandem repeat detection using short and long reads. BMC Med. Genomics. 2015; 8(Suppl. 3):S5. PubMed PMC

Simpson J.T., Wong K., Jackman S.D., Schein J.E., Jones S.J.M., Birol I.. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009; 19:1117–1123. PubMed PMC

Gong Z., Wu Y., Koblížková A., Torres G.A., Wang K., Iovene M., Neumann P., Zhang W., Novák P., Buell C.R. et al. . Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell. 2012; 24:3559–3574. PubMed PMC

Macas J., Navrátilová A., Mészáros T.. Sequence subfamilies of satellite repeats related to rDNA intergenic spacer are differentially amplified on Vicia sativa chromosomes. Chromosoma. 2003; 112:152–158. PubMed

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

The first insight into Acanthocephalus (Palaeacanthocephala) satellitome: species-specific satellites as potential cytogenetic markers

. 2025 Jan 23 ; 15 (1) : 2945. [epub] 20250123

The burst of satellite DNA in Leptidea wood white butterflies and their putative role in karyotype evolution

. 2024 Dec 01 ; 31 (6) : .

Holocentric repeat landscapes: From micro-evolutionary patterns to macro-evolutionary associations with karyotype evolution

. 2024 Dec ; 33 (24) : e17100. [epub] 20230814

Sex-biased gene content is associated with sex chromosome turnover in Danaini butterflies

. 2024 Dec ; 33 (24) : e17256. [epub] 20240105

Repeat-based holocentromeres of the woodrush Luzula sylvatica reveal insights into the evolutionary transition to holocentricity

. 2024 Nov 05 ; 15 (1) : 9565. [epub] 20241105

First insight into the genomes of the Pulmonaria officinalis group (Boraginaceae) provided by repeatome analysis and comparative karyotyping

. 2024 Sep 13 ; 24 (1) : 859. [epub] 20240913

Ancient hybridization and repetitive element proliferation in the evolutionary history of the monocot genus Amomum (Zingiberaceae)

. 2024 ; 15 () : 1324358. [epub] 20240419

Phased Assembly of Neo-Sex Chromosomes Reveals Extensive Y Degeneration and Rapid Genome Evolution in Rumex hastatulus

. 2024 Apr 02 ; 41 (4) : .

Evolution of ancient satellite DNAs in extant alligators and caimans (Crocodylia, Reptilia)

. 2024 Feb 27 ; 22 (1) : 47. [epub] 20240227

Analysis of 5S rDNA Genomic Organization Through the RepeatExplorer2 Pipeline: A Simplified Protocol

Holocentromeres can consist of merely a few megabase-sized satellite arrays

. 2023 Jun 13 ; 14 (1) : 3502. [epub] 20230613

Cytogenetics Meets Genomics: Cytotaxonomy and Genomic Relationships among Color Variants of the Asian Arowana Scleropages formosus

. 2023 May 19 ; 24 (10) : . [epub] 20230519

A step forward in the genome characterization of the sugarcane borer, Diatraea saccharalis: karyotype analysis, sex chromosome system and repetitive DNAs through a cytogenomic approach

. 2022 Dec ; 131 (4) : 253-267. [epub] 20221011

Genome sequence and silkomics of the spindle ermine moth, Yponomeuta cagnagella, representing the early diverging lineage of the ditrysian Lepidoptera

. 2022 Nov 23 ; 5 (1) : 1281. [epub] 20221123

The relationship between transposable elements and ecological niches in the Greater Cape Floristic Region: A study on the genus Pteronia (Asteraceae)

. 2022 ; 13 () : 982852. [epub] 20220929

Repeat Dynamics across Timescales: A Perspective from Sibling Allotetraploid Marsh Orchids (Dactylorhiza majalis s.l.)

. 2022 Aug 03 ; 39 (8) : .

Complex sequence organization of heterochromatin in the holocentric plant Cuscuta europaea elucidated by the computational analysis of nanopore reads

. 2021 ; 19 () : 2179-2189. [epub] 20210422

The Role of Satellite DNAs in Genome Architecture and Sex Chromosome Evolution in Crambidae Moths

. 2021 ; 12 () : 661417. [epub] 20210330

Limitation of current probe design for oligo-cross-FISH, exemplified by chromosome evolution studies in duckweeds

. 2021 Mar ; 130 (1) : 15-25. [epub] 20210114

The B chromosome of Sorghum purpureosericeum reveals the first pieces of its sequence

. 2021 Feb 27 ; 72 (5) : 1606-1616.

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...