TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads
Jazyk angličtina Země Anglie, Velká Británie Médium print
Typ dokumentu časopisecké články
PubMed
28402514
PubMed Central
PMC5499541
DOI
10.1093/nar/gkx257
PII: 3574061
Knihovny.cz E-zdroje
- MeSH
- DNA rostlinná genetika MeSH
- genom rostlinný * MeSH
- hrách setý genetika MeSH
- hybridizace in situ fluorescenční MeSH
- konsenzuální sekvence MeSH
- kukuřice setá genetika MeSH
- Magnoliopsida genetika MeSH
- mapování chromozomů metody MeSH
- metafáze MeSH
- počítačová grafika MeSH
- šáchorovité genetika MeSH
- satelitní DNA klasifikace genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA MeSH
- shluková analýza MeSH
- software * MeSH
- Vicia faba genetika MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- DNA rostlinná MeSH
- satelitní DNA MeSH
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
Zobrazit více v PubMed
Macas J., Mészáros T., Nouzová M.. PlantSat: a specialized database for plant satellite repeats. Bioinformatics. 2002; 18:28–35. PubMed
Garrido-Ramos M.A. Satellite DNA in plants: More than just rubbish. Cytogenet. Genome Res. 2015; 146:153–170. PubMed
Plohl M., Luchetti A., Meštrović N., Mantovani B.. Satellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin. Gene. 2008; 409:72–82. PubMed
Ellegren H. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 2004; 5:435–445. PubMed
Richard G.-F., Kerrest A., Dujon B.. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev. 2008; 72:686–727. PubMed PMC
Plohl M., Meštrović N., Mravinac B.. Centromere identity from the DNA point of view. Chromosoma. 2014; 123:313–325. PubMed PMC
Fuchs J., Strehl S., Brandes A., Schweizer D., Schubert I.. Molecular-cytogenetic characterization of the Vicia faba genome – heterochromatin differentiation, replication patterns and sequence localization. Chromosom. Res. 1998; 6:219–230. PubMed
Macas J., Požárková D., Navrátilová A., Nouzová M., Neumann P.. Two new families of tandem repeats isolated from genus Vicia using genomic self-priming PCR. Mol. Gen. Genet. 2000; 263:741–751. PubMed
Cai Z., Liu H., He Q., Pu M., Chen J., Lai J., Li X., Jin W.. Differential genome evolution and speciation of Coix lacryma-jobi L. and Coix aquatica Roxb. hybrid guangxi revealed by repetitive sequence analysis and fine karyotyping. BMC Genomics. 2014; 15:1025. PubMed PMC
Navrátilová A., Neumann P., Macas J.. Karyotype analysis of four Vicia species using in situ hybridization with repetitive sequences. Ann. Bot. 2003; 91:921–926. PubMed PMC
Kit S. Equilibrium sedimentation in density gradients of DNA preparations from animal tissues. J. Mol. Biol. 1961; 3:711–716. PubMed
Hemleben V., Kovařík A., Torres-Ruiz R.A., Volkov R.A., Beridze T.. Plant highly repeated satellite DNA: molecular evolution, distribution and use for identification of hybrids. Syst. Biodivers. 2007; 5:277–289.
Benson G. Tandem Repeats Finder: a program to analyse DNA sequences. Nucleic Acids Res. 1999; 27:573–578. PubMed PMC
Glunčić M., Paar V.. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res. 2013; 41:e17. PubMed PMC
Herzel H., Weiss O., Trifonov E.N.. 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics. 1999; 15:187–193. PubMed
Macas J., Navrátilová A., Koblížková A.. Sequence homogenization and chromosomal localization of VicTR-B satellites differ between closely related Vicia species. Chromosoma. 2006; 115:437–447. PubMed
Sharma D., Issac B., Raghava G.P.S., Ramaswamy R.. Spectral repeat finders (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics. 2004; 20:1405–1412. PubMed
Treangen T.J., Salzberg S.L.. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 2012; 13:36–46. PubMed PMC
Novák P., Neumann P., Macas J.. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics. 2010; 11:378. PubMed PMC
Novák P., Neumann P., Pech J., Steinhaisl J., Macas J.. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics. 2013; 29:792–793. PubMed
Weiss-Schneeweiss H., Leitch A.R., McCann J., Jang T.-S., Macas J.. Hörandl E, Appelhans M. Employing next generation sequencing to explore the repeat landscape of the plant genome. Next Generation Sequencing in Plant Systematics. Regnum Vegetabile 157. 2015; 158, Königstein: Koeltz Scientific Books; 155–179.
Pagan H.J.T., Macas J., Novák P., McCulloch E.S., Stevens R.D., Ray D.A.. Survey sequencing reveals elevated DNA transposon activity, novel elements, and variation in repetitive landscapes among vesper bats. Genome Biol. Evol. 2012; 4:575–585. PubMed PMC
García G., Ríos N., Gutiérrez V.. Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus Austrolebias (Cyprinodontiformes, Rivulidae). Genetica. 2015; 143:353–360. PubMed
Camacho J.P.M., Ruiz-Ruano F.J., Martín-Blázquez R., López-León M.D., Cabrero J., Lorite P., Cabral-de-Mello D.C., Bakkali M.. A step to the gigantic genome of the desert locust: chromosome sizes and repeated DNAs. Chromosoma. 2014; 124:263–275. PubMed
Neumann P., Navrátilová A., Schroeder-Reiter E., Koblížková A., Steinbauerová V., Chocholová E., Novák P., Wanner G., Macas J.. Stretching the rules: monocentric chromosomes with multiple centromere domains. PLoS Genet. 2012; 8:e1002777. PubMed PMC
Marques A., Ribeiro T., Neumann P., Macas J., Novák P., Schubert V., Pellino M., Fuchs J., Ma W., Kuhlmann M. et al. . Holocentromeres in Rhynchospora are associated with genome-wide centromere-specific repeat arrays interspersed among euchromatin. Proc. Natl. Acad. Sci. U.S.A. 2015; 112:13633–13638. PubMed PMC
Heckmann S., Macas J., Kumke K., Fuchs J., Schubert V., Ma L., Novák P., Neumann P., Taudien S., Platzer M. et al. . The holocentric species Luzula elegans shows interplay between centromere and large-scale genome organization. Plant J. 2013; 73:555–565. PubMed
Ruiz-Ruano F.J., López-León M.D., Cabrero J., Camacho J.P.M.. High-throughput analysis of the satellitome illuminates satellite DNA evolution. Sci. Rep. 2016; 6:28333. PubMed PMC
Macas J., Kejnovský E., Neumann P., Novák P., Koblížková A., Vyskot B.. Next generation sequencing-based analysis of repetitive DNA in the model dioecious plant Silene latifolia. PLoS One. 2011; 6:e27335. PubMed PMC
Macas J., Novák P., Pellicer J., Čížková J., Koblížková A., Neumann P., Fuková I., Doležel J., Kelly L.J., Leitch I.J.. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS One. 2015; 10:e0143424. PubMed PMC
Renny-Byfield S., Kovařík A., Chester M., Nichols R.A., Macas J., Novák P., Leitch A.R.. Independent, rapid and targeted loss of highly repetitive DNA in natural and synthetic allopolyploids of Nicotiana tabacum. PLoS One. 2012; 7:e36963. PubMed PMC
Macas J., Neumann P., Novák P., Jiang J.. Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data. Bioinformatics. 2010; 26:2101–2108. PubMed
Torres G.A., Gong Z., Iovene M., Hirsch C.D., Buell C.R., Bryan G.J., Novák P., Macas J., Jiang J.. Organization and evolution of subtelomeric satellite repeats in the potato genome. G3. 2011; 1:85–92. PubMed PMC
Blondel V.D., Guillaume J.-L., Lambiotte R., Lefebvre E.. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008; 10008:6.
Wilson R.J. Wilson R.J. Introduction to Graph Theory. 1996; 4th edn, Addison Wesley Longman Limited.
Zaslavsky T. Signed graphs. Discret. Appl. Math. 1982; 4:47–74.
Fraley C., Raftery A.E.. Model-based clustering, discriminant analysis, and densiy estimation. J. Am. Stat. Assoc. 2002; 97:611–631.
Havecker E.R., Gao X., Voytas D.F.. The diversity of LTR retrotransposons. Genome Biol. 2004; 5:225. PubMed PMC
Csardi G., Nepusz T.. The igraph software package for complex network research. Inter J. Compex Syst. 2006.
Afgan E., Baker D., van den Beek M., Blankenberg D., Bouvier D., Čech M., Chilton J., Clements D., Coraor N., Eberhard C. et al. . The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016; 44:W3–W10. PubMed PMC
Kato A., Albert P., Vega J., Birchler J.. Sensitive fluorescence in situ hybridization signal detection in maize using directly labeled probes produced by high concentration DNA polymerase nick translation. Biotech. Histochem. 2006; 81:71–78. PubMed
Macas J., Neumann P., Navrátilová A.. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 2007; 8:427. PubMed PMC
Kato A., Yakura K., Tanifuji S.. Sequence analysis of Vicia faba repeated DNA, the FokI repeat element. Nucleic Acids Res. 1984; 12:6415–6426. PubMed PMC
Fuchs J., Pich U., Meister A., Schubert I.. Differentiation of field bean heterochromatin by in situ hybridization with a repeated FokI sequence. Chromosom. Res. 1994; 2:25–28. PubMed
Ananiev E.V., Phillips R.L., Rines H.W.. Chromosome-specific molecular organization of maize (Zea mays L.) centromeric regions. Proc. Natl. Acad. Sci. U.S.A. 1998; 95:13073–13078. PubMed PMC
Ananiev E.V., Phillips R.L., Rines H.W.. A knob-associated tandem repeat in maize capable of forming fold-back DNA segments: are chromosome knobs megatransposons. Proc. Natl. Acad. Sci. U.S.A. 1998; 95:10785–1090. PubMed PMC
Ananiev E.V., Phillips R.L., Rines H.W.. Complex structure of knob DNA on maize chromosome 9: retrotransposon invasion into heterochromatin. Genetics. 1998; 149:2025–2037. PubMed PMC
Maggini F., Cremonini R., Zolfino C., Tucci G.F., D’Ovidio R., Delre V., DePace C., Scarascia Mugnozza G.T., Cionini P.G.. Structure and chromosomal localization of DNA sequences related to ribosomal subrepeats in Vicia faba. Chromosoma. 1991; 100:229–234. PubMed
Macas J., Koblížková A., Navrátilová A., Neumann P.. Hypervariable 3΄ UTR region of plant LTR-retrotransposons as a source of novel satellite repeats. Gene. 2009; 448:198–206. PubMed
Schaper E., Kajava A. V., Hauser A., Anisimova M.. Repeat or not repeat? Statistical validation of tandem repeat prediction in genomic sequences. Nucleic Acids Res. 2012; 40:10005–10017. PubMed PMC
Lim K.G., Kwoh C.K., Hsu L.Y., Wirawan A.. Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief. Bioinform. 2013; 14:67–81. PubMed
Fertin G., Jean G., Radulescu A., Rusu I.. DExTaR: detection of exact tandem repeats based on the de Bruijn graph. Proc. - 2014 IEEE Int. Conf. Bioinforma. Biomed. IEEE BIBM 2014. 2014; doi:10.1109/BIBM.2014.6999134.
Fertin G., Jean G., Radulescu A., Rusu I.. Hybrid de novo tandem repeat detection using short and long reads. BMC Med. Genomics. 2015; 8(Suppl. 3):S5. PubMed PMC
Simpson J.T., Wong K., Jackman S.D., Schein J.E., Jones S.J.M., Birol I.. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009; 19:1117–1123. PubMed PMC
Gong Z., Wu Y., Koblížková A., Torres G.A., Wang K., Iovene M., Neumann P., Zhang W., Novák P., Buell C.R. et al. . Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell. 2012; 24:3559–3574. PubMed PMC
Macas J., Navrátilová A., Mészáros T.. Sequence subfamilies of satellite repeats related to rDNA intergenic spacer are differentially amplified on Vicia sativa chromosomes. Chromosoma. 2003; 112:152–158. PubMed
Sex-biased gene content is associated with sex chromosome turnover in Danaini butterflies
Evolution of ancient satellite DNAs in extant alligators and caimans (Crocodylia, Reptilia)
Analysis of 5S rDNA Genomic Organization Through the RepeatExplorer2 Pipeline: A Simplified Protocol
Holocentromeres can consist of merely a few megabase-sized satellite arrays
The Role of Satellite DNAs in Genome Architecture and Sex Chromosome Evolution in Crambidae Moths
The B chromosome of Sorghum purpureosericeum reveals the first pieces of its sequence