Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary

. 2019 Jul ; 19 (4) : 1015-1026. [epub] 20190517

Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid30972949

Grantová podpora
17-00-00146 Russian Foundation for Basic Research
P 29623 Austrian Science Fund FWF - Austria
RPG-2017-287 Leverhulme Trust
P29623-B25 Austrian Science Fund
16-14-10009 Russian Science Foundation

Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.

Zobrazit více v PubMed

Abdussamad, A. M. , Charruau, R. , Kalla, D. J. U. , & Burger, P. A. (2015). Validating local knowledge on camels: Colour phenotypes and genetic variation of dromedaries in the Nigeria‐Niger corridor. Livestock Science, 181, 131–136.

Alim, F. Z. D. , Romanova, E. V. , Tay, Y.‐L. , Rahman, A. Y. B. A. , Chan, K.‐G. , Hong, K.‐W. , … Hindmarch, C. C. T. (2019). Seasonal adaptations of the hypothalamo‐neurohypophyseal system of the dromedary camel. PloS One. PubMed PMC

Almathen, F. , Charruau, P. , Mohandesan, E. , Mwacharo, J. M. , Orozco‐ter Wengel, P. , Pitt, D. , … Burger, P. A. (2016). Ancient and modern DNA reveal dynamics of domestication and cross‐continental dispersal of the dromedary. Proceedings of the National Academy of Sciences of the United States of America, 113, 6707–6712. PubMed PMC

Altschul, S. , Gish, W. , Miller, W. , Meyers,E. W. , & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410. PubMed

Avila, F. , Baily, M. P. , Perelman, P. , Das, P. J. , Pontius, J. , Chowdhary, R. , … Raudsepp, T. (2014). A comprehensive whole‐genome integrated cytogenetic map for the alpaca (Lama pacos). Cytogenetic and Genome Research, 144, 196–207. PubMed

Bailey, J. A. (2004). Analysis of segmental duplications and genome assembly in the mouse. Genome Research, 14, 789–801. PubMed PMC

Balmus, G. , Trifonov, V. A. , Biltueva, L. S. , O'Brien, P. C. , Alkalaeva, E. S. , Fu, B. , … Ferguson‐Smith, M. A. (2007). Cross‐species chromosome painting among camel, cattle, pig and human: Further insights into the putative Cetartiodactyla ancestral karyotype. Chromosome Research, 15, 499–515. PubMed

Barnett, D. W. , Garrison, E. K. , Quinlan, A. R. , Strömberg, M. P. , & Marth, G. T. (2011). bamtools: A C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27, 1691–1692. PubMed PMC

Bickhart, D. M. , Rosen, B. D. , Koren, S. , Sayre, B. L. , Hastie, A. R. , Chan, S. , … Smith, T. P. L. (2017). Single‐molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nature Genetics, 49, 643–650. PubMed PMC

Bonetta, L. (2006). Genome sequencing in the fast lane. Nature Methods, 3, 141.

Boutet, E. , Lieberherr, D. , Tognolli, M. , Schneider, M. , Bansal, P. , Bridge, A. J. , … Xenarios, I. (2016). UniProtKB/Swiss‐Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. Methods in Molecular Biology, 1374, 23–54. PubMed

Buchfink, B. , Xie, C. , & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12, 59–60. 10.1038/nmeth.3176 PubMed DOI

Cabanettes, F. , & Klopp, C. (2018). D‐GENIES: Dot plot large genomes in an interactive, efficient and simple way. PeerJ, 6, e4958 10.7717/peerj.4958 PubMed DOI PMC

Campbell, M. S. , Law, M. , Holt, C. , Stein, J. C. , Moghe, G. D. , Hufnagel, D. E. , … Yandell, M. (2014). MAKER‐P: A tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiology, 164, 513–524. 10.1104/pp.113.230144 PubMed DOI PMC

Cantarel, B. L. , Korf, I. , Robb, S. M. , Parra, G. , Ross, E. , Moore, B. , … Yandell, M. (2008). maker: An easy‐to‐use annotation pipeline designed for emerging model organism genomes. Genome Research, 18, 188–196. 10.1101/gr.6743907 PubMed DOI PMC

Davies, R. W. , Flint, J. , Myers, S. , & Mott, R. (2016). Rapid genotype imputation from sequence without reference panels. Nature Genetics, 48, 965–969. 10.1038/ng.3594 PubMed DOI PMC

Dudchenko, O. , Batra, S. S. , Omer, A. D. , Nyquist, S. K. , Hoeger, M. , Durand, N. C. , … Aiden, E. L. (2017). De novo assembly of the Aedes aegypti genome using Hi‐C yields chromosome‐length scaffolds. Science, 356, 92–95. PubMed PMC

Dudchenko, O. , Shamim, M. S. , Batra, S. , Durand, N. C. , Musial, N. T. , & Aiden, E. L. (2018). The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome‐length scaffolds for under $1000. bioRxiv. 10.1101/254797 DOI

Eid, J. , Fehr, A. , Gray, J. , Luong, K. , Lyle, J. , Otto, G. , … Turner, S. (2009). Real‐time DNA sequencing from single polymerase molecules. Science, 323, 133–138. 10.1126/science.1162986 PubMed DOI

English, A. C. , Richards, S. , Han, Y. I. , Wang, M. , Vee, V. , Qu, J. , … Gibbs, R. A. (2012). Mind the gap: Upgrading genomes with Pacific Biosciences RS long‐read sequencing technology. PLoS ONE, 7, e47768 10.1371/journal.pone.0047768 PubMed DOI PMC

Faye, B. , Abdallah, H. , Almathen, F. , Harzallah, B. , & Al‐Mutairi, S. (2011). Camel biodiversity. Camel phenotypes in the Kingdom of Saudi Arabia. Camel Breeding, Protection and Improvement Center, project UTF/SAU/021/SAU, FAO publ., Riyadh (Saudi Arabia), 78 p.

Feng, X. , Jiang, J. , Padhi, A. , Ning, C. , Fu, J. , Wang, A. , … Liu, J.‐F. (2017). Characterization of genome‐wide segmental duplications reveals a common genomic feature of association with immunity among domestic animals. BMC Genomics, 18, 293 10.1186/s12864-017-3690-x PubMed DOI PMC

Fitak, R. R. , Mohandesan, E. , Corander, J. , & Burger, P. A. (2016). The de novo genome assembly and annotation of a female domestic dromedary of North African origin. Molecular Ecology Resources, 16, 314–324. PubMed PMC

Geib, S. M. , Hall, B. , Derego, T. , Bremer, F. T. , Cannoles, K. , & Sim, S. B. (2018). Genome Annotation Generator: A simple tool for generating and correcting WGS annotation tables for NCBI submission. GigaScience, 7, giy018 10.1093/gigascience/giy018 PubMed DOI PMC

Green, R. E. , Braun, E. L. , Armstrong, J. , Earl, D. , Nguyen, N. , Hickey, G. , … Ray, D. A. (2014). Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science, 346, 1254449 10.1126/science.1254449 PubMed DOI PMC

Guo, Y. , Huang, Y. , Hou, L. , Ma, J. , Chen, C. , Ai, H. , … Ren, J. (2017). Genome‐wide detection of genetic markers associated with growth and fatness in four pig populations using four approaches. Genetics Selection Evolution, 49, 21 10.1186/s12711-017-0295-4 PubMed DOI PMC

Holt, C. , & Yandell, M. (2011). maker2: An annotation pipeline and genome‐database management tool for second‐generation genome projects. BMC Bioinformatics, 12, 491 10.1186/1471-2105-12-491 PubMed DOI PMC

Jackman, S. D. , Vandervalk, B. P. , Mohamadi, H. , Chu, J. , Yeo, S. , Hammond, S. A. , … Birol, I. (2017). abyss 2.0: Resource‐efficient assembly of large genomes using a Bloom filter. Genome Research, 27, 768–777. 10.1101/gr.214346.116 PubMed DOI PMC

Jain, M. , Olsen, H. E. , Paten, B. , & Akeson, M. (2016). The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biology, 17, 239. PubMed PMC

Jiao, W.‐B. , Accinelli, G. G. , Hartwig, B. , Kiefer, C. , Baker, D. , Severing, E. , … Schneeberger, K. (2017). Improving and correcting the contiguity of long‐read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Research, 27, 778–786. 10.1101/gr.213652.116 PubMed DOI PMC

Johnston, S. E. , Bérénos, C. , Slate, J. , & Pemberton, J. M. (2016). Conserved genetic architecture underlying individual recombination rate variation in a wild population of Soay sheep (Ovis aries). Genetics, 203, 583–598. PubMed PMC

Kaplan, N. , & Dekker, J. (2013). High‐throughput genome scaffolding from in vivo DNA interaction frequency. Nature Biotechnology, 31, 1143–1147. PubMed PMC

Kelley, D. R. , Schatz, M. C. , & Salzberg, S. L. (2010). quake: Quality‐aware detection and correction of sequencing errors. Genome Biology, 11, R116. PubMed PMC

Kim, D. , Langmead, B. , & Salzberg, S. L. (2015). hisat: A fast spliced aligner with low memory requirements. Nature Methods, 12, 357–360. PubMed PMC

Kim, D. , Pertea, G. , Trapnell, C. , Pimentel, H. , Kelley, R. , & Salzberg, S. L. (2013). TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14, R36. PubMed PMC

Kitts, P. A. , Church, D. M. , Thibaud‐Nissen, F. , Choi, J. , Hem, V. , Sapojnikov, V. , … Kimchi, A. (2016). Assembly: A resource for assembled genomes at NCBI. Nucleic Acids Research, 44, D73–80. PubMed PMC

Kofler, R. , Orozco‐terWengel, P. , De Maio, N. , Pandey, R. V. , Nolte, V. , Futschik, A. , … Schlötterer, C. (2011). popoolation: A toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE, 6, e15925. PubMed PMC

Korbel, J. O. , & Lee, C. (2013). Genome assembly and haplotyping with Hi‐C. Nature Biotechnology, 31, 1099–1101. PubMed

Kronenberg, Z. N. , Fiddes, I. T. , Gordon, D. , Murali, S. , Cantsilieris, S. , Meyerson, O. S. , … Eichler, E. E. (2018). High‐resolution comparative analysis of great ape genomes. Science, 360, eaar6343. PubMed PMC

Li, H. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34, 3094–3100. PubMed PMC

Lieberman‐Aiden, E. , van Berkum, N. L. , Williams, L. , Imakaev, M. , Ragoczy, T. , Telling, A. , … Dekker, J. (2009). Comprehensive mapping of long‐range interactions reveals folding principles of the human genome. Science, 326, 289–293. PubMed PMC

Lomsadze, A. (2005). Gene identification in novel eukaryotic genomes by self‐training algorithm. Nucleic Acids Research, 33, 6494–6506. PubMed PMC

Mapleson, D. , Garcia Accinelli, G. , Kettleborough, G. , Wright, J. , & Clavijo, B. J. (2017). kat: A K‐mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics, 33, 574–576. PubMed PMC

Miller, J. R. , Zhou, P. , Mudge, J. , Gurtowski, J. , Lee, H. , Ramaraj, T. , … Silverstein, K. A. T. (2017). Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genomics, 18, 541. PubMed PMC

Moll, K. M. , Zhou, P. , Ramaraj, T. , Fajardo, D. , Devitt, N. P. , Sadowsky, M. J. , … Mudge, J. (2017). Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula . BMC Genomics, 18, 578. PubMed PMC

Palesch, D. , Bosinger, S. E. , Tharp, G. K. , Vanderford, T. H. , Paiardini, M. , Chahroudi, A. , … Silvestri, G. (2018). Sooty mangabey genome sequence provides insight into AIDS resistance in a natural SIV host. Nature, 553, 77–81. PubMed PMC

Passera, A. , Marcolungo, L. , Casati, P. , Brasca, M. , Quaglino, F. , Cantaloni, C. , & Delledonne, M. (2018). Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity. PLoS ONE, 13, e0189993. PubMed PMC

Perelman, P. L. , Pichler, R. , Gaggl, A. , & Larkin, D. M. (2018). Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000 RAD and 15000 RAD. Scientific Reports, 8, 1982. PubMed PMC

Pertea, M. , Pertea, G. M. , Antonescu, C. M. , Chang, T.‐C. , Mendell, J. T. , & Salzberg, S. L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA‐Seq reads. Nature Biotechnology, 33, 290–295. PubMed PMC

Putnam, N. H. , O’Connell, B. L. , Stites, J. C. , Rice, B. J. , Blanchette, M. , Calef, R. , … Green, R. E. (2016). Chromosome‐scale shotgun assembly using an in vitro method for long‐range linkage. Genome Research, 26, 342–350. PubMed PMC

Queirós, J. , Alves, P. C. , Vicente, J. , Gortázar, C. , & de la Fuente, J. (2018). Genome‐wide associations identify novel candidate loci associated with genetic susceptibility to tuberculosis in wild boar. Scientific Reports, 8, 1980. PubMed PMC

Rice, E. S. , Kohno, S. , John, J. S. , Pham, S. , Howard, J. , Lareau, L. F. , … Green, R. E. (2017). Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Research, 27, 686–696. PubMed PMC

Salmela, L. , & Rivals, E. (2014). LoRDEC: Accurate and efficient long read error correction. Bioinformatics, 30(24), 3506–3514. PubMed PMC

Simão, F. A. , Waterhouse, R. M. , Ioannidis, P. , Kriventseva, E. V. , & Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single‐copy orthologs. Bioinformatics, 31, 3210–3212. PubMed

Simpson, J. T. , Wong, K. , Jackman, S. D. , Schein, J. E. , Jones, S. J. M. , & Birol, İ. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19, 1117–1123. PubMed PMC

Sorbolini, S. , Bongiorni, S. , Cellesi, M. , Gaspa, G. , Dimauro, C. , Valentini, A. , & Macciotta, N. P. P. (2017). Genome wide association study on beef production traits in Marchigiana cattle breed. Journal of Animal Breeding and Genetics, 134, 43–48. PubMed

Sotero‐Caio, C. G. , Platt, R. N. II , Suh, A. , & Ray, D. A. (2017). Evolution and diversity of transposable elements in vertebrate genomes. Genome Biology and Evolution, 9, 161–177. PubMed PMC

Stanke, M. , Keller, O. , Gunduz, I. , Hayes, A. , Waack, S. , & Morgenstern, B. (2006). augustus: A b initio prediction of alternative transcripts. Nucleic Acids Research, 34, W435–W439. PubMed PMC

Treangen, T. J. , & Salzberg, S. L. (2011). Repetitive DNA and next‐generation sequencing: Computational challenges and solutions. Nature Reviews Genetics, 13, 36–46. PubMed PMC

Uerpmann, M. , & Uerpman, H. P. (2012). Archeozoology of camels in South‐Eastern Arabia. Camels in Asia and North Africa In Knoll I. E., & Burger P. (Eds.), Interdisciplinary perspectives on their significance in past and present (pp. 109–122). Vienna: Academy of Sciences Press.

van Heesch, S. , Kloosterman, W. P. , Lansu, N. , Ruzius, F.‐P. , Levandowsky, E. , Lee, C. C. , … Cuppen, E. (2013). Improving mammalian genome scaffolding using large insert mate‐pair next‐generation sequencing. BMC Genomics, 14, 257. PubMed PMC

Venkatesan, B. M. , & Bashir, R. (2011). Nanopore sensors for nucleic acid analysis. Nature Nanotechnology, 6, 615–624. PubMed

Walker, B. J. , Abeel, T. , Shea, T. , Priest, M. , Abouelliel, A. , Sakthikumar, S. , … Earl, A. M. (2014). Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE, 9, e112963. PubMed PMC

Watson, M. , & Warr, A. (2019). Errors in long‐read assemblies can critically affect protein prediction. Nature Biotechnology, 37, 124–126. PubMed

Wu, H. , Guang, X. , Al‐Fageeh, M. B. , Cao, J. , Pan, S. , Zhou, H. , … Wang, J. (2014). Camelid genomes reveal evolution and adaptation to desert environments. Nature Communications, 5, 5188. PubMed

Yandell, M. , & Ence, D. (2012). A beginner’s guide to eukaryotic genome annotation. Nature Reviews Genetics, 13, 329–342. PubMed

Yue, S. J. , Zhao, Y. Q. , Gu, X. R. , Yin, B. , Jiang, Y. L. , Wang, Z. H. , & Shi, K. R. (2017). A genome‐wide association study suggests new candidate genes for milk production traits in Chinese Holstein cattle. Animal Genetics, 48, 677–681. PubMed

Zdobnov, E. M. , Tegenfeldt, F. , Kuznetsov, D. , Waterhouse, R. M. , Simão, F. A. , Ioannidis, P. , … Kriventseva, E. V. (2017). orthodb v9.1: Cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Research, 45, D744–D749. PubMed PMC

Zimin, A. V. , Puiu, D. , Luo, M. C. , Zhu, T. , Koren, S. , Marçais, G. , & …. Salzberg, S. L., (2017). Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega‐reads algorithm. Genome Research, 27, 787–792. PubMed PMC

Zobrazit více v PubMed

GENBANK
GCA_000803125.1, GCA_000767585.1, GCA_000767855.1, GCA_000311805.2, GCA_000003055.3, SRP014573, GCF_000767855.1, GCF_000003055.6, GCF_000311805.1, GCF_000164845.2

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace