Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary
Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
17-00-00146
Russian Foundation for Basic Research
P 29623
Austrian Science Fund FWF - Austria
RPG-2017-287
Leverhulme Trust
P29623-B25
Austrian Science Fund
16-14-10009
Russian Science Foundation
PubMed
30972949
PubMed Central
PMC6618069
DOI
10.1111/1755-0998.13020
Knihovny.cz E-zdroje
- Klíčová slova
- chromosome conformation capture, chromosome mapping, dromedary, genome annotation, genome assembly, scaffolding,
- MeSH
- genom * MeSH
- genomika metody MeSH
- pouštní klima MeSH
- sekvenční analýza DNA metody MeSH
- velbloudi genetika MeSH
- výpočetní biologie metody MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.
Department of Biostatistics University of Oslo Oslo Norway
Department of Mathematics and Statistics University of Helsinki Helsinki Finland
Intelligent Systems Laboratory University of Bristol Bristol UK
Zobrazit více v PubMed
Abdussamad, A. M. , Charruau, R. , Kalla, D. J. U. , & Burger, P. A. (2015). Validating local knowledge on camels: Colour phenotypes and genetic variation of dromedaries in the Nigeria‐Niger corridor. Livestock Science, 181, 131–136.
Alim, F. Z. D. , Romanova, E. V. , Tay, Y.‐L. , Rahman, A. Y. B. A. , Chan, K.‐G. , Hong, K.‐W. , … Hindmarch, C. C. T. (2019). Seasonal adaptations of the hypothalamo‐neurohypophyseal system of the dromedary camel. PloS One. PubMed PMC
Almathen, F. , Charruau, P. , Mohandesan, E. , Mwacharo, J. M. , Orozco‐ter Wengel, P. , Pitt, D. , … Burger, P. A. (2016). Ancient and modern DNA reveal dynamics of domestication and cross‐continental dispersal of the dromedary. Proceedings of the National Academy of Sciences of the United States of America, 113, 6707–6712. PubMed PMC
Altschul, S. , Gish, W. , Miller, W. , Meyers,E. W. , & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410. PubMed
Avila, F. , Baily, M. P. , Perelman, P. , Das, P. J. , Pontius, J. , Chowdhary, R. , … Raudsepp, T. (2014). A comprehensive whole‐genome integrated cytogenetic map for the alpaca (Lama pacos). Cytogenetic and Genome Research, 144, 196–207. PubMed
Bailey, J. A. (2004). Analysis of segmental duplications and genome assembly in the mouse. Genome Research, 14, 789–801. PubMed PMC
Balmus, G. , Trifonov, V. A. , Biltueva, L. S. , O'Brien, P. C. , Alkalaeva, E. S. , Fu, B. , … Ferguson‐Smith, M. A. (2007). Cross‐species chromosome painting among camel, cattle, pig and human: Further insights into the putative Cetartiodactyla ancestral karyotype. Chromosome Research, 15, 499–515. PubMed
Barnett, D. W. , Garrison, E. K. , Quinlan, A. R. , Strömberg, M. P. , & Marth, G. T. (2011). bamtools: A C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27, 1691–1692. PubMed PMC
Bickhart, D. M. , Rosen, B. D. , Koren, S. , Sayre, B. L. , Hastie, A. R. , Chan, S. , … Smith, T. P. L. (2017). Single‐molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nature Genetics, 49, 643–650. PubMed PMC
Bonetta, L. (2006). Genome sequencing in the fast lane. Nature Methods, 3, 141.
Boutet, E. , Lieberherr, D. , Tognolli, M. , Schneider, M. , Bansal, P. , Bridge, A. J. , … Xenarios, I. (2016). UniProtKB/Swiss‐Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. Methods in Molecular Biology, 1374, 23–54. PubMed
Buchfink, B. , Xie, C. , & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12, 59–60. 10.1038/nmeth.3176 PubMed DOI
Cabanettes, F. , & Klopp, C. (2018). D‐GENIES: Dot plot large genomes in an interactive, efficient and simple way. PeerJ, 6, e4958 10.7717/peerj.4958 PubMed DOI PMC
Campbell, M. S. , Law, M. , Holt, C. , Stein, J. C. , Moghe, G. D. , Hufnagel, D. E. , … Yandell, M. (2014). MAKER‐P: A tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiology, 164, 513–524. 10.1104/pp.113.230144 PubMed DOI PMC
Cantarel, B. L. , Korf, I. , Robb, S. M. , Parra, G. , Ross, E. , Moore, B. , … Yandell, M. (2008). maker: An easy‐to‐use annotation pipeline designed for emerging model organism genomes. Genome Research, 18, 188–196. 10.1101/gr.6743907 PubMed DOI PMC
Davies, R. W. , Flint, J. , Myers, S. , & Mott, R. (2016). Rapid genotype imputation from sequence without reference panels. Nature Genetics, 48, 965–969. 10.1038/ng.3594 PubMed DOI PMC
Dudchenko, O. , Batra, S. S. , Omer, A. D. , Nyquist, S. K. , Hoeger, M. , Durand, N. C. , … Aiden, E. L. (2017). De novo assembly of the Aedes aegypti genome using Hi‐C yields chromosome‐length scaffolds. Science, 356, 92–95. PubMed PMC
Dudchenko, O. , Shamim, M. S. , Batra, S. , Durand, N. C. , Musial, N. T. , & Aiden, E. L. (2018). The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome‐length scaffolds for under $1000. bioRxiv. 10.1101/254797 DOI
Eid, J. , Fehr, A. , Gray, J. , Luong, K. , Lyle, J. , Otto, G. , … Turner, S. (2009). Real‐time DNA sequencing from single polymerase molecules. Science, 323, 133–138. 10.1126/science.1162986 PubMed DOI
English, A. C. , Richards, S. , Han, Y. I. , Wang, M. , Vee, V. , Qu, J. , … Gibbs, R. A. (2012). Mind the gap: Upgrading genomes with Pacific Biosciences RS long‐read sequencing technology. PLoS ONE, 7, e47768 10.1371/journal.pone.0047768 PubMed DOI PMC
Faye, B. , Abdallah, H. , Almathen, F. , Harzallah, B. , & Al‐Mutairi, S. (2011). Camel biodiversity. Camel phenotypes in the Kingdom of Saudi Arabia. Camel Breeding, Protection and Improvement Center, project UTF/SAU/021/SAU, FAO publ., Riyadh (Saudi Arabia), 78 p.
Feng, X. , Jiang, J. , Padhi, A. , Ning, C. , Fu, J. , Wang, A. , … Liu, J.‐F. (2017). Characterization of genome‐wide segmental duplications reveals a common genomic feature of association with immunity among domestic animals. BMC Genomics, 18, 293 10.1186/s12864-017-3690-x PubMed DOI PMC
Fitak, R. R. , Mohandesan, E. , Corander, J. , & Burger, P. A. (2016). The de novo genome assembly and annotation of a female domestic dromedary of North African origin. Molecular Ecology Resources, 16, 314–324. PubMed PMC
Geib, S. M. , Hall, B. , Derego, T. , Bremer, F. T. , Cannoles, K. , & Sim, S. B. (2018). Genome Annotation Generator: A simple tool for generating and correcting WGS annotation tables for NCBI submission. GigaScience, 7, giy018 10.1093/gigascience/giy018 PubMed DOI PMC
Green, R. E. , Braun, E. L. , Armstrong, J. , Earl, D. , Nguyen, N. , Hickey, G. , … Ray, D. A. (2014). Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science, 346, 1254449 10.1126/science.1254449 PubMed DOI PMC
Guo, Y. , Huang, Y. , Hou, L. , Ma, J. , Chen, C. , Ai, H. , … Ren, J. (2017). Genome‐wide detection of genetic markers associated with growth and fatness in four pig populations using four approaches. Genetics Selection Evolution, 49, 21 10.1186/s12711-017-0295-4 PubMed DOI PMC
Holt, C. , & Yandell, M. (2011). maker2: An annotation pipeline and genome‐database management tool for second‐generation genome projects. BMC Bioinformatics, 12, 491 10.1186/1471-2105-12-491 PubMed DOI PMC
Jackman, S. D. , Vandervalk, B. P. , Mohamadi, H. , Chu, J. , Yeo, S. , Hammond, S. A. , … Birol, I. (2017). abyss 2.0: Resource‐efficient assembly of large genomes using a Bloom filter. Genome Research, 27, 768–777. 10.1101/gr.214346.116 PubMed DOI PMC
Jain, M. , Olsen, H. E. , Paten, B. , & Akeson, M. (2016). The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biology, 17, 239. PubMed PMC
Jiao, W.‐B. , Accinelli, G. G. , Hartwig, B. , Kiefer, C. , Baker, D. , Severing, E. , … Schneeberger, K. (2017). Improving and correcting the contiguity of long‐read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Research, 27, 778–786. 10.1101/gr.213652.116 PubMed DOI PMC
Johnston, S. E. , Bérénos, C. , Slate, J. , & Pemberton, J. M. (2016). Conserved genetic architecture underlying individual recombination rate variation in a wild population of Soay sheep (Ovis aries). Genetics, 203, 583–598. PubMed PMC
Kaplan, N. , & Dekker, J. (2013). High‐throughput genome scaffolding from in vivo DNA interaction frequency. Nature Biotechnology, 31, 1143–1147. PubMed PMC
Kelley, D. R. , Schatz, M. C. , & Salzberg, S. L. (2010). quake: Quality‐aware detection and correction of sequencing errors. Genome Biology, 11, R116. PubMed PMC
Kim, D. , Langmead, B. , & Salzberg, S. L. (2015). hisat: A fast spliced aligner with low memory requirements. Nature Methods, 12, 357–360. PubMed PMC
Kim, D. , Pertea, G. , Trapnell, C. , Pimentel, H. , Kelley, R. , & Salzberg, S. L. (2013). TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14, R36. PubMed PMC
Kitts, P. A. , Church, D. M. , Thibaud‐Nissen, F. , Choi, J. , Hem, V. , Sapojnikov, V. , … Kimchi, A. (2016). Assembly: A resource for assembled genomes at NCBI. Nucleic Acids Research, 44, D73–80. PubMed PMC
Kofler, R. , Orozco‐terWengel, P. , De Maio, N. , Pandey, R. V. , Nolte, V. , Futschik, A. , … Schlötterer, C. (2011). popoolation: A toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE, 6, e15925. PubMed PMC
Korbel, J. O. , & Lee, C. (2013). Genome assembly and haplotyping with Hi‐C. Nature Biotechnology, 31, 1099–1101. PubMed
Kronenberg, Z. N. , Fiddes, I. T. , Gordon, D. , Murali, S. , Cantsilieris, S. , Meyerson, O. S. , … Eichler, E. E. (2018). High‐resolution comparative analysis of great ape genomes. Science, 360, eaar6343. PubMed PMC
Li, H. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34, 3094–3100. PubMed PMC
Lieberman‐Aiden, E. , van Berkum, N. L. , Williams, L. , Imakaev, M. , Ragoczy, T. , Telling, A. , … Dekker, J. (2009). Comprehensive mapping of long‐range interactions reveals folding principles of the human genome. Science, 326, 289–293. PubMed PMC
Lomsadze, A. (2005). Gene identification in novel eukaryotic genomes by self‐training algorithm. Nucleic Acids Research, 33, 6494–6506. PubMed PMC
Mapleson, D. , Garcia Accinelli, G. , Kettleborough, G. , Wright, J. , & Clavijo, B. J. (2017). kat: A K‐mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics, 33, 574–576. PubMed PMC
Miller, J. R. , Zhou, P. , Mudge, J. , Gurtowski, J. , Lee, H. , Ramaraj, T. , … Silverstein, K. A. T. (2017). Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genomics, 18, 541. PubMed PMC
Moll, K. M. , Zhou, P. , Ramaraj, T. , Fajardo, D. , Devitt, N. P. , Sadowsky, M. J. , … Mudge, J. (2017). Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula . BMC Genomics, 18, 578. PubMed PMC
Palesch, D. , Bosinger, S. E. , Tharp, G. K. , Vanderford, T. H. , Paiardini, M. , Chahroudi, A. , … Silvestri, G. (2018). Sooty mangabey genome sequence provides insight into AIDS resistance in a natural SIV host. Nature, 553, 77–81. PubMed PMC
Passera, A. , Marcolungo, L. , Casati, P. , Brasca, M. , Quaglino, F. , Cantaloni, C. , & Delledonne, M. (2018). Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity. PLoS ONE, 13, e0189993. PubMed PMC
Perelman, P. L. , Pichler, R. , Gaggl, A. , & Larkin, D. M. (2018). Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000 RAD and 15000 RAD. Scientific Reports, 8, 1982. PubMed PMC
Pertea, M. , Pertea, G. M. , Antonescu, C. M. , Chang, T.‐C. , Mendell, J. T. , & Salzberg, S. L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA‐Seq reads. Nature Biotechnology, 33, 290–295. PubMed PMC
Putnam, N. H. , O’Connell, B. L. , Stites, J. C. , Rice, B. J. , Blanchette, M. , Calef, R. , … Green, R. E. (2016). Chromosome‐scale shotgun assembly using an in vitro method for long‐range linkage. Genome Research, 26, 342–350. PubMed PMC
Queirós, J. , Alves, P. C. , Vicente, J. , Gortázar, C. , & de la Fuente, J. (2018). Genome‐wide associations identify novel candidate loci associated with genetic susceptibility to tuberculosis in wild boar. Scientific Reports, 8, 1980. PubMed PMC
Rice, E. S. , Kohno, S. , John, J. S. , Pham, S. , Howard, J. , Lareau, L. F. , … Green, R. E. (2017). Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Research, 27, 686–696. PubMed PMC
Salmela, L. , & Rivals, E. (2014). LoRDEC: Accurate and efficient long read error correction. Bioinformatics, 30(24), 3506–3514. PubMed PMC
Simão, F. A. , Waterhouse, R. M. , Ioannidis, P. , Kriventseva, E. V. , & Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single‐copy orthologs. Bioinformatics, 31, 3210–3212. PubMed
Simpson, J. T. , Wong, K. , Jackman, S. D. , Schein, J. E. , Jones, S. J. M. , & Birol, İ. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19, 1117–1123. PubMed PMC
Sorbolini, S. , Bongiorni, S. , Cellesi, M. , Gaspa, G. , Dimauro, C. , Valentini, A. , & Macciotta, N. P. P. (2017). Genome wide association study on beef production traits in Marchigiana cattle breed. Journal of Animal Breeding and Genetics, 134, 43–48. PubMed
Sotero‐Caio, C. G. , Platt, R. N. II , Suh, A. , & Ray, D. A. (2017). Evolution and diversity of transposable elements in vertebrate genomes. Genome Biology and Evolution, 9, 161–177. PubMed PMC
Stanke, M. , Keller, O. , Gunduz, I. , Hayes, A. , Waack, S. , & Morgenstern, B. (2006). augustus: A b initio prediction of alternative transcripts. Nucleic Acids Research, 34, W435–W439. PubMed PMC
Treangen, T. J. , & Salzberg, S. L. (2011). Repetitive DNA and next‐generation sequencing: Computational challenges and solutions. Nature Reviews Genetics, 13, 36–46. PubMed PMC
Uerpmann, M. , & Uerpman, H. P. (2012). Archeozoology of camels in South‐Eastern Arabia. Camels in Asia and North Africa In Knoll I. E., & Burger P. (Eds.), Interdisciplinary perspectives on their significance in past and present (pp. 109–122). Vienna: Academy of Sciences Press.
van Heesch, S. , Kloosterman, W. P. , Lansu, N. , Ruzius, F.‐P. , Levandowsky, E. , Lee, C. C. , … Cuppen, E. (2013). Improving mammalian genome scaffolding using large insert mate‐pair next‐generation sequencing. BMC Genomics, 14, 257. PubMed PMC
Venkatesan, B. M. , & Bashir, R. (2011). Nanopore sensors for nucleic acid analysis. Nature Nanotechnology, 6, 615–624. PubMed
Walker, B. J. , Abeel, T. , Shea, T. , Priest, M. , Abouelliel, A. , Sakthikumar, S. , … Earl, A. M. (2014). Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE, 9, e112963. PubMed PMC
Watson, M. , & Warr, A. (2019). Errors in long‐read assemblies can critically affect protein prediction. Nature Biotechnology, 37, 124–126. PubMed
Wu, H. , Guang, X. , Al‐Fageeh, M. B. , Cao, J. , Pan, S. , Zhou, H. , … Wang, J. (2014). Camelid genomes reveal evolution and adaptation to desert environments. Nature Communications, 5, 5188. PubMed
Yandell, M. , & Ence, D. (2012). A beginner’s guide to eukaryotic genome annotation. Nature Reviews Genetics, 13, 329–342. PubMed
Yue, S. J. , Zhao, Y. Q. , Gu, X. R. , Yin, B. , Jiang, Y. L. , Wang, Z. H. , & Shi, K. R. (2017). A genome‐wide association study suggests new candidate genes for milk production traits in Chinese Holstein cattle. Animal Genetics, 48, 677–681. PubMed
Zdobnov, E. M. , Tegenfeldt, F. , Kuznetsov, D. , Waterhouse, R. M. , Simão, F. A. , Ioannidis, P. , … Kriventseva, E. V. (2017). orthodb v9.1: Cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Research, 45, D744–D749. PubMed PMC
Zimin, A. V. , Puiu, D. , Luo, M. C. , Zhu, T. , Koren, S. , Marçais, G. , & …. Salzberg, S. L., (2017). Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega‐reads algorithm. Genome Research, 27, 787–792. PubMed PMC
Microsatellite markers of the major histocompatibility complex genomic region of domestic camels
Innate and Adaptive Immune Genes Associated with MERS-CoV Infection in Dromedaries
A Deadly Cargo: Gene Repertoire of Cytotoxic Effector Proteins in the Camelidae
The Major Histocompatibility Complex of Old World Camels-A Synopsis
Natural Killer Cell Receptor Genes in Camels: Another Mammalian Model
GENBANK
GCA_000803125.1, GCA_000767585.1, GCA_000767855.1, GCA_000311805.2, GCA_000003055.3, SRP014573, GCF_000767855.1, GCF_000003055.6, GCF_000311805.1, GCF_000164845.2