A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning
Language English Country United States Media print
Document type Journal Article, Research Support, Non-U.S. Gov't
Grant support
339873
European Research Council - International
207492/Z/17/Z
Wellcome Trust - United Kingdom
BB/R007500/1
Biotechnology and Biological Sciences Research Council - United Kingdom
WT207492
Wellcome Trust - United Kingdom
BB/M011194/1
Biotechnology and Biological Sciences Research Council - United Kingdom
WT206194
Wellcome Trust - United Kingdom
PubMed
32808665
PubMed Central
PMC7433188
DOI
10.1093/gigascience/giaa088
PII: 5893772
Knihovny.cz E-resources
- Keywords
- Lepidoptera, annotation, genome assembly, population genomics, trio binning, wood tiger moth; Arctia plantaginis,
- MeSH
- Wood MeSH
- Genome MeSH
- Genomics MeSH
- Haplotypes MeSH
- Humans MeSH
- Moths * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. FINDINGS: We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. CONCLUSIONS: We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.
Department of Genetics University of Cambridge Downing Street Cambridge CB2 3EH UK
Department of Zoology University of Cambridge Downing Street Cambridge CB2 3EJ UK
St John's College University of Cambridge St John's Street Cambridge CB2 1TP UK
Wellcome Sanger Institute Wellcome Trust Genome Campus Hinxton Saffron Walden CB10 1SA UK
See more in PubMed
Ellegren H. Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol. 2013;29:51–63. PubMed
Jayakumar V, Sakakibara Y. Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. Brief Bioinform. 2017;20:866–76. PubMed PMC
Vinson JP, Jaffe DB, O'Neill K, et al. . Assembly of polymorphic genomes: Algorithms and application to Ciona savignyi. Genome Res. 2005;15:1127–35. PubMed PMC
Pryszcz LP, Gabaldón T. Redundans: An assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113. PubMed PMC
Garg S, Rautiainen M, Novak AM, et al. . A graph-based approach to diploid genome assembly. Bioinformatics. 2018;34:i105–14. PubMed PMC
Koren S, Rhie A, Walenz BP, et al. . De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol. 2018;36:1174–82. PubMed PMC
Rice ES, Koren S, Rhie A, et al. . Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle. Gigascience. 2020, doi:10.1093/gigascience/giaa029. PubMed DOI PMC
Rönkä K, Mappes J, Kaila L, et al. . Putting Parasemia in its phylogenetic place: A molecular analysis of the subtribe Arctiina (Lepidoptera). Syst Entomol. 2016;41:844–53.
Kronenberg ZN, Rhie A, Koren S, et al. . Extended haplotype phasing of de novo genome assemblies with FALCON-Phase. bioRxiv. 2019, doi:10.1101/327064. PubMed DOI PMC
Wenger AM, Peluso P, Rowell WJ, et al. . Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62. PubMed PMC
Vertebrate Genomes Project GenomeArk . https://vgp.github.io/genomeark. Accessed 27 May 2020.
Challis RJ, Kumar S, Dasmahapatra KK, et al. . Lepbase: The Lepidopteran genome database. bioRxiv. 2016, doi:10.1101/056994. DOI
Kawahara AY, Breinholt JW. Phylogenomics provides strong evidence for relationships of butterflies and moths. Proc Biol Sci. 2014;281:20140970. PubMed PMC
Breinholt JW, Earl C, Lemmon AR, et al. . Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for anchored phylogenomics. Syst Biol. 2018;67:78–93. PubMed
Triant DA, Cinel SD, Kawahara AY. Lepidoptera genomes: Current knowledge, gaps and future directions. Curr Opin Insect Sci. 2018;25:99–105. PubMed
Lindstedt C, Eager H, Ihalainen E, et al. . Direction and strength of selection by predators for the color of the aposematic wood tiger moth. Behav Ecol. 2011;22:580–7.
Galarza JA, Nokelainen O, Ashrafi R, et al. . Temporal relationship between genetic and warning signal variation in the aposematic wood tiger moth (Parasemia plantaginis). Mol Ecol. 2014;23:4939–57. PubMed
Hegna RH, Galarza JA, Mappes J. Global phylogeography and geographical variation in warning coloration of the wood tiger moth (Parasemia plantaginis). J Biogeogr. 2015;42:1469–81.
Koren S, Walenz BP, Berlin K, et al. . Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–36. PubMed PMC
Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat Methods. 2020;17:155–8. PubMed PMC
GenomicConsensus . https://github.com/PacificBiosciences/GenomicConsensus. Accessed March 2019.
Scaff10X . https://github.com/wtsi-hpag/Scaff10X. Accessed March 2019.
Long Ranger . https://github.com/10XGenomics/longranger. Accessed March 2019.
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012:1207.3907.
Freebayes-polish . https://github.com/VGP/vgp-assembly/tree/master/pipeline/freebayes-polish. Accessed March 2019.
NCBI adaptors_for_screening_euks.fa. ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_euks.fa. Accessed November 2017.
NCBI Contam_in_euks.fa.gz. ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/contam_in_euks.fa.gz. Accessed November 2017, .
NCBI RefSeq Mitochondrion Database. ftp://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion. Accessed May 2019.
RefSeq: NCBI Reference Sequence Database Version 85 . www.ncbi.nlm.nih.gov/refseq. Accessed January 2018.
Chow W, Brugger K, Caccamo M, et al. . gEVAL — A web-based browser for evaluating genome assemblies. Bioinformatics. 2016;32:2508–10. PubMed PMC
Mapleson D, Accinelli GG, Kettleborough G, et al. . KAT: A K-mer Analysis Toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 2016;33:574–6. PubMed PMC
TrioBinning . https://github.com/arangrhie/TrioBinning. Accessed March 2019.
Rhie A, Walenz BP, Koren S, et al. . Merqury: Reference-free quality and phasing assessment for genome assemblies. bioRxiv. 2020, doi:10.1101/2020.03.15.992941. PubMed DOI PMC
Vurture GW, Sedlazeck FJ, Nattestad M, et al. . GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4. PubMed PMC
Nattestad M, Schatz MC. Assemblytics: A web analytics tool for the detection of variants from an assembly. Bioinformatics. 2016;32:3021–3. PubMed PMC
Kurtz S, Phillippy A, Delcher AL, et al. . Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. PubMed PMC
Simão FA, Waterhouse RM, Ioannidis P, et al. . BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2. PubMed
Nowell RW, Elsworth B, Oostra V, et al. . A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana. Gigascience. 2017;6, doi:10.1093/gigascience/gix035. PubMed DOI PMC
Zhan S, Reppert SM. MonarchBase: The monarch butterfly genome database. Nucleic Acids Res. 2012;41:D758–63. PubMed PMC
Davey JW, Chouteau M, Barker SL, et al. . Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution. G3 (Bethesda). 2016;6:695–708. PubMed PMC
Kanost MR, Arrese EL, Cao X, et al. . Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta. Insect Biochem Mol Biol. 2016;76:118–47. PubMed PMC
Ahola V, Lehtonen R, Somervuo P, et al. . The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. Nat Commun. 2014;5:4737. PubMed PMC
Kawamoto M, Jouraku A, Toyoda A, et al. . High-quality genome assembly of the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2019;107:53–62. PubMed
SilkBase . http://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi. Accessed June 2019.
Chen W, Yang X, Tetreau G, et al. . A high-quality chromosome-level genome assembly of a generalist herbivore, Trichoplusia ni. Mol Ecol Resour. 2019;19:485–96. PubMed
RefSeq: NCBI Reference Sequence Database Version 94. www.ncbi.nlm.nih.gov/refseq. Accessed May 2019.
R Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2018. https://www.R-project.org/.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York, NY: Springer; 2016.
Hoff KJ, Lange S, Lomsadze A, et al. . BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–9. PubMed PMC
Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–8. PubMed
Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013–2015. http://www.repeatmasker.org. Accessed June 2019.
Benson G. Tandem Repeats Finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80. PubMed PMC
RMBlast . http://www.repeatmasker.org/RMBlast.html. Accessed June 2019.
Hubley R, Finn RD, Clements J, et al. . The Dfam database of repetitive DNA families. Nucleic Acids Res. 2016;44:D81. PubMed PMC
Galarza JA, Dhaygude K, Mappes J. De novo transcriptome assembly and its annotation for the aposematic wood tiger moth (Parasemia plantaginis). Genomics Data. 2017;12:71–3. PubMed PMC
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–2.
Andrews S. FASTQC: A quality control tool for high throughput sequence data. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed June 2019.
Dobin A, Davis CA, Schlesinger F, et al. . STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. PubMed PMC
Kriventseva EV, Kuznetsov D, Tegenfeldt F, et al. . OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47:D807–11. PubMed PMC
GenomeThreader Gene Prediction Software . genomethreader.org. Accessed June 2019.
Šíchová J, Nguyen P, Dalíková M, et al. . Chromosomal evolution in tortricid moths: Conserved karyotypes with diverged features. PLoS One. 2013;8:e64520. PubMed PMC
Winnepenninckx B, Backeljau T, De Wachter R. Extraction of high molecular weight DNA from molluscs. Trends Genet. 1993;9:407. PubMed
Kato A, Albert PS, Birchler JA. Sensitive fluorescence in situ hybridization signal detection in maize using directly labeled probes produced by high concentration DNA polymerase nick translation. Biotech Histochem. 2006;81:71–8. PubMed
Yoshido A, Marec F, Sahara K. Resolution of sex chromosome constitution by genomic in situ hybridization and fluorescence in situ hybridization with (TTAGG)( n ) telomeric probe in some species of Lepidoptera. Chromosoma. 2005;114:193–202. PubMed
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013:1303.3997.
Li H, Handsaker B, Wysoker A, et al. . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9. PubMed PMC
Picard . broadinstitute.github.io/picard. Accessed October 2019.
McKenna A, Hanna M, Banks E, et al. . The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303. PubMed PMC
Poplin R, Ruano-Rubio V, DePristo MA, et al. . Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv. 2017, doi:10.1101/201178. DOI
joanam scripts. https://github.com/joanam/scripts/blob/master/ldPruning.sh. Accessed November 2019.
Danecek P, Auton A, Abecasis G, et al. . The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. PubMed PMC
Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3. PubMed PMC
Rambaut A. FigTree version 1.4.3. 2014. http://tree.bio.ed.ac.uk/software/figtree/. Accessed November 2019.
Zheng X, Levine D, Shen J, et al. . A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–8. PubMed PMC
Robinson R. Lepidoptera Genetics. 1st ed. Oxford: Pergamon; 1971.
De Prins J, Saitoh K. Lepidoptera, moths and butterflies. In: Kristensen NP, ed. Handbook of Zoology. Berlin & New York: Walter de Gruyter; 2003:449–68.
Murakami A, Imai HT. Cytological evidence for holocentric chromosomes of the silkworms, Bombyx moriand B. mandarina (Bombycidae, Lepidoptera). Chromosoma. 1974;47:167–78. PubMed
Aguillon SM, Fitzpatrick JW, Bowman R, et al. . Deconstructing isolation-by-distance: The genomic consequences of limited dispersal. PLoS Genet. 2017;13:e1006911. PubMed PMC
Maresova J, Habel JC, Neve G, et al. . Cross-continental phylogeography of two Holarctic Nymphalid butterflies, Boloria eunomia and Boloria selene. PLoS One. 2019;14:e0214483. PubMed PMC
Yen EC, McCarthy SA, Galarza JA, et al.. Supporting data for “A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning.". GigaScience Database. 2020; 10.5524/100774. PubMed DOI PMC