• This record comes from PubMed

A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning

. 2020 Aug 01 ; 9 (8) : .

Language English Country United States Media print

Document type Journal Article, Research Support, Non-U.S. Gov't

Grant support
339873 European Research Council - International
207492/Z/17/Z Wellcome Trust - United Kingdom
BB/R007500/1 Biotechnology and Biological Sciences Research Council - United Kingdom
WT207492 Wellcome Trust - United Kingdom
BB/M011194/1 Biotechnology and Biological Sciences Research Council - United Kingdom
WT206194 Wellcome Trust - United Kingdom

BACKGROUND: Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. FINDINGS: We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. CONCLUSIONS: We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.

Erratum In

PubMed

See more in PubMed

Ellegren H. Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol. 2013;29:51–63. PubMed

Jayakumar V, Sakakibara Y. Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. Brief Bioinform. 2017;20:866–76. PubMed PMC

Vinson JP, Jaffe DB, O'Neill K, et al. . Assembly of polymorphic genomes: Algorithms and application to Ciona savignyi. Genome Res. 2005;15:1127–35. PubMed PMC

Pryszcz LP, Gabaldón T. Redundans: An assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113. PubMed PMC

Garg S, Rautiainen M, Novak AM, et al. . A graph-based approach to diploid genome assembly. Bioinformatics. 2018;34:i105–14. PubMed PMC

Koren S, Rhie A, Walenz BP, et al. . De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol. 2018;36:1174–82. PubMed PMC

Rice ES, Koren S, Rhie A, et al. . Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle. Gigascience. 2020, doi:10.1093/gigascience/giaa029. PubMed DOI PMC

Rönkä K, Mappes J, Kaila L, et al. . Putting Parasemia in its phylogenetic place: A molecular analysis of the subtribe Arctiina (Lepidoptera). Syst Entomol. 2016;41:844–53.

Kronenberg ZN, Rhie A, Koren S, et al. . Extended haplotype phasing of de novo genome assemblies with FALCON-Phase. bioRxiv. 2019, doi:10.1101/327064. PubMed DOI PMC

Wenger AM, Peluso P, Rowell WJ, et al. . Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62. PubMed PMC

Vertebrate Genomes Project GenomeArk . https://vgp.github.io/genomeark. Accessed 27 May 2020.

Challis RJ, Kumar S, Dasmahapatra KK, et al. . Lepbase: The Lepidopteran genome database. bioRxiv. 2016, doi:10.1101/056994. DOI

Kawahara AY, Breinholt JW. Phylogenomics provides strong evidence for relationships of butterflies and moths. Proc Biol Sci. 2014;281:20140970. PubMed PMC

Breinholt JW, Earl C, Lemmon AR, et al. . Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for anchored phylogenomics. Syst Biol. 2018;67:78–93. PubMed

Triant DA, Cinel SD, Kawahara AY. Lepidoptera genomes: Current knowledge, gaps and future directions. Curr Opin Insect Sci. 2018;25:99–105. PubMed

Lindstedt C, Eager H, Ihalainen E, et al. . Direction and strength of selection by predators for the color of the aposematic wood tiger moth. Behav Ecol. 2011;22:580–7.

Galarza JA, Nokelainen O, Ashrafi R, et al. . Temporal relationship between genetic and warning signal variation in the aposematic wood tiger moth (Parasemia plantaginis). Mol Ecol. 2014;23:4939–57. PubMed

Hegna RH, Galarza JA, Mappes J. Global phylogeography and geographical variation in warning coloration of the wood tiger moth (Parasemia plantaginis). J Biogeogr. 2015;42:1469–81.

Koren S, Walenz BP, Berlin K, et al. . Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–36. PubMed PMC

Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat Methods. 2020;17:155–8. PubMed PMC

GenomicConsensus . https://github.com/PacificBiosciences/GenomicConsensus. Accessed March 2019.

Scaff10X . https://github.com/wtsi-hpag/Scaff10X. Accessed March 2019.

Long Ranger . https://github.com/10XGenomics/longranger. Accessed March 2019.

Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012:1207.3907.

Freebayes-polish . https://github.com/VGP/vgp-assembly/tree/master/pipeline/freebayes-polish. Accessed March 2019.

NCBI adaptors_for_screening_euks.fa. ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_euks.fa. Accessed November 2017.

NCBI Contam_in_euks.fa.gz. ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/contam_in_euks.fa.gz. Accessed November 2017, .

NCBI RefSeq Mitochondrion Database. ftp://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion. Accessed May 2019.

RefSeq: NCBI Reference Sequence Database Version 85 . www.ncbi.nlm.nih.gov/refseq. Accessed January 2018.

Chow W, Brugger K, Caccamo M, et al. . gEVAL — A web-based browser for evaluating genome assemblies. Bioinformatics. 2016;32:2508–10. PubMed PMC

Mapleson D, Accinelli GG, Kettleborough G, et al. . KAT: A K-mer Analysis Toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 2016;33:574–6. PubMed PMC

TrioBinning . https://github.com/arangrhie/TrioBinning. Accessed March 2019.

Rhie A, Walenz BP, Koren S, et al. . Merqury: Reference-free quality and phasing assessment for genome assemblies. bioRxiv. 2020, doi:10.1101/2020.03.15.992941. PubMed DOI PMC

Vurture GW, Sedlazeck FJ, Nattestad M, et al. . GenomeScope: Fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4. PubMed PMC

Nattestad M, Schatz MC. Assemblytics: A web analytics tool for the detection of variants from an assembly. Bioinformatics. 2016;32:3021–3. PubMed PMC

Kurtz S, Phillippy A, Delcher AL, et al. . Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. PubMed PMC

Simão FA, Waterhouse RM, Ioannidis P, et al. . BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2. PubMed

Nowell RW, Elsworth B, Oostra V, et al. . A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana. Gigascience. 2017;6, doi:10.1093/gigascience/gix035. PubMed DOI PMC

Zhan S, Reppert SM. MonarchBase: The monarch butterfly genome database. Nucleic Acids Res. 2012;41:D758–63. PubMed PMC

Davey JW, Chouteau M, Barker SL, et al. . Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution. G3 (Bethesda). 2016;6:695–708. PubMed PMC

Kanost MR, Arrese EL, Cao X, et al. . Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta. Insect Biochem Mol Biol. 2016;76:118–47. PubMed PMC

Ahola V, Lehtonen R, Somervuo P, et al. . The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. Nat Commun. 2014;5:4737. PubMed PMC

Kawamoto M, Jouraku A, Toyoda A, et al. . High-quality genome assembly of the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2019;107:53–62. PubMed

SilkBase . http://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi. Accessed June 2019.

Chen W, Yang X, Tetreau G, et al. . A high-quality chromosome-level genome assembly of a generalist herbivore, Trichoplusia ni. Mol Ecol Resour. 2019;19:485–96. PubMed

RefSeq: NCBI Reference Sequence Database Version 94.  www.ncbi.nlm.nih.gov/refseq. Accessed May 2019.

R Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2018. https://www.R-project.org/.

Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York, NY: Springer; 2016.

Hoff KJ, Lange S, Lomsadze A, et al. . BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–9. PubMed PMC

Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–8. PubMed

Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013–2015. http://www.repeatmasker.org. Accessed June 2019.

Benson G. Tandem Repeats Finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80. PubMed PMC

RMBlast . http://www.repeatmasker.org/RMBlast.html. Accessed June 2019.

Hubley R, Finn RD, Clements J, et al. . The Dfam database of repetitive DNA families. Nucleic Acids Res. 2016;44:D81. PubMed PMC

Galarza JA, Dhaygude K, Mappes J. De novo transcriptome assembly and its annotation for the aposematic wood tiger moth (Parasemia plantaginis). Genomics Data. 2017;12:71–3. PubMed PMC

Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–2.

Andrews S. FASTQC: A quality control tool for high throughput sequence data. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed June 2019.

Dobin A, Davis CA, Schlesinger F, et al. . STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. PubMed PMC

Kriventseva EV, Kuznetsov D, Tegenfeldt F, et al. . OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47:D807–11. PubMed PMC

GenomeThreader Gene Prediction Software . genomethreader.org. Accessed June 2019.

Šíchová J, Nguyen P, Dalíková M, et al. . Chromosomal evolution in tortricid moths: Conserved karyotypes with diverged features. PLoS One. 2013;8:e64520. PubMed PMC

Winnepenninckx B, Backeljau T, De Wachter R. Extraction of high molecular weight DNA from molluscs. Trends Genet. 1993;9:407. PubMed

Kato A, Albert PS, Birchler JA. Sensitive fluorescence in situ hybridization signal detection in maize using directly labeled probes produced by high concentration DNA polymerase nick translation. Biotech Histochem. 2006;81:71–8. PubMed

Yoshido A, Marec F, Sahara K. Resolution of sex chromosome constitution by genomic in situ hybridization and fluorescence in situ hybridization with (TTAGG)( n ) telomeric probe in some species of Lepidoptera. Chromosoma. 2005;114:193–202. PubMed

Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013:1303.3997.

Li H, Handsaker B, Wysoker A, et al. . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9. PubMed PMC

Picard . broadinstitute.github.io/picard. Accessed October 2019.

McKenna A, Hanna M, Banks E, et al. . The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303. PubMed PMC

Poplin R, Ruano-Rubio V, DePristo MA, et al. . Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv. 2017, doi:10.1101/201178. DOI

joanam scripts. https://github.com/joanam/scripts/blob/master/ldPruning.sh. Accessed November 2019.

Danecek P, Auton A, Abecasis G, et al. . The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. PubMed PMC

Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3. PubMed PMC

Rambaut A. FigTree version 1.4.3. 2014. http://tree.bio.ed.ac.uk/software/figtree/. Accessed November 2019.

Zheng X, Levine D, Shen J, et al. . A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–8. PubMed PMC

Robinson R. Lepidoptera Genetics. 1st ed. Oxford: Pergamon; 1971.

De Prins J, Saitoh K. Lepidoptera, moths and butterflies. In: Kristensen NP, ed. Handbook of Zoology. Berlin & New York: Walter de Gruyter; 2003:449–68.

Murakami A, Imai HT. Cytological evidence for holocentric chromosomes of the silkworms, Bombyx moriand B. mandarina (Bombycidae, Lepidoptera). Chromosoma. 1974;47:167–78. PubMed

Aguillon SM, Fitzpatrick JW, Bowman R, et al. . Deconstructing isolation-by-distance: The genomic consequences of limited dispersal. PLoS Genet. 2017;13:e1006911. PubMed PMC

Maresova J, Habel JC, Neve G, et al. . Cross-continental phylogeography of two Holarctic Nymphalid butterflies, Boloria eunomia and Boloria selene. PLoS One. 2019;14:e0214483. PubMed PMC

Yen EC, McCarthy SA, Galarza JA, et al.. Supporting data for “A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning.". GigaScience Database. 2020; 10.5524/100774. PubMed DOI PMC

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...